Page 1 of 2 12 LastLast
Results 1 to 40 of 65

Thread: Discussion about a Resource Allocation Model

Hybrid View

Previous Post Previous Post   Next Post Next Post
  1. #1

    Discussion about a Resource Allocation Model

    Originally posted by jjjjL
    I have this naging feeling that a stunningly beutiful model could be constructed that took into account the likelihood of prime discoveries by n-level, sieve speed, expected sieve speedup, and prp speed to properly calculate the weighting of cpu between sieving and prping. ... give me your thoughts on all this.

    -Louie
    I'm thinking about ways to build such a model, and wondering about the data to drive it. I figure other people may be noodling about this, too, so a place to share information would be useful. Here are some of things I think could be useful in the model.

    1. I can find the current distribution of remaining candidates in MikeH's sieve statistics.

    2. The time for a "standard PC" to complete a primality test - 3 days?

    3. The way primality testing will slow down with increasing n - will n's that are twice as large require twice as long? 4 times? sqrt(2)? ln(2)?

    4. The rate at which a "standard PC" can sieve at the present level - 1G per hour? And the way this will change with p. Doubling the p value decreases the number of primes in a range, but it also, I assume, increases the time to test each p. The Prime Number Theorem tells us the first component - but how does the time per test change?

    5. The rate at which factors will be found by sieving at various p levels - I think we can estimate this from Merten's Theorem and the number of remaining candidates.

    6. The rate at which primes are found among the candidates - I think we can use the Proth-Weight adjusted prime number theorem for this. A full probabilitistic model is probably complex - a reasonable first cut could use the average n-value for finding the next primes.

    7. The speedup of sieving with fewer k's to sieve - linear but with a fixed overhead?

    8. The increase in resources from the publciity of finding a new prime - 20%?

    9. The increase in resources during periods of no new primes - for purposes of optimal allocation this probably doesn't matter.

    I'm presently thinking about applying the above parameters to a system of differential equations. p(t) would be the Prime threshold for sieving, n(t) would be the n threshold for primality testing. These equations could be solved for fixed proportions of resources for sieving and primality testing. (For variable allocations, too, if we have any idea about how vary them.) A longer term planning horizon is probably appropriate - perhaps minimizing the predicted time until four more primes are found.

    Longer term there are other resource allocation problems that I think we can ignore at this time. I'd ignore the question of when to begin sieving beyond n=20M - it seem clear that deeper sieving on the present range has vastly greater payoff. I'd ignore the question of double-checking - my explorations of this suggest it is MUCH too early to devote serious resources to this, although to be sure it would be noce to know the answer to 10. Do we have any validated estimate of the error probability of reported results?

    Please jump in with data and ideas.

  2. #2
    Senior Member
    Join Date
    Jan 2003
    Location
    U.S
    Posts
    123
    quote:
    _________________________________________________
    4. The rate at which a "standard PC" can sieve at the present level - 1G per hour?
    _________________________________________________

    Is that a figure for the double check sieve?
    If so, that's reasonable, but if not, sieving 1 G per hour is
    definitely too fast for the main sieve.
    1 G per hour~ 278 k p/sec, which is impossible to achieve
    currently, even if your "standard PC" is a 2 GHZ Athlon or P4.

  3. #3
    Originally posted by Moo_the_cow
    quote:
    _________________________________________________
    4. The rate at which a "standard PC" can sieve at the present level - 1G per hour?
    _________________________________________________

    Is that a figure for the double check sieve?
    No - it's a clueless, unresearched guess for the main sieving effort. although it seems to be missing the leading decimal point I had intended. Do we have a definition of the "standard PC?" And an informed estimate of it's sieving rate?

  4. #4
    I love 67607
    Join Date
    Dec 2002
    Location
    Istanbul
    Posts
    752
    4. You can find many benchmarks in the client thread. Just take a look at the posts after the release of version 1.28. BTW, the speed of my pc increases 6% for each doubling of p (for the main sieve), across same client version of course.

  5. #5

    Re: Discussion about a Resource Allocation Model

    Originally posted by wblipp
    9. The increase in resources during periods of no new primes
    Maybe this was optimistic. On April 29th the project stats page is showing 843 users active in the last two weeks. The thread on 3000 users seems to say this figure was 1000 on February 23. Perhaps the number of users declines during periods of no primes. These numbers suggest 8% net decline per month.

    Eyeballing the system stats chart at the top of the stat page, there was a big increase from Feb 10 to March 26th, then a sharp decline and unclear trends after that, but all much higher than the February 10th level.

    Maybe "Random fluctuations around a flat level" is the best guess.

  6. #6
    3. The way primality testing will slow down with increasing n - will n's that are twice as large require twice as long? 4 times? sqrt(2)? ln(2)?
    It all depends.

    To perform a prp test, N-1 iterations have to be done. Within one FFT range, iteration times are constant. So if N is 1.5 times higher, but falls in the same FFT range, a test will take about 50% longer.

    One iteration with a twice as large FFT takes more than twice as long. So if an exponent is only 100 higher, but falls in a twice as large FFT range, a test will take more then twice as long.

    I don't know if the SoB client only supports 2^N FFT sizes or also some intermediate values as Prime95, but in general, a test twice as long will take more then 4 times longer. (maybe 5 times?).

    More then twice for the larger FFT size and twice as many iterations need to be performed.

  7. #7
    Senior Member
    Join Date
    Jan 2003
    Location
    UK
    Posts
    479
    2. The time for a "standard PC" to complete a primarily test - 3 days?
    Possibly a quote from me. This referred to my AMD XP2100+. Which is about right for n=3.5M.

    4. The rate at which a "standard PC" can sieve at the present level - 1G per hour? And the way this will change with p. Doubling the p value decreases the number of primes in a range, but it also, I assume, increases the time to test each p. The Prime Number Theorem tells us the first component - but how does the time per test change?
    On the same PC it takes about 1.4 hours/G, and that's for 300K-20M sieving. Doubling the value of p approximately halves the number of factors found. The sieve speed, however, does not change with p.

    Also remember to include in the model that a P4 sieves at about 35-40% of the speed of P3s and AMDs. To illustrate, I have a P4-1.7 which also takes about 3 days to PRP test n=3.5M, but for sieving it's rate is about 3.8 hours/G. Thus I never use P4s (that can PRP) for seiving.
    Last edited by MikeH; 04-30-2003 at 01:50 PM.

  8. #8
    Originally posted by wblipp
    4. The rate at which a "standard PC" can sieve at the present level - 1G per hour? And the way this will change with p.
    ----------------
    Originally posted by Nuri
    4. You can find many benchmarks in the client thread. Just take a look at the posts after the release of version 1.28. BTW, the speed of my pc increases 6% for each doubling of p (for the main sieve), across same client version of course.
    Here's what I found for P3s and Celerons on the main sieve
    Joe O.&nbsp 450Mhz &nbsp 55Kp/sec
    alexr &nbsp 600Mhz &nbsp 75Kp/sec
    Slatz &nbsp 1Ghz &nbsp 116Kp/sec
    Nuri &nbsp 1Ghz &nbsp 126Kp/sec


    These numbers are about 120Kp/sec for a 1Ghz P3, about 10Gp per day (24 hours usage). Only Nuri posted performance for a P4, although others agreed they are slower. It appears that at the same clock speed a P4 is slower on sieving by a factor of 3.

    These numbers were posted around April 9th. I think the p's being tested were in the vicinity of 14T.

    Using Nuri's speedup, the number of p's cleared per day should be proportional to p^0.09. Calibrating to the April 9th values, the number of p's removed per day by a 1Ghz PIII is 6.6*10^8*p^0.09.

    If we have "a" computers seiving, and p(t) is the seive threshold reached on day t, p(t) satisfies the differential equation

    dp/dt = a*6.6*10^8*p^0.09

    My D.E. skills are rusty, but I think this is trivial -

    p(t) = 6a*10^8*p^1.09 + p0

    Any corrections or comments?

  9. #9
    Member
    Join Date
    Oct 2002
    Location
    Austria
    Posts
    37
    some sieve benchmarks (main sieve)

    P3 Celeron 1300@1560 200kp/sec
    P4 Celeron 2000@2440 106kp/sec
    AMD Duron 1300 148kp/sec

    for prp testing don't forget to have a look at the memory speed:

    P4 Celeron 2000@2440 260kcEMs/sec
    FSB 122MHz, FSB/memory 3/4, memoryclock=162MHz

    P4 Celeron 1700@1989 335kcEMs/sec
    FSB 117MHz, FSB/memory 3/5, memoryclock=195MHz

    slower CPU but faster system !

  10. #10
    Junior Member
    Join Date
    Jan 2003
    Location
    Flensburg, Germany
    Posts
    10
    And another benchmark:
    AMD Athlon XP 1800+ 182kp/sec
    sieving around 15T

  11. #11
    Member
    Join Date
    Oct 2002
    Location
    Austria
    Posts
    37
    another sieve benchmark

    P4 Celeron 1700@1989 86kp/sec

    seems that the sieving speed doesn't depend on the memory speed

  12. #12
    I love 67607
    Join Date
    Dec 2002
    Location
    Istanbul
    Posts
    752
    Originally posted by wblipp
    Using Nuri's speedup, the number of p's cleared per day should be proportional to p^0.09.
    Any corrections or comments?
    Since the only data related to speedup came from me, I felt unconfortable and decided to run a test on speed change w.r.t. p values.

    As can be seen from the table below, the values are different from the 6% figure I previously mentioned. That 6% data was an approximation based on a couple of ranges I sieved on PIII-1000.

    The test of the table below is conducted on 10m p ranges each, for the p values shown in the table, each one starting from the exact T value un to T+10m. The machine used in test is, again, P4-1700, and the client used is SoBsieve v.1.28.
    Attached Images Attached Images

  13. #13
    I love 67607
    Join Date
    Dec 2002
    Location
    Istanbul
    Posts
    752
    5. The rate at which factors will be found by sieving at various p levels - I think we can estimate this from Merten's Theorem and the number of remaining candidates.
    Wblipp, could you please also check the 40000 = y * (x^1.06) formula I am using in
    Sieve: Distribution of Factors vd. p value thread against Merten's theorem? The formula is for the 3m-20m sieve. y = # factors per G and x = p level in terms of G.

    PS: Please also note that, the formula also includes the duplicates. So, I don't know how much, but if the data were to use unique factors only, the formula would probably tilt a bit more downwards (but probably not much).

  14. #14
    9. The increase in resources during periods of no new primes
    Moo has asked a question on the prediction thread that got me thinking long term about the computer power on the project. Over the scale of years, Moore’s Law says that computing power per dollar should double every eighteen months. Many of us are taking a Moore’s Law Dividend by spending less money on computers over time, but still the power per user should double every 24 months or so. So even if the number of people stays flat, the number of “standard PC’s” should double every 24 months. Measuring time in days, this is approximately

    dc/dt = 0.001c

    c(t) = c0*e^0.001t

    4b. The rate at which … (sieving) will change with p.
    Wow! What a thorough and well documented job Nuri did on this! Thanks! Based on Nuri’s speedup spreadsheet, the rate at which p’s will be cleared is proportional to p^0.03.

    4a. The rate at which a "standard PC" can sieve at the present level.
    Thanks to priwo and jimmy, and Nuri we have some new data about this, too.

    For P3’s and AMDs
    Joe O.&nbsp 450Mhz &nbsp 55Kp/sec
    alexr &nbsp 600Mhz &nbsp 75Kp/sec
    Slatz &nbsp 1Ghz &nbsp 116Kp/sec
    Nuri &nbsp 1Ghz &nbsp 126Kp/sec
    priwo &nbsp 1.56Ghz &nbsp 200Kp/sec
    priwo &nbsp 1.3Ghz &nbsp 148Kp/sec
    jimmy &nbsp 1.8Ghz &nbsp 182 Kp/sec

    Standardized to 1Ghz, these numbers are 122, 125, 116, 126, 128, 100, 113. The median is 122 – I’m comfortable sticking to the 120Kp/sec assumption.

    This is about 10Gp/day. We need the constant of proportionality so that at 15T the sieving rate for a standard 1Ghz 24/7 P3 is 10Gp/day – c*(15T)^0.03 = 10G – the constant is about 4G.


    For P4’s
    priwo &nbsp 2.44Ghz 106Kp/sec
    priwo &nbsp 1.99Ghz 86Kp/sec
    Nuri &nbsp 1.70Ghz 70

    Standardized to 1Ghz, these numbers are 43, 43, and 41. So the P4’s are slower on sieving by a factor of 2.8.

    Let s(t) be the proportion of standard computers that are dedicated to sieving, and set aside the P3 vs P4 question for the moment. The number of computers at time t is c(t), so the number of sieving computers is s(t)*c(t). The p level is p(t), so the rate at which each computer clears p’s is 4G*p(t)^0.03. All together

    dp/dt = s(t)*c(t)*4G*p(t)^0.03

    Any comments, complaints or corrections? When we have the computers (c(t)) and sieving process (p(t)) in good shape, we will move on to the issues around the number of candidates and the rate at which sieving and primality testing remove them.

  15. #15
    Originally posted by smh
    in general, a test twice as long will take more then 4 times longer. (maybe 5 times?).

    More then twice for the larger FFT size and twice as many iterations need to be performed.
    This article says the complexity of multiplying 2 N-digit numbers is N(log(N))^2. We have n-bit numbers, but that shouldn't affect the argument. We still need to account for the n cycles, so using his academic argument, the time for a primality test would go up at n^2*(log(n))^2. Using your "5 times" number, the work would go up n^2.32. The difference is probably smaller than our uncertainty. Here's a short table of how much more work each test takes at various n levels, compared to n=3.5M

    n=7M &nbsp 4.4 &nbsp 5.0
    n=20M &nbsp 57 &nbsp 41
    n=50M &nbsp 480 &nbsp 282
    n=200M &nbsp 12K &nbsp 5K
    n=1G &nbsp 504K &nbsp 154K

    So even at the 1G level, the two approaches differ by a factor of 3.3 when the extra work is a factor of at least 150,000.

    Does anyone have strong feelings, or better yet strong arguments, for one of these approaches?

  16. #16
    Originally posted by wblipp
    {...}
    jimmy &nbsp 1.8Ghz &nbsp 182 Kp/sec
    {...}
    Dunno how you're counting that, but he wrote "Athlon XP 1800+", which runs at less than 1.8Ghz (something like 1530MHz? There around?) Which would bring him up from 100 to 118, more in the range with the others...

    [Edit: And while I'm thinking about it, the "smart resource allocation" scheme should really do two different types of math for P4s and everything else -- i.e. should avoid the notion of a "standard PC". There may well come a point where the smart thing to do is putting P4s on PRP while keeping everything else sieving or such.]

  17. #17
    Senior Member
    Join Date
    Dec 2002
    Location
    Madrid, Spain
    Posts
    132
    The Athlon XP 1800+ runs at 11,5x133 MHz=1533 MHz.

    There may well come a point where the smart thing to do is putting P4s on PRP while keeping everything else sieving or such.
    I think Athlons and Athlons XP with DDR memory should also be on PRP (they aren't as efficient as P4's, but much more than P3's).
    In the near future, same for Opterons and Athlons 64 (which should be even more efficients than K7 cores, as they have SSE2, a faster memory subsystem and in some cases, a larger L2 cache).

  18. #18
    Originally posted by Troodon
    The Athlon XP 1800+ runs at 11,5x133 MHz=1533 MHz.
    Originally posted by Lagardo
    the "smart resource allocation" scheme should really do two different types of math for P4s and everything else
    Thanks for the correction. Knowing the correct speed for the Athlon XP 1800+ makes the standardized result 119 instead of 100. Fixing a transposition error and a roundoff error, the seven standardized performance numbers are now 122, 125, 116, 126, 128, 114, 119. The median is still 122, but the spread has narrowed.

    Yes, I agree that we are going to need to make separate decisions about the P4s and the other processors because they bring different relative skills to the two tasks. I've been trying to track that difference by thinking that a standard PC is a 1 Ghz P3, and recognizing that a P4 is 2.8 times slower than standard at seiving. Priwo's data suggests that we may need to track memory speed instead of processor speed for primality testing, so "standard" may become even less useful.

  19. #19
    Senior Member
    Join Date
    Jan 2003
    Location
    U.S
    Posts
    123
    Main sieve (3M-20M) CPU benchmark-

    P-III 800 Mhz - 93.5 kp/sec

    Hope this helps....

  20. #20

    Merten's Theorem

    Merten's Theorem is that the product of (1-1/p) for all primes less than x is asymptotically exp(-gamma)/ln(x). This is useful for estimating the effectiveness of trial factoring because the product is the fraction of numbers left after removing primes less than x. This is potentially useful in our resource model for what I have been thinking of as two distinct issues – the number of candidates presently in the system and the rate at which sieving will remove these candidates.

    We will need to modify Merten’s theorem by Proth Weights. We need Proth Weights because our sequences have divisibility properties that differ from the divisibility of sequential integers in standard ways. For example, in sequential integers, ½ are divisible by 2, then 1/3 of the remainder are divisible by 3, then 1/5 of the remainder are divisible by 5. For our remaining k values, none of the numbers we are testing for prime are divisible by 2, ½ are divisible by 3, ½ of the remainder are divisible by 5. It gets even stranger with 7 – for most of our k values 1/3 of the remainder are divisible by 7, but for two of the k values, none of the remainder are divisible by 7. I explored these divisibility properties by constructing partial coverings in the prediction thread, but better estimates of the Proth Weight are contained in Table 5 of Yves Gallot’s paper on weights.

    We might apply Merten’s Theorem in absolute terms or in relative terms. It’s most elegant in relative terms. If we previously sieved up through p=S, the number of values left from a range of size N will be N*a / ln(S), where is a constant calculated as the product of the Proth Weight and exp(-gamma). If we now extend the sieve through p=T, there will now be N*a / ln(T) left. The reduction factor is ln(S)/ln(T). This has some simple consequences. For example, the main sieve is now around 15T. This predicts that if we extend that sieve to 200T, the number of candidates will reduce by a factor of ln(15T)/ln(200T) = 0.92 – about 8% of the remaining candidates will be removed. It also predicts that removing half the candidates requires squaring the sieve limit.

    This sounds good, but does it work? Let’s see how well it predicts the current status. We can check the status of sieving in the Sieve Coordination Thread. When I checked, the sieving was nearly complete through 15.35T, there was a completed block from 17.42T-18.14T, and some scattered ranges even higher. The effect of the higher block should be ln(17.42T)/ln(18.14T)=0.9987, so we can ignore that. Using Yves Gallot’s Proth Weights, the average weight for the twelve remaining k values is 0.163. The value 0.163 * exp(-gamma) / ln(15.35T) = 0.00302. In each range of 1 million n’s, from the 12 million original candidates, this method predicts that on average there will be 36,245 remaining candidates. Looking at MikeH's Sieve Statistics for the main sieve (n=3M to 20M), we see that this estimate is about 1% high:

    3M<n< 4M: 35900
    4M<n< 5M: 35784
    5M<n< 6M: 36124
    6M<n< 7M: 35839
    7M<n< 8M: 35866
    8M<n< 9M: 36026
    9M<n<10M: 35862
    10M<n<11M: 35856
    11M<n<12M: 36015
    12M<n<13M: 35815
    13M<n<14M: 36157
    14M<n<15M: 35699
    15M<n<16M: 35731
    16M<n<17M: 35990
    17M<n<18M: 35851
    18M<n<19M: 36028
    19M<n<20M: 35905

    These average 35909. If we lower the Proth Weight estimate to 0.1618 we match very closely. Yves Gallot only reported the Proth Weights to 2 digits, so an adjustment of this magnitude seems reasonable.

    Merten’s theorem with the adjusted average proth weight seems well suited for calculating the present number of candidates. The remaining question is how well it describes the incremental process of sieving. Fortunately Nuri has excellent data on how that process has been going:
    Originally posted by Nuri
    Wblipp, could you please also check the 40000 = y * (x^1.06) formula I am using in
    Sieve: Distribution of Factors vd. p value thread against Merten's theorem? The formula is for the 3m-20m sieve. y = # factors per G and x = p level in terms of G.
    Between m=3M and n=17M for twelve k-values, there are 12*17*10^6 n values before sieving. Merten’s Theorem predicts that after sieving to “p” in G, the number of candidates left will be 12*17*10^6 * 0.1618 * exp(-gamma) / ln(p*10^9). The difference between this value at “p” and at “p+1” represent the numbers sieved away, and every one of the sievings represents a factor. The following is a plot of the number of factors found using Nuri’s function and using the Merten prediction. Blue is Nuri and red is Merton The agreement is excellent. I’m convinced we can use Merten’s Theorem to model the sieving.
    Attached Images Attached Images

  21. #21

    Model Changes After finding Primes

    I have proposed running the resource allocation model for the period of finding the next three primes because running this kind of model for exactly one period sometimes leads to false “edge effects” where the model says to do strange things because model thinks the world ends at the first stage. By running for three stages, these edge effects are usually pushed into the third stage, and the results are more realistic for planning the first stage. When the first stage ends (finding the next prime in our case), the model should be rolled out to look at stages 2, 3, and 4 to make plans during stage 2.

    To run the model in later stages, we need to know what parts of the model change after we have found a prime. There are three effects.

    1. c(t) – the computing power available at time t. I have proposed that this power increases by 20% every time we find a prime (Item 8). There hasn’t been any corrective input about this guess yet. This would be in addition to the slow steady growth I have proposed for Moore’s Law.
    In stage 1 c(t) = c0*e^0.001t
    In stage 2 c(t) = c0*1.2*e^0.001t
    In stage 3 c(t) = c0*1.2^2*e^0.001

    2. p(t) – the sieving limit increases faster because we are testing fewer k’s. As Mike discussed here, the sieve increase is less than linear. One explanation is that there is an overhead equivalent to three k-values. So removing three values makes the sieve run as (12+3)/(9+3) = 1.25, 25% faster. Using this approach, the sieving will increase by a rate of 15/14 in stage 2 and 15/13 in stage 3.

    3. Remaining Candidates. There will be 11, then 10, candidates left to test, so the primality testing will proceed faster. How much faster depends on which k-values we find primes for. The k’s with higher Proth Weights are more likely to be found first because they have more opportunities. That is, every remaining candidate is equally qualified as a number with no divisors under 15T, but the k’s with high Proth Weight have more of these candidates than the ones with low Proth Weight. Since their likelihood of removal is proportional to their weights, we can calculate the average weight removed by taking a weighted average. The average weight for the next prime discovered is 0.19.

    The average weight for the next two primes can be calculated by the same weighted average method, but now there are 66 pairs of k’s that might be the next two. For each pair we can calculate the amount of that pair – the sum of their Proth Weights – and the likelihood of that pair – the product of their Proth Weights. The weighted average for the next two primes discovered is 0.3767.

    So the total Proth Weight in Stage 1 will be 1.96, in Stage 2 will be 1.77, and in Stage 3 will be 1.5833. These values were all calculated using Yves Gallot’s Proth Weights, so we will reduce them by about 1% as discussed in the post about Merten’s Theorem, so the actual values are 1.9416, 1.7532, and 1.5685.

  22. #22

    Rate of completing n's

    MikeH reported that an Athlon XP2100+ completes a primality test in 3 days at n=3.5M. I think this is equivalent to a 1Ghz machine completing in 5 days. Using the academic scaling, a test at other values of “n” will take 5*[(n*ln(n))/(3.5M*ln(3.5M)]^2 days to complete, ignoring the lumpiness of the FFT transitions. At time t the sieving will have progressed to p(t), so based on Merten’s Theorem, the number of remaining candidates per n-step will be 1.9416 * exp(-gamma) / ln(p(t)), where 1.9416 is the total Proth Weight for all remaining candidates.

    Using, as before, c(t) for the total equivalent computing power available to the project and s(t) for the fraction of power devoted to sieving, we get the relationship

    dn/dt = c(t) * (1-s(t)) / [5 * [(n*ln(n))/(3.5M*ln(3.5M))]^2 * 1.9416 * (exp(-gamma)/ln(p(t))]

    Estimates elsewhere indicate about 1000 computers, with about 2% sieving.

    Comments, complaints, or corrections?

  23. #23
    Senior Member
    Join Date
    Dec 2002
    Location
    Madrid, Spain
    Posts
    132

    Re: Rate of completing n's

    Originally posted by wblipp
    MikeH reported that an Athlon XP2100+ completes a primality test in 3 days at n=3.5M. I think this is equivalent to a 1Ghz machine completing in 5 days.
    Please note that a 1 GHz Athlon is faster than a 1 GHz Pentium III. And a 1 GHz Pentium III is faster than a 1 GHz C3. So we can't speak about a "1 GHz machine".
    For k=10223 n=3461705, my Tualatin@1420 MHz (142 MHz FSB) needs nearly 160 CPU hours (~6.5 days)!

    I think the baseline should be a 1 GHz Pentium III Coppermine, with 100 MHz FSB, SDRAM and i815 chipset.

  24. #24

    Re: Re: Rate of completing n's

    Originally posted by Troodon
    I think the baseline should be a 1 GHz Pentium III Coppermine, with 100 MHz FSB, SDRAM and i815 chipset.
    Great! I love clarity. Do we know how long this baseline machine needs to complete a primality test near n=3.5M?

  25. #25
    Senior Member
    Join Date
    Dec 2002
    Location
    Madrid, Spain
    Posts
    132
    I think it would need at least 260 hours. Yes, 100 hours more that my Tualatin because:
    - it's 420 MHz slower.
    - the FSB and RAM speeds are 42 MHz slower.
    - Coppermines don't have a prefecth logic.
    - The i815 chipset is a bit slower than the BX chipset I'm using.
    - The n is higher (+50000).

  26. #26
    Maybe the "baseline machine" should be something that someone here actually has at their disposal? So we could run a PRP and see how long it takes?

    "1G coppermine" sounds like a pretty highfalutin' chip for its time. Contrary to, say, "1G Duron" or "1.2G Athlon", of which there were uncounted bazillions (at least amongst the people I know). Somewhat more modern equivalents would probably be either a P4-2.4 or an XP1700 or some such.

  27. #27
    Senior Member
    Join Date
    Dec 2002
    Location
    Madrid, Spain
    Posts
    132
    Maybe the "baseline machine" should be something that someone here actually has at their disposal? So we could run a PRP and see how long it takes?
    I'll set the Tualatin at 100x10 MHz to see how long it takes to complete a unit of the current test I'm running (n=3,46 M).

    "1G coppermine" sounds like a pretty highfalutin' chip for its time. Contrary to, say, "1G Duron" or "1.2G Athlon", of which there were uncounted bazillions (at least amongst the people I know).
    I said a Coppermine, not beause it's a very popular chip, but because I think is a good baseline (not too fast, and not too slow). Don't forget that there are many people here with machines slower than 1 GHz.

  28. #28

    FInalizing Parameters

    New developments on the Sieving Client thread indicate a 35% increase or so in sieving speed with version 1.31. The numbers are coming in around 160Kp per Gigahertz instead of the 120 we have found in this thread. This will increase the constant is the sieving equation from 4G to 5.6G.

    The last parameter we need some agreemment about is the time to complete a primality test in the vicinity of n=3.5M for a "standard" machine. This seems to have bogged down in the mire of "what is standard." I've been hoping that a simple approximation based on processor speed scaled to 1Ghz would be good enough - that seemed to work well for sieving speed with the caveat that if we consider putting P4's into sieving we need to remember that they are 2.8 times slower than "standard."

    So far we've got estimates that range from 5 days to 11 days for a 1Ghz matchine to finish a primality test. If I don't get more information soon, I'll pick "8 days" as a central estimate. I'm eager to start playing with the model and see if offers insignts into the appropriate level of sieving in the short term and long term.

  29. #29
    Senior Member
    Join Date
    Jan 2003
    Location
    UK
    Posts
    479
    So far we've got estimates that range from 5 days to 11 days for a 1Ghz matchine to finish a primality test. If I don't get more information soon, I'll pick "8 days" as a central estimate.
    I have a P3-850 which completed a PRP test where n~3.50M in 5.6 days.

    So scaled for that average 1GHz PC, I guess that would be 4.75 days.

  30. #30
    Senior Member
    Join Date
    Dec 2002
    Location
    Madrid, Spain
    Posts
    132
    5.6 days?! That's too fast. Or maybe is something wrong with my Tualatin. Currently, it does ~82000 cEM/s and needs ~50 minutes per unit.

  31. #31
    Code:
    [11:46] User [ice:~/software/sob] > grep name /proc/cpuinfo
    model name      : AMD Athlon(tm) XP 1900+
    [11:47] User [ice:~/software/sob] > grep MHz /proc/cpuinfo
    cpu MHz         : 1569.350
    [11:47] User [ice:~/software/sob] > grep "blocks left" sclient.log | tail
    [Mon May 12 08:35:06 2003] n.high = 2494915  .  55 blocks left in test
    [Mon May 12 08:55:56 2003] n.high = 2514560  .  54 blocks left in test
    [Mon May 12 09:16:45 2003] n.high = 2534205  .  53 blocks left in test
    [Mon May 12 09:37:37 2003] n.high = 2553850  .  52 blocks left in test
    [Mon May 12 09:58:39 2003] n.high = 2573495  .  51 blocks left in test
    [Mon May 12 10:19:35 2003] n.high = 2593140  .  50 blocks left in test
    [Mon May 12 10:40:31 2003] n.high = 2612785  .  49 blocks left in test
    [Mon May 12 11:01:27 2003] n.high = 2632430  .  48 blocks left in test
    [Mon May 12 11:22:22 2003] n.high = 2652075  .  47 blocks left in test
    [Mon May 12 11:43:18 2003] n.high = 2671720  .  46 blocks left in test
    [11:47] User [ice:~/software/sob] >
    So on an Athlon at least I get 10 blocks in 3h:8m:12s. For n=3.5M (that's about 172 blocks or) just under 54 hours for the whole test. Or scaled to 1GHz about 3days12.75hour for a full test on a 1GHz Tbred.

  32. #32
    Senior Member
    Join Date
    Jan 2003
    Location
    UK
    Posts
    479
    Code:
    [Sun Apr 20 19:40:51 2003] n.high = 61050  .  170 blocks left in test
    [Sun Apr 20 20:26:32 2003] n.high = 81400  .  169 blocks left in test
    [Sun Apr 20 21:12:09 2003] n.high = 101750  .  168 blocks left in test
    [Sun Apr 20 21:57:47 2003] n.high = 122100  .  167 blocks left in test
    [Sun Apr 20 22:43:25 2003] n.high = 142450  .  166 blocks left in test
    [Sun Apr 20 23:29:05 2003] n.high = 162800  .  165 blocks left in test
    This is the log from the P3-850 quoted above (it's nothing special a Dell latitude laptop). As you can see ~46 minutes per block, 173 blocks for this test (n=3504985), so that's just over 5.5 days.

  33. #33
    Sieve it, baby!
    Join Date
    Nov 2002
    Location
    Potsdam, Germany
    Posts
    959
    Problem should be the memory and front side bus. The P3 likely uses PC100 SDR SDRAM, whereas the Athlon should use DDR SDRAM - I guess PC266?

    AFAIK, this makes a big difference...

    Plus, Lagardo used the real CPU frequency, not the Quantispeed value. The Palomino Athlon was basically on par with the Coppermine P3, but the XP-versions provide more bang for the buck .... eh... instructions per second.
    I think, one had to average the standard PC over all the different PC architectures to be thorough. But maybe that's to much effort? I don't know...

  34. #34
    Senior Member
    Join Date
    Dec 2002
    Location
    Madrid, Spain
    Posts
    132
    Code:
    [Mon May 12 14:57:03 2003] n.high = 2357406  .  53 blocks left in test
    [Mon May 12 15:49:43 2003] n.high = 2378268  .  52 blocks left in test
    [Mon May 12 16:40:19 2003] n.high = 2399130  .  51 blocks left in test
    [Mon May 12 17:31:03 2003] n.high = 2419992  .  50 blocks left in test
    [Mon May 12 18:21:38 2003] n.high = 2440854  .  49 blocks left in test
    [Mon May 12 19:11:37 2003] n.high = 2461716  .  48 blocks left in test
    MikeH's Pentium III 850 is a bit faster than my Tualatin@1420 and Lagardo's Athlon XP at the same speed would be twice as fast. So there's a problem. Could anyone test SB on a Tualatin, please?

  35. #35
    Moderator Joe O's Avatar
    Join Date
    Jul 2002
    Location
    West Milford, NJ
    Posts
    643
    Wasn't the cache on the Tualatin 256K? Whereas the cache on the older PIII was 512K.
    Joe O

  36. #36
    Sieve it, baby!
    Join Date
    Nov 2002
    Location
    Potsdam, Germany
    Posts
    959
    It's exactly the other way - when Intel moved to the Tualatin core, they used the 0.13 (?) micron process, which gave them more space for a bigger L2 cache...

  37. #37
    Senior Member
    Join Date
    Dec 2002
    Location
    Madrid, Spain
    Posts
    132
    It isn't a cache size-related problem. My Tualatin and Lagardo's and MikeH's CPU's all have 256 kB of on-die full speed L2 cache.

    Here's a little table I've made for you

    CPU | L1 Cache | L2 Cache
    ----------------------------------------------------------------------------------
    Pentium II (Klamath and Deschutes) | 32 kB | 512 kB, external (64 bit bus), 1/2 core speed
    Pentium III (Katmai) | 32 kB | 512 kB, external (64 bit bus), 1/2 core speed
    Pentium III (Coppermine) | 32 kB | 256 kB, on-die (256 bit bus), full core speed
    Pentium III/Celeron (Tualatin) | 32 kB | 256 kB, on-die (256 bit bus), full core speed
    Pentium III-S (Tualatin) | 32 kB | 512 kB, on-die (256 bit bus), full core speed

    Pentium 4 (Willamete) | 8 kB | 256 kB, on-die (256 bit bus), full core speed
    Pentium 4 (Northwood) | 8 kB | 512 kB, on-die (256 bit bus), full core speed
    Pentium 4 (Prescott) | 8 kB | 1024 kB, on-die (256 bit bus), full core speed

    Celeron (Covington) | 32 kB | None
    Celeron (Mendocino) | 32 kB | 128 kB, on-die (64 bit bus), full core speed
    Celeron (Coppermine) | 32 kB | 128 kB, on-die (256 bit bus), full core speed
    Celeron (Willamete/Northwood) | 8 kB | 128 kB, on-die (256 bit bus), full core speed

    Athlon (K7/K75) | 128 kB | 512 kB, external (64 bit bus), 1/2 or 2/3 core speed
    Athlon/ Athlon XP (Thunderbird, Palomino, Thoroughbred A/B) | 128 kB | 256 kB, on-die (64 bit bus), full core speed
    Athlon XP (Barton) | 128 kB | 512 kB, on-die (64 bit bus), full core speed

    Duron (Spitfire/Morgan) | 128 kB | 64 kB, on-die (64 bit bus), full core speed
    Last edited by Troodon; 05-12-2003 at 06:03 PM.

  38. #38
    Originally posted by Mystwalker
    Plus, Lagardo used the real CPU frequency, not the Quantispeed value. The Palomino Athlon was basically on par with the Coppermine P3, but the XP-versions provide more bang for the buck .... eh... instructions per second.
    Granted -- if I think of it as "1900MHz" instead of the ('real' number) "1569" then It'd scale to 4days6.5hours at 1GHz even. Of course there never was an "Athlon XP1000" or such.

  39. #39
    Sieve it, baby!
    Join Date
    Nov 2002
    Location
    Potsdam, Germany
    Posts
    959
    Troodon:

    The P3-m has 512 KByte L2 cache.

    (I just checked it with wcpuid to make sure )

  40. #40
    Senior Member
    Join Date
    Dec 2002
    Location
    Madrid, Spain
    Posts
    132
    Mystwalker: yes and no.
    When it appeared, the name "Pentium III-M" was used in several places (not officially) to call the 256 kB version of the desktop Tualatin and difference it from the 512 kB version, the "Pentium III-S". That's why I made that mystake. Currently, Intel refers to Tualatins as "Pentium III 256KB L2 Cache" and "Pentium III with 512KB L2 Cache".
    You're speaking about the mobile version of the Tualatin also called "Pentium III-M". And AFAIK, all mobile Tualatins have 512 kB of L2 cache.
    Please note that my table is only about desktop processors.

Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •