1. Discussion about a Resource Allocation Model

Originally posted by jjjjL
I have this naging feeling that a stunningly beutiful model could be constructed that took into account the likelihood of prime discoveries by n-level, sieve speed, expected sieve speedup, and prp speed to properly calculate the weighting of cpu between sieving and prping. ... give me your thoughts on all this.

-Louie
I'm thinking about ways to build such a model, and wondering about the data to drive it. I figure other people may be noodling about this, too, so a place to share information would be useful. Here are some of things I think could be useful in the model.

1. I can find the current distribution of remaining candidates in MikeH's sieve statistics.

2. The time for a "standard PC" to complete a primality test - 3 days?

3. The way primality testing will slow down with increasing n - will n's that are twice as large require twice as long? 4 times? sqrt(2)? ln(2)?

4. The rate at which a "standard PC" can sieve at the present level - 1G per hour? And the way this will change with p. Doubling the p value decreases the number of primes in a range, but it also, I assume, increases the time to test each p. The Prime Number Theorem tells us the first component - but how does the time per test change?

5. The rate at which factors will be found by sieving at various p levels - I think we can estimate this from Merten's Theorem and the number of remaining candidates.

6. The rate at which primes are found among the candidates - I think we can use the Proth-Weight adjusted prime number theorem for this. A full probabilitistic model is probably complex - a reasonable first cut could use the average n-value for finding the next primes.

7. The speedup of sieving with fewer k's to sieve - linear but with a fixed overhead?

8. The increase in resources from the publciity of finding a new prime - 20%?

9. The increase in resources during periods of no new primes - for purposes of optimal allocation this probably doesn't matter.

I'm presently thinking about applying the above parameters to a system of differential equations. p(t) would be the Prime threshold for sieving, n(t) would be the n threshold for primality testing. These equations could be solved for fixed proportions of resources for sieving and primality testing. (For variable allocations, too, if we have any idea about how vary them.) A longer term planning horizon is probably appropriate - perhaps minimizing the predicted time until four more primes are found.

Longer term there are other resource allocation problems that I think we can ignore at this time. I'd ignore the question of when to begin sieving beyond n=20M - it seem clear that deeper sieving on the present range has vastly greater payoff. I'd ignore the question of double-checking - my explorations of this suggest it is MUCH too early to devote serious resources to this, although to be sure it would be noce to know the answer to 10. Do we have any validated estimate of the error probability of reported results?

Please jump in with data and ideas.

2. quote:
_________________________________________________
4. The rate at which a "standard PC" can sieve at the present level - 1G per hour?
_________________________________________________

Is that a figure for the double check sieve?
If so, that's reasonable, but if not, sieving 1 G per hour is
definitely too fast for the main sieve.
1 G per hour~ 278 k p/sec, which is impossible to achieve
currently, even if your "standard PC" is a 2 GHZ Athlon or P4.

3. Originally posted by Moo_the_cow
quote:
_________________________________________________
4. The rate at which a "standard PC" can sieve at the present level - 1G per hour?
_________________________________________________

Is that a figure for the double check sieve?
No - it's a clueless, unresearched guess for the main sieving effort. although it seems to be missing the leading decimal point I had intended. Do we have a definition of the "standard PC?" And an informed estimate of it's sieving rate?

4. 4. You can find many benchmarks in the client thread. Just take a look at the posts after the release of version 1.28. BTW, the speed of my pc increases 6% for each doubling of p (for the main sieve), across same client version of course.

5. Re: Discussion about a Resource Allocation Model

Originally posted by wblipp
9. The increase in resources during periods of no new primes
Maybe this was optimistic. On April 29th the project stats page is showing 843 users active in the last two weeks. The thread on 3000 users seems to say this figure was 1000 on February 23. Perhaps the number of users declines during periods of no primes. These numbers suggest 8% net decline per month.

Eyeballing the system stats chart at the top of the stat page, there was a big increase from Feb 10 to March 26th, then a sharp decline and unclear trends after that, but all much higher than the February 10th level.

Maybe "Random fluctuations around a flat level" is the best guess.

6. 3. The way primality testing will slow down with increasing n - will n's that are twice as large require twice as long? 4 times? sqrt(2)? ln(2)?
It all depends.

To perform a prp test, N-1 iterations have to be done. Within one FFT range, iteration times are constant. So if N is 1.5 times higher, but falls in the same FFT range, a test will take about 50% longer.

One iteration with a twice as large FFT takes more than twice as long. So if an exponent is only 100 higher, but falls in a twice as large FFT range, a test will take more then twice as long.

I don't know if the SoB client only supports 2^N FFT sizes or also some intermediate values as Prime95, but in general, a test twice as long will take more then 4 times longer. (maybe 5 times?).

More then twice for the larger FFT size and twice as many iterations need to be performed.

7. 2. The time for a "standard PC" to complete a primarily test - 3 days?
Possibly a quote from me. This referred to my AMD XP2100+. Which is about right for n=3.5M.

4. The rate at which a "standard PC" can sieve at the present level - 1G per hour? And the way this will change with p. Doubling the p value decreases the number of primes in a range, but it also, I assume, increases the time to test each p. The Prime Number Theorem tells us the first component - but how does the time per test change?
On the same PC it takes about 1.4 hours/G, and that's for 300K-20M sieving. Doubling the value of p approximately halves the number of factors found. The sieve speed, however, does not change with p.

Also remember to include in the model that a P4 sieves at about 35-40% of the speed of P3s and AMDs. To illustrate, I have a P4-1.7 which also takes about 3 days to PRP test n=3.5M, but for sieving it's rate is about 3.8 hours/G. Thus I never use P4s (that can PRP) for seiving.

8. Originally posted by wblipp
4. The rate at which a "standard PC" can sieve at the present level - 1G per hour? And the way this will change with p.
----------------
Originally posted by Nuri
4. You can find many benchmarks in the client thread. Just take a look at the posts after the release of version 1.28. BTW, the speed of my pc increases 6% for each doubling of p (for the main sieve), across same client version of course.
Here's what I found for P3s and Celerons on the main sieve
Joe O.&nbsp 450Mhz &nbsp 55Kp/sec
alexr &nbsp 600Mhz &nbsp 75Kp/sec
Slatz &nbsp 1Ghz &nbsp 116Kp/sec
Nuri &nbsp 1Ghz &nbsp 126Kp/sec

These numbers are about 120Kp/sec for a 1Ghz P3, about 10Gp per day (24 hours usage). Only Nuri posted performance for a P4, although others agreed they are slower. It appears that at the same clock speed a P4 is slower on sieving by a factor of 3.

These numbers were posted around April 9th. I think the p's being tested were in the vicinity of 14T.

Using Nuri's speedup, the number of p's cleared per day should be proportional to p^0.09. Calibrating to the April 9th values, the number of p's removed per day by a 1Ghz PIII is 6.6*10^8*p^0.09.

If we have "a" computers seiving, and p(t) is the seive threshold reached on day t, p(t) satisfies the differential equation

dp/dt = a*6.6*10^8*p^0.09

My D.E. skills are rusty, but I think this is trivial -

p(t) = 6a*10^8*p^1.09 + p0

9. some sieve benchmarks (main sieve)

P3 Celeron 1300@1560 200kp/sec
P4 Celeron 2000@2440 106kp/sec
AMD Duron 1300 148kp/sec

for prp testing don't forget to have a look at the memory speed:

P4 Celeron 2000@2440 260kcEMs/sec
FSB 122MHz, FSB/memory 3/4, memoryclock=162MHz

P4 Celeron 1700@1989 335kcEMs/sec
FSB 117MHz, FSB/memory 3/5, memoryclock=195MHz

slower CPU but faster system !

10. And another benchmark:
AMD Athlon XP 1800+ 182kp/sec
sieving around 15T

11. another sieve benchmark

P4 Celeron 1700@1989 86kp/sec

seems that the sieving speed doesn't depend on the memory speed

12. Originally posted by wblipp
Using Nuri's speedup, the number of p's cleared per day should be proportional to p^0.09.
Since the only data related to speedup came from me, I felt unconfortable and decided to run a test on speed change w.r.t. p values.

As can be seen from the table below, the values are different from the 6% figure I previously mentioned. That 6% data was an approximation based on a couple of ranges I sieved on PIII-1000.

The test of the table below is conducted on 10m p ranges each, for the p values shown in the table, each one starting from the exact T value un to T+10m. The machine used in test is, again, P4-1700, and the client used is SoBsieve v.1.28.

13. 5. The rate at which factors will be found by sieving at various p levels - I think we can estimate this from Merten's Theorem and the number of remaining candidates.
Wblipp, could you please also check the 40000 = y * (x^1.06) formula I am using in
Sieve: Distribution of Factors vd. p value thread against Merten's theorem? The formula is for the 3m-20m sieve. y = # factors per G and x = p level in terms of G.

PS: Please also note that, the formula also includes the duplicates. So, I don't know how much, but if the data were to use unique factors only, the formula would probably tilt a bit more downwards (but probably not much).

14. 9. The increase in resources during periods of no new primes
Moo has asked a question on the prediction thread that got me thinking long term about the computer power on the project. Over the scale of years, Moore’s Law says that computing power per dollar should double every eighteen months. Many of us are taking a Moore’s Law Dividend by spending less money on computers over time, but still the power per user should double every 24 months or so. So even if the number of people stays flat, the number of “standard PC’s” should double every 24 months. Measuring time in days, this is approximately

dc/dt = 0.001c

c(t) = c0*e^0.001t

4b. The rate at which … (sieving) will change with p.
Wow! What a thorough and well documented job Nuri did on this! Thanks! Based on Nuri’s speedup spreadsheet, the rate at which p’s will be cleared is proportional to p^0.03.

4a. The rate at which a "standard PC" can sieve at the present level.

For P3’s and AMDs
Joe O.&nbsp 450Mhz &nbsp 55Kp/sec
alexr &nbsp 600Mhz &nbsp 75Kp/sec
Slatz &nbsp 1Ghz &nbsp 116Kp/sec
Nuri &nbsp 1Ghz &nbsp 126Kp/sec
priwo &nbsp 1.56Ghz &nbsp 200Kp/sec
priwo &nbsp 1.3Ghz &nbsp 148Kp/sec
jimmy &nbsp 1.8Ghz &nbsp 182 Kp/sec

Standardized to 1Ghz, these numbers are 122, 125, 116, 126, 128, 100, 113. The median is 122 – I’m comfortable sticking to the 120Kp/sec assumption.

This is about 10Gp/day. We need the constant of proportionality so that at 15T the sieving rate for a standard 1Ghz 24/7 P3 is 10Gp/day – c*(15T)^0.03 = 10G – the constant is about 4G.

For P4’s
priwo &nbsp 2.44Ghz 106Kp/sec
priwo &nbsp 1.99Ghz 86Kp/sec
Nuri &nbsp 1.70Ghz 70

Standardized to 1Ghz, these numbers are 43, 43, and 41. So the P4’s are slower on sieving by a factor of 2.8.

Let s(t) be the proportion of standard computers that are dedicated to sieving, and set aside the P3 vs P4 question for the moment. The number of computers at time t is c(t), so the number of sieving computers is s(t)*c(t). The p level is p(t), so the rate at which each computer clears p’s is 4G*p(t)^0.03. All together

dp/dt = s(t)*c(t)*4G*p(t)^0.03

Any comments, complaints or corrections? When we have the computers (c(t)) and sieving process (p(t)) in good shape, we will move on to the issues around the number of candidates and the rate at which sieving and primality testing remove them.

15. Originally posted by smh
in general, a test twice as long will take more then 4 times longer. (maybe 5 times?).

More then twice for the larger FFT size and twice as many iterations need to be performed.
This article says the complexity of multiplying 2 N-digit numbers is N(log(N))^2. We have n-bit numbers, but that shouldn't affect the argument. We still need to account for the n cycles, so using his academic argument, the time for a primality test would go up at n^2*(log(n))^2. Using your "5 times" number, the work would go up n^2.32. The difference is probably smaller than our uncertainty. Here's a short table of how much more work each test takes at various n levels, compared to n=3.5M

n=7M &nbsp 4.4 &nbsp 5.0
n=20M &nbsp 57 &nbsp 41
n=50M &nbsp 480 &nbsp 282
n=200M &nbsp 12K &nbsp 5K
n=1G &nbsp 504K &nbsp 154K

So even at the 1G level, the two approaches differ by a factor of 3.3 when the extra work is a factor of at least 150,000.

Does anyone have strong feelings, or better yet strong arguments, for one of these approaches?

16. Originally posted by wblipp
{...}
jimmy &nbsp 1.8Ghz &nbsp 182 Kp/sec
{...}
Dunno how you're counting that, but he wrote "Athlon XP 1800+", which runs at less than 1.8Ghz (something like 1530MHz? There around?) Which would bring him up from 100 to 118, more in the range with the others...

[Edit: And while I'm thinking about it, the "smart resource allocation" scheme should really do two different types of math for P4s and everything else -- i.e. should avoid the notion of a "standard PC". There may well come a point where the smart thing to do is putting P4s on PRP while keeping everything else sieving or such.]

17. The Athlon XP 1800+ runs at 11,5x133 MHz=1533 MHz.

There may well come a point where the smart thing to do is putting P4s on PRP while keeping everything else sieving or such.
I think Athlons and Athlons XP with DDR memory should also be on PRP (they aren't as efficient as P4's, but much more than P3's).
In the near future, same for Opterons and Athlons 64 (which should be even more efficients than K7 cores, as they have SSE2, a faster memory subsystem and in some cases, a larger L2 cache).

18. Originally posted by Troodon
The Athlon XP 1800+ runs at 11,5x133 MHz=1533 MHz.
Originally posted by Lagardo
the "smart resource allocation" scheme should really do two different types of math for P4s and everything else
Thanks for the correction. Knowing the correct speed for the Athlon XP 1800+ makes the standardized result 119 instead of 100. Fixing a transposition error and a roundoff error, the seven standardized performance numbers are now 122, 125, 116, 126, 128, 114, 119. The median is still 122, but the spread has narrowed.

Yes, I agree that we are going to need to make separate decisions about the P4s and the other processors because they bring different relative skills to the two tasks. I've been trying to track that difference by thinking that a standard PC is a 1 Ghz P3, and recognizing that a P4 is 2.8 times slower than standard at seiving. Priwo's data suggests that we may need to track memory speed instead of processor speed for primality testing, so "standard" may become even less useful.

19. Main sieve (3M-20M) CPU benchmark-

P-III 800 Mhz - 93.5 kp/sec

Hope this helps....

20. Merten's Theorem

Merten's Theorem is that the product of (1-1/p) for all primes less than x is asymptotically exp(-gamma)/ln(x). This is useful for estimating the effectiveness of trial factoring because the product is the fraction of numbers left after removing primes less than x. This is potentially useful in our resource model for what I have been thinking of as two distinct issues – the number of candidates presently in the system and the rate at which sieving will remove these candidates.

We will need to modify Merten’s theorem by Proth Weights. We need Proth Weights because our sequences have divisibility properties that differ from the divisibility of sequential integers in standard ways. For example, in sequential integers, &#189; are divisible by 2, then 1/3 of the remainder are divisible by 3, then 1/5 of the remainder are divisible by 5. For our remaining k values, none of the numbers we are testing for prime are divisible by 2, &#189; are divisible by 3, &#189; of the remainder are divisible by 5. It gets even stranger with 7 – for most of our k values 1/3 of the remainder are divisible by 7, but for two of the k values, none of the remainder are divisible by 7. I explored these divisibility properties by constructing partial coverings in the prediction thread, but better estimates of the Proth Weight are contained in Table 5 of Yves Gallot’s paper on weights.

We might apply Merten’s Theorem in absolute terms or in relative terms. It’s most elegant in relative terms. If we previously sieved up through p=S, the number of values left from a range of size N will be N*a / ln(S), where is a constant calculated as the product of the Proth Weight and exp(-gamma). If we now extend the sieve through p=T, there will now be N*a / ln(T) left. The reduction factor is ln(S)/ln(T). This has some simple consequences. For example, the main sieve is now around 15T. This predicts that if we extend that sieve to 200T, the number of candidates will reduce by a factor of ln(15T)/ln(200T) = 0.92 – about 8% of the remaining candidates will be removed. It also predicts that removing half the candidates requires squaring the sieve limit.

This sounds good, but does it work? Let’s see how well it predicts the current status. We can check the status of sieving in the Sieve Coordination Thread. When I checked, the sieving was nearly complete through 15.35T, there was a completed block from 17.42T-18.14T, and some scattered ranges even higher. The effect of the higher block should be ln(17.42T)/ln(18.14T)=0.9987, so we can ignore that. Using Yves Gallot’s Proth Weights, the average weight for the twelve remaining k values is 0.163. The value 0.163 * exp(-gamma) / ln(15.35T) = 0.00302. In each range of 1 million n’s, from the 12 million original candidates, this method predicts that on average there will be 36,245 remaining candidates. Looking at MikeH's Sieve Statistics for the main sieve (n=3M to 20M), we see that this estimate is about 1% high:

3M<n< 4M: 35900
4M<n< 5M: 35784
5M<n< 6M: 36124
6M<n< 7M: 35839
7M<n< 8M: 35866
8M<n< 9M: 36026
9M<n<10M: 35862
10M<n<11M: 35856
11M<n<12M: 36015
12M<n<13M: 35815
13M<n<14M: 36157
14M<n<15M: 35699
15M<n<16M: 35731
16M<n<17M: 35990
17M<n<18M: 35851
18M<n<19M: 36028
19M<n<20M: 35905

These average 35909. If we lower the Proth Weight estimate to 0.1618 we match very closely. Yves Gallot only reported the Proth Weights to 2 digits, so an adjustment of this magnitude seems reasonable.

Merten’s theorem with the adjusted average proth weight seems well suited for calculating the present number of candidates. The remaining question is how well it describes the incremental process of sieving. Fortunately Nuri has excellent data on how that process has been going:
Originally posted by Nuri
Wblipp, could you please also check the 40000 = y * (x^1.06) formula I am using in
Sieve: Distribution of Factors vd. p value thread against Merten's theorem? The formula is for the 3m-20m sieve. y = # factors per G and x = p level in terms of G.
Between m=3M and n=17M for twelve k-values, there are 12*17*10^6 n values before sieving. Merten’s Theorem predicts that after sieving to “p” in G, the number of candidates left will be 12*17*10^6 * 0.1618 * exp(-gamma) / ln(p*10^9). The difference between this value at “p” and at “p+1” represent the numbers sieved away, and every one of the sievings represents a factor. The following is a plot of the number of factors found using Nuri’s function and using the Merten prediction. Blue is Nuri and red is Merton The agreement is excellent. I’m convinced we can use Merten’s Theorem to model the sieving.

21. Model Changes After finding Primes

I have proposed running the resource allocation model for the period of finding the next three primes because running this kind of model for exactly one period sometimes leads to false “edge effects” where the model says to do strange things because model thinks the world ends at the first stage. By running for three stages, these edge effects are usually pushed into the third stage, and the results are more realistic for planning the first stage. When the first stage ends (finding the next prime in our case), the model should be rolled out to look at stages 2, 3, and 4 to make plans during stage 2.

To run the model in later stages, we need to know what parts of the model change after we have found a prime. There are three effects.

1. c(t) – the computing power available at time t. I have proposed that this power increases by 20% every time we find a prime (Item 8). There hasn’t been any corrective input about this guess yet. This would be in addition to the slow steady growth I have proposed for Moore’s Law.
In stage 1 c(t) = c0*e^0.001t
In stage 2 c(t) = c0*1.2*e^0.001t
In stage 3 c(t) = c0*1.2^2*e^0.001

2. p(t) – the sieving limit increases faster because we are testing fewer k’s. As Mike discussed here, the sieve increase is less than linear. One explanation is that there is an overhead equivalent to three k-values. So removing three values makes the sieve run as (12+3)/(9+3) = 1.25, 25% faster. Using this approach, the sieving will increase by a rate of 15/14 in stage 2 and 15/13 in stage 3.

3. Remaining Candidates. There will be 11, then 10, candidates left to test, so the primality testing will proceed faster. How much faster depends on which k-values we find primes for. The k’s with higher Proth Weights are more likely to be found first because they have more opportunities. That is, every remaining candidate is equally qualified as a number with no divisors under 15T, but the k’s with high Proth Weight have more of these candidates than the ones with low Proth Weight. Since their likelihood of removal is proportional to their weights, we can calculate the average weight removed by taking a weighted average. The average weight for the next prime discovered is 0.19.

The average weight for the next two primes can be calculated by the same weighted average method, but now there are 66 pairs of k’s that might be the next two. For each pair we can calculate the amount of that pair – the sum of their Proth Weights – and the likelihood of that pair – the product of their Proth Weights. The weighted average for the next two primes discovered is 0.3767.

So the total Proth Weight in Stage 1 will be 1.96, in Stage 2 will be 1.77, and in Stage 3 will be 1.5833. These values were all calculated using Yves Gallot’s Proth Weights, so we will reduce them by about 1% as discussed in the post about Merten’s Theorem, so the actual values are 1.9416, 1.7532, and 1.5685.

22. Rate of completing n's

MikeH reported that an Athlon XP2100+ completes a primality test in 3 days at n=3.5M. I think this is equivalent to a 1Ghz machine completing in 5 days. Using the academic scaling, a test at other values of “n” will take 5*[(n*ln(n))/(3.5M*ln(3.5M)]^2 days to complete, ignoring the lumpiness of the FFT transitions. At time t the sieving will have progressed to p(t), so based on Merten’s Theorem, the number of remaining candidates per n-step will be 1.9416 * exp(-gamma) / ln(p(t)), where 1.9416 is the total Proth Weight for all remaining candidates.

Using, as before, c(t) for the total equivalent computing power available to the project and s(t) for the fraction of power devoted to sieving, we get the relationship

dn/dt = c(t) * (1-s(t)) / [5 * [(n*ln(n))/(3.5M*ln(3.5M))]^2 * 1.9416 * (exp(-gamma)/ln(p(t))]

23. Re: Rate of completing n's

Originally posted by wblipp
MikeH reported that an Athlon XP2100+ completes a primality test in 3 days at n=3.5M. I think this is equivalent to a 1Ghz machine completing in 5 days.
Please note that a 1 GHz Athlon is faster than a 1 GHz Pentium III. And a 1 GHz Pentium III is faster than a 1 GHz C3. So we can't speak about a "1 GHz machine".
For k=10223 n=3461705, my Tualatin@1420 MHz (142 MHz FSB) needs nearly 160 CPU hours (~6.5 days)!

I think the baseline should be a 1 GHz Pentium III Coppermine, with 100 MHz FSB, SDRAM and i815 chipset.

24. Re: Re: Rate of completing n's

Originally posted by Troodon
I think the baseline should be a 1 GHz Pentium III Coppermine, with 100 MHz FSB, SDRAM and i815 chipset.
Great! I love clarity. Do we know how long this baseline machine needs to complete a primality test near n=3.5M?

25. I think it would need at least 260 hours. Yes, 100 hours more that my Tualatin because:
- it's 420 MHz slower.
- the FSB and RAM speeds are 42 MHz slower.
- Coppermines don't have a prefecth logic.
- The i815 chipset is a bit slower than the BX chipset I'm using.
- The n is higher (+50000).

26. Maybe the "baseline machine" should be something that someone here actually has at their disposal? So we could run a PRP and see how long it takes?

"1G coppermine" sounds like a pretty highfalutin' chip for its time. Contrary to, say, "1G Duron" or "1.2G Athlon", of which there were uncounted bazillions (at least amongst the people I know). Somewhat more modern equivalents would probably be either a P4-2.4 or an XP1700 or some such.

27. Maybe the "baseline machine" should be something that someone here actually has at their disposal? So we could run a PRP and see how long it takes?
I'll set the Tualatin at 100x10 MHz to see how long it takes to complete a unit of the current test I'm running (n=3,46 M).

"1G coppermine" sounds like a pretty highfalutin' chip for its time. Contrary to, say, "1G Duron" or "1.2G Athlon", of which there were uncounted bazillions (at least amongst the people I know).
I said a Coppermine, not beause it's a very popular chip, but because I think is a good baseline (not too fast, and not too slow). Don't forget that there are many people here with machines slower than 1 GHz.

28. FInalizing Parameters

New developments on the Sieving Client thread indicate a 35% increase or so in sieving speed with version 1.31. The numbers are coming in around 160Kp per Gigahertz instead of the 120 we have found in this thread. This will increase the constant is the sieving equation from 4G to 5.6G.

The last parameter we need some agreemment about is the time to complete a primality test in the vicinity of n=3.5M for a "standard" machine. This seems to have bogged down in the mire of "what is standard." I've been hoping that a simple approximation based on processor speed scaled to 1Ghz would be good enough - that seemed to work well for sieving speed with the caveat that if we consider putting P4's into sieving we need to remember that they are 2.8 times slower than "standard."

So far we've got estimates that range from 5 days to 11 days for a 1Ghz matchine to finish a primality test. If I don't get more information soon, I'll pick "8 days" as a central estimate. I'm eager to start playing with the model and see if offers insignts into the appropriate level of sieving in the short term and long term.

29. So far we've got estimates that range from 5 days to 11 days for a 1Ghz matchine to finish a primality test. If I don't get more information soon, I'll pick "8 days" as a central estimate.
I have a P3-850 which completed a PRP test where n~3.50M in 5.6 days.

So scaled for that average 1GHz PC, I guess that would be 4.75 days.

30. 5.6 days?! That's too fast. Or maybe is something wrong with my Tualatin. Currently, it does ~82000 cEM/s and needs ~50 minutes per unit.

31. Code:
```[11:46] User [ice:~/software/sob] > grep name /proc/cpuinfo
model name      : AMD Athlon(tm) XP 1900+
[11:47] User [ice:~/software/sob] > grep MHz /proc/cpuinfo
cpu MHz         : 1569.350
[11:47] User [ice:~/software/sob] > grep "blocks left" sclient.log | tail
[Mon May 12 08:35:06 2003] n.high = 2494915  .  55 blocks left in test
[Mon May 12 08:55:56 2003] n.high = 2514560  .  54 blocks left in test
[Mon May 12 09:16:45 2003] n.high = 2534205  .  53 blocks left in test
[Mon May 12 09:37:37 2003] n.high = 2553850  .  52 blocks left in test
[Mon May 12 09:58:39 2003] n.high = 2573495  .  51 blocks left in test
[Mon May 12 10:19:35 2003] n.high = 2593140  .  50 blocks left in test
[Mon May 12 10:40:31 2003] n.high = 2612785  .  49 blocks left in test
[Mon May 12 11:01:27 2003] n.high = 2632430  .  48 blocks left in test
[Mon May 12 11:22:22 2003] n.high = 2652075  .  47 blocks left in test
[Mon May 12 11:43:18 2003] n.high = 2671720  .  46 blocks left in test
[11:47] User [ice:~/software/sob] >```
So on an Athlon at least I get 10 blocks in 3h:8m:12s. For n=3.5M (that's about 172 blocks or) just under 54 hours for the whole test. Or scaled to 1GHz about 3days12.75hour for a full test on a 1GHz Tbred.

32. Code:
```[Sun Apr 20 19:40:51 2003] n.high = 61050  .  170 blocks left in test
[Sun Apr 20 20:26:32 2003] n.high = 81400  .  169 blocks left in test
[Sun Apr 20 21:12:09 2003] n.high = 101750  .  168 blocks left in test
[Sun Apr 20 21:57:47 2003] n.high = 122100  .  167 blocks left in test
[Sun Apr 20 22:43:25 2003] n.high = 142450  .  166 blocks left in test
[Sun Apr 20 23:29:05 2003] n.high = 162800  .  165 blocks left in test```
This is the log from the P3-850 quoted above (it's nothing special a Dell latitude laptop). As you can see ~46 minutes per block, 173 blocks for this test (n=3504985), so that's just over 5.5 days.

33. Problem should be the memory and front side bus. The P3 likely uses PC100 SDR SDRAM, whereas the Athlon should use DDR SDRAM - I guess PC266?

AFAIK, this makes a big difference...

Plus, Lagardo used the real CPU frequency, not the Quantispeed value. The Palomino Athlon was basically on par with the Coppermine P3, but the XP-versions provide more bang for the buck .... eh... instructions per second.
I think, one had to average the standard PC over all the different PC architectures to be thorough. But maybe that's to much effort? I don't know...

34. Code:
```[Mon May 12 14:57:03 2003] n.high = 2357406  .  53 blocks left in test
[Mon May 12 15:49:43 2003] n.high = 2378268  .  52 blocks left in test
[Mon May 12 16:40:19 2003] n.high = 2399130  .  51 blocks left in test
[Mon May 12 17:31:03 2003] n.high = 2419992  .  50 blocks left in test
[Mon May 12 18:21:38 2003] n.high = 2440854  .  49 blocks left in test
[Mon May 12 19:11:37 2003] n.high = 2461716  .  48 blocks left in test```
MikeH's Pentium III 850 is a bit faster than my Tualatin@1420 and Lagardo's Athlon XP at the same speed would be twice as fast. So there's a problem. Could anyone test SB on a Tualatin, please?

35. Wasn't the cache on the Tualatin 256K? Whereas the cache on the older PIII was 512K.

36. It's exactly the other way - when Intel moved to the Tualatin core, they used the 0.13 (?) micron process, which gave them more space for a bigger L2 cache...

37. It isn't a cache size-related problem. My Tualatin and Lagardo's and MikeH's CPU's all have 256 kB of on-die full speed L2 cache.

Here's a little table I've made for you

CPU | L1 Cache | L2 Cache
----------------------------------------------------------------------------------
Pentium II (Klamath and Deschutes) | 32 kB | 512 kB, external (64 bit bus), 1/2 core speed
Pentium III (Katmai) | 32 kB | 512 kB, external (64 bit bus), 1/2 core speed
Pentium III (Coppermine) | 32 kB | 256 kB, on-die (256 bit bus), full core speed
Pentium III/Celeron (Tualatin) | 32 kB | 256 kB, on-die (256 bit bus), full core speed
Pentium III-S (Tualatin) | 32 kB | 512 kB, on-die (256 bit bus), full core speed

Pentium 4 (Willamete) | 8 kB | 256 kB, on-die (256 bit bus), full core speed
Pentium 4 (Northwood) | 8 kB | 512 kB, on-die (256 bit bus), full core speed
Pentium 4 (Prescott) | 8 kB | 1024 kB, on-die (256 bit bus), full core speed

Celeron (Covington) | 32 kB | None
Celeron (Mendocino) | 32 kB | 128 kB, on-die (64 bit bus), full core speed
Celeron (Coppermine) | 32 kB | 128 kB, on-die (256 bit bus), full core speed
Celeron (Willamete/Northwood) | 8 kB | 128 kB, on-die (256 bit bus), full core speed

Athlon (K7/K75) | 128 kB | 512 kB, external (64 bit bus), 1/2 or 2/3 core speed
Athlon/ Athlon XP (Thunderbird, Palomino, Thoroughbred A/B) | 128 kB | 256 kB, on-die (64 bit bus), full core speed
Athlon XP (Barton) | 128 kB | 512 kB, on-die (64 bit bus), full core speed

Duron (Spitfire/Morgan) | 128 kB | 64 kB, on-die (64 bit bus), full core speed

38. Originally posted by Mystwalker
Plus, Lagardo used the real CPU frequency, not the Quantispeed value. The Palomino Athlon was basically on par with the Coppermine P3, but the XP-versions provide more bang for the buck .... eh... instructions per second.
Granted -- if I think of it as "1900MHz" instead of the ('real' number) "1569" then It'd scale to 4days6.5hours at 1GHz even. Of course there never was an "Athlon XP1000" or such.

39. Troodon:

The P3-m has 512 KByte L2 cache.

(I just checked it with wcpuid to make sure )

40. Mystwalker: yes and no.
When it appeared, the name "Pentium III-M" was used in several places (not officially) to call the 256 kB version of the desktop Tualatin and difference it from the 512 kB version, the "Pentium III-S". That's why I made that mystake. Currently, Intel refers to Tualatins as "Pentium III 256KB L2 Cache" and "Pentium III with 512KB L2 Cache".
You're speaking about the mobile version of the Tualatin also called "Pentium III-M". And AFAIK, all mobile Tualatins have 512 kB of L2 cache.