PDA

View Full Version : Discussion about a Resource Allocation Model



wblipp
04-29-2003, 06:21 PM
Originally posted by jjjjL
I have this naging feeling that a stunningly beutiful model could be constructed that took into account the likelihood of prime discoveries by n-level, sieve speed, expected sieve speedup, and prp speed to properly calculate the weighting of cpu between sieving and prping. ... give me your thoughts on all this.

-Louie

I'm thinking about ways to build such a model, and wondering about the data to drive it. I figure other people may be noodling about this, too, so a place to share information would be useful. Here are some of things I think could be useful in the model.

1. I can find the current distribution of remaining candidates in MikeH's sieve statistics (http://www.aooq73.dsl.pipex.com/scores.txt).

2. The time for a "standard PC" to complete a primality test - 3 days?

3. The way primality testing will slow down with increasing n - will n's that are twice as large require twice as long? 4 times? sqrt(2)? ln(2)?

4. The rate at which a "standard PC" can sieve at the present level - 1G per hour? And the way this will change with p. Doubling the p value decreases the number of primes in a range, but it also, I assume, increases the time to test each p. The Prime Number Theorem tells us the first component - but how does the time per test change?

5. The rate at which factors will be found by sieving at various p levels - I think we can estimate this from Merten's Theorem and the number of remaining candidates.

6. The rate at which primes are found among the candidates - I think we can use the Proth-Weight adjusted prime number theorem for this. A full probabilitistic model is probably complex - a reasonable first cut could use the average n-value for finding the next primes.

7. The speedup of sieving with fewer k's to sieve - linear but with a fixed overhead?

8. The increase in resources from the publciity of finding a new prime - 20%?

9. The increase in resources during periods of no new primes - for purposes of optimal allocation this probably doesn't matter.

I'm presently thinking about applying the above parameters to a system of differential equations. p(t) would be the Prime threshold for sieving, n(t) would be the n threshold for primality testing. These equations could be solved for fixed proportions of resources for sieving and primality testing. (For variable allocations, too, if we have any idea about how vary them.) A longer term planning horizon is probably appropriate - perhaps minimizing the predicted time until four more primes are found.

Longer term there are other resource allocation problems that I think we can ignore at this time. I'd ignore the question of when to begin sieving beyond n=20M - it seem clear that deeper sieving on the present range has vastly greater payoff. I'd ignore the question of double-checking - my explorations of this suggest it is MUCH too early to devote serious resources to this, although to be sure it would be noce to know the answer to 10. Do we have any validated estimate of the error probability of reported results?

Please jump in with data and ideas.

Moo_the_cow
04-29-2003, 08:52 PM
quote:
_________________________________________________
4. The rate at which a "standard PC" can sieve at the present level - 1G per hour?
_________________________________________________

Is that a figure for the double check sieve?
If so, that's reasonable, but if not, sieving 1 G per hour is
definitely too fast for the main sieve.
1 G per hour~ 278 k p/sec, which is impossible to achieve
currently, even if your "standard PC" is a 2 GHZ Athlon or P4.

wblipp
04-29-2003, 10:00 PM
Originally posted by Moo_the_cow
quote:
_________________________________________________
4. The rate at which a "standard PC" can sieve at the present level - 1G per hour?
_________________________________________________

Is that a figure for the double check sieve?

No - it's a clueless, unresearched guess for the main sieving effort. although it seems to be missing the leading decimal point I had intended. Do we have a definition of the "standard PC?" And an informed estimate of it's sieving rate?

Nuri
04-29-2003, 10:03 PM
4. You can find many benchmarks in the client thread. Just take a look at the posts after the release of version 1.28. BTW, the speed of my pc increases 6% for each doubling of p (for the main sieve), across same client version of course.

wblipp
04-30-2003, 12:09 AM
Originally posted by wblipp
9. The increase in resources during periods of no new primes

Maybe this was optimistic. On April 29th the project stats page (http://www.seventeenorbust.com/stats/) is showing 843 users active in the last two weeks. The thread on 3000 users (http://www.free-dc.org/forum/showthread.php?s=&threadid=2618&highlight=users) seems to say this figure was 1000 on February 23. Perhaps the number of users declines during periods of no primes. These numbers suggest 8% net decline per month.

Eyeballing the system stats chart at the top of the stat page, there was a big increase from Feb 10 to March 26th, then a sharp decline and unclear trends after that, but all much higher than the February 10th level.

Maybe "Random fluctuations around a flat level" is the best guess.

smh
04-30-2003, 09:08 AM
3. The way primality testing will slow down with increasing n - will n's that are twice as large require twice as long? 4 times? sqrt(2)? ln(2)?


It all depends.

To perform a prp test, N-1 iterations have to be done. Within one FFT range, iteration times are constant. So if N is 1.5 times higher, but falls in the same FFT range, a test will take about 50% longer.

One iteration with a twice as large FFT takes more than twice as long. So if an exponent is only 100 higher, but falls in a twice as large FFT range, a test will take more then twice as long.

I don't know if the SoB client only supports 2^N FFT sizes or also some intermediate values as Prime95, but in general, a test twice as long will take more then 4 times longer. (maybe 5 times?).

More then twice for the larger FFT size and twice as many iterations need to be performed.

MikeH
04-30-2003, 01:22 PM
2. The time for a "standard PC" to complete a primarily test - 3 days? Possibly a quote from me. This referred to my AMD XP2100+. Which is about right for n=3.5M.


4. The rate at which a "standard PC" can sieve at the present level - 1G per hour? And the way this will change with p. Doubling the p value decreases the number of primes in a range, but it also, I assume, increases the time to test each p. The Prime Number Theorem tells us the first component - but how does the time per test change? On the same PC it takes about 1.4 hours/G, and that's for 300K-20M sieving. Doubling the value of p approximately halves the number of factors found. The sieve speed, however, does not change with p.

Also remember to include in the model that a P4 sieves at about 35-40% of the speed of P3s and AMDs. To illustrate, I have a P4-1.7 which also takes about 3 days to PRP test n=3.5M, but for sieving it's rate is about 3.8 hours/G. Thus I never use P4s (that can PRP) for seiving.

wblipp
04-30-2003, 10:27 PM
Originally posted by wblipp
4. The rate at which a "standard PC" can sieve at the present level - 1G per hour? And the way this will change with p.
----------------
Originally posted by Nuri
4. You can find many benchmarks in the client thread. Just take a look at the posts after the release of version 1.28. BTW, the speed of my pc increases 6% for each doubling of p (for the main sieve), across same client version of course.

Here's what I found for P3s and Celerons on the main sieve
Joe O.&nbsp 450Mhz &nbsp 55Kp/sec
alexr &nbsp 600Mhz &nbsp 75Kp/sec
Slatz &nbsp 1Ghz &nbsp 116Kp/sec
Nuri &nbsp 1Ghz &nbsp 126Kp/sec

These numbers are about 120Kp/sec for a 1Ghz P3, about 10Gp per day (24 hours usage). Only Nuri posted performance for a P4, although others agreed they are slower. It appears that at the same clock speed a P4 is slower on sieving by a factor of 3.

These numbers were posted around April 9th. I think the p's being tested were in the vicinity of 14T.

Using Nuri's speedup, the number of p's cleared per day should be proportional to p^0.09. Calibrating to the April 9th values, the number of p's removed per day by a 1Ghz PIII is 6.6*10^8*p^0.09.

If we have "a" computers seiving, and p(t) is the seive threshold reached on day t, p(t) satisfies the differential equation

dp/dt = a*6.6*10^8*p^0.09

My D.E. skills are rusty, but I think this is trivial -

p(t) = 6a*10^8*p^1.09 + p0

Any corrections or comments?

priwo
05-01-2003, 02:49 AM
some sieve benchmarks (main sieve)

P3 Celeron 1300@1560 200kp/sec
P4 Celeron 2000@2440 106kp/sec
AMD Duron 1300 148kp/sec

for prp testing don't forget to have a look at the memory speed:

P4 Celeron 2000@2440 260kcEMs/sec
FSB 122MHz, FSB/memory 3/4, memoryclock=162MHz

P4 Celeron 1700@1989 335kcEMs/sec
FSB 117MHz, FSB/memory 3/5, memoryclock=195MHz

slower CPU but faster system !

jimmy
05-01-2003, 03:52 AM
And another benchmark:
AMD Athlon XP 1800+ 182kp/sec
sieving around 15T

priwo
05-01-2003, 06:32 AM
another sieve benchmark

P4 Celeron 1700@1989 86kp/sec

seems that the sieving speed doesn't depend on the memory speed

Nuri
05-01-2003, 05:22 PM
Originally posted by wblipp
Using Nuri's speedup, the number of p's cleared per day should be proportional to p^0.09.
Any corrections or comments?

Since the only data related to speedup came from me, I felt unconfortable and decided to run a test on speed change w.r.t. p values.

As can be seen from the table below, the values are different from the 6% figure I previously mentioned. That 6% data was an approximation based on a couple of ranges I sieved on PIII-1000.

The test of the table below is conducted on 10m p ranges each, for the p values shown in the table, each one starting from the exact T value un to T+10m. The machine used in test is, again, P4-1700, and the client used is SoBsieve v.1.28.

Nuri
05-01-2003, 05:33 PM
5. The rate at which factors will be found by sieving at various p levels - I think we can estimate this from Merten's Theorem and the number of remaining candidates.

Wblipp, could you please also check the 40000 = y * (x^1.06) formula I am using in
Sieve: Distribution of Factors vd. p value (http://www.free-dc.org/forum/showthread.php?s=&threadid=2442&perpage=40&pagenumber=2) thread against Merten's theorem? The formula is for the 3m-20m sieve. y = # factors per G and x = p level in terms of G.

PS: Please also note that, the formula also includes the duplicates. So, I don't know how much, but if the data were to use unique factors only, the formula would probably tilt a bit more downwards (but probably not much).

wblipp
05-01-2003, 10:06 PM
9. The increase in resources during periods of no new primesMoo has asked a question on the prediction thread that got me thinking long term about the computer power on the project. Over the scale of years, Moore’s Law says that computing power per dollar should double every eighteen months. Many of us are taking a Moore’s Law Dividend by spending less money on computers over time, but still the power per user should double every 24 months or so. So even if the number of people stays flat, the number of “standard PC’s” should double every 24 months. Measuring time in days, this is approximately

dc/dt = 0.001c

c(t) = c0*e^0.001t


4b. The rate at which … (sieving) will change with p.Wow! What a thorough and well documented job Nuri did on this! Thanks! Based on Nuri’s speedup spreadsheet, the rate at which p’s will be cleared is proportional to p^0.03.


4a. The rate at which a "standard PC" can sieve at the present level.Thanks to priwo and jimmy, and Nuri we have some new data about this, too.

For P3’s and AMDs
Joe O.&nbsp 450Mhz &nbsp 55Kp/sec
alexr &nbsp 600Mhz &nbsp 75Kp/sec
Slatz &nbsp 1Ghz &nbsp 116Kp/sec
Nuri &nbsp 1Ghz &nbsp 126Kp/sec
priwo &nbsp 1.56Ghz &nbsp 200Kp/sec
priwo &nbsp 1.3Ghz &nbsp 148Kp/sec
jimmy &nbsp 1.8Ghz &nbsp 182 Kp/sec

Standardized to 1Ghz, these numbers are 122, 125, 116, 126, 128, 100, 113. The median is 122 – I’m comfortable sticking to the 120Kp/sec assumption.

This is about 10Gp/day. We need the constant of proportionality so that at 15T the sieving rate for a standard 1Ghz 24/7 P3 is 10Gp/day – c*(15T)^0.03 = 10G – the constant is about 4G.


For P4’s
priwo &nbsp 2.44Ghz 106Kp/sec
priwo &nbsp 1.99Ghz 86Kp/sec
Nuri &nbsp 1.70Ghz 70

Standardized to 1Ghz, these numbers are 43, 43, and 41. So the P4’s are slower on sieving by a factor of 2.8.

Let s(t) be the proportion of standard computers that are dedicated to sieving, and set aside the P3 vs P4 question for the moment. The number of computers at time t is c(t), so the number of sieving computers is s(t)*c(t). The p level is p(t), so the rate at which each computer clears p’s is 4G*p(t)^0.03. All together

dp/dt = s(t)*c(t)*4G*p(t)^0.03

Any comments, complaints or corrections? When we have the computers (c(t)) and sieving process (p(t)) in good shape, we will move on to the issues around the number of candidates and the rate at which sieving and primality testing remove them.

wblipp
05-01-2003, 11:22 PM
Originally posted by smh
in general, a test twice as long will take more then 4 times longer. (maybe 5 times?).

More then twice for the larger FFT size and twice as many iterations need to be performed. This article (http://numbers.computation.free.fr/Constants/Algorithms/fft.html#e.numericalerror) says the complexity of multiplying 2 N-digit numbers is N(log(N))^2. We have n-bit numbers, but that shouldn't affect the argument. We still need to account for the n cycles, so using his academic argument, the time for a primality test would go up at n^2*(log(n))^2. Using your "5 times" number, the work would go up n^2.32. The difference is probably smaller than our uncertainty. Here's a short table of how much more work each test takes at various n levels, compared to n=3.5M

n=7M &nbsp 4.4 &nbsp 5.0
n=20M &nbsp 57 &nbsp 41
n=50M &nbsp 480 &nbsp 282
n=200M &nbsp 12K &nbsp 5K
n=1G &nbsp 504K &nbsp 154K

So even at the 1G level, the two approaches differ by a factor of 3.3 when the extra work is a factor of at least 150,000.

Does anyone have strong feelings, or better yet strong arguments, for one of these approaches?

Lagardo
05-02-2003, 01:04 PM
Originally posted by wblipp
{...}
jimmy &nbsp 1.8Ghz &nbsp 182 Kp/sec
{...}

Dunno how you're counting that, but he wrote "Athlon XP 1800+", which runs at less than 1.8Ghz (something like 1530MHz? There around?) Which would bring him up from 100 to 118, more in the range with the others...

[Edit: And while I'm thinking about it, the "smart resource allocation" scheme should really do two different types of math for P4s and everything else -- i.e. should avoid the notion of a "standard PC". There may well come a point where the smart thing to do is putting P4s on PRP while keeping everything else sieving or such.]

Troodon
05-02-2003, 01:43 PM
The Athlon XP 1800+ runs at 11,5x133 MHz=1533 MHz.


There may well come a point where the smart thing to do is putting P4s on PRP while keeping everything else sieving or such.

I think Athlons and Athlons XP with DDR memory should also be on PRP (they aren't as efficient as P4's, but much more than P3's).
In the near future, same for Opterons and Athlons 64 (which should be even more efficients than K7 cores, as they have SSE2, a faster memory subsystem and in some cases, a larger L2 cache).

wblipp
05-02-2003, 02:54 PM
Originally posted by Troodon
The Athlon XP 1800+ runs at 11,5x133 MHz=1533 MHz.
Originally posted by Lagardo
the "smart resource allocation" scheme should really do two different types of math for P4s and everything elseThanks for the correction. Knowing the correct speed for the Athlon XP 1800+ makes the standardized result 119 instead of 100. Fixing a transposition error and a roundoff error, the seven standardized performance numbers are now 122, 125, 116, 126, 128, 114, 119. The median is still 122, but the spread has narrowed.

Yes, I agree that we are going to need to make separate decisions about the P4s and the other processors because they bring different relative skills to the two tasks. I've been trying to track that difference by thinking that a standard PC is a 1 Ghz P3, and recognizing that a P4 is 2.8 times slower than standard at seiving. Priwo's data suggests that we may need to track memory speed instead of processor speed for primality testing, so "standard" may become even less useful.

Moo_the_cow
05-02-2003, 07:29 PM
Main sieve (3M-20M) CPU benchmark-

P-III 800 Mhz - 93.5 kp/sec

Hope this helps....

wblipp
05-03-2003, 08:50 PM
Merten's Theorem (http://primes.utm.edu/glossary/page.php?sort=MertensTheorem) is that the product of (1-1/p) for all primes less than x is asymptotically exp(-gamma)/ln(x). This is useful for estimating the effectiveness of trial factoring because the product is the fraction of numbers left after removing primes less than x. This is potentially useful in our resource model for what I have been thinking of as two distinct issues – the number of candidates presently in the system and the rate at which sieving will remove these candidates.

We will need to modify Merten’s theorem by Proth Weights. We need Proth Weights because our sequences have divisibility properties that differ from the divisibility of sequential integers in standard ways. For example, in sequential integers, ½ are divisible by 2, then 1/3 of the remainder are divisible by 3, then 1/5 of the remainder are divisible by 5. For our remaining k values, none of the numbers we are testing for prime are divisible by 2, ½ are divisible by 3, ½ of the remainder are divisible by 5. It gets even stranger with 7 – for most of our k values 1/3 of the remainder are divisible by 7, but for two of the k values, none of the remainder are divisible by 7. I explored these divisibility properties by constructing partial coverings in the prediction thread (http://www.free-dc.org/forum/showthread.php?s=&threadid=2884), but better estimates of the Proth Weight are contained in Table 5 of Yves Gallot’s paper on weights (http://perso.wanadoo.fr/yves.gallot/papers/weight.pdf).

We might apply Merten’s Theorem in absolute terms or in relative terms. It’s most elegant in relative terms. If we previously sieved up through p=S, the number of values left from a range of size N will be N*a / ln(S), where is a constant calculated as the product of the Proth Weight and exp(-gamma). If we now extend the sieve through p=T, there will now be N*a / ln(T) left. The reduction factor is ln(S)/ln(T). This has some simple consequences. For example, the main sieve is now around 15T. This predicts that if we extend that sieve to 200T, the number of candidates will reduce by a factor of ln(15T)/ln(200T) = 0.92 – about 8% of the remaining candidates will be removed. It also predicts that removing half the candidates requires squaring the sieve limit.

This sounds good, but does it work? Let’s see how well it predicts the current status. We can check the status of sieving in the Sieve Coordination Thread (http://www.free-dc.org/forum/showthread.php?s=&threadid=2406). When I checked, the sieving was nearly complete through 15.35T, there was a completed block from 17.42T-18.14T, and some scattered ranges even higher. The effect of the higher block should be ln(17.42T)/ln(18.14T)=0.9987, so we can ignore that. Using Yves Gallot’s Proth Weights, the average weight for the twelve remaining k values is 0.163. The value 0.163 * exp(-gamma) / ln(15.35T) = 0.00302. In each range of 1 million n’s, from the 12 million original candidates, this method predicts that on average there will be 36,245 remaining candidates. Looking at MikeH's Sieve Statistics (http://www.aooq73.dsl.pipex.com/scores.txt) for the main sieve (n=3M to 20M), we see that this estimate is about 1% high:

3M<n< 4M: 35900
4M<n< 5M: 35784
5M<n< 6M: 36124
6M<n< 7M: 35839
7M<n< 8M: 35866
8M<n< 9M: 36026
9M<n<10M: 35862
10M<n<11M: 35856
11M<n<12M: 36015
12M<n<13M: 35815
13M<n<14M: 36157
14M<n<15M: 35699
15M<n<16M: 35731
16M<n<17M: 35990
17M<n<18M: 35851
18M<n<19M: 36028
19M<n<20M: 35905

These average 35909. If we lower the Proth Weight estimate to 0.1618 we match very closely. Yves Gallot only reported the Proth Weights to 2 digits, so an adjustment of this magnitude seems reasonable.

Merten’s theorem with the adjusted average proth weight seems well suited for calculating the present number of candidates. The remaining question is how well it describes the incremental process of sieving. Fortunately Nuri has excellent data on how that process has been going:

Originally posted by Nuri
Wblipp, could you please also check the 40000 = y * (x^1.06) formula I am using in
Sieve: Distribution of Factors vd. p value (http://www.free-dc.org/forum/showthread.php?s=&threadid=2442&perpage=40&pagenumber=2) thread against Merten's theorem? The formula is for the 3m-20m sieve. y = # factors per G and x = p level in terms of G.

Between m=3M and n=17M for twelve k-values, there are 12*17*10^6 n values before sieving. Merten’s Theorem predicts that after sieving to “p” in G, the number of candidates left will be 12*17*10^6 * 0.1618 * exp(-gamma) / ln(p*10^9). The difference between this value at “p” and at “p+1” represent the numbers sieved away, and every one of the sievings represents a factor. The following is a plot of the number of factors found using Nuri’s function and using the Merten prediction. Blue is Nuri and red is Merton The agreement is excellent. I’m convinced we can use Merten’s Theorem to model the sieving.

wblipp
05-06-2003, 09:58 PM
I have proposed running the resource allocation model for the period of finding the next three primes because running this kind of model for exactly one period sometimes leads to false “edge effects” where the model says to do strange things because model thinks the world ends at the first stage. By running for three stages, these edge effects are usually pushed into the third stage, and the results are more realistic for planning the first stage. When the first stage ends (finding the next prime in our case), the model should be rolled out to look at stages 2, 3, and 4 to make plans during stage 2.

To run the model in later stages, we need to know what parts of the model change after we have found a prime. There are three effects.

1. c(t) – the computing power available at time t. I have proposed that this power increases by 20% every time we find a prime (Item 8). There hasn’t been any corrective input about this guess yet. This would be in addition to the slow steady growth I have proposed for Moore’s Law.
In stage 1 c(t) = c0*e^0.001t
In stage 2 c(t) = c0*1.2*e^0.001t
In stage 3 c(t) = c0*1.2^2*e^0.001

2. p(t) – the sieving limit increases faster because we are testing fewer k’s. As Mike discussed here, (http://www.free-dc.org/forum/showthread.php?s=&threadid=2870) the sieve increase is less than linear. One explanation is that there is an overhead equivalent to three k-values. So removing three values makes the sieve run as (12+3)/(9+3) = 1.25, 25% faster. Using this approach, the sieving will increase by a rate of 15/14 in stage 2 and 15/13 in stage 3.

3. Remaining Candidates. There will be 11, then 10, candidates left to test, so the primality testing will proceed faster. How much faster depends on which k-values we find primes for. The k’s with higher Proth Weights are more likely to be found first because they have more opportunities. That is, every remaining candidate is equally qualified as a number with no divisors under 15T, but the k’s with high Proth Weight have more of these candidates than the ones with low Proth Weight. Since their likelihood of removal is proportional to their weights, we can calculate the average weight removed by taking a weighted average. The average weight for the next prime discovered is 0.19.

The average weight for the next two primes can be calculated by the same weighted average method, but now there are 66 pairs of k’s that might be the next two. For each pair we can calculate the amount of that pair – the sum of their Proth Weights – and the likelihood of that pair – the product of their Proth Weights. The weighted average for the next two primes discovered is 0.3767.

So the total Proth Weight in Stage 1 will be 1.96, in Stage 2 will be 1.77, and in Stage 3 will be 1.5833. These values were all calculated using Yves Gallot’s Proth Weights, so we will reduce them by about 1% as discussed in the post about Merten’s Theorem, so the actual values are 1.9416, 1.7532, and 1.5685.

wblipp
05-08-2003, 11:04 PM
MikeH reported that an Athlon XP2100+ completes a primality test in 3 days at n=3.5M. I think this is equivalent to a 1Ghz machine completing in 5 days. Using the academic scaling, a test at other values of “n” will take 5*[(n*ln(n))/(3.5M*ln(3.5M)]^2 days to complete, ignoring the lumpiness of the FFT transitions. At time t the sieving will have progressed to p(t), so based on Merten’s Theorem, the number of remaining candidates per n-step will be 1.9416 * exp(-gamma) / ln(p(t)), where 1.9416 is the total Proth Weight for all remaining candidates.

Using, as before, c(t) for the total equivalent computing power available to the project and s(t) for the fraction of power devoted to sieving, we get the relationship

dn/dt = c(t) * (1-s(t)) / [5 * [(n*ln(n))/(3.5M*ln(3.5M))]^2 * 1.9416 * (exp(-gamma)/ln(p(t))]

Estimates elsewhere indicate about 1000 computers, with about 2% sieving.

Comments, complaints, or corrections?

Troodon
05-09-2003, 09:40 AM
Originally posted by wblipp
MikeH reported that an Athlon XP2100+ completes a primality test in 3 days at n=3.5M. I think this is equivalent to a 1Ghz machine completing in 5 days.

Please note that a 1 GHz Athlon is faster than a 1 GHz Pentium III. And a 1 GHz Pentium III is faster than a 1 GHz C3. So we can't speak about a "1 GHz machine".
For k=10223 n=3461705, my Tualatin@1420 MHz (142 MHz FSB) needs nearly 160 CPU hours (~6.5 days)!

I think the baseline should be a 1 GHz Pentium III Coppermine, with 100 MHz FSB, SDRAM and i815 chipset.

wblipp
05-09-2003, 10:32 AM
Originally posted by Troodon
I think the baseline should be a 1 GHz Pentium III Coppermine, with 100 MHz FSB, SDRAM and i815 chipset.

Great! I love clarity. Do we know how long this baseline machine needs to complete a primality test near n=3.5M?

Troodon
05-09-2003, 04:37 PM
I think it would need at least 260 hours. Yes, 100 hours more that my Tualatin because:
- it's 420 MHz slower.
- the FSB and RAM speeds are 42 MHz slower.
- Coppermines don't have a prefecth logic.
- The i815 chipset is a bit slower than the BX chipset I'm using.
- The n is higher (+50000).

Lagardo
05-09-2003, 04:59 PM
Maybe the "baseline machine" should be something that someone here actually has at their disposal? So we could run a PRP and see how long it takes?

"1G coppermine" sounds like a pretty highfalutin' chip for its time. Contrary to, say, "1G Duron" or "1.2G Athlon", of which there were uncounted bazillions (at least amongst the people I know). Somewhat more modern equivalents would probably be either a P4-2.4 or an XP1700 or some such.

Troodon
05-09-2003, 07:44 PM
Maybe the "baseline machine" should be something that someone here actually has at their disposal? So we could run a PRP and see how long it takes?

I'll set the Tualatin at 100x10 MHz to see how long it takes to complete a unit of the current test I'm running (n=3,46 M).


"1G coppermine" sounds like a pretty highfalutin' chip for its time. Contrary to, say, "1G Duron" or "1.2G Athlon", of which there were uncounted bazillions (at least amongst the people I know).

I said a Coppermine, not beause it's a very popular chip, but because I think is a good baseline (not too fast, and not too slow). Don't forget that there are many people here with machines slower than 1 GHz.

wblipp
05-12-2003, 10:51 AM
New developments on the Sieving Client thread indicate a 35% increase or so in sieving speed with version 1.31. The numbers are coming in around 160Kp per Gigahertz instead of the 120 we have found in this thread. This will increase the constant is the sieving equation from 4G to 5.6G.

The last parameter we need some agreemment about is the time to complete a primality test in the vicinity of n=3.5M for a "standard" machine. This seems to have bogged down in the mire of "what is standard." I've been hoping that a simple approximation based on processor speed scaled to 1Ghz would be good enough - that seemed to work well for sieving speed with the caveat that if we consider putting P4's into sieving we need to remember that they are 2.8 times slower than "standard."

So far we've got estimates that range from 5 days to 11 days for a 1Ghz matchine to finish a primality test. If I don't get more information soon, I'll pick "8 days" as a central estimate. I'm eager to start playing with the model and see if offers insignts into the appropriate level of sieving in the short term and long term.

MikeH
05-12-2003, 12:20 PM
So far we've got estimates that range from 5 days to 11 days for a 1Ghz matchine to finish a primality test. If I don't get more information soon, I'll pick "8 days" as a central estimate. I have a P3-850 which completed a PRP test where n~3.50M in 5.6 days.

So scaled for that average 1GHz PC, I guess that would be 4.75 days.

Troodon
05-12-2003, 01:54 PM
5.6 days?! That's too fast. Or maybe is something wrong with my Tualatin. Currently, it does ~82000 cEM/s and needs ~50 minutes per unit.

Lagardo
05-12-2003, 02:56 PM
[11:46] User [ice:~/software/sob] > grep name /proc/cpuinfo
model name : AMD Athlon(tm) XP 1900+
[11:47] User [ice:~/software/sob] > grep MHz /proc/cpuinfo
cpu MHz : 1569.350
[11:47] User [ice:~/software/sob] > grep "blocks left" sclient.log | tail
[Mon May 12 08:35:06 2003] n.high = 2494915 . 55 blocks left in test
[Mon May 12 08:55:56 2003] n.high = 2514560 . 54 blocks left in test
[Mon May 12 09:16:45 2003] n.high = 2534205 . 53 blocks left in test
[Mon May 12 09:37:37 2003] n.high = 2553850 . 52 blocks left in test
[Mon May 12 09:58:39 2003] n.high = 2573495 . 51 blocks left in test
[Mon May 12 10:19:35 2003] n.high = 2593140 . 50 blocks left in test
[Mon May 12 10:40:31 2003] n.high = 2612785 . 49 blocks left in test
[Mon May 12 11:01:27 2003] n.high = 2632430 . 48 blocks left in test
[Mon May 12 11:22:22 2003] n.high = 2652075 . 47 blocks left in test
[Mon May 12 11:43:18 2003] n.high = 2671720 . 46 blocks left in test
[11:47] User [ice:~/software/sob] >


So on an Athlon at least I get 10 blocks in 3h:8m:12s. For n=3.5M (that's about 172 blocks or) just under 54 hours for the whole test. Or scaled to 1GHz about 3days12.75hour for a full test on a 1GHz Tbred.

MikeH
05-12-2003, 03:08 PM
[Sun Apr 20 19:40:51 2003] n.high = 61050 . 170 blocks left in test
[Sun Apr 20 20:26:32 2003] n.high = 81400 . 169 blocks left in test
[Sun Apr 20 21:12:09 2003] n.high = 101750 . 168 blocks left in test
[Sun Apr 20 21:57:47 2003] n.high = 122100 . 167 blocks left in test
[Sun Apr 20 22:43:25 2003] n.high = 142450 . 166 blocks left in test
[Sun Apr 20 23:29:05 2003] n.high = 162800 . 165 blocks left in test

This is the log from the P3-850 quoted above (it's nothing special a Dell latitude laptop). As you can see ~46 minutes per block, 173 blocks for this test (n=3504985), so that's just over 5.5 days.

Mystwalker
05-12-2003, 03:16 PM
Problem should be the memory and front side bus. The P3 likely uses PC100 SDR SDRAM, whereas the Athlon should use DDR SDRAM - I guess PC266?

AFAIK, this makes a big difference...

Plus, Lagardo used the real CPU frequency, not the Quantispeed value. The Palomino Athlon was basically on par with the Coppermine P3, but the XP-versions provide more bang for the buck .... eh... instructions per second.
I think, one had to average the standard PC over all the different PC architectures to be thorough. But maybe that's to much effort? I don't know...

Troodon
05-12-2003, 03:20 PM
[Mon May 12 14:57:03 2003] n.high = 2357406 . 53 blocks left in test
[Mon May 12 15:49:43 2003] n.high = 2378268 . 52 blocks left in test
[Mon May 12 16:40:19 2003] n.high = 2399130 . 51 blocks left in test
[Mon May 12 17:31:03 2003] n.high = 2419992 . 50 blocks left in test
[Mon May 12 18:21:38 2003] n.high = 2440854 . 49 blocks left in test
[Mon May 12 19:11:37 2003] n.high = 2461716 . 48 blocks left in test


MikeH's Pentium III 850 is a bit faster than my Tualatin@1420 and Lagardo's Athlon XP at the same speed would be twice as fast. So there's a problem. Could anyone test SB on a Tualatin, please?

Joe O
05-12-2003, 03:39 PM
Wasn't the cache on the Tualatin 256K? Whereas the cache on the older PIII was 512K.

Mystwalker
05-12-2003, 04:05 PM
It's exactly the other way - when Intel moved to the Tualatin core, they used the 0.13 (?) micron process, which gave them more space for a bigger L2 cache...

Troodon
05-12-2003, 04:15 PM
It isn't a cache size-related problem. My Tualatin and Lagardo's and MikeH's CPU's all have 256 kB of on-die full speed L2 cache.

Here's a little table I've made for you ;)

CPU | L1 Cache | L2 Cache
----------------------------------------------------------------------------------
Pentium II (Klamath and Deschutes) | 32 kB | 512 kB, external (64 bit bus), 1/2 core speed
Pentium III (Katmai) | 32 kB | 512 kB, external (64 bit bus), 1/2 core speed
Pentium III (Coppermine) | 32 kB | 256 kB, on-die (256 bit bus), full core speed
Pentium III/Celeron (Tualatin) | 32 kB | 256 kB, on-die (256 bit bus), full core speed
Pentium III-S (Tualatin) | 32 kB | 512 kB, on-die (256 bit bus), full core speed

Pentium 4 (Willamete) | 8 kB | 256 kB, on-die (256 bit bus), full core speed
Pentium 4 (Northwood) | 8 kB | 512 kB, on-die (256 bit bus), full core speed
Pentium 4 (Prescott) | 8 kB | 1024 kB, on-die (256 bit bus), full core speed

Celeron (Covington) | 32 kB | None
Celeron (Mendocino) | 32 kB | 128 kB, on-die (64 bit bus), full core speed
Celeron (Coppermine) | 32 kB | 128 kB, on-die (256 bit bus), full core speed
Celeron (Willamete/Northwood) | 8 kB | 128 kB, on-die (256 bit bus), full core speed

Athlon (K7/K75) | 128 kB | 512 kB, external (64 bit bus), 1/2 or 2/3 core speed
Athlon/ Athlon XP (Thunderbird, Palomino, Thoroughbred A/B) | 128 kB | 256 kB, on-die (64 bit bus), full core speed
Athlon XP (Barton) | 128 kB | 512 kB, on-die (64 bit bus), full core speed

Duron (Spitfire/Morgan) | 128 kB | 64 kB, on-die (64 bit bus), full core speed

Lagardo
05-12-2003, 04:17 PM
Originally posted by Mystwalker
Plus, Lagardo used the real CPU frequency, not the Quantispeed value. The Palomino Athlon was basically on par with the Coppermine P3, but the XP-versions provide more bang for the buck .... eh... instructions per second.

Granted -- if I think of it as "1900MHz" instead of the ('real' number) "1569" then It'd scale to 4days6.5hours at 1GHz even. Of course there never was an "Athlon XP1000" or such.

Mystwalker
05-12-2003, 04:23 PM
Troodon:

The P3-m has 512 KByte L2 cache.

(I just checked it with wcpuid to make sure ;))

Troodon
05-12-2003, 05:52 PM
Mystwalker: yes and no.
When it appeared, the name "Pentium III-M" was used in several places (not officially) to call the 256 kB version of the desktop Tualatin and difference it from the 512 kB version, the "Pentium III-S". That's why I made that mystake. Currently, Intel refers to Tualatins as "Pentium III 256KB L2 Cache" and "Pentium III with 512KB L2 Cache".
You're speaking about the mobile version of the Tualatin also called "Pentium III-M". And AFAIK, all mobile Tualatins have 512 kB of L2 cache.
Please note that my table is only about desktop processors.

Mystwalker
05-12-2003, 06:04 PM
Troodon: Ah, ok! :)

Joe O
05-14-2003, 12:00 AM
in this thread. (http://www.free-dc.org/forum/showthread.php?s=&postid=27991#post27991)


1.28 203kp/s
1.33 254kp/s

on a P3-Celeron(Tualatin) with 1500 MHz, 128 MB SDRAM, alpha=2.5




n=3673052 k=21181
cEMs/sec=155.000 time left (for full test)=91hours 5minutes

wblipp
05-14-2003, 02:54 PM
[on a P3-Celeron(Tualatin) with 1500 MHz, 128 MB SDRAM,

n=3673052 k=21181
cEMs/sec=155.000 time left (for full test)=91hours 5minutes

priwo uses about 5.7 Ghz days at n=3.5M


This is the log from the P3-850 quoted above (it's nothing special a Dell latitude laptop). As you can see ~46 minutes per block, 173 blocks for this test (n=3504985), so that's just over 5.5 days.

MikeH uses about 4.7 Ghz days at n=3.5M




Lagardo uses about 3.5 Ghz days at n=3.5M

Most of the variation is probably memory speed. 5Ghz days looks like a conservative round number averaged over machines.

The numbers for the 31337 tests around 13.5M are coming in less than our scaling model would suggest, although this may be related to break points in the FFT scaling. Our scaling model suggests a a factor of {[13.5*10^6*ln(13.5*10^6)]/(3.5*10^6*ln(3.5*10^6)]}^2 = 17.7, so an elite test should take 88 Ghz days.


Originally posted by OberonBob
[B]ok, on my PII 350 here is what I am seeing:

~ 152000 cEMs/sec

155.8 days :)
54.5 Ghz days



Originally posted by OberonBob
alrighty, I am running an AMD 1900+ here is what I am seeing.

k: 21181
n: 13466972

cEMs/sec: ~680000

# of blocks: 9773

average block time: 5:42

est. time to complete: 3342366 secs or 38.7 days
figuring this as 1.5 Ghz , it's 58 Ghz days.

The discrepancy may indicate a problem in our scaling estimate, or it may be variation caused by different processors and lumpiness in the FFT size transitions.

OberonBob
05-14-2003, 03:36 PM
AMD Athlon 1900+s runs at 1600MHz, fyi.

Troodon
05-16-2003, 11:04 AM
I've found the problem: the In-order queue depth option was set to 1. If I set it to 8, it works ~2X faster!
Now, I think good estimate would be 5 or 6 GHz days.

Nuri
05-19-2003, 03:28 PM
So, wblipp, what's the result?

Sorry if you have already mentioned it and I could not understand. If so, could you please rephrase it? If not, could you please inform us on what other figures you need?

Thx in advance.

wblipp
06-04-2003, 11:25 AM
Originally posted by Nuri
So, wblipp, what's the result?

The major results are that within wide bounds the sieving level doesn’t matter much and that our model has failed to capture an important aspect of the system. More on this failure and what I'm going to do about it, later.

I updated some of the data and started working through questions. The simplest questions went fine.

For the update, we will soon be testing n=3.8 million, and we have sieved through about 23*10^12. Assuming no prime found by 3.8 million, the 50th percentile points for the next three primes are increased to 4.85 million, 7.10 million, and 16.5 million.

The simplest question is "How long will it take assuming we continue with 2% of resources dedicated to sieving? The answer is that will reach the 50th percentile milestones in 255, 784, and 2625 days.

The next simple question is "If we continue with a fixed percentage of resources dedicated to sieving, what is the optimal fixed percentage?" The answer is that 2.7% of resources would make these numbers 256, 783, and 2624.

The next simple question is where the problems show up. It seems intuitive to many of us that as sieving proceeds to higher levels it becomes less effective, so the resources devoted to sieving should decline over time. One way to achieve that would be to have the percentage of resources devoted to sieving set at some level today, and to decline as the sieving progresses, perhaps as a negative power of the sieving level. The next simple question is “If the percentage of sieving resources is a function of the form a/(SieveLevel)^b, what values of a and b are optimal?” Functions of this form can be greater than 100% in the early times, so we must interpret these values as 100% sieving. Following the search path, we soon notice that something strange is going on. We soon move into the range where the sieving starts at 100%, but we keep finding slightly better solutions at endlessly higher and higher powers. After a while it becomes necessary to stop mindlessly cranking numbers and actually think about what is going on.

Here’s what is going on. The question we are really asking the model is “What is the fastest way to get to n=16.5 million?” We know the third prime isn’t really going to be at exactly 16.5 million – this is just the midpoint of a huge range of numbers where it might show up. But it seemed like a good proxy for the real question, and for the earlier primes it gave us an easy way to adjust the model for the changes that happen when we find primes. But think for a moment about the best way to get to any fixed value of n. You would sieve some and test some. One thing that would slow you down would be to test a number that is later removed by sieving, so you would want to avoid that. To avoid that loss of efficiency, you would want to do all the sieving first, and then follow it by all the testing. That's what the functions are trying to achieve! At any power, the sieving is 100% for a while and then drops off rapidly. As we progress to higher and higher powers, the drop off comes faster and faster, getting closer and closer to this optimal way to reach a fixed number. The limit is that the fastest way to reach n=16.5 million is sieve through 5.8*10^15 then test the remaining n values. This approach will reach the three milestones in 474, 880, and 2609 days.

The first thing to observe is that within wide bounds the level of sieving doesn’t make much difference. Most of the work is going to be testing, and it’s going to take as long as it takes. Optimizing our sieving for reaching n=16.5 million will save 15 days out of 7 years.

I think the problem with the model is that it fails to capture the situation that the next prime may come very soon or it may be a long ways off. We want some resources testing in case it comes soon, and we want some resources sieving in case it is far off. The fixed milestone approach loses that balancing of uncertainty. I’m going to try changing the model by focusing on the next prime, and find the 10 mid-decile values – the values by which there is a 5% chance of finding the next prime, 15%, 25% … 95%. Then find the times to reach these 10 values and ask about the average of these times and how it changes with sieving.

Nuri
06-06-2003, 07:27 PM
wblipp, thanks for the response.

5.8*10^15 :shocked:

That's huge... 5,800T (or 5,800,000G). If I recall correctly, the largest figure pronounced so far at this forum was 400T. At levels that high, we would probably be finding a prime per 177G on the average (or roughly one every two weeks, may change depending on the machine speed of course). I'm not sure, but I guess that's also the time to prp test a candidate at n=16.5m.

Just some brainstorming. I'd be glad if you could respond to the questions below.

A question: Does your model consider the distinction between factors found above/below effective prp n level? (For ex: as of today, we're at=3.8m, so effectively, roughly 95% of the unique factors found today are helping to reduce the number of candidates we will be testing in the future, and the remaining 5% eliminates the candidates that were already tested. The same percentages would be 20% to 80% at n=16.5, so if we were not concerned about double checking prp test runs at all, the limit for sieve dept would be where it finds unique factors at least five times faster than a prp test for 16.5m could be finished).

A second question: Does the model adjust for sieve client speed increase due to removal of one of the k values when a prime is found? (and of course, the decrease in the number of unique factors found per sieve range)

Anyway, I am curious about something else too. I guess your model can solve for this:

Let's assume that we are aiming to reach p=500T at sieve when n reaches 18m at PRP. Let's also assume that we have the ability to keep a fixed percentage of our resources at sieve.

If so, what would that fixed percentage be, and when would we be reaching that?

PS: 500T and 18m are arbitrary. I just assumed that 500T would be deep enough, and we would want to shift our sieving effort to 20m+ ranges as we approach 20m at prp. I'd be very happy if you could also look at some sensitivities around those two numbers.

Regards,

Nuri

wblipp
06-06-2003, 10:26 PM
Originally posted by Nuri
A question: Does your model consider the distinction between factors found above/below effective prp n level?

The model knows that only sieved values above the current test point increase the progress of testing. First the sieving differential equation determines the sieving level at time t. After that has been solved, the sieving level enters the differential equation for the progress of "n" via Merten's Theorem, which tells us how many "n's" we will increase for every test performed.


A second question: Does the model adjust for sieve client speed increase due to removal of one of the k values when a prime is found? (and of course, the decrease in the number of unique factors found per sieve range)

Yes. At each milestone the sieving gets faster because there are fewer k values to sieve and the testing gets faster because there are fewer k values to test.

Perhaps I should have written a summary for the model, but it's all in this thread someplace, often with several revisions as people pointed out errors and improvements.


Let's assume that we are aiming to reach p=500T at sieve when n reaches 18m at PRP. Let's also assume that we have the ability to keep a fixed percentage of our resources at sieve.

If so, what would that fixed percentage be, and when would we be reaching that?

That should be easy, but it may be a few days until I get to it.

jjjjL
06-07-2003, 03:39 AM
i think the underlying assumptions need to be reanalyzed when the model recommends delaying the discovery of the next prime by 100% (250 --> 500 days) in order to save two weeks over 7 years in the discovery of prime 8. this model ignores what we discussed earlier about how user growth is non-existant or neg in long periods without primes and huge afterwards.

i think the model shows two things:

1) making this model is hard ;)

2) whether we stop sieving now or sieve another 5000T, we won't dramatically change the rate at which primes are found.

this is both good and bad. it means that all sieving is worth doing, but also means that all prping is worth doing too. the majority of the work needs to be done with SB and I think we have a good balance of resources right now.

i think looking short-term may make analyzing it easier. if our goal was only to find the next prime and to ignore double-checking for the moment, that would mean that to compare sieve rates to prp rates, you'd compare the (weighted) average time it takes your computer to do a test between where we are and the next prime, and the number of candidates you can remove by sieving between now and the next prime.

since "the next prime" is an unknown value, lets use the prediction n=4.8M. so we need to check all numbers between 3.8-4.8M before we have a better than average chance of "next prime." remembering that about 20% (not positive on that figure, someone could calculate it more accurately) of the results are for numbers that already have a factor, that means that you should compare your prp rate to 80% of 1/17th (or 1/20th for dual) of your sieve production. So figure out how large a range you can sieve in the time you can do one prp test. divide the # of factors by 25, and if it's over 1, the sieving is still a short term benefit to the project. lets say that number falls below 1, does that mean it's no longer a benefit of the project? no, because we're discounting the factoring for DC and the factors found for n > 4.8M BUT it does mean that it is only a benefit for the project in the long term. i'd say when the ratio falls below 1, it makes sense to stop sieving and start prping. save the sieving until after a prime is found. at that point, the sieve will be faster and the ratio will be improved because "next prime" will be higher.

i believe the ratio is still over 1 right now, but i don't think it will stay that way much longer. i've just run though some numbers. it's like an economic marginal benefit analysis.

assumptions:
a factor for a number between the current test TOP bound(3.8)-4.8M is a perfect substitute for a prp test of a number n=current test TOP bound-4.8M.

trends:
factor density --> rapidly decreasing, exponentially
current test TOP bound --> increasing but at a slowing rate
test length time --> increasing, but sub-exp

my data points:
SB v1.10 and SoBSieve v1.34
1GHz athlon (100MHz fsb)

current sieve range = p ~ 25T
current test size = n ~ 3.8M

sieve speed = 160k p/s = 0.576G / hour
factor density = 35 factors / 42.2G ~ 0.830 factors / G

test size = n = 3.8M ~ 221 blocks
SB speed = 160k CEMs/s = 2 blocks / hour

therefore, test time = 221 / 2 = 110.5 hours / full test

so marginal analysis would ask "how much of the perfect substitute (factors in the pre-"next prime" window) could i have if I were to sieve with my 110.5 hours instead of do the test?"

since the factor density has already fallen to less than 1 factor / G, it turns out the total thoughput for my machine is only 53 factors in the time I could do one test. correcting for the fact that 20% of those are duplicates and 95% of them are in irrelavent ranges (based on the "next prime" window), that means that only about 2 factors are expected to be immediately useful in finding the next prime.

also remember that the window is decreasing in size the closer we get to the next prime. even worse, the factor density is dropping off like crazy. I may have a factor density of 0.83 at p=25T but just look at the sieve coordination thread where people have done high ranges and reported the number of factors found:

49950-50005 chvo [complete] 25 factors found = 0.45 factors / G
50005-50100 biwema [complete] 43 factors found = 0.45 factors / G

so if the project tests up to n=4M by the time we reach p=50T, the falloff in factor density would effectively turn further sieving from both a short AND long term benefit into simply a long term one.

the point i want to make is that even if you ignore all the intanglible long term benefits of sieving such as the factors that prevent future double checks or the factors that prevent future prp tests beyond the immediate goal of the next prime you can still make a sorta "worst case" comparison where you show sieving in the dimest possible light. since it can be shown that further sieving is good even then, it must still be good *with* the extra, hard-to-qunatify benefits of sieving. heck, in a way, this almost proves the opposite of what i started off saying... that it doesn't really matter how much we sieve. if i had more faith that the n=4.8M number was a perfect prediction, it would mean that all resources should be taken off prp testing until we hit p=50T.

-Louie

jjjjL
06-07-2003, 04:01 AM
Originally posted by jjjjL
2) whether we stop sieving now or sieve another 5000T, we won't dramatically change the rate at which primes are found.

just to reinforce this point some more, lets say that we did sieve up to p=50T, effectively doulbing our sieve depth. how would this effect the speed with which we find the next prime?

some rough numbers would be
prp test = 110.5 hours
factor density between 25-50T = 0.65
sieve rate = 0.576G / hour
duplicate rate = 0.20
window size / total sieve width = 0.05

so that means in the time it takes to sieve from p=25T to 50T, we could have done 393 tests. however, we would have eliminated 640 tests from the window. that's a savings of 247 tests.

since SB completed over 200 regular sized tests today, even if we sieved to the more optimal level of 50T, we would only speed up the discovery of the next prime by about 1 day.

-Louie

garo
06-07-2003, 04:56 AM
What does your model say about stopping sieving at once? How would the time required to get to the three primes change then?

wblipp
06-07-2003, 09:52 AM
Originally posted by jjjjL
also remember that the window is decreasing in size the closer we get to the next prime.

This part isn't right - it's an artifact of using the 50 percentile milestone as a substitute for finding the next prime. We have what the reliability folks call increasing residual lifetime - that is, the longer we don't find a prime, the further away the expected time becomes. For a metaphor, it's like flipping pennies looking for a head, but every time we get a tail, the weight on the coin changes to make heads less likely on the next flip. The time until failure for most semiconductors has this property - it's part of why burn-in is so effective.

As I mentioned towards the bottom of my message, I think these fixed milestones are the biggest problem with the model. I think it's also the source of the problem that the model says "double the time to find the next prime so you can save a few weels in finding the third prime." I'll get around to trying the mid-decile modification after I respond to some open questions on this thread.

jjjjL
06-07-2003, 10:55 AM
Originally posted by wblipp
increasing residual lifetime - that is, the longer we don't find a prime, the further away the expected time becomes.

that right there is why i hate these prediction methods so much. how absurd is it that the closer we get to a fixed number, the further we believe we are from it? the day we find the next prime will be the same day that this model is the most incorrect. how can we even call that a prediction model? shouldn't it be able to tell us something about the future in a reliable way?

explain to me how a model that is always wrong, and continues to get more wrong can be useful.

-Louie

wblipp
06-08-2003, 12:28 AM
Originally posted by Nuri

5.8*10^15

That's huge... 5,800T (or 5,800,000G). If I recall correctly, the largest figure pronounced so far at this forum was 400T.

I see that for exponents in the range of 15 million, GIMPS does trial factoring (http://www.mersenne.org/math.htm) to 2^65, whiich is almost 4*10^19. They must be harder to shock.


Originally posted by Nuri
Let's assume that we are aiming to reach p=500T at sieve when n reaches 18m at PRP. Let's also assume that we have the ability to keep a fixed percentage of our resources at sieve.

If so, what would that fixed percentage be, and when would we be reaching that?


0.1% of resources devoted to sieving is sufficient to reach this. We would get to n=18 miillion in 2925 days.


Originally posted by garo
What does your model say about stopping sieving at once? How would the time required to get to the three primes change then?

With zero additional sieving, the time to reach the milestones would be 258, 808. and 2736 days. That's about 4 months later for reaching the third milestone.

Keroberts1
06-10-2003, 01:04 AM
Wblipp,
According to your model how does the time for the 50% probability completion time of the 17th prime change with changes in resources between sieve and prp. Would the differances be much more prevalent than in the lower numbers. I'm assuming that once we sieve up to a certain p value we will start sieveing for above 20,000,000 but i also understand this is going to be far off in the future. Maybe your model could help us to determine a best time ot switch over for a maximum benefit to the project. Mainly I'd like to know how much the sieve is helping ot bring down to a more reasonable level the completion of the project. After all I'd like to be alive to be at the celebration party when the 17th is found.

Keroberts1
06-20-2003, 05:55 PM
any word on this?

Keroberts1
06-20-2003, 05:56 PM
could P-1 be incorporated and maybe make a bigger difference?

Keroberts1
07-08-2003, 02:34 AM
I've been trying ot find a response to this a while, I've even tried ot figure out some of the math and build my own model (I failed miserable) But if someone could let me know, I know that the sieving doesn't matter much for the next few prime discoveries but for the completion it is my belief that it will have a much larger effect. PLease someone offer some input here

smh
07-08-2003, 11:11 PM
I see that for exponents in the range of 15 million, GIMPS does trial factoring to 2^65, which is almost 4*10^19. They must be harder to shock.

Factors of mersenne numbers are of the form 2*K*P+1

For a single P, not every K produces a prime, and even if it does, it needs to be 1 or 7 mod 8 (IIRC). So the won't have to test every primefactor below 4*10^19, but just a very small fraction ( < 1% ??).

The larger P gets, the faster trailfactoring goes.

jjjjL
07-09-2003, 03:18 AM
Originally posted by smh
Factors of mersenne numbers are of the form 2*K*P+1

For a single P, not every K produces a prime, and even if it does, it needs to be 1 or 7 mod 8 (IIRC). So the won't have to test every primefactor below 4*10^19, but just a very small fraction ( < 1% ??).

The larger P gets, the faster trailfactoring goes.

right, if you just take the 2*p out of the picture, they have a 2^25 sieving advantage over us due solely to the form of their factors. That means that us sieving to 2^45 would equate to GIMPS sieving to 2^70. However, they have only sieved to 2^65 (for the range originally being commented on.)

This basically means that the parallel nature of our sieve makes it many times more efficient than the trial factoring done by GIMPS... they just happen to have a nice factor form that makes it appear as though theirs is more efficient.

-Louie

tqft
07-10-2003, 02:01 AM
has the final analysis been completed or is someone waiting for some more data, or trying to find time to do it around their day job?

I and probably some of the rest of us would be interested.

The thread above has some analysis, saying basically that more sieving would only subtract a few days, is this definitive? I did read above somewhere that some people (Louie I think) did not like some of the "artifacts" produced by the type of model and/or that the model threw out some starnge stuff.

jasong
07-30-2005, 09:09 PM
*bump* because I'm hoping for new eyes and opinions.

hhh
07-31-2005, 05:30 AM
Whew, all this is way too complicated.
I liked the easy model: as long as we get more factor throughput then can run tests, we should sieve (adjusted with some constants). According to this one, we should sieve more. We had some discussions like this some time ago.

Unfortunately, everything has even become more complicated by the threshhold of proth sieve which forces us to doublecheck until the moment we decide that we need new factors now and continue FP sieve even with slower software (which will happen at 300G? 500G?)

BTW, why the hell 2^50? 2^48 or 2^64 I would understand, but 2^50? There must be a reason, though. Does somebody know why this problem cannot be fixed by changing 50 to 51 in the sourcecode somewhere?

And then, if we join with PSP, it becomes even more complicated.

I hope that in V3, there will be an option to factor the number before PSP, so that at least this question disappears.

Sieving is still for people who have a particular interest, read the forum etc., and there is no way to convince the "only-client" people to join sieve. That's no problem though. We can support them by sieving as much as we can.

Anyway, is there an allocation problem? We know what would be better (more sieving).

The only interest I see is in a model in order to join the statistics of PRP and sieving/factoring one day. Is there such an intention? Don't know.

I'm not interested in personal stats, so I will leave this alone anyway.

Sorry for my bad thougts. Yours H.

Greenbank
07-31-2005, 10:37 AM
The 2^50 limit comes from a trick using the floating point unit to help perform the mod-mul operation in the baby-steps giant-steps algorithm.

Because the x86 FPU is natively 80-bit (unlike others which are 64-bit) the mantissa component is large enough (64-bit) there is space to perform a relatively accurate mulmod of up to 60bit numbers.

Phil Carmody's post on sci.crypt (http://groups-beta.google.com/group/sci.crypt/browse_frm/thread/16584f78726c6824/048e74f0b0707882?lnk=st&q=64+bit+integer+multiply+modulus&rnum=5&hl=en#048e74f0b0707882)

The code (in C, fixed from an incorrect version posted in the thread) is:-

__int64 mulmod(__int64 x, __int64 y, __int64 p)
{
__int64 res,xyl = x * y;
double xyf = (double)x * (double)y / (double)p;
__int64 r = p * __int64(xyf); // CRITICAL
res=ryl-r;
if( res < 0 ) res+=p;
if( res >= p ) res-=p;
return(res);
}

There is also some benefit to pre-computing:

double recip_p=1.0/(double)p;

and changing the third line to:-

double xyf = (double)x * recip_p * (double)y;

Because we are using x86 we could extend the sieving up to 62 bits (lets say 60 to be safe). That is if proth_sieve is implemented in the same way that Phil Carmody implemented his.

Given that it has taken a considerable time (over 2 years) to sieve to 2^50 (and we aren't even there yet) then another 10 bits (to 2^60) should keep us busy for another 2048 years (given same participation level and processor speed).