Discussion about a Resource Allocation Model
Quote:
Originally posted by jjjjL
I have this naging feeling that a stunningly beutiful model could be constructed that took into account the likelihood of prime discoveries by n-level, sieve speed, expected sieve speedup, and prp speed to properly calculate the weighting of cpu between sieving and prping. ... give me your thoughts on all this.
-Louie
I'm thinking about ways to build such a model, and wondering about the data to drive it. I figure other people may be noodling about this, too, so a place to share information would be useful. Here are some of things I think could be useful in the model.
1. I can find the current distribution of remaining candidates in MikeH's sieve statistics.
2. The time for a "standard PC" to complete a primality test - 3 days?
3. The way primality testing will slow down with increasing n - will n's that are twice as large require twice as long? 4 times? sqrt(2)? ln(2)?
4. The rate at which a "standard PC" can sieve at the present level - 1G per hour? And the way this will change with p. Doubling the p value decreases the number of primes in a range, but it also, I assume, increases the time to test each p. The Prime Number Theorem tells us the first component - but how does the time per test change?
5. The rate at which factors will be found by sieving at various p levels - I think we can estimate this from Merten's Theorem and the number of remaining candidates.
6. The rate at which primes are found among the candidates - I think we can use the Proth-Weight adjusted prime number theorem for this. A full probabilitistic model is probably complex - a reasonable first cut could use the average n-value for finding the next primes.
7. The speedup of sieving with fewer k's to sieve - linear but with a fixed overhead?
8. The increase in resources from the publciity of finding a new prime - 20%?
9. The increase in resources during periods of no new primes - for purposes of optimal allocation this probably doesn't matter.
I'm presently thinking about applying the above parameters to a system of differential equations. p(t) would be the Prime threshold for sieving, n(t) would be the n threshold for primality testing. These equations could be solved for fixed proportions of resources for sieving and primality testing. (For variable allocations, too, if we have any idea about how vary them.) A longer term planning horizon is probably appropriate - perhaps minimizing the predicted time until four more primes are found.
Longer term there are other resource allocation problems that I think we can ignore at this time. I'd ignore the question of when to begin sieving beyond n=20M - it seem clear that deeper sieving on the present range has vastly greater payoff. I'd ignore the question of double-checking - my explorations of this suggest it is MUCH too early to devote serious resources to this, although to be sure it would be noce to know the answer to 10. Do we have any validated estimate of the error probability of reported results?
Please jump in with data and ideas.
Re: Discussion about a Resource Allocation Model
Quote:
Originally posted by wblipp
9. The increase in resources during periods of no new primes
Maybe this was optimistic. On April 29th the project stats page is showing 843 users active in the last two weeks. The thread on 3000 users seems to say this figure was 1000 on February 23. Perhaps the number of users declines during periods of no primes. These numbers suggest 8% net decline per month.
Eyeballing the system stats chart at the top of the stat page, there was a big increase from Feb 10 to March 26th, then a sharp decline and unclear trends after that, but all much higher than the February 10th level.
Maybe "Random fluctuations around a flat level" is the best guess.
Model Changes After finding Primes
I have proposed running the resource allocation model for the period of finding the next three primes because running this kind of model for exactly one period sometimes leads to false “edge effects” where the model says to do strange things because model thinks the world ends at the first stage. By running for three stages, these edge effects are usually pushed into the third stage, and the results are more realistic for planning the first stage. When the first stage ends (finding the next prime in our case), the model should be rolled out to look at stages 2, 3, and 4 to make plans during stage 2.
To run the model in later stages, we need to know what parts of the model change after we have found a prime. There are three effects.
1. c(t) – the computing power available at time t. I have proposed that this power increases by 20% every time we find a prime (Item 8). There hasn’t been any corrective input about this guess yet. This would be in addition to the slow steady growth I have proposed for Moore’s Law.
In stage 1 c(t) = c0*e^0.001t
In stage 2 c(t) = c0*1.2*e^0.001t
In stage 3 c(t) = c0*1.2^2*e^0.001
2. p(t) – the sieving limit increases faster because we are testing fewer k’s. As Mike discussed here, the sieve increase is less than linear. One explanation is that there is an overhead equivalent to three k-values. So removing three values makes the sieve run as (12+3)/(9+3) = 1.25, 25% faster. Using this approach, the sieving will increase by a rate of 15/14 in stage 2 and 15/13 in stage 3.
3. Remaining Candidates. There will be 11, then 10, candidates left to test, so the primality testing will proceed faster. How much faster depends on which k-values we find primes for. The k’s with higher Proth Weights are more likely to be found first because they have more opportunities. That is, every remaining candidate is equally qualified as a number with no divisors under 15T, but the k’s with high Proth Weight have more of these candidates than the ones with low Proth Weight. Since their likelihood of removal is proportional to their weights, we can calculate the average weight removed by taking a weighted average. The average weight for the next prime discovered is 0.19.
The average weight for the next two primes can be calculated by the same weighted average method, but now there are 66 pairs of k’s that might be the next two. For each pair we can calculate the amount of that pair – the sum of their Proth Weights – and the likelihood of that pair – the product of their Proth Weights. The weighted average for the next two primes discovered is 0.3767.
So the total Proth Weight in Stage 1 will be 1.96, in Stage 2 will be 1.77, and in Stage 3 will be 1.5833. These values were all calculated using Yves Gallot’s Proth Weights, so we will reduce them by about 1% as discussed in the post about Merten’s Theorem, so the actual values are 1.9416, 1.7532, and 1.5685.
Re: Rate of completing n's
Quote:
Originally posted by wblipp
MikeH reported that an Athlon XP2100+ completes a primality test in 3 days at n=3.5M. I think this is equivalent to a 1Ghz machine completing in 5 days.
Please note that a 1 GHz Athlon is faster than a 1 GHz Pentium III. And a 1 GHz Pentium III is faster than a 1 GHz C3. So we can't speak about a "1 GHz machine".
For k=10223 n=3461705, my Tualatin@1420 MHz (142 MHz FSB) needs nearly 160 CPU hours (~6.5 days)!
I think the baseline should be a 1 GHz Pentium III Coppermine, with 100 MHz FSB, SDRAM and i815 chipset.
Re: Re: Rate of completing n's
Quote:
Originally posted by Troodon
I think the baseline should be a 1 GHz Pentium III Coppermine, with 100 MHz FSB, SDRAM and i815 chipset.
Great! I love clarity. Do we know how long this baseline machine needs to complete a primality test near n=3.5M?