# Thread: A New Estimate for When Will Find Primes above n = 3 million

1. ## A New Estimate for When Will Find Primes above n = 3 million

I wasn’t content with the existing estimates, so I built a probability model to estimate when we will find the remaining primes. Based on this model, there is a 50% probability that we will find one more prime before n=3.7 million, and two more primes before n=5.1 million. There is a 95% probability of finding one more before n=7.2 million, and two more before n=13 million. But if the model is reasonable, we may be here a very very long time. There is a 50% probability that we will get all 12 primes by n=2.9 x 10^13, and a 95% probability we will find all twelve by 1.9 x 10^23.

My approach started by finding a partial covering set for each of the remaining twelve “k” values. If you don’t yet understand covering sets, look at Team PrimeRib’s proof that 78557 is a Sierpinski Number. That proof shows that seven specific prime numbers “cover” all the possible values of 78557 x 2^n + 1. These prime divisors repeat in cycle of length 36. For the remaining twelve k-values I created partial coverings with a cycle length of 720 – there are twenty-eight primes that can contribute to such a cycle. Then I calculated the number of “holes” in each partial covering – the number of values in each sequence of 720 “n” values that is not divisible by one of the twenty-eight specific primes. The number of holes ranged from eight holes for k=22699 and k=67607 through forty-four holes for k=55459.

Next I undertook adjustments to the generic probability that a number is prime. The generic probability that “x” is prime is approximately 1/ln(x). For one of our Proth numbers, this is approximately 1/(ln(k)+n*ln(2)). But we can make three adjustments to this generic probability because we have special knowledge.

For the first adjustment, consider the twenty-eight specific primes and p=2. The generic probability includes the factor that “x” is not divisible by any of these twenty-nine specific primes as the product of (1-1/p) – this product is 0.144861. But we have special knowledge from the partial covering set that at least one of these numbers divides the Proth number unless n falls into one of the holes in the cycle of 720. So we can “replace” the factor of 0.144861 by (# of Holes)/720.

For the second adjustment, consider the primes not in the specific set. The generic probability counts each of these as “not dividing with probability (1-1/p) = (p-1)/p.” But look at the proof for 78557 again. The divisibility of primes cycles in a period of length (p-1) or a divisor of (p-1). If the period is length (p-1), the correct probability that p does not divide the Proth number is (p-2)/(p-1). If the cycle length is a divisor, the correct value is either (d-1)/d or it is “1.0”, depending on whether p ever divides the Proth numbers for that k. But averaged over all k values, the result is again (p-2)/(p-1). This adjustment ought to made for every prime that could divide a Proth number, replacing the (p-1)/p with (p-2)/(p-1). This adjustment factor converges to 0.992318.

For the third adjustment, consider the primes that divide “k.” If these primes are in the list of specific primes, we have already adjusted for them. Otherwise the generic probability (after the second adjustment) includes a factor of (p-2)/(p-1) that the Proth number is not divisible by p – but we know the correct probability is “1.0”, so we can make those corrections.

The net result of these adjustments is that the generic probability is reduced by a factor that ranges from 0.076114 for k=67607 to 0.418855 for k=55459. To finish, we need to accumulate these adjusted probabilities for values of n starting at 3 million.

Two more simple tricks and the model is complete. To accumulate these probabilities over large numbers of “n” values, we can integrate instead adding. The result of the addition or integration is actually the expected number of primes in the range. To calculate the probability of zero primes in a range with expected number “a”, we can use the Poisson approximation of exp(-a). Now we have all the tools to estimate the probability that a particular k value has zero – or more than zero – primes in a range of n values starting at 3 million. The estimates at the beginning of this message are built from combinations of these  Reply With Quote

2. ## Wblipp,

Very Good!

I started out with the following 28 primes:

Prime Order-(2,p)
3 2
5 4
7 3
11 10
13 12
17 8
19 18
31 5
37 36
41 20
61 60
73 9
97 48
109 36
151 15
181 180
241 24
257 16
331 30
433 72
577 144
631 45
673 48
1321 60
23311 45
38737 72
54001 180
61681 40
I chose these because the order(2,p) for these primes was a factor of 720. The order(2,p) is the least m such that 2^m == 1 (mod p). Note that m will never be greater than p-1 becasue of Fermat's Little Theorem
Let p be a prime which does not divide the integer a, then a^p-1 == 1 (mod p).
which gives us 2^(p-1) == 1 (mod p) for all primes greater than 2.

But I think that I ended up with far fewer "holes" than you did:

K Holes
4847 18
5359 36
10223 36
19249 9
21181 24
22699 8
24737 18
27653 15
28433 12
33661 18
55459 28
67607 8
I have attached the details, all n (mod720) for which a prime p could exist such that p | k*2^n+1 aka "holes". The numbers in red are those n values (mod 720) that are common to two k values. There are 14 common values so we have 216 distinct values of n (mod 720) instead of the 230 that you would expect by totaling up the column above.

I don't know if this affects your analysis or not. If anything, it may enable you to refine your analysis.

EDIT: I just reread your post and saw that you talked about the factors of k, so just for the record:
K Factors/Prime
4847 37*131
5359 23*233
10223 Prime
19249 Prime
21181 59*359
22699 Prime
24737 29*853
27653 Prime
28433 Prime
33661 41*821
55459 31*1789
67607 Prime  Reply With Quote

3. Joe

I couldn't access your files, but I checked my work and found a transcription error that resulted in too many holes for k=55459. I agree with your results for the twenty-eight spedific primes and their orders. After making the one correction, I now agree with your count of holes. I think you have a typographical error in the prime factorization of k=4847 - I have it as 37*131.

I was surprised at how little difference the correction makes in the estimate, although there was a typographical error in one of the exponents. The revised estimates are:

&nbsp &nbsp &nbsp &nbsp &nbsp &nbsp &nbsp &nbsp &nbsp &nbsp &nbsp 50% &nbsp &nbsp &nbsp &nbsp &nbsp &nbsp 95%
Next 1 &nbsp &nbsp &nbsp &nbsp 3.7 million &nbsp &nbsp &nbsp &nbsp &nbsp 7.6 million
Next 2 &nbsp &nbsp &nbsp &nbsp 5.2 million &nbsp &nbsp &nbsp &nbsp &nbsp 14.4 million
ALL &nbsp &nbsp &nbsp &nbsp &nbsp &nbsp 2.9*10^13 &nbsp &nbsp &nbsp &nbsp &nbsp &nbsp 1.9*10^22  Reply With Quote

4. An interesting analysis...I would also mention that you have a unique opportunity to check the accuracy of your estimates by applying them to the five primes already found. What would your method judge the probability to be that we would have found these primes by now?

By the way, I'm not at all sure that Poisson is correct for the PDF of the number of primes found in a region, but since no one knows a better one, I guess it will have to do!  Reply With Quote

5. Originally posted by MathGuy
By the way, I'm not at all sure that Poisson is correct for the PDF of the number of primes found in a region, but since no one knows a better one, I guess it will have to do!
It may be a while until I have time to backtest the model over the previously found primes. What "n" levels did Seventeen Or Bust begin with?

I'll provide my defense of the Poisson approximation now. It may be a lousy approximation for the number of primes in a stretch of consecutive numbers, but it seems well suited for the number of primes from k-n pairs that fall into holes in the partial covers. Poisson approximations are often suited for rare event probabilities, and the remaining k-values have been self-selected to have rare primes. If we think of the Poisson model applying to the number of events on a line segment, our line segment consists of successive bits from the probability that each of the holes is prime. The Poisson model requires the probability of events on that symbolic line to be divisible and independent.

The departure from the divisible property is that the Poisson model implies that a single value of n and k could have two or more primes, something we know isn't really possible. If the probability of a prime for a particular k and n is a small value "q", then the Poisson model says the probability of two or more is about 0.5 * q-squared. For the "two primes by 14 million" calculaton, the typical q-value will be about one in 5 million. Over that whole range there are about five million k-n pairs that haven't been covered. Hence the divisibility error is that the Poisson model says there is about a one in ten-million chance of multiple events where we know there are none. This should not be a problem as long as we avoid predictions of multiple events with tiny probability.

On the independence issue, first observe that the partial coverings explicitly account for most short correlations - such as every second n being divisible by 3 and every third n being divisible by 7. Then also observe that our predictions do not depend on relationships between values, so that even missing strong correlations shouldn't affect our result much as long as we have the averages correct. The largest correlation failure of the model is that once a hole is found that is divisible by 23, every 7920 n values later will also be divisible by 23. If we have the right average number of values divisible by 23, it's hard to see how their precise distribution is going to severly impair the Poisson model.

Note that the thinning of k values to these extreme points may result in the wrong average effect for divisibility by 23 and other primes. For example, over random k-values, half of the time the values of n that are divisible by 5 (something mod 4) will fall on top of the n's that are divisible by 3 (something mod 2), resulting in no new coverage. However, that half of the k-values all have primes at low values of n - our remaining twelve k-values always cover distinct numbers with 3 and 5. The same thing might happen with divisibility by 23 - our model assumes that half of the 23's fall on top of the 3's. But if this error exists, it's an error in the adjustment calculation, not an error in the Poisson model.

I'll respond about the backtest when I have an opportunity to complete it.

(edited to fix Poisson example by a factor of 2)  Reply With Quote

6. k N where SoB started
44131 690012
46157 617063
54767 1023127
65567 981643
69109 1125094
k N for which a prime was found
44131 995972
46157 698207
54767 1337287
65567 1013803
69109 1157446  Reply With Quote

7. Originally posted by Joe O
k N where SoB started
44131 690012
46157 617063
54767 1023127
65567 981643
69109 1125094
Thanks, Joe. Do you know the starting points for the twelve k-values still under test? I think we need that to help avoid some of the common statistical errors. It's like the difference between asking "What's the probability Rondal Eaton would win Powerball on February 1st (depending on how many tickets he bought, one in serveral million), and asking "What's the probability somebody somewhere would have won Powerball on Feb 1st? (depending on how many tickets were sold, probably 30-80%). Although it's very unlikely to have been Rondal, it's quite likely that there would be somebody that won and a name we could then stick into the question.  Reply With Quote

8. k N where SoB started
4847 2006031
5359 875350
10223 610025
19249 1307678
21181 800204
22699 900190
24737 300127
27653 340089
28433 2000353
33661 645696
55459 540046
67607 400091
Although, Louie has initated an effort, aided by others whose names I do not know, to doublecheck(go down about 7 posts) these k values starting with n around 1000. I hope that you do not need exact values. I have them, though, if you really need them.  Reply With Quote

9. Regarding Poisson: I do think that it's the best one of the standard distributions for this purpose. I agree also that whatever correlation exists here is highly unlikely to be a problem, nor is the divisibility a problem (this is why Poisson is used for rare events - if events are sufficiently rare, you can almost always ignore correlations and "multiple event" issues). The problem I was really referring to was that the probability of primality is not constant over the interval in question. Poisson might work well to estimate the probability of having no airplane crashes over the last year (since that instantaneous probability is relatively constant), but it wouldn't work to estimate the probability of having no airplane crashes over the previous two centuries (since the underlying probability has changed dramatically over the last two centuries).

Since the underlying probability here is roughly proportional to 1/(ln k + n ln 2), the difference in probability between n=3,000,000 and n=6,000,000 are probability would certainly be significant. Choosing a Poisson model that matches the aggregate probability may well give a quite good estimate, but it is still an additional estimate. If, for example, you had an *exact* probability P(k,n) for the probability that k*2^n + 1 is prime (given whatever special knowledge you like), the probability of no primes between n=n1 and n2 for fixed k would be

1 - [1-P(k,n1)]*[1-P(k,n1+1)]*...*[1-P(k,n2)]

This would only be *approximated* by

exp(-[P(k,n1)+P(k,n1+2)+...P(k,n2)])

which is the result of using a Poisson distribution that matches the expected value of the number of primes in that interval.

I really don't mean to be overly pedantic here...I suspect that Poisson gives fairly accurate results and it's clearly *much* easier (and vastly better from a computational noise point of view) than using the formula for the probability above. Given that the probabilities themselves are only estimates as well, it probably doesn't make much sense to do anything other than use the Poisson estimate...

Edit: the initial "1-" in the probability expression above shouldn't be there...standard excuse applies: it was late and I was tired!  Reply With Quote

10. ## Re: Wblipp,

Joe - would you check my work on the five finished k-values? For holes I get

&nbsp k &nbsp &nbsp Holes
44131 &nbsp 40
46157 &nbsp 30
54767 &nbsp 28
65567 &nbsp 12
69109 &nbsp 24

For factors I get

&nbsp k &nbsp &nbsp &nbsp Factors
44131 &nbsp Prime
46157 &nbsp 101*457
54767 &nbsp Prime
65567 &nbsp 173*379
69109 &nbsp Prime  Reply With Quote

11. Originally posted by MathGuy
Regarding Poisson: I do think that it's the best one of the standard distributions for this purpose. ... The problem I was really referring to was that the probability of primality is not constant over the interval in question.
Text-only media are tricky; I'm a bit worried this may sound like I'm trying to pick a fight with somebody that agrees with me. But I find it useful and fun to understand the weaknesses of models and approximations, so I'm going to continue dialog on these alleged weaknesses a bit longer.

I think the non-constancy issue should be seen as an artifact of different scales. The Poisson model applies across a scale where numbers in the vicinity of n=6,000,000 are only about half as wide as numbers in the vicinity of n=3,000,000. This is analogous to continuous random variables being uniformly distributed across their density, a fact commonly used in generating pseudo-random numbers. This property would make important differences of we used the model to ask questions like "what's the smallest gap likely to be among the remaining twelve primes?" It doesn't matter for the questions I've asked so far.

Originally posted by MathGuy

[1-P(k,n1)]*[1-P(k,n1+1)]*...*[1-P(k,n2)]

This would only be *approximated* by

exp(-[P(k,n1)+P(k,n1+2)+...P(k,n2)])
I think this is really the divisibility issue again. The statement is true even when the P's are all identical, and the error in both cases is exactly the divisibility error - if each P is subdivided into smaller and smaller intervals, the expressions converge in the limit.  Reply With Quote

12. Originally posted by wblipp
Text-only media are tricky; I'm a bit worried this may sound like I'm trying to pick a fight with somebody that agrees with me. But I find it useful and fun to understand the weaknesses of models and approximations, so I'm going to continue dialog on these alleged weaknesses a bit longer.

I agree on both counts...Poisson is probably a good-enough solution *and* it's useful to continue discussion about it until we agree completely and/or get too tired to continue! Personally, I do a fair amount of joint research with collaborators who are also good friends, so I'm quite accustomed to having disagreements without being disagreeable (and, I might add, very familiar with the experience of ultimately being wrong!)

As long as everyone involved agrees that we're still having fun, I'm perfectly happy to continue talking through the model. The only possible outcome, really, is to strengthen overall confidence in whatever estimates eventually result. I'm also fairly confident that our estimates are highly unlikely to be rendered moot by us finding all 12 primes before we get the discussion of the estimates finished! As far as the text-only media is concerned, my collaborators and I often use TeX code in text...are you comfortable with that? If so, it does clean things up considerably (my earlier product then becomes \$\prod_{i=n_1}^{n_2} (1-P(k,i))\$ rather than that abomination I typed before...it might make things confusing for onlookers, however.  Reply With Quote

13. ## Re: Re: Wblipp,

Originally posted by wblipp
Joe - would you check my work on the five finished k-values? For holes I get

&nbsp k &nbsp &nbsp Holes
44131 &nbsp 40
46157 &nbsp 30 &nbsp &nbsp <<<&nbsp &nbsp 18
54767 &nbsp 28
65567 &nbsp 12
69109 &nbsp 24 &nbsp &nbsp <<<&nbsp &nbsp 16 &nbsp &nbsp <<<&nbsp &nbsp Really is 24

For factors I get

&nbsp k &nbsp &nbsp &nbsp Factors
44131 &nbsp Prime
46157 &nbsp 101*457
54767 &nbsp Prime
65567 &nbsp 173*379
69109 &nbsp Prime
Wblipp,
I agree with all your results, except for the two I marked up above. If you send me a list of your holes(or attach them here), I'll let you know what primes I think cover them.
Joe.

Edit to correct an error.  Reply With Quote

14. Speaking as an "onlooker" it seems you guys are omitting an important factor somewhere: one (or even mutiple) of the remaing 12 k may well be a sierpinsky number. I don't think there's a basis for assigning some kind of "probability" to this possibility, but it seems to me that there is a finite expectation of some "very small chance" to need "inifinitely many" more n -- or at least that it would be hard to exclude such a thing.

It just seems that you're trying to find probabilities for "general numbers" when we're dealing with numbers that are special at least in the sense that they might not produce primes at all.

I'd be curious to know how all these analyses have to modified if we allow for all 12 remaining k to be Sierpinsky numbers (quite improbable but certainly not impossible); or if 11 out of them are; or 10; etc. It seems that will lead to (generally) larger estimates for "average n at which something might be found" as we're including the possibility of a k that doesn't follow dp = (ln k + n ln 2) but is indeed dp = 0 .  Reply With Quote

15. You're right, of course, that any of the 12 might in fact be a Sierpinski number. What these estimates would give is some idea of how long we would expect to run if they *aren't*

These estimates would be primarily useful if, for example, there were one particular "k" for which we would expect to have found the "n" by now, but we haven't...if this were the case, we might switch our focus to showing that this "k" *is* a Sierpinski number (rather than showing that it isn't).

On the other hand, if, as it now appears, we haven't run anywhere near high enough to reasonably expect to have found the "n" for any of the remaining "k" then we can happily continue for a while longer...  Reply With Quote

16. ## Re: Re: Re: Wblipp,

Originally posted by Joe O
Wblipp,
I agree with all your results, except for the two I marked up above. If you send me a list of your holes(or attach them here), I'll let you know what primes I think cover them.
Joe.
Joe,

I found a mistake in my work for k=46157 and now agree there are 18 holes. I can't find an error in k=69109, though. I have the following twenty-four holes (I hope I copy these correctly - most of my problems have been copying numbers from one place to another wrong):

22
46
70
94
142
166
190
214
262
286
310
334
382
406
430
454
502
526
550
574
622
646
670
694  Reply With Quote

17. Originally posted by MathGuy
As far as the text-only media is concerned, my collaborators and I often use TeX code in text...are you comfortable with that? If so, it does clean things up considerably (my earlier product then becomes \$\prod_{i=n_1}^{n_2} (1-P(k,i))\$ rather than that abomination I typed before...it might make things confusing for onlookers, however.
My "text-only" concern was that the absence of body language and intonation can sometimes result in friendly statements being misread as aggressive. I haven't used TeX code, but it appears intuitive to figure out. I agree that it may make things harder for onlookers, but use it to the extent you believe the clarification outweighs the confusion.  Reply With Quote

18. ## Re: Re: Re: Re: Wblipp,

Originally posted by wblipp
Joe,

I can't find an error in k=69109, though.
That's because there isn't any. I'm not sure what I did wrong the first time, but when I redid it I got 24 holes.  Reply With Quote

19. Originally posted by MathGuy
you have a unique opportunity to check the accuracy of your estimates by applying them to the five primes already found. What would your method judge the probability to be that we would have found these primes by now?
One of the tricky issues in backtests, where you see how well a model would have predicted results that are already known, is figuring out the “right” or “fair” questions to ask. I think we should ask how many primes the model predicts between starting and n=3 million. From this point of view, the fact that there were four primes between 0.9 and 1.4 million, then none between there and 3 million becomes a question about clusters and gaps.

If you agree that the relevant questions are about the number of primes between the start and n=3 million, then the model works pretty well. The expected number of primes in this region was 5.23, and the most likely number of primes was 5. The probabilities are

#primes Probabilty
0 &nbsp &nbsp 0.001234
1 &nbsp &nbsp 0.011003
2 &nbsp &nbsp 0.044761
3 &nbsp &nbsp 0.110232
4 &nbsp &nbsp 0.183790
5 &nbsp &nbsp 0.219836
6 &nbsp &nbsp 0.195009
7 &nbsp &nbsp 0.130790
8 &nbsp &nbsp 0.067002
9 &nbsp &nbsp 0.026306
10 &nbsp &nbsp 0.007895
11 &nbsp &nbsp 0.001796
12 &nbsp &nbsp 0.000305
13 &nbsp &nbsp 0.000038
14-17 &nbsp 0.000003

(For each k-value I used the Poisson approximation to calculate the probability of zero primes. I then combined the 17 sets of "zero-or-one" probabilities to calculate the probability of zero to 17 primes. This approach avoids counting the probability of two primes for a single k.)  Reply With Quote

20. Certainly an impressive backtest...

Incidentally, to satisfy my curiosity about the impact of the Poisson assumption, I calculated the probability of finding zero primes in various regions of one of the sequences (10223) using the product formula (actually, using sums and logarithms for numerical reasons) and calculated the same probabilities using the Poisson assumption [in both cases the only approximation I used was replacing summation by infinite-precision integration, then evaluated the resulting expression with 100-digit floating-point arithmetic]. To make a long story short, the largest difference occurred in the seventh decimal place...now, I believe!

By the way, lest anyone think I was unfairly critical of poor Poisson, who isn't around to defend himself, the Mathematics Genealogy Project http://genealogy.math.ndsu.nodak.edu/ informs me that he was my great-great-great-great-great-grandteacher - I should have known better than to doubt the work of my ancestors...  Reply With Quote

21. Originally posted by MathGuy
impact of the Poisson assumption, ... the largest difference occurred in the seventh decimal place
So the Poisson part of the model is pretty solid.

I'm coming to understand that the weight part of the model is a mediocre approximation to the Proth weight, though. Yves Gallot provides Proth weight estimates based on p and n through 10,000 in Table 5 here. I'm wondering if MikeH's Sieving Statisics (here) can be used to make improved estimates of the Proth weights. Looking at the bottom of the table, I think I understand that for k=4847 there are presently 72959 values of n below 20 million that are not divisible by "something known." I'm not clear, though, about whether there is a simple statement of "something known." It would be ideal for Proth weight estimates if "something known" is "primes less than m." Can someone help me understand what sieving has achieved so far?  Reply With Quote

22. I was just shutting down my computer to be AFK for a while. If no one else steps up to the plate, I will try to help you when I get back.  Reply With Quote

23. Originally posted by wblipp
I'm wondering if MikeH's Sieving Statisics can be used to make improved estimates of the Proth weights. Can someone help me understand what sieving has achieved so far?
Well, I don't know how, but I think there is a way to connect the stats we have to the proth weights. The relative densities of remaining candidates (and naturally, found factors as well) w.r.t their k values are very close to the ratios between proth weights.

This relation was the logic behind how I found out that there was a problem with the first double check sieve attempt (on post dated 02-23-2003).

The relationship between the proth weight and the number of factors found (as of the two monhts ago when sieving was just at 700G) can be seen here (on post dated 02-08-2003).

Unfortunately I could not find a way to use the factors data to calculate the revised proth weights. My math knowledge is not enough to find it out. Wblipp, as far as I know, you can read the numbers like this;

There are a total of 735295 numbers with n < 20m that are not divisible to something known.

There are a total of 287766 numbers with n < 20m that are divisivle to something 1G < p < 3T. (one billion to three trillion)

And, finally there are a total of 30827 numbers with n < 20m that are divisivle to something 3T < p.

So, there were a total of 1,053,888 numbers with n< 20m that were not divisible to something known, when the sieving was at 1G.

PS: Numbers are as of 11/04/2003 12:02 (GMT) update.

I'm not sure if that explanation might help you in any way, but I hope it could.

And JoeO, please add your comments or correct me where I'm wrong. You and Mike probably know more than me on that issue.  Reply With Quote

24. We need MikeH to check in here. Did he include all the numbers less than 3M or just those > 30000? And unless he has a private source of information he doesn't yet have all the factors found under 1T for n<3M.

While we are waiting for him, you might want to check out the following thread.  Reply With Quote

25. We need MikeH to check in here. Did he include all the numbers less than 3M or just those > 30000? And unless he has a private source of information he doesn't yet have all the factors found under 1T for n<3M.
The sieving stats are based on the range 100<n<50M, where p>1G.

And unfortunately, I don't have my own private source of information, so the 100<n<3M range where p<1T is now missing some contributions. I was planning to wait until DC completes 2T before asking Louie to update the small factors file. That will then probably be the last update required for some time - the results.txt file is expectedly growing quite slowly now.

Can I also take this opportunity to thank those that are viewing and using the sieve stats.

Mike.  Reply With Quote

26. Originally posted by MikeH
The sieving stats are based on the range 100<n<50M, where p>1G.
I have some elementary questions.

1. Are we removing all primes in numerical error except for coordination issues, or are we doing something more clever?

2. MikeH says the stats are based on n up to 50M, but the table at the bottom of the Sieving Stats page only goes to 20M - why the discrepancy?

3. Looking at the Sieve Coordination Thread it appears that all primes through 9.7T have been tested, and all primes through 14.020T have been reserved - am I reading that correctly?  Reply With Quote

27. 1. Are we removing all primes in numerical error except for coordination issues, or are we doing something more clever?
Yes, we are removing candidates in numerical order. Some people (myself included) have 'played' with the high ranges, but these are really information gathering exercises (what are the densities of factors in very high ranges?). I think we all accept that the most efficient way to sieve is by strict p order.

2. MikeH says the stats are based on n up to 50M, but the table at the bottom of the Sieving Stats page only goes to 20M - why the discrepancy?
Perhaps I was thinking too far into the future, but I created an sob.dat file up to 50M as my 'check' file. But for the purposes of factor removal metrics, currently the numbers you see are for n<20M, since the tables that have n sliced by 1M chunks ignore all ranges where no factors have been submitted. But that has now made me realise that the submission of a few factors where n>20M will really mess up that lowest table!

3. Looking at the Sieve Coordination Thread it appears that all primes through 9.7T have been tested, and all primes through 14.020T have been reserved - am I reading that correctly?
Yep, that's about right. Most of 9700 - 12500 are also complete, but yes 9700 and above is still "work in progress".

Mike.  Reply With Quote

28. I think that the only thing Mike did not cover was that there are two sieving efforts. The first and primary is for n such that 3M < n <20M. The second, "double check" is for n such that 101 < n < 3M. The second effort is not as far advanced as the first. It started much later, and is now only done for p < 1T, with some higher ranges of P done as well for various reasons, mostly for checking the density as Mike states.  Reply With Quote

29. But that has now made me realise that the submission of a few factors where n>20M will really mess up that lowest table!
I really shouldn't have said that! But the culprit is named.   Reply With Quote

30. Yeah, I submitted that 1 factor >20M along with
my DC sieve factors as a practical joke.
(since I wanted to know what would really happen to the
stats). If it bothers anyone, Mike can remove it from
the stats.

Of course, if there are people out there who like factors above
20M, I still have 5 factors not submitted yet   Reply With Quote

31. No problem Moo I'd suggest hanging onto those extra factors right now. I'm 99% certain that Louie doesn't have the data base set-up above 20M, which means if we ever do get that far, those factors will possibly need some manual intervention.

Cheers,
Mike.  Reply With Quote

32. Well since we're getting close to 3700000 adn haven't found a prime yet I assume the probability of a prime before 3700000 is not still 50% If you've got some spare time perhaps mathguy could generate some new probabilities or at least give us some idea of when we should expect to see the 6th prime. It would be really nice ot see another one because it has been a long time since one was found and I'm sure some peopl are starting ot get discouraged as ot wether or not we're actually gonna find more.  Reply With Quote

33. sorry ot be confused it was wblipp that generated the last probabilitys wasn't it well I guess he's the one I'm asking the favor of then. but if anyone else knows how to recreate them then I am curious where the values stand  Reply With Quote

34. Yes, it's true that or sixth k should have cracked by now, but I guess it's a bit early to get discouraged. The primes in the numbers we're searching are rare, and it's that feature which makes them so valuable. What I do, not to discourage myself is, I always keep the hope that the candidate I am testing at the moment will turn out to be a prime.

Millions of people have crunched (and are still crunching) SETI for years just for the hope of hearing from ET. And there is no guarantee that either their receiver is strong enough, or they are looking at the right direction at the right time, or they are looking at the right type of data, or they are analysing it the way they should. And even if all of these are perfectly as they should be, still there is no guarantee that ET really exists. At least, I'm sure that we're at the right track for our purpose. Sooner or later, we'll reach our goal.

Anyway, I also had a probability analysis before, and it was giving results close to that of wblipp. The reason why I did not share it is that I'm afraid assigning probabilities might dislead people sometimes.

Let me try to explain what I mean. If we were to run these probabilities just after we found our last prime, the model would have suggested 50% chance of finding at least one prime up until n=1.7 million, and 95% chance up until n=3.75 million (and 94% up until n=3.55 million, where we're currently returning our leading edge tests). 98% and 99% figures were n=5.2 million and n=6.6 million, respectively back then.

At the moment, roughly one tenth of the currently pending tests are below 3.4 million, which corresponds to roughly 300 tests. So, assuming none of these 300 tests will actually return a prime, then the 50% and 95% marks are now n=4.3 million and n=9.5 million, respectively. And, the probability of finding a prime up until n=3.75 million dropped from 95% at the time of our latest discovery, to 25% now.

I don't know what figures wblipp would find out, but I guess it wouldn't be too far off from my figures (may be a few 0.xm less). As for comparison, my model was suggesting 46% for 3.7 million and 92% for 7.2 for at least one prime, and 43% for 5.1 million and 91% for 13 million for at least two primes.

I know, there is a logic in the change in probabilities as we go further and find no primes, and I understand that. I have the feeling that the very last (17) primes are behaving strangely that the probaility analysis is not sufficient to explain the behavior. Finding five primes within the n=0.7m-1.3m range was probably much more frequent than expected, and now, finding no primes up until n=3.4 million is much sparse than expected. If we were to rely on probability analysis, we would have failed in both.

Although I know that the probability is really low (0.0000000004%), there is no guarantee that we will not find all of the remaining 12 primes up until n=4m, and have a great story to tell our grand children. So, please keep crunching.  Reply With Quote

35. Originally posted by Keroberts1
Well since we're getting close to 3700000 and haven't found a prime yet I assume the probability of a prime before 3700000 is not still 50%.
The original message at the head of this thread went through extensive effort to estimate adjustments that I now understand are the same thing as Proth Weights. And it isn't a particularly good estimate because it gets the effect for a few primes exactly right instead of getting the effect for many primes approximately right. In the following estimate I'll use Yves Gallot's estimate of Proth Weight from Table 5 of his paper on weights.

Given there are no new primes under n=3.5M, there is a 50% probability the next prime will be found by n=4.47M, and a 50% probability that two will be found by n=6.54M.

The 95% probability numbers are 10.0M and 20.5M.  Reply With Quote

36. Wow, our results are closer than I expected. My figures for n=3.5m were, 4.45m, 6.45m, 9.85m, and 19.7m, respectively.  Reply With Quote

37. I don't want to sound negative here, but there's also the small chance that we've missed it already.  Reply With Quote

38. Originally posted by smh
I don't want to sound negative here, but there's also the small chance that we've missed it already.
Yes. I've looked into that, too. The only estiimate I've seen for an error probability came from you - you reported that GIMPS tests indicate that 1% of all tests produce a wrong result in this thread . I put out a request for better information in item 10 of this thread on Modeling.

Using your 1% estimate, I figure there is a 3.9% probability that we have missed a prime and a 0.07% probability we have missed two primes. That level isn't sufficiently high to warrant any action beyond the sieving that is going on. Finding more primes will make the double check, if necessary, shorter, too. Somewhere between 10% and 80% I'd suggest a small scale double-check effort begin at a pace sufficient to keep this probability bounded.

For an example of the other extreme in reliability, we can look at what United Devices does in their cancer drug screening at grid.org. Every calculation is repeated on many machines - it appears that an average of eighteen repetitions with a minimum five is used. But their project doesn't have any self-correction nor any inherent error indication. In our project we have self-correction because we might find the second prime if we miss the first, and we have inherent error indication from sufficiently long periods of no primes.  Reply With Quote

39. ## GIMPS error rates

Here is George Woltman's original post about error rates. The following excerpt is the heart of the matter:
Our overall error rate is roughly 3.5%. If you have an error-free run, the error rate is in the 1.4% to 2% area.
There is more discussion in this thread where George states:
The 1% I quote in the FAQ is for a clean run - although I may not state that. The error rate has probably gone up over time too.
This is where the 1% figure came from, but as stated in the first quote, it is higher than this.  Reply With Quote

40. quote:
___________________________________________________
There is a 50% probability that we will get all 12 primes by n=2.9 x 10^13
___________________________________________________

Does this mean that there is a 50% probability that we will
NOT see the sierpinski problem solved in our lifetimes,
assuming that no quantum computers are invented?  Reply With Quote

Page 1 of 2 12 Last #### Posting Permissions

• You may not post new threads
• You may not post replies
• You may not post attachments
• You may not edit your posts
•