Sieving vs. prping vs. prime density [Archive]

View Full Version : Sieving vs. prping vs. prime density

jasong

05-20-2005, 05:19 PM

I've noticed threads that claim, for instance, that,"Everything but a P4 needs to be sieving," and others that say,"Factoring is far behind."

Unfortunately, these statements are just that. Statements. If someone could generate some actual statistics, for instance, the odds of finding a random prime before n=10 million as opposed to the amount of effort required to sieve everything in general, people might actually be able to make a better decision.

What would be REALLY fabulous is if there was a thread that explained the math(for the hardcore among us) and another thread that supplied statistics every, say, two weeks or a month, and encouraged people to do this or that based on the actual statistics.

If someone could supply me with the basic information I need, perhaps I may have the math skills to provide some useful statistics. I'd even be willing to spend some time in IRC or AOL IM, if that would help.

vjs

05-20-2005, 05:39 PM

It's not as easy as it sounds and the problems still boil down to stastics.

Here is a good start...

http://primes.utm.edu/howmany.shtml

http://www.mersenne.org/math.htm

Add to this the influences of error rates, sieving, factoring, new clients, advances with computers, etc...

You see why it's just alot easier to make blanket statments, there are a few posts if you really search that have a little math in them.

jasong

05-20-2005, 05:56 PM

Originally posted by vjs
It's not as easy as it sounds and the problems still boil down to stastics.

Here is a good start...

http://primes.utm.edu/howmany.shtml

http://www.mersenne.org/math.htm

Add to this the influences of error rates, sieving, factoring, new clients, advances with computers, etc...

You see why it's just alot easier to make blanket statments, there are a few posts if you really search that have a little math in them. I'm the perfect person for this to come up with statistics. High math IQ, no life, no job, jaded with Tivo.

Looks like I've found a potential hobby.

jasong

05-20-2005, 07:28 PM

Link to where I'm setting up camp to solve this problem (http://www.mersenneforum.org/showthread.php?p=54714#post54714)

Questions:

What is the approximate sieving speed of SOB in general?

How many prp tests are done in a day, in general?

How many tests are generally sieved in a day?

What is the total number of tests?

For now, I'm going to ignore factoring, since I'm not sure what the impact is.

jasong

05-20-2005, 09:43 PM

I've found the following equation:

factoring_cost < chance_of_finding_factor * 2 * primality_test_cost

When it says "cost" I'm assuming them mean "time." (Is there any disagreement?)

Assuming this is true, the cost of finding a factor is:

1000/15=66.66666667 (average number of billion you go through before finding a factor) divided by

average number of billion gone through in a day, assumes a 500 kp/s rate(?)=43.2

equals cost: 1.543 days per factor

So we have:

factoring cost(1.543 days)<1/log 2^1,900,000*2*8.681 days

So, apparently, we need the logarithm of 2^9,113,000 to figure this out.

hhh

05-20-2005, 10:03 PM

Some of your questions find their answer in the secret stats (http://www.seventeenorbust.com/secret). You see for axample that we make less DC tests then Firstpass. Firstpass is 'pulling away'.　But that doesn't really help. The calculation is the following.
We have some error rate r. If the ratio 'Time for a DC test/Time for a FP test' is larger then the error rate r, we shouldn't DC but look for a prime in the new tests. If it is smaller, we should DC. The optimal case is when both are equal.

I'm still waiting for an update of the error rate.

We have some discussions like this from time to time, e.g. here (http://www.free-dc.org/forum/showthread.php?s=&threadid=7929), but it is a little hard to read again such a conversation. Might be an indication.

As for sieving, it is basically the same: If a machine finds more factors by sieving then it would perform PRP tests in the same time, it is better to sieve. This is definitely the case, with PIII for example. Especially with the new dat.

This is of course only true on the long run. If we knew that there is a new prime in the next 10000 tests, one should finish them up as soon as possible. But we have no such information. That's why I think one should think on the long run, and why I personally don't like discussions like 'When will come the next prime.' I believe we have no idea. But never mind.:|ot|:

Again, waiting for a factor by sieving and PRP should take the same time, this would be optimal.

As for factoring, it is true that even with the optimal bounds, the gain of time is slight, by far not so big as with sieving. But it is there, and we have always ranges 'passed by', meaning that we are not able to make a quick factoring attempt for every test; that's why I said once that is is 'far behind'. It is not so far behind as sieving.

I hope I helped you to understand. H.

hhh

05-20-2005, 10:15 PM

Sorry, jasong, but even though I have got a high math IQ, too, I don't get your formula. :p In fact I don't know what you are calculating.

Anyway, as far as I get it, you seem to claim that with a particular machine, you find a factor every 1.5 days. (With sieving, isn't it?)
This factor saves a test with 1900000<n<50000000. On average, this test will take more then 1.5 days. So it makes sense to sieve, Logarithms or not, no?
:confused: <--confused but one cannot see the ??? H.

jasong

05-20-2005, 10:29 PM

As for sieving, it is basically the same: If a machine finds more factors by sieving then it would perform PRP tests in the same time, it is better to sieve. This is definitely the case, with PIII for example. Especially with the new dat.
This is the only part I disagree with, and the reason is simple. I ask that people read the entire explanation before responding. If you don't, I won't flame you(that happens way too much on the Internet), I'm just trying to expedite things.

Here goes:

Fact: Sieving will never find a prime. From a statistical and logistics point, it is an incredibly necessary part of the project, but it is simply a tool, a means to an end.

Fact: As a very general rule, the chance of a random number(x) being prime(I'm quoting now, possibly I trusted someone I should not have) is 1/log x

Fact: Once we find a prime, the other Ns for that k are useless in terms of the projects main goal.

Maybe I'm about to get lofty and a little ridiculous, but I'd like to point out that the majority of our understanding of things is gotten from trusting other people. I simply take it for granted that the earth is round, even though I don't have any experience that proves it. While I allow for the fact that this is a volunteer project and people have the right to crunch what they want, maybe we need some intelligent people that know their math to actually crunch the numbers so that people see where there's a need instead of hearing this or that opinion.

This project is about numbers, so people, and I'm not stating this as a demand, need to be guided by the numbers.

Just my opinion.

jasong

05-20-2005, 10:38 PM

Originally posted by hhh
Sorry, jasong, but even though I have got a high math IQ, too, I don't get your formula. :p In fact I don't know what you are calculating.

Anyway, as far as I get it, you seem to claim that with a particular machine, you find a factor every 1.5 days. (With sieving, isn't it?)
This factor saves a test with 1900000<n<50000000. On average, this test will take more then 1.5 days. So it makes sense to sieve, Logarithms or not, no?
:confused: <--confused but one cannot see the ??? H.
That would be correct assuming that a purely known, linear approach to finding a prime was used to find a prime. Actually the prime could be the very next n or millions of n values away. From a statistical standpoint, simply sieving more values than you prp doesn't gaurantee that you're making the best use of your time. The odds that the intermediate values could be the prime are important as well.

Look at it this way:

Let's say you have a choice of making a 2 minute walk to the store or taking a 1 minute bike ride. You do this every day and prefer to ride your bike since it's the same amount of effort to pedal as to walk. The problem is that you have a hole in your tire and it takes you 5 minutes every 3 days to pump up your tires. I'm not actually going to do the math, but the point is that this is more than a linear problem because of the uncertainty involved.

Edit: the above is my retarded attempt to make sense, I apologize for wasting your time. I'm going to open up a word document and work my butt off, so be patient. It WILL make sense, I promise.

I'm just trying to help people be as efficient as possible.

jasong

05-20-2005, 11:26 PM

Edit: the above is my retarded attempt to make sense, I apologize for wasting your time. I'm going to open up a word document and work my butt off, so be patient. It WILL make sense, I promise.

Apparently, I'm a liar. What follows is what I managed to write:

Okay here goes. I assumed that the average computer sieves at 500kp/s and used the 20M .dat "statistic" of 15 per 1000G.

That makes the factoring cost 1.543 days per factor.

The chance that the very next k/n pair is prime is figured by the equation (1/log x) where x is the actual value. I assigned x the approximate value of 2^9,133,000. Since my calculator couldn't go that high, I assumed an approximate linear progression of the logarithm(I hope this wasn't a mistake) and came up with the number 1/2742842. We're figuring this in terms of the numbers in general, so I multiplied it by the number of tests until 10,000,000 which is 123,214 and got .04492 which, unless I screwed up royally(possible) means that there is an approximately 4 and 1/2 % chance of finding a prime before 10,000,000. I multiplied this by the cost of a primality test (8.681 days). For some reason the example said to multiply the result by 2, which confuses me(maybe someone could explain that specific part?) and got .7799.

Unfortunately, this is an inaccurate estimation because the number of tests is supposed to include ALL the tests

The basic concept is good, but I've gotta figure out how to factor in the fact that there is sieving going on while we're counting up the n values.

I contracted a brain disorder in my Junior year of high school and don't have a heck of a lot of college math. Can anybody help me with this?

hhh

05-21-2005, 01:16 AM

Ok, jasong, I'm going to get it.

I overread your last posts and appearently you are searching the deeper approach. So I apologize for giving you only the quick answer, and will think about everything before my next post. You are right, if you look at it like this, it becomes quick very complicated.
See you later, Yours H.

hhh

05-21-2005, 04:15 AM

jasong, in your last post, you try to find the probability to find a prime until 10000000, right. I have no idea if your formula applies, but I would like to correct your calculations.
The numbers we are dealing with have the style 2^9100000, I agree. the k has no weight.
so log 2^9100000=9100000*log 2= 9100000* 0.6931471=~6300000.
The probability to find no prime is approximately
(1-1/6300000)^123,214=0.98
Using your formula, I get a 2% chance of finding a prime.

The 2 in your calculations is probably due to the fact that every n needs two tests.

I say again that I don't understand your formulas, what they calculate, and so.
In particular, I doubt that the probability that a number is prime is still the same after sieving . Here we need help.
I made some back of the envelope style calculations, too, I'll post them later. H.

hhh

05-21-2005, 08:38 AM

OK, here we go.

The most difficult problem is the estimation of the probability to get a prime. That's why I found an other approach.

How many primes can we expect until n=50000000? 1? 2? 3? 7? 11? Choose a number. If you chose more then 3, you are really optimistic, IMHO.
I think it is max three; even with the best k eliminated, only about half of the tests disappear. In other words, 50% of the sieving work is lost.

We know the optimality equation: PRP time = Factor finding time; adjusted by the circumstances that we loose half of the sieving work and save 2 PRP tests by every factor,
2*PRP=sieving/50%.
So, waiting for a factor should take as long as one PRP test.
Conclusion: Sieving is way behind.
PS: the fact that, if a prime is found, there has been only one PRP test saved may push the balance a little bit in favor of PRP, but even though the ratio you posted
PRP 8 days
Sieving 1.5 days
is not rectified.

I am sorry that this is again no real math, but it convinces me, at least. Additionally, I don't know how to make it better. If somebody finds a way, I would be glad.

Yours H.

jasong

05-21-2005, 03:19 PM

hhh, your math makes perfect sense. My attempts to make sense of it had to deal with the fact that

I HAVE NO IDEA WHAT I'M DOING except for guessing at algebraic concepts, which I usually screw up at least three times before getting correct(the fact that there are no statistically consistent real-life examples doesn't help).

Unless someone comes up with a better idea, I think your equation should be paraded like an American Flag at a 4th of July celebration(non-Americans, please mentally place your own simile here).

Off-topic:hhh, I got your PM before I read this topic and all I have to say is I wish you long life, vast riches, and a loyal circle of friends. I can only imagine what you're like in person. You are the type of person I constantly try to be myself.

jasong

05-21-2005, 04:31 PM

Guys, I must be getting old. I could've sworn my math skills are better than this.

Okay if x is the maximum number, 1/log x is the distribution of primes

(log 2^50,000,000)-(log 2^2,000,000)

Is the above correct? I never realized how rusty my skills had gotten.

15,000,000(approx.)-600,000(approx.)=14,400,000

I'm going to go out on a limb and hope that I'm correct in saying that about
1/14,500,000 numbers are prime for the 2M to 50M range.

I'm not going to give any more statistics until this is confirmed or denied.:help:

hhh

05-21-2005, 09:22 PM

Originally posted by jasong

Off-topic:hhh, I got your PM before I read this topic and all I have to say is I wish you long life, vast riches, and a loyal circle of friends. I can only imagine what you're like in person. You are the type of person I constantly try to be myself.

Yeah, that's me. Smart, handsome, successful, tons of groupies... and so on.http://www.arnold.ro/images/galerie/foto22.jpg

:jester:

Edit: I'm sorry, will not happen anymore.:|ot|:

vjs

05-22-2005, 02:30 PM

O.K. I'll comment here...

Cost = computing time correct... If you want to get picky you can think of the following costs differences.
P4's are better at prp and P-1 than P3's athlons etc and thankfully the reverse is true.
Athlons are better at sieve than prp and p4's don't sieve well compared to prp performance.

So when calculating costs p4's get a bonus (sse2) when it comes to prp where athlons short pipeline, large L1 cache or ?? get a boots with sieve.
So the ratio of sieve to prp is better for some processors than others.

________

Great I'm glad your taking the effort it would be great if someone could set up a spread sheet with variable so that as things change (secondpass level, error rate, first pass, clients, primes, sieve level, ...) we can see the differences.

What is the approximate sieving speed of SOB in general?

How many prp tests are done in a day, in general?

How many tests are generally sieved in a day?

What is the total number of tests?
[quote]

What is the approximate sieving speed of SOB in general?
Hard to predict it changes from day to day but you could follow the 90% level on mikes site.

How many prp tests are done in a day, in general?

I'd estimate somewhere around:
160 (first pass)
120 (Second pass) This number has been increasing lately

How many tests are generally sieved in a day?
Both Sieve and P-1 remove about 25 tests on average.
(This might mean we are progressing around 1500G per day with sieve on average.)

What is the total number of tests?
Infinte until we find all 10 remaining primes...

There are >40,000 tests per 1,000,000n for the remaining 10k those tests are distributed between the k's in the following ratio's.

4847 63397/524548
10223 72956/524548
19249 26519/524548
21181 62359/524548
22699 25525/524548
24737 65806/524548
27653 36545/524548
33661 62064/524548
55459 87239/524548
67607 22138/524548

Now those ~40K test have been reduced to approximately ~27k per 1M-n-range through sieve and factor.

Sieve reduces the total number of tests by about 4% for each 2^n p sieved.
We are currently beyond 2^49 and are expected to remove about 4% of the tests from sieving from 2^49 to 2^50( 563T - 1126T), by "about" the same amount we reduced it by for sieving 2^48 ~ to 2^49 ( 281T - 563T)

Note we cannot sieve beyond 1126T with the current client in it's current config. Other sieve programs are available but they are about ???90% slower???

[quote]
1000/15=66.66666667 (average number of billion you go through before finding a factor) divided by
average number of billion gone through in a day, assumes a 500 kp/s rate(?)=43.2
equals cost: 1.543 days per factor

I think your on the right track but, what is the cost of the factor, first the factor is random so it's any n. And relating that n cost is what's important.
If the n is very high n=49M we may find a prime for that number before then... If not the n is very high... are we going to test it once or twice?
If the factor is less than first pass prp n=9.2M we already tested it once, will it remove a double? Probably, if you think we missed a prime it's worth 0.9 of the test if it's after the missed prime n if you think we missed two primes it's 0.8 after the n of both primes.

You may consider than on average the factor you find is worth the average test time of the n-range sieved x 1.8 test costs. Some people argue it's worth 2.2 test costs. Since we test n after the prime if the prime is missed. If you stick with factor is worth 2 tests it's hard to argue.

Also I have the feeling your thinking sieve isn't worth it... work the stats to make sieve look bad (i.e. lowest factor density, error rate is zero, we find 5 primes before 50M, etc,) I'm pretty sure sieve will still come out on top.

Also

15 tests per 1000G sieved... this is for the 1M<n<20M dat, for the 991<n<50M dat multiply by 2.5 38 factors per 1000G, factors denisty does not increase or decrease with n (significantly) only p.

Also,

Go with the lowest factor denisty possible per 100G of sieve, pixl97 has already sieved a very large range at the limit of proth. I trust his work and that he didn't miss any factors.

He found 13.5 factors per 1000G using 1m<n<20M would have been 36 had he used 991<n<50M

Again, waiting for a factor by sieving and PRP should take the same time, this would be optimal.

No not really, first we don't know if the test is 100% correct and we will probably test most of the numbers twice eventually.

I personally don't like discussions like 'When will come the next prime.'
Agreed it's all in good fun though, we have no way of predicting beyond stats.

In particular, I doubt that the probability that a number is prime is still the same after sieving . Here we need help.

The probability is obviously better, one thing that you have to remember is we have already missed a tremendous number of primes. Lots and lots and lots of primes have been passed by in our search!!!
Why? because those primes didn't fit our equation k*2^n+1 for one of our k's.

We are only testing those numbers which fit the equation k*2^n+1 all of those n's for the k's have to be tested until we find a prime (k,n).
So going back to the >40K tests per 1M-n range, we have not really increased our probability of finding a prime on a per test basis we have reduced the number of tests required to find one.
(At least look at it this way for your calculations) You could also say that since we started with >40K tests per 1M-n range (with very very little sieving) and we are now around ~27K we have increased our chances by better than 30%...
All we really do with sieve is reduce the needless work.

How many primes can we expect until n=50000000? 1? 2? 3? 7? 11? Choose a number. If you chose more then 3, you are really optimistic, IMHO.
I think it is max three; even with the best k eliminated, only about half of the tests disappear. In other words, 50% of the sieving work is lost.

At best you could even assume 5k eliminated by 50M, who knows we may find all by 30M but unlikely.

Sorry I'm going through this and commenting what I understand as I read... you guys are doing great and probably undeerstand by now why blanket statements are made.

We know the optimality equation: PRP time = Factor finding time; adjusted by the circumstances that we loose half of the sieving work and save 2 PRP tests by every factor,
2*PRP=sieving/50%.
So, waiting for a factor should take as long as one PRP test.
Conclusion: Sieving is way behind.
PS: the fact that, if a prime is found, there has been only one PRP test saved may push the balance a little bit in favor of PRP, but even though the ratio you posted
PRP 8 days
Sieving 1.5 days
is not rectified.

This is great!!!

(log 2^50,000,000)-(log 2^2,000,000)

Is the above correct? I never realized how rusty my skills had gotten.

15,000,000(approx.)-600,000(approx.)=14,400,000

No I'm pretty sure you have to intergrate x/log x from n^2M to n^50M... even then it's an approximation.

Nuri

05-23-2005, 07:55 AM

I've been collecting snapshots from Mike's Summary 100T<p<1000T page (http://www.aooq73.dsl.pipex.com/gaps/Gaps_n3_20_ps0u.htm) every now and then. The timing of data points are pretty arbitrary, but still I guess it'll give you an idea on the progress.

May be not the particular points themselves, but the trends in Daily Change column would give an idea.