Sieve: Distribution of Factors vd. p value

**jjjjL** · 02-05-2003, 10:45 PM

yeah! that's what i like to see. forget this linear sieving business. from now on, just test the numbers that produce factors people!

why sieve a billion numbers when there aren't even a million numbers left to be factored?? just check the correct 650,000 numbers so we can be done with all this!!

-Louie

</sarcasm>

**FatPhil** · 02-06-2003, 04:14 AM

Originally posted by jjjjL
yeah! that's what i like to see. forget this linear sieving business. from now on, just test the numbers that produce factors people!

The funny thing is that to an extent that I've not measured yet, it might be possible to find families of higher-yield primes!

With a single k value the skew would be trivial to find, but with 12 k values they're all competing with their own biases, and the skew is blurred. There is still some bias measurable though.

Much of this bias is already absorbed in the Jacobi test that Payam introduced to Paul and I, so in reality the exploitable bias is now even smaller.

Also add to that the fact that the period of the pattern that could be found would be so large that th chances of finding primes in the high yield families would be much lower, and therefore the bias would be diluted by the fact that you can't make use ofit so often.

Many a true word spoken in jest...

Phil

**jjjjL** · 02-06-2003, 06:40 PM

exactly! bias would be a very interesting property to explore.

the idea of possibly "missing" factors will likely be repugnant to some folks at first but if a fast pass though a range can get 50% of the factors in 10% of the time, it would seem to be worth doing those families first. In fact, if you use specific families, wouldn't it be possible to make the program run in "high yield" and "low yield" modes? That way the purists could come back though a range and expend the other 90% of the time in it to find the other 50% of the factors (using my made up numbers from above) without completely resieving that range? There might be a small overhead from determining families, but it would be more than worth it if even a modest difference in factor densities could be revealed.

for those who like to think hypothetically, this would also imply a few other interesting scenerios... for instance, instead of having a single "optimal" sieve level, there would now be distinct levels for biased and non-biased sieves. in other words, let's say that the ideal sieve limit for our range were 100T. If a new, dual biased/non-biased sieve were to come out, there would be a new factor removal equalibrim established for each type of sieve. and although the two limits would be dependent on the bias delta, i'm pretty sure that the "low yield" optimal sieve level would fall below the original level a bit and the "high yield" optimal sieve level would go quite far above the other limit. This means that, assuming a sieve came out like this, it would be in the siever's best interest to sieve the "high yield" immediately and then only return to the "low yield" once the factors slowed down to the rate at which the low yield sieve could remove factors faster. for my 50%/50% removal sieve that runs at a 90%/10% speed the analysis isn't too hard. the optimal biased sieve level would be a little less than 1.7x the non-biased sieve level. this is because, in the same range, each sieve would produce the same amount of factors, one would just take 9x as long. however, if the range is 1.7x higher, the density of primes is about 9x lower so it would be just as beneficial to run the unbiased sieve again at the lower level.

those numbers are all very corse and could be off a tad... i'm just trying to give an idea of some of the interesting dynamics that could occur.

also, phil, i see what you mean about how it will be difficult to see bias with the multiple k values to worry about. and it most likely wouldn't be worth factoring them seperately... however, such schemes can't nessisarily be ruled out. they would have to be extremely good to offset the parallel benefit of the performing the sieve on all k values at once. it's still good to plan though... as the sieve continues, the parallel benefit is likely to decrease (aka we will eliminate k values as we find primes hopefully

) so individual bias may eventually be worth exploiting.

let us know if you discover a practical way to exploit 2 or more bias levels. as i hint above, if the bias levels can be kept mutually exclusive and the overhead isn't crippling, almost any level of bias (>5% i'd guess) would be beneficial.

-Louie

**Nuri** · 02-13-2003, 09:53 PM

The first chart is 1G to 10T, log-log scale, bands widened progressively to counter the flaring.

The second chart is a zoom in to the 1T-10T range, normal scale, bands widened progressively to counter the flaring. (Trust me, you wouldn't want to see the non-widened version).

**biwema** · 02-17-2003, 08:59 PM

Hi everyone,

As Louie suggested I’ve tried to find a 1P factor as well. This is quite tricky because they become quite sparse. According to Phil’s calculations there are about 11 times fewer factors when p gets 10 times bigger.
Hence I tested the area around 2^50 and planned to scan a range of 100G. After unsucessfully finishing that I continued for another 25G. It is the range from 1125900 – 1126025 G. In that range there should be on average 1 factor every 50G, so the probability of not finding any Factor is about 6%.

Using a Thunderbird 1.4Ghz, sobsieve 1.22 had a speed of 120000 p per second. That makes about 10G per 23 hours. If we expect one factor every 50G, we can eliminate one candidate every 115 hours. What is the size of a sierpinski exponent which can PRPed by a P4 2Ghz in that time? Maybe that information helps to find an appropriate sieving limit.

By the way, if you also want to work in that range, use version 1.22 of sobsieve, because version 1.24 does not work beyond 2^48 due to the optimisation of the code. I hope that I was not just unsuccessful because of a unknown bug which only shows up beyond that limit. Maybe there is someone who is also searching above 2^50 using Phil’s nbegon and finds a factor to verify.

It was also asked in that forum why I had chosen such strange ranges to check before. The answer is quite easy: I just took some ranges near powers of 2.

Is there a p-1 factorer available which can handle 3Mbit siepinski numbers? I would be quite much fun to find huge factors.
Maybe I should concentrate on smaller p as it is quite much more effective. We will see.

biwema (Reto Keiser)

**Joe O** · 02-27-2003, 12:07 PM

Originally posted by jjjjL
I want a 1P factor next time people... quit your slacking

-Louie

Well I tried, but I think that I overdid it . I tried to submit: 30844005186493245889187 | 27653*2^177+1
but it wouldn't take it. The other factor is
171743661917981147488578793152538891 which I know is too big!

**smh** · 02-27-2003, 02:33 PM

But thats cheating, it's just factoring a relatively small number.

I guess it's trivial to find an evne larger example

**Nuri** · 02-27-2003, 04:29 PM

Originally posted by FatPhil
The funny thing is that to an extent that I've not measured yet, it might be possible to find families of higher-yield primes!
Phil

Any good news?

**jjjjL** · 02-28-2003, 12:51 AM

Originally posted by Joe O
Well I tried, but I think that I overdid it . I tried to submit: 30844005186493245889187 | 27653*2^177+1
but it wouldn't take it. The other factor is
171743661917981147488578793152538891 which I know is too big!

Cool. Very neat finds.

The verifier didn't accept them because the factors are stored in the database as 64-bit ints... since both of the known sievers can't break that limit anyway. Which only makes me more curious as to how you found those.

Good job.

-Louie

**Joe O** · 03-02-2003, 02:43 PM

I used ZZMath and verified it with bigint .

I've reached the limit of these tools and need to find something else to use. Any suggestions?

**smh** · 03-03-2003, 03:45 AM

If you are looking for factors of numbers with a low N, you can try GMP-ECM

**biwema** · 03-10-2003, 07:08 PM

Hi,

Finally I managed to find a 1P factor. On average one would expect 1 factor every 55 billion. But there was no factor in a range of 200G. I was even more surprised when I found 3 factors in the following 10G. statistical distribution.

Actually I don't know if we manage to sieve the range that far. At 10^15, the factor removal rate would be still higher than a prp of a 15M sierpinski, but that implies that half of the work has to be done with sieving. If we find a prime, the situation will change anyway.

we will see.

biwema

**Moo_the_cow** · 03-10-2003, 07:30 PM

Excellent job biwema !

Now if I could only find a way to get back into the
top 25 sievers again (according to Mike H's sieve
stats)..........

**Slatz** · 03-29-2003, 03:01 PM

Nuri, can we get an updated graph now that we are approaching 12T sieved!!

Slatz

**Nuri** · 03-29-2003, 06:18 PM

Sure. I'll update it this weekend.

**Nuri** · 03-30-2003, 12:32 PM

Here's the updated graph as of today's results.txt file. The concept is the same. Only difference is, the formula of the blue line is now 40000 = y * (x^1.06) => y = 40000 / (x^1.06).

Regards.

**Slatz** · 03-30-2003, 07:35 PM

thanks for the graph...looks like the sieving is still doing better than prp testing

Slatz

**Nuri** · 03-30-2003, 07:41 PM

Ah, that's for sure. As far as the depth of sieving we should go forward, I can confidently say that we're still at the beginning.

Just a quick and dirty calculation:

prp test @ 3m takes 2 days at my pc.

I can find ~10 factors per day at the same machine (p=10T).

The factors have n=11.5m (ie. (3+20)/2) on the average. I don't have data on a prp test @11.5m, but I guess it should take ~10 days (ie. 0.1 test per day).

So, it's something like 10 vs. 0.1.

Of course, actual effectiveness is not as 100 times higher;

- there are some duplicate factors, (7% of factors above 1T so far)
- we're sieving some ranges that we've already finished prp testing (2% of the 3m-20m range so far. This will become 6% when prp testing reach 4m, 12% @ 5m, 41% @ 10m, etc. But I guess this will not be a problem. I guess we'll be able to adjust our sob.dat files if need arises.)
- hopefully we'll find some new primes soon, and the factor data for those k's will be useless, because we'll not need to test those any more (I can not quantify that, but I think none of us will be upset for the loss of effectiveness in this respect

)

Regards,

Nuri

**Mystwalker** · 03-31-2003, 04:56 AM

I'm not sure whether or not we should adjust the lower bound for the sieving, as it is useful for a double-check.
Apart from that, I totally agree with you.

**Joe O** · 04-24-2003, 03:33 AM

Is it time for graphs and/or gap analysis?

**MikeH** · 04-24-2003, 12:57 PM

Here's a bit of gap analysis while you're waiting for Nuri's graphs.

I've attached the raw output from my gap (hole) finder.

To summarise we have about 5G left to sieve in order to complete p<20T.

I've checked all the reported gaps, and all are still being sieved, so no obvious holes present right now.

**Nuri** · 04-24-2003, 06:22 PM

Sure, I'll take a look during weekend.

PS: I'm planning to create another chart for DC sievers as well (and I am really excited to see how it will look like).

But I'll probably put in the DC sieve thread not to confuse anybody.

**Joe O** · 05-02-2003, 08:52 PM

Did you really mean this:

Code:

 Range (n): 3.00G - 20.00M

or did you mean this:

Code:

 Range (n): .3M - 20.00M

Anyway, can you tell me how the "Swiss Cheese" looks today?

**MikeH** · 05-03-2003, 06:41 AM

Ah, well spoted Joe. It really should have been

Range (n): 3.00M - 20.00M

I'll get back with the latest holey stuff in a short while.

**MikeH** · 05-03-2003, 07:54 AM

OK, here it is.

Gap summary 3.1G left to complete p<20T.

Haven't checked any of these gaps to see if they are real holes, but when I checked a few days ago, all was in order.

**Nuri** · 05-04-2003, 10:07 PM

Originally posted by Nuri
Sure, I'll take a look during weekend.

PS: I'm planning to create another chart for DC sievers as well (and I am really excited to see how it will look like). But I'll probably put in the DC sieve thread not to confuse anybody.

Joe et al,

I'm waiting for a bit further progress before the update. That's why I didn't put it last weekend. But I guess I'll do it within this week.

While this is the case for the main sieve graph, I've generated the first graph for DC sieve. You can find it on the Sieve Double Checking (1<n<3M) thread.

**Nuri** · 05-11-2003, 09:41 PM

Here's the updated graph for distribution of factors vs. p value at the main sieve.

There's no change in the formula. The enhancement on the blue line (in terms of fitting the actual data) towards the left side of the graph is basically due to the technique used in fitting the graph. I can explain the reason later if anybody is interested.

Anyway, if the formula is right, from the ranges where we stand, it suggests:

A total of 2,300 new factors up to 20T,
A total of 22,000 new factors up to 50T (19,700 of which is for 20T-50T),
A total of 36,100 new factors up to 100T (14,100 of which is for 50T-100T),
A total of 49,700 new factors up to 200T (13,600 of which is for 100T-200T), and
A total of 57,400 new factors up to 300T (7,700 of which is for 200T-300T).

PS: Please note that the figures above includes duplicates as well.

Next update will not come up until we reach 30T.

Regards,

Nuri

**MikeH** · 05-17-2003, 06:15 AM

I have updated the gap analysis stuff. It is now avaliable (updated daily), along with sieve scores, and links to the three different sob.dat files here.

When I have a little more time, I'll take various of the good words from Nuri, Joe, etc. add apropriate links, and try to turn this into a "one stop shop" for SoB sieving.

Mike.

**olaright** · 05-17-2003, 12:59 PM

Great page!!!

Could you post the different sobsieve executables too?

(since louie doesnt update his page and browsing this forum isn't the most exciting thing to do)

regards,

ola.

**Joe O** · 05-18-2003, 03:56 PM

Try this post.

**biwema** · 05-24-2003, 07:39 PM

Hi,

Here I have 2 candidates for the largest accepted gap:

candidate 1: between
1125875640686911 and
1126101624584753
Size: 225.98 G

1 factor expected every 64.9 G,
so it is 3.47 times the average distance between factors of that size

candidate 2: between
281406122591719 and
281479089948269
Size: 72.95 G

1 factor expected every 14.94 G,
so it is 4.88 times the average distance between factors of that size

Both gaps are completely tested.
Sometimes it was quite tiring to wait so long for the next factor...

Abaout the list of the gaps: maybe it is more accurate to detect the gaps using the distribution equation (40000/nG^1.06). For example, listing all gaps whose probability being that big is less than 1% or so.

biwema

**Nuri** · 05-24-2003, 10:32 PM

Thanks for the input biwema. It helps us understand the nature of the distribution of factors on high p ranges, where we hopefully will be sieving for the months to come.

Just a few comments for others who are not familiar with the hole finding issue:

Mike's hole stats is concentrated on where the main activity is going on (i.e. 3M < n < 20M, and 20T < p < 30T currently). The holes biwema mentioned above are at p=~1126T(T=trillion), and p=~281T. So, it's normal that those holes can not be seen on Mike's hole stats.

AFAIK, Mike uses p/x as one of the methods to find the size of largest accepted gap. Current x in the p/x formula is 1900. So, the size of the largest acceptable gap for, say p=~19T is ~10G.

I find this approach reasonable. In fact, I was using the same thing (with x=3000) when I was looking for holes at lower ranges (up to 10T).

If x=1900 is used, the largest acceptable gap would be 592G (vs 225.98 G) and 148G (vs 72.95 G) for 1126T and 281T, respectively.

May be, the x value decreases (the size of gaps relative to p value increases) as we go up in p. This also seems reasonable to me. The 1.06 figure in 40000/nG^1.06 formula suggests that the density of factors diminish "a bit" faster than the increase in the size of p. Anyways, we'll see.

**biwema** · 07-07-2003, 09:03 AM

Nuri, are there new graphs about the distribution of factors available? I think it would be quite interesting.

It seems, as if the approximation is still correct around 40T.

On the other hand, I completed a range of 460G around 281T and found only 18 factors instead of 30. I agree, that might just be the statistical distribution, but in that range, there were 3 gaps of >70G each and one of 56G. That seems somewhat suspicious.
I hope there is no software related bordereffect near 2^48.
Maybe it makes sense to doublecheck with a different version of the sieve sometime.

biwema

**Nuri** · 07-08-2003, 03:11 PM

Here's the latest update as of 07 July 2003.

The formula seen in the charts is the line equation for excel's trendline. It suggests y = 45840 / (x^1.0759). In fact, the red line is very close to the former one, just rotated clockwise (meaning less factors at deeper sieve levels). That formula, and the former one [ y = 40000 / (x^1.06) ] both use all factor data available, which starts at 1G (one billion).

Due to the clockwise rotation effect, I've also looked at what excel would have suggested if the data for below 1T was ignored. Taking into consideration data above 1T only, the line rotates futhermore, to y = 60008 / (x^1.1045).

And as a last checkpoint, the same with data above 10T only. As expected, more rotation... y = 77140 / (x^1.1293).

Naturally, the number of expected factors do change depending on what formula is used. While the original formula suggests a total of 76,500 factors up to 2000T (2000000G), the formula which takes only 10T+ data into consideration suggests a total of 62,700 factors up to 2000T.

I'll post what the four formulas suggest within a few minutes. Also, for ease of comparison with Mike's data, I'll be giving expected number of factors for the same ranges.

**Nuri** · 07-08-2003, 03:20 PM

Here's the comparison of formulas (i.e. how many remainining factors they suggest at current sieve level).

**Nuri** · 07-08-2003, 03:36 PM

BTW, considering current settings (sieve speed, number of PCs, number of remaining ks, etc.) it would take 6 months to finish 100T, and 15 years to finish all the way up 2000T.

**Slatz** · 10-03-2003, 05:25 PM

Now that we are approaching 100 T in seiving...how do all of your graphs look?

Slatz

**Mystwalker** · 10-04-2003, 05:41 AM

Originally posted by Nuri
BTW, considering current settings (sieve speed, number of PCs, number of remaining ks, etc.) it would take 6 months to finish 100T, and 15 years to finish all the way up 2000T.

Hm, did you apply some special maths?
As it takes 0.5 years to complete 100T, one might suspect that 20 times that range would take at most 20 times as long, thus 10 years.
Sieve speed improvements, increased PC count, faster PCs (and, as you've mentioned, possible k elemination) left unconsidered...

**ceselb** · 10-04-2003, 09:06 AM

That quote is from july, things have sped up since then.

**Mystwalker** · 10-04-2003, 09:55 AM

Ah, I see...

I shouldn't be allowed to reanimate such old threads...

Thread: Sieve: Distribution of Factors vd. p value

Thread Tools

Rate This Thread

Display

searching big factors

Nuri,

MikeH

largest accepted gap

Posting Permissions