PDA

View Full Version : Sieve: Distribution of Factors vd. p value



Nuri
01-27-2003, 05:37 PM
Using the data on results.txt file, I made the attached graph.

The data is number of factors found per 1G p intervals. (Data for non completed 1G intevals are pooled out ie. if there are onlt factors until xx.6 billion, I assumed that means the full 1G range was not submitted yet, so omited that range).

Well, I guess if an always updated SoB.dat was used throughout the p range, it should have been more downwards as p increases, but I don't know exactly how much.

Regards,

Nuri

jjjjL
01-27-2003, 06:21 PM
that's a really cool graph. i'm glad to see someone found a good use for the result.txt file i setup.

btw, it would be interesting if someone wrote a "hole" detector that looked though results.txt to make sure ranges had the expected number of factors in it. that way if someone accidentally missed a patch, it could be easily spotted and filled in. I'll do it for p < 100G once the other two people in that range are complete but i'm more or less counting on the community to monitor the rest themselves.

think about it if you have time and want an easy scripting/coding project that could help the project out.


-Louie

Moo_the_cow
01-27-2003, 09:41 PM
I hope that this won't be a pain, but could you update the graph
once a week? I just want to see how accurate your prediction
was.

nuutti
01-28-2003, 05:22 AM
Hello,

Can you tell me exact link to results.txt file ?
I forgot where I can find it.

Nuutti

alexr
01-28-2003, 05:29 AM
http://www.seventeenorbust.com/sieve/results.txt

nuutti
01-28-2003, 05:33 AM
Thanks !

Nuutti

jjjjL
01-28-2003, 11:09 AM
Someone emailed me this so I figured I'd post it. Let me know if it works:

-Louie



your idea of writing a programm searching holes in results.txt is quite qood. i wrote i little programm in delphi. it shows the number of factors in results.txt for each G. i used a formula to estimate the factors for each G. if the factors in results.txt are under 80% of the estimated value my programm will show them.

check it out at http://www.geocities.com/yummie_55555/holes.zip

here is an small output from my program of ranges allready shown comleted in the Sieve Coordination Thread:
258G has 10%
315G has 12%
638G has 23%
650G has 32%
774G has 44%
of the estimated number of factors. perhaps some are missing

PS: i cant post to forum couse i deleted my email needed for varification of the registration. that doesnt matter. post it in the forum if you want so.

Thommy

Nuri
01-28-2003, 04:20 PM
Originally posted by Moo_the_cow
I hope that this won't be a pain, but could you update the graph
once a week? I just want to see how accurate your prediction
was.

Please do not consider this as a promise, but to be honest, I was already doing it for myself and it would not be a big problem to post it (I don't know, may be once a week or may be more/less frequently if submitted factors increase significantly).

On the prediction of 10,000 - 10,001 range, I should admid that I simply made it up. In other words, I wish that you find at least three factors, but no intelligent mechanism behind.

Well, on the hole detecting issue, unfortunately I don't know any programming language, but I am making tons of analysis/number crunching everyday on excel, it's a part of my job. :D

Anyway, I can detect big holes while generating the graph, but please consider this as a doble check. A simple script would definitely be more useful, and easier to implement.

As an alternative to the estimated density per G, I simply look at the difference between two submitted p values. If the difference increases significantly (like more than x times the average of the difference between last 10 p values), then it signals a hole (or range not submitted yet) to me.

PS: I personally prefer not to assume there are any holes unless the full range is announced to be complete by the siever. None of us can know how the siever distributed the range between boxen and/or how frequent he/she is submitting etc. Looking at the above figures for example, it seems like in most of the cases they sent all of the factors found so far at the time they submitted them, ie. sieving was not yet finished on that particular range.

nuutti
01-28-2003, 05:04 PM
to Nuri : You don't speak fluent Excel vba ?

Nuutti

Nuri
01-28-2003, 05:32 PM
Originally posted by nuutti
to Nuri : You don't speak fluent Excel vba ?

Nuutti

Well, to be honest, not really. In fact, I learned some back in those undergraduate and MBA days (I know, it's a shame to talk about MBA in a prime number related forum). :o But I found no need to use any in six years of business life, so forgot everything. It resembles of my German to me, "Ich erinnere nichts!".

Edit: To any German speaking friend. Please do not fell offended. None intended. :)

nuutti
01-28-2003, 05:43 PM
OK. I do number crunching sometimes and I have found that vba is sometimes very useful. I have also used it primenumbers related projects like when I have to find 10,000 first primes :-)
and also when I have to managed my numerous result files
in defferent projects.

Nuutti

Nuri
01-28-2003, 05:46 PM
The new uploaded 20,000 factors deserve an update.

olaright
01-29-2003, 09:36 AM
Very nice indeed! :cheers:


Did you let excel fit the graph, or did you plot it using an equation.
In any case, could you give me the equation? (Maybe I'll write a hole detector :)

Cheers,
Ola

Mystwalker
01-29-2003, 11:49 AM
It could be interesting to see the correlation between removing density and n removed. I wonder if there's a connection...

btw. it should better be "Ich erinnere mich an nichts." :D

Nuri
01-29-2003, 12:40 PM
Originally posted by Mystwalker
btw. it should better be "Ich erinnere mich an nichts." :D

You see, that's exactly what I meant. :D

Nuri
01-29-2003, 01:16 PM
Originally posted by olaright
Did you let excel fit the graph, or did you plot it using an equation.

Ok, that's what I assume about the relationship btw. range and # factors.

At the very beginning, I was just curious about if range * factors = constant (like, 100G*300 = 300G*100 = 1,000G*30 = 10,000G*3 etc)

I simply chose the type of trendline that I think should fit best, which is power. That's available in excel.

y = a * x^b

where
a =1, b =-1 (my initial assumptions),
x = p range, y = # factors

=> ~ 30,500 = y * x (for the data so far)

I think, this will definitely change (downwards) as we update the SoB.dat file, simply because some of the factors will not be submitted again for larger composites as they are already removed at smaller ranges.

Mystwalker
01-29-2003, 03:03 PM
I just checked the distribution of found factors over all n's.
The attached text file holds the count, grouped by 100K intervals. Right now, there are ~250-330 values removed per interval.
While writing this program, I saw that caselb sent in some values < 3M. That made me curious...


Oh, and concerning that "You see, that's exactly what I meant. :D:
http://www.mystwalker.de/tb/stackenblochen.avi :rotfl:

ceselb
01-29-2003, 03:34 PM
That's my 4847 sieve results on my last two ranges. Louie wanted them submitted to the database. Please feel free to igore them. :cool:

Mystwalker
01-29-2003, 04:23 PM
Already done. Java doesn't like negative array indexes. ;)

FatPhil
01-31-2003, 05:02 AM
Originally posted by Nuri
Using the data on results.txt file, I made the attached graph.

Wonderful!

To my eye, it's looking like 1/log(p), which is exactly as predicted. If you want to try to get a straight line, then simply plot #factors removed against log(p). Can you do another chart with those axes please, I'd be very interested to see it.

I've updated my comments on sieve depth at http://fatphil.org/maths/sierpinski/
And this time there's a, shock horror, numeric prediction!

Better than that, there are several, as I indicate why the first prediction is likely to be wrong!

It will make a lot more sense when the whole range has been sieved to 1T, and I can re-do the predictions with greater accuracy. I'll see if I can get Jack Brennen or Yves Gallot to perform some of their wizardry with the data too. They know this field better than anyone.

Phil

FatPhil
01-31-2003, 05:26 AM
Originally posted by FatPhil
To my eye, it's looking like 1/log(p)

Too early in the morning... the derivative thereof (and the negative of that - third time lucky!)
I.e. it ought to slowly creep below the 1/p line (by a squared logarithmic rate, which to be honest isn't actually obvious to the eye.)
Plotting #cands*p verses p should show the logarithmic factors (without them, it would be a flat line), but it will be one that flattens off quite quickly, and as the sieving has already got to 5.7G maybe it's already quite flat. Sieving results from down to 1G, or even 100M would make the trend more obvious.
Plotting #cands*p verses log(p) should give a straight line.

Phil

Nuri
01-31-2003, 06:04 PM
WOW. - 70,000 factors submitted above 100G. For comparison, the first graph was drawn with ~21,000 factors, and the second graph with ~41,000 factors.

IF my above assumption holds, and if there will be no SoB.dat file update, then I'd expect to have roughly 115,000 factors by the time all factors submitted below 5T (5,000G). That is, roughly 45,000 factors more. AFAIK, the current .dat file has 746961 candidates. If it's updated with a newer file that has, let's say 8/9 candidates ie. 663,965 candidates (although I guess what we have is roughly 690,000 as of today), and everybody starts using it right now, then we should have 110,000 factors submitted for 100G - 5T range (70,000+45,000*8/9).

Of course, slightly less candidates will be removed, because we have duplicates.

Anyways, about the graph:

Red line is the same as black line in above graphs (excel's power trendline)
Blue line is y = a * x^b I mentioned above. Only difference is, with the current data the function is 29,405 = y * x. The change in the constant comes from the updates we made on SoB.dat file. I calculate the constant as the average of p(in Gs)*# factors for that G. That effect can be seen on the graph too. The blue line assumes less factors per G at the lower p values and higher factors at higher p values, because it uses average of all submissions. On the other hand, red line uses actual data, so shows higher factors at low p (when we used a larger .dat file) etc.

I will post on the next post what the function and the trendline assumes for p up to 5T.

Nuri
01-31-2003, 06:08 PM
As mentioned above, this is what I expect up to 5T (based on current data). :D

Nuri
01-31-2003, 06:25 PM
Originally posted by FatPhil
To my eye, it's looking like 1/log(p), which is exactly as predicted. If you want to try to get a straight line, then simply plot #factors removed against log(p). Can you do another chart with those axes please, I'd be very interested to see it.

Phil,

I plotted both alternatives. Please let me know if you meant something else.

Nuri

FatPhil
02-01-2003, 02:51 AM
Originally posted by Nuri
I plotted both alternatives. Please let me know if you meant something else.


Yeah, sorry Nuri, I've been messin' you about with my early morning bug-ridden postings. My second post had the correct requests. As it's early morning again, I'm frightened to make any requests!
sqrt(delta(cands)/p) vs. log(p)
and
delta(cands)/log(p)^2 vs. p
should both be straight lines (where delta(cands) = number of candidates removed by that p-band.

The logic behind that is that if
remaining cands ~= 1/log(p)
then
delta(cands) ~= d/dp (1/log(p))
= -1/p * 1/log(p)^2

You can smooth the results by taking larger p bands. You're not short of data points, so you can afford to do this.

Phil

Nuri
02-01-2003, 07:31 AM
Ok, it's no problem at all. I wouldn't bother on the analysis at all if I wasn't curious too. The point is, every idea (or request) helps.

I guess I understood what you are looking for. I'll take a look at it this weekend.

frmky
02-01-2003, 02:42 PM
here is an small output from my program of ranges allready shown comleted in the Sieve Coordination Thread:
258G has 10%
315G has 12%
638G has 23%
650G has 32%
774G has 44%
of the estimated number of factors. perhaps some are missing


I sieved 638G and a quick check confirms that all have been submitted. There's probably an equivalent number of ranges with a more than expected number of factors.

Greg

jjjjL
02-03-2003, 02:50 PM
to keep bandwidth down on the main server, updates to results.txt will now only shows p > 200 billion (instead of 100 billion). this file is still located at:

http://www.seventeenorbust.com/sieve/results.txt

and it's still updated every night at 3:40am EST.

however, there is now a static file that shows new data previously not available for all factors between 1 - 200 billion. it is available on my umich space, off the server at:

http://www-personal.engin.umich.edu/~lhelm/results200.txt

this data will be useful for future trend estimation, graphs, and hole detection that the community may want to do. a quick look though the data appears to indicate that 1-200 billion is entirely complete but i certainly haven't done extensive checking myself so anyone interested is throughly encouraged to look though to help verify the data.

hopefully this data makes it into one of the fine graphs that have been produced lately too. :)

-Louie

Nuri
02-03-2003, 04:37 PM
It may be a good idea to put zipped versions of both files too. Just an idea.

I think most people would prefer downloading that version, and then browsing through after unzipping.

I guess that would be a wiser way of using the bandwith on both sides. Zipping reduces the file size by a factor of three and I guess it's worth it.

Of course, .txt versions should also be kept for those who still prefer to look at the data from their browser.

PS: Notepad has problems with columns, but wordpad works fine.

EDIT: OMG! It took 30 min to download result200.txt (6MB)... :scared:

Nuri
02-03-2003, 05:30 PM
Wow! That graph is really sexy. :D

Nuri
02-03-2003, 05:35 PM
As I mentioned earlier, blue line is the straight line which uses average of all submitted factors, whereas red line is the trendline based on actual data. The difference comes from the changes we make on SoB.dat file.

Regards,

Nuri

Nuri
02-03-2003, 06:24 PM
Originally posted by jjjjL
a quick look though the data appears to indicate that 1-200 billion is entirely complete but i certainly haven't done extensive checking myself so anyone interested is throughly encouraged to look though to help verify the data.

I found 49585693249 - 50006546839 range missing.

Doing it. Will submit as soon as it finishes. :D

Edit: There is another hole on 11971526549 - 12000264097. Sieving that too.

I suspect no more holes on 1G-200G range.

Regards,

Nuri

Nuri
02-03-2003, 06:56 PM
Cool. :D

74 new factors on the 11971526549 - 12000264097 range (with the latest SoB.dat).

Finished and submitted.

49585693249 - 50006546839 will take a couple of hours. Will report later. (I expect ~315 factors there too).

It was definitely worth scanning through (especially now that we dropped to 10-20 factors per G nowadays).

Regards,

Nuri

jjjjL
02-03-2003, 08:28 PM
Nuri - looks like it was a good idea to make a text dump of the lower factor ranges ;)

great job! i can't tell you how happy i am to have you working with us. :) between finding two factor holes and the awesome graphs... you've been doing lots of great work... stuff that i probably couldn't have done on my own. it's nice to have such great helpers. thanks again.

-Louie

Nuri
02-04-2003, 01:42 AM
You're welcome. No need to mention names, but I would never devote that much time of mine for some other DC projects I participated earlier. The wonderful atmosthere here at SoB encourages the contribution I make.

BTW, a total of 317 (=74+243) new factors found and submitted in two holes below 200G mentioned above.

FatPhil
02-04-2003, 05:08 AM
Originally posted by Nuri
Wow! That graph is really sexy. :D


To counter the flaring that you get a the high ps, you can either band the ps by their logs rather than by their numeric p value, or simply make the bands 10 times wider.

In some ways the log/log graph looks depressing as it says "soon you'll be removing no more factors" (as the #candidates value plummets), however, it's not "soon" at all, as it takes longer and longer to progress to the right along the logarithmic p axis.

In real-world terms, what the users see, sieving doesn't slow down much at all, as you spend _most_ of the time in the "still a bit better than PRP-ing" zone. This initial spurt is alas going to start run out soon, and then everyone's in for the long haul of only one factor every few hours (i.e better than sieving), which will continue for quite a while. (Which is one reason knowing when to stop sieving is so difficult.)

Keep these graphs coming, they're good to look at.

Louie - would it be possible for you to stick up a table of #candidates remaining verses completed sieving depth so that I can update my predictions, please?

Cheers,
Phil

Nuri
02-04-2003, 02:56 PM
Originally posted by FatPhil To counter the flaring that you get a the high ps, you can either band the ps by their logs rather than by their numeric p value, or simply make the bands 10 times wider. I'll try that on the next update. :)

In some ways the log/log graph looks depressing as it says "soon you'll be removing no more factors" (as the #candidates value plummets), however, it's not "soon" at all, as it takes longer and longer to progress to the right along the logarithmic p axis. I agree with you on that it looks depressing. In fact, extrapolating the line suggests 1 factor per G at somewhere around 32T (32,000G).
Keep these graphs coming, they're good to look at.
Sure. But it will not be as frequent as it was until now. I guess there is no need to update more than once a week any more.

jjjjL
02-05-2003, 02:59 PM
Originally posted by FatPhil

Louie - would it be possible for you to stick up a table of #candidates remaining verses completed sieving depth so that I can update my predictions, please?

Cheers,
Phil

Sure. What exactly do you want?

As of right now, there are 251669 factors are for n > 3000000.

leaving 659590 n values for the 12 k where n > 3000000 (and < 20000000).

divisor[p] # factors
1-10G 89458
10-100G 77415
100-1T 66045
1T - 10T 18744
10T+ 7

I just grabed those from the db. 1-10G is done. 10-100G is done. 100G-1T is very close to complete but does have a couple G holes... it will probably go up a few hundred when done. 1-10T isn't even half complete... well, linearly speaking.

The other 7 factors are beyond me. I knew of

10000127155987 | 55459*2^3236734+1

which Moo submitted...

the other 6 over 10T were submitted by someone who didn't even log into the site:

17598251446627 | 24737*2^11993551+1
17598663878903 | 21181*2^6492260+1
17598830641963 | 5359*2^13178142+1
17598922454137 | 10223*2^9388157+1
17599242410803 | 5359*2^10545390+1

and

100005426934979 | 10223*2^15670937+1 :)

yeah, that's 100T+... and it only took 5.5G to find it assuming the dude started at 100T even. I want a 1P factor next time people... quit your slacking :p ;)

-Louie

Moo_the_cow
02-05-2003, 10:15 PM
Yeah!!!!!!!

I just hit the jackpot (well sort of ;) )

25000000918691 | 33661*2^17056080+1 :D

I was kind of bored... and I felt like sieving 25T-25.001T for
the fun of it........and I was shocked to find that factor, which
took just a few seconds. Whew...... if I'm this lucky, I'll find
1,088 factors in this 1G range :) I'm not reserving that
range, however, so competition is welcome.

ceselb
02-05-2003, 10:37 PM
You've got some competition. Check the Sieve Coordination Thread. :D :thumbs:

jjjjL
02-05-2003, 10:45 PM
yeah! that's what i like to see. forget this linear sieving business. from now on, just test the numbers that produce factors people! :thumbs:

why sieve a billion numbers when there aren't even a million numbers left to be factored?? just check the correct 650,000 numbers so we can be done with all this!! :D


-Louie




</sarcasm> ;)

FatPhil
02-06-2003, 04:14 AM
Originally posted by jjjjL
yeah! that's what i like to see. forget this linear sieving business. from now on, just test the numbers that produce factors people! :thumbs:


The funny thing is that to an extent that I've not measured yet, it might be possible to find families of higher-yield primes!

With a single k value the skew would be trivial to find, but with 12 k values they're all competing with their own biases, and the skew is blurred. There is still some bias measurable though.

Much of this bias is already absorbed in the Jacobi test that Payam introduced to Paul and I, so in reality the exploitable bias is now even smaller.

Also add to that the fact that the period of the pattern that could be found would be so large that th chances of finding primes in the high yield families would be much lower, and therefore the bias would be diluted by the fact that you can't make use ofit so often.

Many a true word spoken in jest...

Phil

jjjjL
02-06-2003, 06:40 PM
exactly! bias would be a very interesting property to explore.

the idea of possibly "missing" factors will likely be repugnant to some folks at first but if a fast pass though a range can get 50% of the factors in 10% of the time, it would seem to be worth doing those families first. In fact, if you use specific families, wouldn't it be possible to make the program run in "high yield" and "low yield" modes? That way the purists could come back though a range and expend the other 90% of the time in it to find the other 50% of the factors (using my made up numbers from above) without completely resieving that range? There might be a small overhead from determining families, but it would be more than worth it if even a modest difference in factor densities could be revealed.

for those who like to think hypothetically, this would also imply a few other interesting scenerios... for instance, instead of having a single "optimal" sieve level, there would now be distinct levels for biased and non-biased sieves. in other words, let's say that the ideal sieve limit for our range were 100T. If a new, dual biased/non-biased sieve were to come out, there would be a new factor removal equalibrim established for each type of sieve. and although the two limits would be dependent on the bias delta, i'm pretty sure that the "low yield" optimal sieve level would fall below the original level a bit and the "high yield" optimal sieve level would go quite far above the other limit. This means that, assuming a sieve came out like this, it would be in the siever's best interest to sieve the "high yield" immediately and then only return to the "low yield" once the factors slowed down to the rate at which the low yield sieve could remove factors faster. for my 50%/50% removal sieve that runs at a 90%/10% speed the analysis isn't too hard. the optimal biased sieve level would be a little less than 1.7x the non-biased sieve level. this is because, in the same range, each sieve would produce the same amount of factors, one would just take 9x as long. however, if the range is 1.7x higher, the density of primes is about 9x lower so it would be just as beneficial to run the unbiased sieve again at the lower level.

those numbers are all very corse and could be off a tad... i'm just trying to give an idea of some of the interesting dynamics that could occur.

also, phil, i see what you mean about how it will be difficult to see bias with the multiple k values to worry about. and it most likely wouldn't be worth factoring them seperately... however, such schemes can't nessisarily be ruled out. they would have to be extremely good to offset the parallel benefit of the performing the sieve on all k values at once. it's still good to plan though... as the sieve continues, the parallel benefit is likely to decrease (aka we will eliminate k values as we find primes hopefully ;)) so individual bias may eventually be worth exploiting.

let us know if you discover a practical way to exploit 2 or more bias levels. as i hint above, if the bias levels can be kept mutually exclusive and the overhead isn't crippling, almost any level of bias (>5% i'd guess) would be beneficial.

-Louie

Nuri
02-13-2003, 09:53 PM
The first chart is 1G to 10T, log-log scale, bands widened progressively to counter the flaring.

The second chart is a zoom in to the 1T-10T range, normal scale, bands widened progressively to counter the flaring. (Trust me, you wouldn't want to see the non-widened version). :D

biwema
02-17-2003, 08:59 PM
Hi everyone,

As Louie suggested I’ve tried to find a 1P factor as well. This is quite tricky because they become quite sparse. According to Phil’s calculations there are about 11 times fewer factors when p gets 10 times bigger.
Hence I tested the area around 2^50 and planned to scan a range of 100G. After unsucessfully finishing that I continued for another 25G. It is the range from 1125900 – 1126025 G. In that range there should be on average 1 factor every 50G, so the probability of not finding any Factor is about 6%.

Using a Thunderbird 1.4Ghz, sobsieve 1.22 had a speed of 120000 p per second. That makes about 10G per 23 hours. If we expect one factor every 50G, we can eliminate one candidate every 115 hours. What is the size of a sierpinski exponent which can PRPed by a P4 2Ghz in that time? Maybe that information helps to find an appropriate sieving limit.

By the way, if you also want to work in that range, use version 1.22 of sobsieve, because version 1.24 does not work beyond 2^48 due to the optimisation of the code. I hope that I was not just unsuccessful because of a unknown bug which only shows up beyond that limit. Maybe there is someone who is also searching above 2^50 using Phil’s nbegon and finds a factor to verify.

It was also asked in that forum why I had chosen such strange ranges to check before. The answer is quite easy: I just took some ranges near powers of 2.

Is there a p-1 factorer available which can handle 3Mbit siepinski numbers? I would be quite much fun to find huge factors.
Maybe I should concentrate on smaller p as it is quite much more effective. We will see.

biwema (Reto Keiser)

Joe O
02-27-2003, 12:07 PM
Originally posted by jjjjL
I want a 1P factor next time people... quit your slacking :p ;)

-Louie

Well I tried, but I think that I overdid it . I tried to submit: 30844005186493245889187 | 27653*2^177+1
but it wouldn't take it. The other factor is
171743661917981147488578793152538891 which I know is too big!

smh
02-27-2003, 02:33 PM
But thats cheating, it's just factoring a relatively small number.;)

I guess it's trivial to find an evne larger example

Nuri
02-27-2003, 04:29 PM
Originally posted by FatPhil
The funny thing is that to an extent that I've not measured yet, it might be possible to find families of higher-yield primes!
Phil

Any good news?

jjjjL
02-28-2003, 12:51 AM
Originally posted by Joe O
Well I tried, but I think that I overdid it . I tried to submit: 30844005186493245889187 | 27653*2^177+1
but it wouldn't take it. The other factor is
171743661917981147488578793152538891 which I know is too big!

Cool. Very neat finds.

The verifier didn't accept them because the factors are stored in the database as 64-bit ints... since both of the known sievers can't break that limit anyway. Which only makes me more curious as to how you found those. ;)

Good job.

-Louie

Joe O
03-02-2003, 02:43 PM
I used ZZMath (http://www.spacefire.com/numbertheory/zmathdoc.htm) and verified it with bigint (http://www.acme.com/software/bigint/) .

I've reached the limit of these tools and need to find something else to use. Any suggestions?

smh
03-03-2003, 03:45 AM
If you are looking for factors of numbers with a low N, you can try GMP-ECM

biwema
03-10-2003, 07:08 PM
Hi,

Finally I managed to find a 1P factor. On average one would expect 1 factor every 55 billion. But there was no factor in a range of 200G. I was even more surprised when I found 3 factors in the following 10G. statistical distribution.

Actually I don't know if we manage to sieve the range that far. At 10^15, the factor removal rate would be still higher than a prp of a 15M sierpinski, but that implies that half of the work has to be done with sieving. If we find a prime, the situation will change anyway.

we will see.

biwema

Moo_the_cow
03-10-2003, 07:30 PM
Excellent job biwema ! :cheers:

Now if I could only find a way to get back into the
top 25 sievers again (according to Mike H's sieve
stats)..........

Slatz
03-29-2003, 03:01 PM
Nuri, can we get an updated graph now that we are approaching 12T sieved!!

Slatz

Nuri
03-29-2003, 06:18 PM
Sure. I'll update it this weekend.

Nuri
03-30-2003, 12:32 PM
Here's the updated graph as of today's results.txt file. The concept is the same. Only difference is, the formula of the blue line is now 40000 = y * (x^1.06) => y = 40000 / (x^1.06).

Regards.

Slatz
03-30-2003, 07:35 PM
thanks for the graph...looks like the sieving is still doing better than prp testing

Slatz

Nuri
03-30-2003, 07:41 PM
Ah, that's for sure. As far as the depth of sieving we should go forward, I can confidently say that we're still at the beginning.

Just a quick and dirty calculation:

prp test @ 3m takes 2 days at my pc.

I can find ~10 factors per day at the same machine (p=10T).

The factors have n=11.5m (ie. (3+20)/2) on the average. I don't have data on a prp test @11.5m, but I guess it should take ~10 days (ie. 0.1 test per day).

So, it's something like 10 vs. 0.1.


Of course, actual effectiveness is not as 100 times higher;

- there are some duplicate factors, (7% of factors above 1T so far)
- we're sieving some ranges that we've already finished prp testing (2% of the 3m-20m range so far. This will become 6% when prp testing reach 4m, 12% @ 5m, 41% @ 10m, etc. But I guess this will not be a problem. I guess we'll be able to adjust our sob.dat files if need arises.)
- hopefully we'll find some new primes soon, and the factor data for those k's will be useless, because we'll not need to test those any more (I can not quantify that, but I think none of us will be upset for the loss of effectiveness in this respect:D)

Regards,

Nuri

Mystwalker
03-31-2003, 04:56 AM
I'm not sure whether or not we should adjust the lower bound for the sieving, as it is useful for a double-check.
Apart from that, I totally agree with you. :)

Joe O
04-24-2003, 03:33 AM
Is it time for graphs and/or gap analysis?

MikeH
04-24-2003, 12:57 PM
Here's a bit of gap analysis while you're waiting for Nuri's graphs.

I've attached the raw output from my gap (hole) finder.

To summarise we have about 5G left to sieve in order to complete p<20T.

I've checked all the reported gaps, and all are still being sieved, so no obvious holes present right now.

Nuri
04-24-2003, 06:22 PM
Sure, I'll take a look during weekend.

PS: I'm planning to create another chart for DC sievers as well (and I am really excited to see how it will look like). :D But I'll probably put in the DC sieve thread not to confuse anybody. :confused:

Joe O
05-02-2003, 08:52 PM
Did you really mean this:


Range (n): 3.00G - 20.00M

or did you mean this:


Range (n): .3M - 20.00M

Anyway, can you tell me how the "Swiss Cheese" looks today?

MikeH
05-03-2003, 06:41 AM
Ah, well spoted Joe. It really should have been

Range (n): 3.00M - 20.00M

I'll get back with the latest holey stuff in a short while.

MikeH
05-03-2003, 07:54 AM
OK, here it is.

Gap summary 3.1G left to complete p<20T.

Haven't checked any of these gaps to see if they are real holes, but when I checked a few days ago, all was in order.

Nuri
05-04-2003, 10:07 PM
Originally posted by Nuri
Sure, I'll take a look during weekend.

PS: I'm planning to create another chart for DC sievers as well (and I am really excited to see how it will look like). :D But I'll probably put in the DC sieve thread not to confuse anybody. :confused:

Joe et al,

I'm waiting for a bit further progress before the update. That's why I didn't put it last weekend. But I guess I'll do it within this week.

While this is the case for the main sieve graph, I've generated the first graph for DC sieve. You can find it on the Sieve Double Checking (1<n<3M) (http://www.free-dc.org/forum/showthread.php?s=&postid=27050#post27050) thread.

Nuri
05-11-2003, 09:41 PM
Here's the updated graph for distribution of factors vs. p value at the main sieve.

There's no change in the formula. The enhancement on the blue line (in terms of fitting the actual data) towards the left side of the graph is basically due to the technique used in fitting the graph. I can explain the reason later if anybody is interested.

Anyway, if the formula is right, from the ranges where we stand, it suggests:

A total of 2,300 new factors up to 20T,
A total of 22,000 new factors up to 50T (19,700 of which is for 20T-50T),
A total of 36,100 new factors up to 100T (14,100 of which is for 50T-100T),
A total of 49,700 new factors up to 200T (13,600 of which is for 100T-200T), and
A total of 57,400 new factors up to 300T (7,700 of which is for 200T-300T).

PS: Please note that the figures above includes duplicates as well.

Next update will not come up until we reach 30T.

Regards,

Nuri

MikeH
05-17-2003, 06:15 AM
I have updated the gap analysis stuff. It is now avaliable (updated daily), along with sieve scores, and links to the three different sob.dat files here (http://www.aooq73.dsl.pipex.com/).

When I have a little more time, I'll take various of the good words from Nuri, Joe, etc. add apropriate links, and try to turn this into a "one stop shop" for SoB sieving.

Mike.

olaright
05-17-2003, 12:59 PM
Great page!!!

Could you post the different sobsieve executables too?

(since louie doesnt update his page and browsing this forum isn't the most exciting thing to do)

regards,

ola.

Joe O
05-18-2003, 03:56 PM
Try this post. (http://www.free-dc.org/forum/showthread.php?s=&postid=28279#post28279)

biwema
05-24-2003, 07:39 PM
Hi,

Here I have 2 candidates for the largest accepted gap:

candidate 1: between
1125875640686911 and
1126101624584753
Size: 225.98 G

1 factor expected every 64.9 G,
so it is 3.47 times the average distance between factors of that size


candidate 2: between
281406122591719 and
281479089948269
Size: 72.95 G

1 factor expected every 14.94 G,
so it is 4.88 times the average distance between factors of that size

Both gaps are completely tested.
Sometimes it was quite tiring to wait so long for the next factor...

Abaout the list of the gaps: maybe it is more accurate to detect the gaps using the distribution equation (40000/nG^1.06). For example, listing all gaps whose probability being that big is less than 1% or so.

biwema

Nuri
05-24-2003, 10:32 PM
Thanks for the input biwema. It helps us understand the nature of the distribution of factors on high p ranges, where we hopefully will be sieving for the months to come. :thumbs:

Just a few comments for others who are not familiar with the hole finding issue:

Mike's hole stats is concentrated on where the main activity is going on (i.e. 3M < n < 20M, and 20T < p < 30T currently). The holes biwema mentioned above are at p=~1126T(T=trillion), and p=~281T. So, it's normal that those holes can not be seen on Mike's hole stats.

AFAIK, Mike uses p/x as one of the methods to find the size of largest accepted gap. Current x in the p/x formula is 1900. So, the size of the largest acceptable gap for, say p=~19T is ~10G.

I find this approach reasonable. In fact, I was using the same thing (with x=3000) when I was looking for holes at lower ranges (up to 10T).

If x=1900 is used, the largest acceptable gap would be 592G (vs 225.98 G) and 148G (vs 72.95 G) for 1126T and 281T, respectively.

May be, the x value decreases (the size of gaps relative to p value increases) as we go up in p. This also seems reasonable to me. The 1.06 figure in 40000/nG^1.06 formula suggests that the density of factors diminish "a bit" faster than the increase in the size of p. Anyways, we'll see.

biwema
07-07-2003, 09:03 AM
Nuri, are there new graphs about the distribution of factors available? I think it would be quite interesting.

It seems, as if the approximation is still correct around 40T.

On the other hand, I completed a range of 460G around 281T and found only 18 factors instead of 30. I agree, that might just be the statistical distribution, but in that range, there were 3 gaps of >70G each and one of 56G. That seems somewhat suspicious.
I hope there is no software related bordereffect near 2^48.
Maybe it makes sense to doublecheck with a different version of the sieve sometime.

biwema

Nuri
07-08-2003, 03:11 PM
Here's the latest update as of 07 July 2003.

The formula seen in the charts is the line equation for excel's trendline. It suggests y = 45840 / (x^1.0759). In fact, the red line is very close to the former one, just rotated clockwise (meaning less factors at deeper sieve levels). That formula, and the former one [ y = 40000 / (x^1.06) ] both use all factor data available, which starts at 1G (one billion).

Due to the clockwise rotation effect, I've also looked at what excel would have suggested if the data for below 1T was ignored. Taking into consideration data above 1T only, the line rotates futhermore, to y = 60008 / (x^1.1045).

And as a last checkpoint, the same with data above 10T only. As expected, more rotation... y = 77140 / (x^1.1293).

Naturally, the number of expected factors do change depending on what formula is used. While the original formula suggests a total of 76,500 factors up to 2000T (2000000G), the formula which takes only 10T+ data into consideration suggests a total of 62,700 factors up to 2000T.

I'll post what the four formulas suggest within a few minutes. Also, for ease of comparison with Mike's data, I'll be giving expected number of factors for the same ranges.

Nuri
07-08-2003, 03:20 PM
Here's the comparison of formulas (i.e. how many remainining factors they suggest at current sieve level).

Nuri
07-08-2003, 03:36 PM
BTW, considering current settings (sieve speed, number of PCs, number of remaining ks, etc.) it would take 6 months to finish 100T, and 15 years to finish all the way up 2000T. :D

Slatz
10-03-2003, 05:25 PM
Now that we are approaching 100 T in seiving...how do all of your graphs look?

Slatz

Mystwalker
10-04-2003, 05:41 AM
Originally posted by Nuri
BTW, considering current settings (sieve speed, number of PCs, number of remaining ks, etc.) it would take 6 months to finish 100T, and 15 years to finish all the way up 2000T. :D

Hm, did you apply some special maths?
As it takes 0.5 years to complete 100T, one might suspect that 20 times that range would take at most 20 times as long, thus 10 years.
Sieve speed improvements, increased PC count, faster PCs (and, as you've mentioned, possible k elemination) left unconsidered... :confused:

ceselb
10-04-2003, 09:06 AM
That quote is from july, things have sped up since then.

Mystwalker
10-04-2003, 09:55 AM
Ah, I see...

I shouldn't be allowed to reanimate such old threads... ;)

Nuri
10-04-2003, 03:57 PM
Ah, I see...

I shouldn't be allowed to reanimate such old threads...
Mystwalker, no problem. It's nice to remember how things were looking like back then, and to compare it with what has changed.

Ceselb is right. That was almost three months ago, and was a projection provided that everything was held constant.

The math behind is pretty straightforward.

I was keeping track of average cumulative sieve speed, making use of Mike's gap analysis, and the average sieve speed was 343.5 G/day for 24/6-04/07 period, suggesting 10.3T/month.

Rounding that number to 10T per month, and considering that the active sieve range was around 40T those days, it would have taken 6 months to finish the ramaining 60T to reach 100T, considering current settings (sieve speed, number of PCs, number of remaining ks, etc.) as of that date.


Things have changed since then.

- Sieve speed increased by ~40%
- Number of PCs also increased, but it's hard to tell how much. My guesstimate would be, by ~50%
- Number of remaining ks (unfortunately) did not change yet.

That would suggest: 10T * (1+40%) * (1+50%) = ~21T per month.

So, we've (roughly) doubled the throughput in terms of G/day since the beginning of July.


Another way of looking at the average spped now would be to keep an average of the change in the size of the holes as I did before, but I'm not recording that data nowadays.

Still, there is a way to estimate it. From Mike's stats (project page), the average number of unique factors submitted over the last 14 days for p>3T is 242*. Assuming that the factors we submitted within the last 14 days were from the ranges that produced an average of ~0.35 factors/G (+-0.02?), and also assuming that 8% of these were duplicates:

= 242 unique factors/day / (0.35 factors/G * (1-8%) unique factors/factors)

= 752 G/day = 23T per month.

* One should substract the number of unique factors coming from P-1 per day from the 242 figure to reach the average factors from sieving, but I think it's insignificant (Please do not understand this as P-1 is useless).

Mystwalker
10-04-2003, 06:01 PM
Now that's a thorough explanation! :)

But wouldn't it be easier to just look at Mike's 90% sieving point?

Nuri
10-04-2003, 07:12 PM
Sure. That's what I'm doing also. ;)

Mike is doing a very nice job of keeping us informed about the sieve stats.

The purpose of the answer was to explain the difference in speed, and to appreciate the progress we've made over the last three months (which came on top of many other previous leaps with every new version of the clients we used since the beginning of sieve project). We were eliminating roughly 300 factors per day three months ago, and despite the fact that the active sieve range more than doubled since then, we're still eliminating 250 factors per day. Hopefully, as newer clients emerge, we'll still be able to continue making significant contribution to the project even at ranges we were thinking that were not achievable before.