Sieve: Distribution of Factors vd. p value

**Nuri** · 01-27-2003, 05:37 PM

Using the data on results.txt file, I made the attached graph.

The data is number of factors found per 1G p intervals. (Data for non completed 1G intevals are pooled out ie. if there are onlt factors until xx.6 billion, I assumed that means the full 1G range was not submitted yet, so omited that range).

Well, I guess if an always updated SoB.dat was used throughout the p range, it should have been more downwards as p increases, but I don't know exactly how much.

Regards,

Nuri

**jjjjL** · 01-27-2003, 06:21 PM

that's a really cool graph. i'm glad to see someone found a good use for the result.txt file i setup.

btw, it would be interesting if someone wrote a "hole" detector that looked though results.txt to make sure ranges had the expected number of factors in it. that way if someone accidentally missed a patch, it could be easily spotted and filled in. I'll do it for p < 100G once the other two people in that range are complete but i'm more or less counting on the community to monitor the rest themselves.

think about it if you have time and want an easy scripting/coding project that could help the project out.

-Louie

**Moo_the_cow** · 01-27-2003, 09:41 PM

I hope that this won't be a pain, but could you update the graph
once a week? I just want to see how accurate your prediction
was.

**nuutti** · 01-28-2003, 05:22 AM

Hello,

Can you tell me exact link to results.txt file ?
I forgot where I can find it.

Nuutti

**alexr** · 01-28-2003, 05:29 AM

http://www.seventeenorbust.com/sieve/results.txt

**nuutti** · 01-28-2003, 05:33 AM

Thanks !

Nuutti

**jjjjL** · 01-28-2003, 11:09 AM

Someone emailed me this so I figured I'd post it. Let me know if it works:

-Louie

your idea of writing a programm searching holes in results.txt is quite qood. i wrote i little programm in delphi. it shows the number of factors in results.txt for each G. i used a formula to estimate the factors for each G. if the factors in results.txt are under 80% of the estimated value my programm will show them.

check it out at http://www.geocities.com/yummie_55555/holes.zip

here is an small output from my program of ranges allready shown comleted in the Sieve Coordination Thread:
258G has 10%
315G has 12%
638G has 23%
650G has 32%
774G has 44%
of the estimated number of factors. perhaps some are missing

PS: i cant post to forum couse i deleted my email needed for varification of the registration. that doesnt matter. post it in the forum if you want so.

Thommy

**Nuri** · 01-28-2003, 04:20 PM

Originally posted by Moo_the_cow
I hope that this won't be a pain, but could you update the graph
once a week? I just want to see how accurate your prediction
was.

Please do not consider this as a promise, but to be honest, I was already doing it for myself and it would not be a big problem to post it (I don't know, may be once a week or may be more/less frequently if submitted factors increase significantly).

On the prediction of 10,000 - 10,001 range, I should admid that I simply made it up. In other words, I wish that you find at least three factors, but no intelligent mechanism behind.

Well, on the hole detecting issue, unfortunately I don't know any programming language, but I am making tons of analysis/number crunching everyday on excel, it's a part of my job.

Anyway, I can detect big holes while generating the graph, but please consider this as a doble check. A simple script would definitely be more useful, and easier to implement.

As an alternative to the estimated density per G, I simply look at the difference between two submitted p values. If the difference increases significantly (like more than x times the average of the difference between last 10 p values), then it signals a hole (or range not submitted yet) to me.

PS: I personally prefer not to assume there are any holes unless the full range is announced to be complete by the siever. None of us can know how the siever distributed the range between boxen and/or how frequent he/she is submitting etc. Looking at the above figures for example, it seems like in most of the cases they sent all of the factors found so far at the time they submitted them, ie. sieving was not yet finished on that particular range.

**nuutti** · 01-28-2003, 05:04 PM

to Nuri : You don't speak fluent Excel vba ?

Nuutti

**Nuri** · 01-28-2003, 05:32 PM

Originally posted by nuutti
to Nuri : You don't speak fluent Excel vba ?

Nuutti

Well, to be honest, not really. In fact, I learned some back in those undergraduate and MBA days (I know, it's a shame to talk about MBA in a prime number related forum).

But I found no need to use any in six years of business life, so forgot everything. It resembles of my German to me, "Ich erinnere nichts!".

Edit: To any German speaking friend. Please do not fell offended. None intended.

**nuutti** · 01-28-2003, 05:43 PM

OK. I do number crunching sometimes and I have found that vba is sometimes very useful. I have also used it primenumbers related projects like when I have to find 10,000 first primes :-)
and also when I have to managed my numerous result files
in defferent projects.

Nuutti

**Nuri** · 01-28-2003, 05:46 PM

The new uploaded 20,000 factors deserve an update.

**olaright** · 01-29-2003, 09:36 AM

Very nice indeed!

Did you let excel fit the graph, or did you plot it using an equation.
In any case, could you give me the equation? (Maybe I'll write a hole detector

Cheers,
Ola

**Mystwalker** · 01-29-2003, 11:49 AM

It could be interesting to see the correlation between removing density and n removed. I wonder if there's a connection...

btw. it should better be "Ich erinnere mich an nichts."

**Nuri** · 01-29-2003, 12:40 PM

Originally posted by Mystwalker
btw. it should better be "Ich erinnere mich an nichts."

You see, that's exactly what I meant.

**Nuri** · 01-29-2003, 01:16 PM

Originally posted by olaright
Did you let excel fit the graph, or did you plot it using an equation.

Ok, that's what I assume about the relationship btw. range and # factors.

At the very beginning, I was just curious about if range * factors = constant (like, 100G*300 = 300G*100 = 1,000G*30 = 10,000G*3 etc)

I simply chose the type of trendline that I think should fit best, which is power. That's available in excel.

y = a * x^b

where
a =1, b =-1 (my initial assumptions),
x = p range, y = # factors

=> ~ 30,500 = y * x (for the data so far)

I think, this will definitely change (downwards) as we update the SoB.dat file, simply because some of the factors will not be submitted again for larger composites as they are already removed at smaller ranges.

**Mystwalker** · 01-29-2003, 03:03 PM

I just checked the distribution of found factors over all n's.
The attached text file holds the count, grouped by 100K intervals. Right now, there are ~250-330 values removed per interval.
While writing this program, I saw that caselb sent in some values < 3M. That made me curious...

Oh, and concerning that "You see, that's exactly what I meant. :
http://www.mystwalker.de/tb/stackenblochen.avi

**ceselb** · 01-29-2003, 03:34 PM

That's my 4847 sieve results on my last two ranges. Louie wanted them submitted to the database. Please feel free to igore them.

**Mystwalker** · 01-29-2003, 04:23 PM

Already done. Java doesn't like negative array indexes.

**FatPhil** · 01-31-2003, 05:02 AM

Originally posted by Nuri
Using the data on results.txt file, I made the attached graph.

Wonderful!

To my eye, it's looking like 1/log(p), which is exactly as predicted. If you want to try to get a straight line, then simply plot #factors removed against log(p). Can you do another chart with those axes please, I'd be very interested to see it.

I've updated my comments on sieve depth at http://fatphil.org/maths/sierpinski/
And this time there's a, shock horror, numeric prediction!

Better than that, there are several, as I indicate why the first prediction is likely to be wrong!

It will make a lot more sense when the whole range has been sieved to 1T, and I can re-do the predictions with greater accuracy. I'll see if I can get Jack Brennen or Yves Gallot to perform some of their wizardry with the data too. They know this field better than anyone.

Phil

**FatPhil** · 01-31-2003, 05:26 AM

Originally posted by FatPhil
To my eye, it's looking like 1/log(p)

Too early in the morning... the derivative thereof (and the negative of that - third time lucky!)
I.e. it ought to slowly creep below the 1/p line (by a squared logarithmic rate, which to be honest isn't actually obvious to the eye.)
Plotting #cands*p verses p should show the logarithmic factors (without them, it would be a flat line), but it will be one that flattens off quite quickly, and as the sieving has already got to 5.7G maybe it's already quite flat. Sieving results from down to 1G, or even 100M would make the trend more obvious.
Plotting #cands*p verses log(p) should give a straight line.

Phil

**Nuri** · 01-31-2003, 06:04 PM

WOW. - 70,000 factors submitted above 100G. For comparison, the first graph was drawn with ~21,000 factors, and the second graph with ~41,000 factors.

IF my above assumption holds, and if there will be no SoB.dat file update, then I'd expect to have roughly 115,000 factors by the time all factors submitted below 5T (5,000G). That is, roughly 45,000 factors more. AFAIK, the current .dat file has 746961 candidates. If it's updated with a newer file that has, let's say 8/9 candidates ie. 663,965 candidates (although I guess what we have is roughly 690,000 as of today), and everybody starts using it right now, then we should have 110,000 factors submitted for 100G - 5T range (70,000+45,000*8/9).

Of course, slightly less candidates will be removed, because we have duplicates.

Anyways, about the graph:

Red line is the same as black line in above graphs (excel's power trendline)
Blue line is y = a * x^b I mentioned above. Only difference is, with the current data the function is 29,405 = y * x. The change in the constant comes from the updates we made on SoB.dat file. I calculate the constant as the average of p(in Gs)*# factors for that G. That effect can be seen on the graph too. The blue line assumes less factors per G at the lower p values and higher factors at higher p values, because it uses average of all submissions. On the other hand, red line uses actual data, so shows higher factors at low p (when we used a larger .dat file) etc.

I will post on the next post what the function and the trendline assumes for p up to 5T.

**Nuri** · 01-31-2003, 06:08 PM

As mentioned above, this is what I expect up to 5T (based on current data).

**Nuri** · 01-31-2003, 06:25 PM

Originally posted by FatPhil
To my eye, it's looking like 1/log(p), which is exactly as predicted. If you want to try to get a straight line, then simply plot #factors removed against log(p). Can you do another chart with those axes please, I'd be very interested to see it.

Phil,

I plotted both alternatives. Please let me know if you meant something else.

Nuri

**FatPhil** · 02-01-2003, 02:51 AM

Originally posted by Nuri
I plotted both alternatives. Please let me know if you meant something else.

Yeah, sorry Nuri, I've been messin' you about with my early morning bug-ridden postings. My second post had the correct requests. As it's early morning again, I'm frightened to make any requests!
sqrt(delta(cands)/p) vs. log(p)
and
delta(cands)/log(p)^2 vs. p
should both be straight lines (where delta(cands) = number of candidates removed by that p-band.

The logic behind that is that if
remaining cands ~= 1/log(p)
then
delta(cands) ~= d/dp (1/log(p))
= -1/p * 1/log(p)^2

You can smooth the results by taking larger p bands. You're not short of data points, so you can afford to do this.

Phil

**Nuri** · 02-01-2003, 07:31 AM

Ok, it's no problem at all. I wouldn't bother on the analysis at all if I wasn't curious too. The point is, every idea (or request) helps.

I guess I understood what you are looking for. I'll take a look at it this weekend.

**frmky** · 02-01-2003, 02:42 PM

here is an small output from my program of ranges allready shown comleted in the Sieve Coordination Thread:
258G has 10%
315G has 12%
638G has 23%
650G has 32%
774G has 44%
of the estimated number of factors. perhaps some are missing

I sieved 638G and a quick check confirms that all have been submitted. There's probably an equivalent number of ranges with a more than expected number of factors.

Greg

**jjjjL** · 02-03-2003, 02:50 PM

to keep bandwidth down on the main server, updates to results.txt will now only shows p > 200 billion (instead of 100 billion). this file is still located at:

http://www.seventeenorbust.com/sieve/results.txt

and it's still updated every night at 3:40am EST.

however, there is now a static file that shows new data previously not available for all factors between 1 - 200 billion. it is available on my umich space, off the server at:

http://www-personal.engin.umich.edu/...results200.txt

this data will be useful for future trend estimation, graphs, and hole detection that the community may want to do. a quick look though the data appears to indicate that 1-200 billion is entirely complete but i certainly haven't done extensive checking myself so anyone interested is throughly encouraged to look though to help verify the data.

hopefully this data makes it into one of the fine graphs that have been produced lately too.

-Louie

**Nuri** · 02-03-2003, 04:37 PM

It may be a good idea to put zipped versions of both files too. Just an idea.

I think most people would prefer downloading that version, and then browsing through after unzipping.

I guess that would be a wiser way of using the bandwith on both sides. Zipping reduces the file size by a factor of three and I guess it's worth it.

Of course, .txt versions should also be kept for those who still prefer to look at the data from their browser.

PS: Notepad has problems with columns, but wordpad works fine.

EDIT: OMG! It took 30 min to download result200.txt (6MB)...

**Nuri** · 02-03-2003, 05:30 PM

Wow! That graph is really sexy.

**Nuri** · 02-03-2003, 05:35 PM

As I mentioned earlier, blue line is the straight line which uses average of all submitted factors, whereas red line is the trendline based on actual data. The difference comes from the changes we make on SoB.dat file.

Regards,

Nuri

**Nuri** · 02-03-2003, 06:24 PM

Originally posted by jjjjL
a quick look though the data appears to indicate that 1-200 billion is entirely complete but i certainly haven't done extensive checking myself so anyone interested is throughly encouraged to look though to help verify the data.

I found 49585693249 - 50006546839 range missing.

Doing it. Will submit as soon as it finishes.

Edit: There is another hole on 11971526549 - 12000264097. Sieving that too.

I suspect no more holes on 1G-200G range.

Regards,

Nuri

**Nuri** · 02-03-2003, 06:56 PM

Cool.

74 new factors on the 11971526549 - 12000264097 range (with the latest SoB.dat).

Finished and submitted.

49585693249 - 50006546839 will take a couple of hours. Will report later. (I expect ~315 factors there too).

It was definitely worth scanning through (especially now that we dropped to 10-20 factors per G nowadays).

Regards,

Nuri

**jjjjL** · 02-03-2003, 08:28 PM

Nuri - looks like it was a good idea to make a text dump of the lower factor ranges

great job! i can't tell you how happy i am to have you working with us.

between finding two factor holes and the awesome graphs... you've been doing lots of great work... stuff that i probably couldn't have done on my own. it's nice to have such great helpers. thanks again.

-Louie

**Nuri** · 02-04-2003, 01:42 AM

You're welcome. No need to mention names, but I would never devote that much time of mine for some other DC projects I participated earlier. The wonderful atmosthere here at SoB encourages the contribution I make.

BTW, a total of 317 (=74+243) new factors found and submitted in two holes below 200G mentioned above.

**FatPhil** · 02-04-2003, 05:08 AM

Originally posted by Nuri
Wow! That graph is really sexy.

To counter the flaring that you get a the high ps, you can either band the ps by their logs rather than by their numeric p value, or simply make the bands 10 times wider.

In some ways the log/log graph looks depressing as it says "soon you'll be removing no more factors" (as the #candidates value plummets), however, it's not "soon" at all, as it takes longer and longer to progress to the right along the logarithmic p axis.

In real-world terms, what the users see, sieving doesn't slow down much at all, as you spend _most_ of the time in the "still a bit better than PRP-ing" zone. This initial spurt is alas going to start run out soon, and then everyone's in for the long haul of only one factor every few hours (i.e better than sieving), which will continue for quite a while. (Which is one reason knowing when to stop sieving is so difficult.)

Keep these graphs coming, they're good to look at.

Louie - would it be possible for you to stick up a table of #candidates remaining verses completed sieving depth so that I can update my predictions, please?

Cheers,
Phil

**Nuri** · 02-04-2003, 02:56 PM

Originally posted by FatPhil To counter the flaring that you get a the high ps, you can either band the ps by their logs rather than by their numeric p value, or simply make the bands 10 times wider.

I'll try that on the next update.

In some ways the log/log graph looks depressing as it says "soon you'll be removing no more factors" (as the #candidates value plummets), however, it's not "soon" at all, as it takes longer and longer to progress to the right along the logarithmic p axis.

I agree with you on that it looks depressing. In fact, extrapolating the line suggests 1 factor per G at somewhere around 32T (32,000G).

Keep these graphs coming, they're good to look at.

Sure. But it will not be as frequent as it was until now. I guess there is no need to update more than once a week any more.

**jjjjL** · 02-05-2003, 02:59 PM

Originally posted by FatPhil

Louie - would it be possible for you to stick up a table of #candidates remaining verses completed sieving depth so that I can update my predictions, please?

Cheers,
Phil

Sure. What exactly do you want?

As of right now, there are 251669 factors are for n > 3000000.

leaving 659590 n values for the 12 k where n > 3000000 (and < 20000000).

divisor[p] # factors
1-10G 89458
10-100G 77415
100-1T 66045
1T - 10T 18744
10T+ 7

I just grabed those from the db. 1-10G is done. 10-100G is done. 100G-1T is very close to complete but does have a couple G holes... it will probably go up a few hundred when done. 1-10T isn't even half complete... well, linearly speaking.

The other 7 factors are beyond me. I knew of

10000127155987 | 55459*2^3236734+1

which Moo submitted...

the other 6 over 10T were submitted by someone who didn't even log into the site:

17598251446627 | 24737*2^11993551+1
17598663878903 | 21181*2^6492260+1
17598830641963 | 5359*2^13178142+1
17598922454137 | 10223*2^9388157+1
17599242410803 | 5359*2^10545390+1

and

100005426934979 | 10223*2^15670937+1

yeah, that's 100T+... and it only took 5.5G to find it assuming the dude started at 100T even. I want a 1P factor next time people... quit your slacking

-Louie

**Moo_the_cow** · 02-05-2003, 10:15 PM

Yeah!!!!!!!

I just hit the jackpot (well sort of

)

25000000918691 | 33661*2^17056080+1

I was kind of bored... and I felt like sieving 25T-25.001T for
the fun of it........and I was shocked to find that factor, which
took just a few seconds. Whew...... if I'm this lucky, I'll find
1,088 factors in this 1G range

I'm not reserving that
range, however, so competition is welcome.

**ceselb** · 02-05-2003, 10:37 PM

You've got some competition. Check the Sieve Coordination Thread.

Thread: Sieve: Distribution of Factors vd. p value

Thread Tools

Rate This Thread

Display

Sieve: Distribution of Factors vd. p value

Re: Sieve: Distribution of Factors vd. p value

Re: Re: Sieve: Distribution of Factors vd. p value

Re: Re: Sieve: Distribution of Factors vd. p value

Re: Re: Re: Sieve: Distribution of Factors vd. p value

two fold update on the result.txt file

Re: two fold update on the result.txt file

Posting Permissions