PDA

View Full Version : Sob.dat file with 10 k etc. (following 28433=Prime)



MikeH
01-03-2005, 09:07 AM
The sob.dat files have been updated to remove k=28433, leaving only 10 k.

Only the daily updated files are now available.

The sob.dat for sieving (http://www.aooq73.dsl.pipex.com/sobdat/SobDat_n1M-20M.zip) file zipped (0.5MB).
A clear text (http://www.aooq73.dsl.pipex.com/sobdat/SobDat_n1M-20M_txt.zip) version of the same (1MB), just for info.
The very small zip file (http://www.aooq73.dsl.pipex.com/sobdat/SobSoon.zip) that has clear text .dat files that indicate the next couple of days of main and DC PRP efforts, and a log file that indicates the daily progress of the shrinkage of the sob.dat file, again just for info.

A sob.dat for P-1 factoring ONLY (http://www.aooq73.dsl.pipex.com/sobdat/SobDat_P1.zip) with the next 0.5M main candidates (becomes out of date if not updated every ~30 days).

All time scoring (http://www.aooq73.dsl.pipex.com/scores.htm) and 2005 scoring (http://www.aooq73.dsl.pipex.com/2005/scores.htm) have been adjusted to reflect the newly found prime. Any factors for k-28433 submitted before today will now have their score frozen, any submitted from now onwards will score zero. I expect that Louie or Dave will stop accepting factors for this k on the submission form soon.

I've tested the new sob.dat file for sieving on a large sample of one PC, and I see a 4% speed improvement. I hope others see better improvements.

Over the next couple of days I'll be changing other aspects of the scoring pages, and I'll also be re-organising the sieving web site a little, sorry in advance if some pages become inaccessible (the .dat files above won't be moving).

vjs
01-03-2005, 11:48 AM
Mike,

Is it just me or has the link to this dat not been updated, I still downloaded the 11k dat same size etc???

Humm, created my own... looks like this wasn't a very demanding k... Looks like about a 5% speed increase. Still very good and I can't say I'm dissapointed in any respect.

Correct me if I'm wrong but the new 10k dat should be around
2.77mb (2,908,006 bytes) I'll download the 10K again and see what happens.

Mystwalker
01-03-2005, 12:00 PM
http://www.aooq73.dsl.pipex.com/sobdat/SobDat_n1M-20M.zip gave me a dat file which at least says "10" in the beginning...

vjs
01-03-2005, 12:57 PM
Humm,

It worked this time, possible internet caching issue??? I deleted my temp files etc, looks like it's/was correct...

pixl97
01-03-2005, 01:20 PM
Im seeing a slight speed increase, I am sieveing close to 2^50 currently

Old dat
pmin=1120143710441429 @ 586 kp/s
pmin=1120143720441463 @ 577 kp/s
pmin=1120143730441481 @ 578 kp/s
New dat
pmin=1120143740441483 @ 614 kp/s
pmin=1120143750441491 @ 607 kp/s
pmin=1120143760441523 @ 612 kp/s

vjs
01-03-2005, 01:45 PM
This is inline with a 4.5% speed increase which is what I'm seeing across my machines.

Unfortunate in a way,

11/10 = 1.1 = 110% so a 10% speed increase would be average.

I think if we eliminate 67607 we get something like a 14% speed increase.

Does anyone know how the k's are tied together?? I though 67607 is somehow different etc, I guess 28433 was tied to another k, (shared computational variable with another k) was is 55459???

MikeH
01-04-2005, 12:47 PM
First batch of changes to the score pages are complete. Key change being that I've moved away from arbitrary n and p boundaries for breaking down results, and have instead moved to categorising by number of PRP tests saved.

I'll update main page to try to give an explanation of what is being shown, but I think you’ll figure most of it out.

Changes apply to all time (http://www.aooq73.dsl.pipex.com/scores.htm) and 2005 (http://www.aooq73.dsl.pipex.com/2005/scores.htm).

...one thing we can see from the new data (http://www.aooq73.dsl.pipex.com/2005/scores_p.htm) - number of main PRP tests saved from factors found this year = 1. Number of DC PRP tests saved from factors found this year = 0. :help:

Nuri
01-04-2005, 01:21 PM
Wonderful!! And very useful. But, may be not very easy to understand for all the users.

Mike, could you please how the logic works when you find some time? Thx.

For example, should we read the

Total: 120775(202987) 6219( 18512) 54091

figures like...

Of all of the factors found so far;

120775 came before first-pass PRP tests, and PRP level is now above their n level.

202987 came before first-pass PRP tests, and PRP has not yet reached that level. So, it's not guaranteed that they will save tests (i.e. if a prime is found)

6219 came after first-pass PRP tests, but before second-pass PRP tests, and second-pass is now above their n level.

18512 came after first-pass PRP tests, but before second-pass PRP tests, and second-pass not yet reached that level. So, it's not guaranteed that they will save tests (i.e. if a prime is found by second-pass PRP)

So far, my reasoning looks like ok to me.. But, hey what about 54091? AFAIK, these figures are for unique factors only. So, why would 3513 n=19m-20m candidate not save tests? The figures seem reasonable up to n=5m, but hey, what's going on after that???

There's definately something (if not all) that I'm misinterpreting.. :cry:

MikeH
01-04-2005, 02:30 PM
120775 came before first-pass PRP tests, and PRP level is now above their n level. Correct.


202987 came before first-pass PRP tests, and PRP has not yet reached that level. So, it's not guaranteed that they will save tests (i.e. if a prime is found) Correct.


6219 came after first-pass PRP tests, but before second-pass PRP tests, and second-pass is now above their n level. Correct.


18512 came after first-pass PRP tests, but before second-pass PRP tests, and second-pass not yet reached that level. So, it's not guaranteed that they will save tests (i.e. if a prime is found by second-pass PRP) Correct. See I said you figure most of it out.


So far, my reasoning looks like ok to me.. But, hey what about 54091? AFAIK, these figures are for unique factors only. So, why would 3513 n=19m-20m candidate not save tests? The figures seem reasonable up to n=5m, but hey, what's going on after that??? These are a combination of factors where two tests had already been performed when the factor was found (this will be the makeup of the smaller n bands) and factors that have become useless because a prime has been found. For k=5359, that's any first timer's n>5272167 and DC's n>400000. For k=28433, that's n>7882623 and n>1255752 respectively. The higher n bands '0 test saved' are made-up exclusively of these "factors made useless by prime" type.

This is intended to show one of the negative aspects of sieving - some factors will be made redundant when primes are found. But the really positive thing that can be seen from this table is that >16689 tests will be saved in 8M<n<9M. This is more than the 16644 for 3M<n<4M, even though two primes have been found in the meantime.

The most important thing on that page is the "candidates remaining" (unchanged), but you need to compare that with and old version of the page to see that it's going down, which isn't good.

I'm now thinking that maybe I should add an "estimated number of PRP tests performed" for each n band. That would then make it really clear that the number of PRP tests per n slice keeps coming down as time goes by, and that something that wasn't obvious before, and still isn't obvious now. More work in progress me thinks...:idea:

maddog1
01-04-2005, 02:51 PM
Originally posted by MikeH
...one thing we can see from the new data (http://www.aooq73.dsl.pipex.com/2005/scores_p.htm) - number of main PRP tests saved from factors found this year = 1. Number of DC PRP tests saved from factors found this year = 0. :help:

Looks like I am the one that is credited with saving the first test of 2005.
http://www.aooq73.dsl.pipex.com/2005/scores.htm
(if I understand the stats correctly) :D

Still, to prove Nuri's point that a better explanation of the stats for us sieving n00bs is needed: I don't really understand what the difference is between the "main PRP" and "DC PRP" tests you mention here. How come we have saved a main test without also saving a double check?
Also, the factor that supposedly saved one test (k=33661, n=7596144) in the main stats is also listed as a "possible ongoing test". Have we saved something at all here-or it came too late?
It is also listed as 2 tests saved in my personal stats page-shows a 2 without ()
http://www.aooq73.dsl.pipex.com/2005/ui/1792.htm
Something I am missing here?
Maybe a detailed idiot-proof explanation of the figures is necessary-not all of us are math genuises here! :confused: (I will be the first to admit I'm stupid or something...) :)

Finally, a slightly off-topic stats related question. My personal stats page shows only the range 490100-491000 as reserved by me and incomplete. In fact, I have completed this almost a month ago, did 545-546 afterwards and now I'm working on 599-600. Why are these shown as non-reserved ranges?

Mike, I do not intend to nag at all, your work is great! Just wanted to show that some aspects of the sieving are still not for the average user and will require quite some clarification if we are to get more users involved in it...
Thanks for your great effort anyway :thumbs:

vjs
01-04-2005, 03:07 PM
It looks really good now Mike and I can see where your going with it.

The standard number (+number) are tests total (that day) I would assume.

I'm still not understanding the n>8m as tests saved, zero?

I'd personally say this just makes things confusing and get rid of it, since these numbers will go to zero once everyone uses the 10k dat.

I guess it has to do with the finding of a prime makes these pages confusing.

Rather than this I would use...

A seperate entry similar to the 90% sieve point, somehow represent % of sieve factors wasted. Some sort of, = 100% x Factors found above k/n prime / total factors found.

This would only have to be updated every prime and would explain alot more, also remeber we only recieved a 5% speed increase from removing this k from sieve not 9.09%.

I like what you have done thus far alot :D

vjs
01-04-2005, 03:17 PM
Thanks for the comments mad dog,

I'll clear up a few things.

Currently if you find a factor above prp (good to see you picked up some terms :) ) you do infact eliminate two tests. IF it's below prp then only one.

PRP is the testing 90+% of the users do with the 2.3 client. Main PRP or first time PRP is just that never tested before, DC PRP is the Double Check PRP where we are testing the same k/n pairs again to look for errors.

main PRP is at ~ n=7,900,000 where DC PRP is somewhere around n=1,300,000

Once the k/n is double check and the results match this test will never be tested again. If they don't match then a third and possibly 4th test are done.

Also your picking up on things and starting to ask the big questions...
"possible ongoing test" (k=33661, n=7596144)

This basically means it's within a region were someone may still be testing that number as a prime in vein b/c you recently found a factor. WE have no way of telling this person hey stop your test. The only thing we can do is post it there. Had we/you found that factor before prp=7596144 the server never would have assigned it.

Mike will probably update everyone stats for ranges once all the havoc is done and he's finished the updates. Ranges are the only thing that is still manual.

Nuri
01-04-2005, 06:11 PM
Originally posted by MikeH
These are a combination of factors where two tests had already been performed when the factor was found (this will be the makeup of the smaller n bands) and factors that have become useless because a prime has been found. For k=5359, that's any first timer's n>5272167 and DC's n>400000. For k=28433, that's n>7882623 and n>1255752 respectively. The higher n bands '0 test saved' are made-up exclusively of these "factors made useless by prime" type.

This is intended to show one of the negative aspects of sieving - some factors will be made redundant when primes are found. But the really positive thing that can be seen from this table is that >16689 tests will be saved in 8M<n<9M. This is more than the 16644 for 3M<n<4M, even though two primes have been found in the meantime.

The most important thing on that page is the "candidates remaining" (unchanged), but you need to compare that with and old version of the page to see that it's going down, which isn't good.

I'm now thinking that maybe I should add an "estimated number of PRP tests performed" for each n band. That would then make it really clear that the number of PRP tests per n slice keeps coming down as time goes by, and that something that wasn't obvious before, and still isn't obvious now. More work in progress me thinks...:idea:

How about using two seperate colums for 0? (i.e. 0i for "factors where two tests had already been performed when the factor was found" and 0ii for "factors that have become useless because a prime has been found") with a short explanation of 0i and 0ii as a footnote of the table of course.

I know, as we've started using n min adjusted sob.dat, 0i figures will not change much. Only three sources of increase comes to my mind, i) users that seldomly update their sob.dats, ii) users that purposely search for and submit factors those k/n pairs, and iii) users that dump their out of range factors.

Still, on the broader perspective, it would be interesting to observe one going down, and one going up as n million bands increase, especially if we find a few more primes on the way.


Originally posted by MikeH
I'm now thinking that maybe I should add an "estimated number of PRP tests performed" for each n band. That would then make it really clear that the number of PRP tests per n slice keeps coming down as time goes by, and that something that wasn't obvious before, and still isn't obvious now.

Why not use actual figures instead of estimated? I'm sure Louie will be able to grab it from the database, and I guess he'll be willing to provide this data this data to you in the form you would use from a link to you every 6 hours. Of course, the resulting figures will have some side effects (like that comes from dropped tests, or from the fact that we dumped residues of previous reasearchers once oupn a time to the database, etc.) that might mislead the end stats fanatic and will include parameters that are beyond the scope of the sieve, but it might still be interesting.

Nuri
01-04-2005, 06:28 PM
One other suggestion to make understanding of the table easier for the average user is to add three subtotal lines just above Total row.

These lines should indicate;

- subtotal of factors below DC active n level (n<1259597 currently),
- subtotal of factors in between (1259597<n<7899646 currently), and
- subtotal of factors above main active n leve (7899646<n<20000000).

This will also make it posssible to follow up with how many first time PRP tests are left to reach 20m. Or even better, one will easily assess the expected proportion of his new factors (i.e. what percent of new factors will save two tests, and what percent will save only one), especially if he updates his sob.dat on a regular basis. Currently, 532336 figure shown at the table for candidates remaining does not make much sense, as it's almost guaranteed that some of the candidates (i.e. those below DC n treshold) will remain there forever.

Just a thought.

Nuri
01-04-2005, 06:54 PM
Originally posted by vjs I'm still not understanding the n>8m as tests saved, zero?

They will be saved once when PRP first-pass n level passes them. Until then, they are candidates for PRP tests will saved twice (2).

When PRP first-pass n level passes them, they will move to PRP tests saved twice.

Or, am I understanding the question wrong?

vjs
01-04-2005, 07:52 PM
Nuri,

This question was based upon something Mike already changed, or something I saw differently earlier, but thanks. I think Mike's pages look great and add alot... of course they are complicated for the first timer. Even upon initial examination, but what are we expecting from the page? I think the required level of understand for this page is high, but that what this page is for...

Other pages like the scores page are what newer people will focus on at first. Then they will try to understand the details, there is something for everyone now:D.

vjs
01-05-2005, 02:03 PM
Hey Mike your all users pages doesn't work did you take it down.

The http://www.aooq73.dsl.pipex.com/ui/9999.htm

MikeH
01-05-2005, 03:30 PM
Hey Mike your all users pages doesn't work did you take it down. It's now http://www.aooq73.dsl.pipex.com/ui/19999.htm. If you use the link which is the 'total' at the bottom of the user score page, that should always work.

vjs
01-05-2005, 03:52 PM
Thanks Mike :D , this is one of my most favorate pages, shows the project as a whole, who is doing what etc.

:cheers:

Frodo42
01-05-2005, 10:55 PM
Originally posted by vjs
Thanks Mike :D , this is one of my most favorate pages, shows the project as a whole, who is doing what etc. :cheers:

Wow ... I never saw that page ... that's surely also going to be one of my favorite's :corn: