Well, almost year have passed (okay, 8 months) since new .dat file arrived.
Can we gain something from creating new dat file?
Any performance increase or lower memory usage?
I think 1% is good increase.
1% would be a good increase, I thought it was said before that simply a new dat file wouldn't change the speed at all. Only something drastic like removing a k from the file would speed things up but even then removing 1/11th of the file would only yeild something like a 3-5% increase if that.
So in other words were probably going to have to wait until we get another prime for another dat file.
I think that a better thing to change would be the results.txt file. That file contains all factors of p>25T. It should be changed to all factors p>200T, so the file won't take that long to download.
*nod* 275T at this point, 300T soon. This is in regard to P1, right?Originally posted by Moo_the_cow
I think that a better thing to change would be the results.txt file. That file contains all factors of p>25T. It should be changed to all factors p>200T, so the file won't take that long to download.
I second the p>300T aside from 2 small ranges almost all is sieved below 300T.
How difficult is this to do anyways?
Based on a guess, it should be easy. Just edit a config file or a source file, and replace 25 with 300, etc. Hopefully that is the case.
As PRP double checking soon reaches 1M, we could increase the lower bounds of the sieve somewhen in the near future...
Hopefully this will be the case as well when we find another prime.
From what I understood we are actually using a dual range effort right now, 1m<n<20m.
Before there were two sieve ranges 300k<n<3m and then they added 3m<n<20m.
It was said somewhere before that there was very little slowdown by switching from 3m<n<20m to 1m<n<20m.
But I'm sure that if we reach 2m in double checks or even 1.5m etc they will create a new dat file from the lowest double check to 20m using the remaining 10k only and include all the found factors etc. Even if it only yeilds a 2% increase it's something to hope for inaddition to the prime!!!
BTW how is your sieve range coming along? I know your quite active in P-1.
550000-550500 Mystwalker
There has always been guessing about how much of a difference removing one k from the sieve would make to sieve speed, so I did a test.
Basically eliminated 1 k from the dat file, 33661.
33661 it's a fairly heavy k with about 11% or the k/n pairs.
Ref: http://www.aooq73.dsl.pipex.com/scores_p.htm
Basically my sieve speed went from 689kps to ~723 kps a 4.8% increase.
The one thing I'm not clear on is how many k/n's have been eliminated since 1/1/2004, the creation of the new dat file. I'm estimating 8,000 but if I were to over estimate that 16,000 were eliminated for example, the difference in sieve speed increase would only be ~1.25%.
This is assuming that the 4.8% speed increase comes from reducing the number of k/n pairs in the dat file only, not the number of k values (probably not correct). So currently, by my napkin guess'tamations the best we could hope for would be a 0.7% speed increase by updating the dat file.
Let's hope we get a prime for 55459 is prime soon it account's for roughly, 15% of the k/n pairs, 67607 only contributes 4% of the k/n pairs.
Candidates eliminated since 1/1/2004 ~17000The one thing I'm not clear on is how many k/n's have been eliminated since 1/1/2004, the creation of the new dat file. I'm estimating 8,000 but if I were to over estimate that 16,000 were eliminated for example, the difference in sieve speed increase would only be ~1.25%.
O.K. thanks Mike on the update,
Somethings weird I tried two more dat files.
Orginal dat file on my machine 690 kps
Without 33661 yeilds 723 kps (4.8% speed increase)
Without 67607 yeilds 743 kps (7.6% speed increase)
Without 33661 and 67607 (only 9k) yeilds 790 kps (14.5% speed increase)
Can someone expalin this 67607 has alot less k/n's than 33661, 4% compared to 11%???
This sounds like a topic for mklasson.
Basically, the number of k/n pairs have hardly any influence on the performance. Maybe 67607 is computationally more intensive as it is larger than 33661. I'm not sure, but the FFT size of the new factor for example heavily depends on the k size...Originally posted by vjs
Can someone expalin this 67607 has alot less k/n's than 33661, 4% compared to 11%???
Setting the lower bounds to 2M would give a speed increase of 2.7% if effort was O(sqrt(n range)). It is less, so it might be like 1.5-2% - which is better than nothing, I think...
To be honest, I'm not doing any progress currently, as sieving speed in that range is halved due to unknown reason. I hope there will be an explanation (and a solution) once the main sieve effort reaches this depth. So it serves as a spare range I fallback to once in a while...BTW how is your sieve range coming along? I know your quite active in P-1.
550000-550500 Mystwalker
Concerning P-1, I'm more like a part time factorer.
As you would remember, the speed difference between 3m-20m and 1m-20m (or was it 300k-20m?..) was a mere ~1%.
It seems to me like we'll (hopefully) find a prime before double check PRP reaches 3m. So, updating the dat file when we find our next prime, and deciding on the lower bound based on the level of double check PRP at that time sounds reasonable to me.
And another note on the speed differences between taking out 33661 and 67607. May be, it has something to do with this:
I wouldn't count on any significant speed increase from removing n candidates in the dat. Removing a k is good though -- some 5% as vjs noted.
The reason why removing 67607 gives a better speedup than 33661 might be because more p values can be skipped entirely then (I'm not sure if this is actually the case, but it sounds like a reasonable explanation). The p values that are potentially interesting for 33661 could happen to share that p with other k:s most of the time, while 67607 might not.
The data Nuri posted means that 67607 has only 4 possible n values mod 360, while 33661 has 9 mod 360, but also that 67607 has 1 n value mod 8 while 33661 has 1 n value mod 24. The first part is good for 67607, but the latter is bad and could possibly influence what I wrote above. I'm too rusty...
Mikael
Good to see you, Mikael!
I recently encountered a small feature request for proth_sieve:
How about a switch to start processing after a parametrized amount of time (like "-d 10" for a delay of 10 seconds)? This helps getting things into order when proth_sieve is loaded at startup.
Especially RieselSieve sievers with their 29MB(?) dat file could possibly improve the startup as a whole...
Dennis,
http://mklasson.com/proth_sieve_042b.zip should do the trick. "-delay 10" waits 10 seconds at startup.
Please test to make sure it's not slower or anything -- I don't really remember if I've done any other changes since the last release.
Thanks!
I haven't tested it yet, but I'll look at the performance.
Concerning performance, I created a SoB.dat file with 2M < n < 20M and tested it.
The observed performance increase was a mere 0.61%...
(Test was made over a range of 0.1G, starting at 400T - I took 10 intermediate result of 0.01G and averaged them; the computer had next to no load except proth_sieve, deviation of intermediate result was very low [perfmax - perfmin = 1 or 2 kp/s with perf ~ 280 kp/s; only first intermediate result was a tad higher in both cases])
I don't know the speed increase when leaving out already found factors - has someone made a test? 17K factors less (most likely still > 16K when lower bounds = 2M) could also have a positive impact...
I wouldn't consider raising the range until double check has passed at this point. n>1m actually takes a while to process on a machine now so it's not worth the slight speed gain. Even if it were 3% I'd still say leave it at 1m.
how difficult would it be to have a dat file updated say daily or evn weekly with factors found and the double check region removed. I understand that the speed increase would be almost insignifigant but there would also be a reduction of the amount of memory that the program uses. I would see this as a benefit as the machines i use are used for sieving because of their shortage of memory. Other wise they'd be factorors. Also any increase, even a small one, does help the project and if it would be an easy task to have new .dats spit out automatically it would save the need for this discussion in the future.
Although I'm probably not the right person to do this, I do have all the data and the tools available to do this. Although I don't think there'll be any noticable performance gains, I really like the idea of a .dat file where the floor is slowly being lifted as the double check PRP effort proceeds. It would also be nice to address the situation for the P-1 factorers - having a fresh .dat file that very slowly shrinks each day is more logical than an ever expanding results.txt file.how difficult would it be to have a dat file updated say daily or evn weekly with factors found and the double check region removed.
With that in mind I'll take a look. As you say, if nothing else "it would save the need for this discussion in the future"
Thanks mike this would be highly appreciated and would put this issue to rest as you said.
Also if it's not too much trouble, the possibility exists, and you have the permissions available, it would be nice to have the list of k/n pairs that require testing from double check up. I know it's possible to generate this from a dat file but a simple text file autoupdated ?weekly? or at your leisure etc would be cool.
Perhaps you could chime in on submitting factexcl and factrange.
Since the double check effort just went through 1m I guess there is no reason to submit factrange.txt anymore.
The nice thing about factors is that they can be verified easily, and provides 100% truth when compared with matching residues which are only very close to 100%.Originally posted by vjs
Since the double check effort just went through 1m I guess there is no reason to submit factrange.txt anymore.
perhaps we could also eliminate all tests that already have two matching residues also. I don't believe there are many of these but since they are not going to be retested anyways then there is no reason to include them in the sieve.
This is very true, but remember that inorder to pass double check it must have two matching residues. If the resides don't match it's tested for a third time.100% truth when compared with matching residues
I think two matching resides is just as good as a factor for our purposes.
In anycase these k/n pairs will not be tested again so there is no reason to sieve for them.
Second this, if it's not that much effort, but I don't really think there are many of these are there???perhaps we could also eliminate all tests that already have two matching residues also.
It would be really nice if the .dat file could be generated by the server on a weekly bases, the dat file should include, all k/n pairs that don't have factors or matching residues.
How difficult is this to do, if its more of a bandwidth issue, our community could surely come up with some bandwidth.
For that matter we could set up a bit-torrent, or yahoo/g-mail, etc.
How would you like the data in this text file to be formatted? Just a set of lines of the formAlso if it's not too much trouble, the possibility exists, and you have the permissions available, it would be nice to have the list of k/n pairs that require testing from double check up. I know it's possible to generate this from a dat file but a simple text file autoupdated ?weekly? or at your leisure etc would be cool.
k * 2 ^ n
or something that's usable by another utility?
Cheers,
Mike.
P.S. I've got the sob.dat daily update basically sorted, but I want to run it for a few days locally to pick the fluff off.
Perhaps others can suggest the best format I don't really care about the format I'm just your doing it.
I would think the same format as the dat file would work best and take the least amount of work/effort. Something easily picked up in notepad or a simple text editor like the dat file.
So listed by k then increasing n all in one column works just fine, (the actual n not n1+x=n2, repeat).
Unless someone else wants something different for a good reason, I just see having each pair listed as k.2^N.... or k's followed by tabbed n's as an issue for people wanting to pick up the values in excel, stats, p-1 programs etc. Also it make the file about 70% larger if not more.
BTW if you like me to beta test the new dat send me a copy, I'll double sieve some portion and send you the fact files etc.
mustang35157L
followed by
yahoo dot com
Something like this do? There's about two days worth of PRP effort in the files (one for main effort, one for DC).I would think the same format as the dat file would work best and take the least amount of work/effort. Something easily picked up in notepad or a simple text editor like the dat file.
Mike I think this format is perfect.
Any objections????
Humm, I think everyone is just waiting for your sieve dat file, but nice work
Mike,
You didn't metion yet if your new dat file would exclude those numbers that are already double checked by the old secret (or was it garbage)? Just curious...
I assume your new dat file was created by using the old dat file and removing factors submitted and parsing it at the double check. Therefore not including those of old garbage etc.
It almost makes me think we should try to bring back the old secret page for each k...
The new sob.dat will each day remove all new factors found, and all those as a result of the floor being trimmed due to secret moving forward. I theory this means that candidates for residue-recovery are being ignored, but then the floor of the sob.dat has been 1M for some time, so secret was also being excluded.You didn't metion yet if your new dat file would exclude those numbers that are already double checked by the old secret (or was it garbage)? Just curious...
...and to answer the original question, yes it will exclude the previous (and current) secret and supersecret efforts, but not those for the previous (and current garbage), since many of those are first time tests anyhow.
Cheers,
Mike.
P.S. I all looks OK tomorrow, I'll post some links to the new files.
I'm not sure but i believe he (VJS) may have been implying the 2,000,000 range tests performed by secret after it finished all of the below 1,000,000 tests it was scheduled to do.
Keroberts,
Perhaps I'm mistakebut I thought some time ago perhaps a year, the que assignments were a little different.
Supersecret was doing the lowest double checks (unchanged)
Garbage was picking up the high n long tests etc (unchanged)
But secret was doing some double check testing around 3m or 2m b/c there were thoughts that if a prime were missed it would be around that number.
So in other words it was like the start of a double check effort from some medium value at the time. I think prp-n was around 5m then.
In anycase this wouldn't account for very many k/n pairs.
Yeah, you are correct, I'd forgotten that - it started at 2M, and looking back at my records got to 2016345. No, these tests haven't been excluded from the new sob.dat file, it's only about 400 candidates, which is about 5 days work of the current DC effort, so in the interests of simplicity I'd rather ignore those.But secret was doing some double check testing around 3m or 2m b/c there were thoughts that if a prime were missed it would be around that number.
The update seems to be working fine, so the links to the new files are ...
The sob.dat file zipped (0.5 MB)
A clear text version of the same (1.13 MB), just for info
A very small zip file that has clear text .dat files that indicate the next couple of days of main and DC PRP efforts, and a log file that indicates the daily progress of the shrinkage of the sob.dat file, again just for info.
These files will be updated daily at about 03:00 UK time.
I would suggest that P-1 users seriously consider a regular download of this file instead of the results.txt file since this is significantly smaller (0.5 MB vs 1.81 MB), and while the sob.dat will slowly shrink, the results.txt will slowly grow.
If anyone has any problems, please let me know. The sob.dat file that we've been using for a while is still acceable from here , please revert to that if you have any problems or concerns.
Good Job Mike,
Trying it out on my fastest machine right now. First observations is that it doesn't increase the speed by a noticeable amount but the k/s jumps around a bit, generally I see a steady 680-684, but now I'm seeing anything between 670-700.
I'll have to run it for a day or two to determine if it increases or decrease speed etc.
Did you personally notice an anomolies???
Thanks for the quick feedback. In the tests I've done I haven't seen any noticable speed change, but over time the slow lifting of the floor will speed things up.I'll have to run it for a day or two to determine if it increases or decrease speed etc.
Did you personally notice an anomolies???
No, I didn't see any problems. The last sentence was just one of my usual caveats, kind-of a "there shouldn't be any problems, but if someone finds something and I don't pop by in the next few days ....."
Well the client with the new dat file seems to have stabalized,generally I don't look at it much except when I first sit down at the computer so I guess some varablity to the kps out put is standard.
Anyways here are the results averaged from 300 reports in dat file
Old dat file peak memory usage 27.4mb speed 682.86 kps stdev 17.4 kps
New dat file peak memory usage 26.7mb speed 692.75 kps stdev 6.40 kps
So at best its 2% faster but I don't like the sampling error, It's obviously not slower which is very cool.
Good job Mike
Mike quick question about one of your new files:
In your SobDat_history.txt file
There are several columns n-min, n-max, (remaining) candidates are self explanitory but;
Under eleminated you have
results
n-min
n-max
Noted that: previous days candidates = Current days - results - n-min
do these define the following?
results - the number of results eliminated through sieve and p-1?
n-min - the number of candidates eliminated by double-check (supersecret) raising the floor?
maybe there's a time to update http://www.aooq73.dsl.pipex.com/
and please tell me, we should use this new file to sieve?
:bs:Originally posted by vjs
1% would be a good increase, I thought it was said before that simply a new dat file wouldn't change the speed at all. Only something drastic like removing a k from the file would speed things up but even then removing 1/11th of the file would only yeild something like a 3-5% increase if that.
So in other words were probably going to have to wait until we get another prime for another dat file.