Page 1 of 2 12 LastLast
Results 1 to 40 of 63

Thread: How about new .dat file?

  1. #1
    Unholy Undead Death's Avatar
    Join Date
    Sep 2003
    Location
    Kyiv, Ukraine
    Posts
    907
    Blog Entries
    1

    How about new .dat file?

    Well, almost year have passed (okay, 8 months) since new .dat file arrived.

    Can we gain something from creating new dat file?
    Any performance increase or lower memory usage?

    I think 1% is good increase.
    wbr, Me. Dead J. Dona \


  2. #2
    Moderator vjs's Avatar
    Join Date
    Apr 2004
    Location
    ARS DC forum
    Posts
    1,331
    1% would be a good increase, I thought it was said before that simply a new dat file wouldn't change the speed at all. Only something drastic like removing a k from the file would speed things up but even then removing 1/11th of the file would only yeild something like a 3-5% increase if that.

    So in other words were probably going to have to wait until we get another prime for another dat file.

  3. #3
    Senior Member
    Join Date
    Jan 2003
    Location
    U.S
    Posts
    123
    I think that a better thing to change would be the results.txt file. That file contains all factors of p>25T. It should be changed to all factors p>200T, so the file won't take that long to download.

  4. #4
    Originally posted by Moo_the_cow
    I think that a better thing to change would be the results.txt file. That file contains all factors of p>25T. It should be changed to all factors p>200T, so the file won't take that long to download.
    *nod* 275T at this point, 300T soon. This is in regard to P1, right?

  5. #5
    Moderator vjs's Avatar
    Join Date
    Apr 2004
    Location
    ARS DC forum
    Posts
    1,331
    I second the p>300T aside from 2 small ranges almost all is sieved below 300T.

    How difficult is this to do anyways?

  6. #6
    Based on a guess, it should be easy. Just edit a config file or a source file, and replace 25 with 300, etc. Hopefully that is the case.

  7. #7
    Sieve it, baby!
    Join Date
    Nov 2002
    Location
    Potsdam, Germany
    Posts
    959
    As PRP double checking soon reaches 1M, we could increase the lower bounds of the sieve somewhen in the near future...

  8. #8
    Moderator vjs's Avatar
    Join Date
    Apr 2004
    Location
    ARS DC forum
    Posts
    1,331
    Hopefully this will be the case as well when we find another prime.

    From what I understood we are actually using a dual range effort right now, 1m<n<20m.
    Before there were two sieve ranges 300k<n<3m and then they added 3m<n<20m.
    It was said somewhere before that there was very little slowdown by switching from 3m<n<20m to 1m<n<20m.

    But I'm sure that if we reach 2m in double checks or even 1.5m etc they will create a new dat file from the lowest double check to 20m using the remaining 10k only and include all the found factors etc. Even if it only yeilds a 2% increase it's something to hope for inaddition to the prime!!!

    BTW how is your sieve range coming along? I know your quite active in P-1.
    550000-550500 Mystwalker

  9. #9
    Moderator vjs's Avatar
    Join Date
    Apr 2004
    Location
    ARS DC forum
    Posts
    1,331
    There has always been guessing about how much of a difference removing one k from the sieve would make to sieve speed, so I did a test.

    Basically eliminated 1 k from the dat file, 33661.
    33661 it's a fairly heavy k with about 11% or the k/n pairs.

    Ref: http://www.aooq73.dsl.pipex.com/scores_p.htm

    Basically my sieve speed went from 689kps to ~723 kps a 4.8% increase.



    The one thing I'm not clear on is how many k/n's have been eliminated since 1/1/2004, the creation of the new dat file. I'm estimating 8,000 but if I were to over estimate that 16,000 were eliminated for example, the difference in sieve speed increase would only be ~1.25%.

    This is assuming that the 4.8% speed increase comes from reducing the number of k/n pairs in the dat file only, not the number of k values (probably not correct). So currently, by my napkin guess'tamations the best we could hope for would be a 0.7% speed increase by updating the dat file.




    Let's hope we get a prime for 55459 is prime soon it account's for roughly, 15% of the k/n pairs, 67607 only contributes 4% of the k/n pairs.

  10. #10
    Senior Member
    Join Date
    Jan 2003
    Location
    UK
    Posts
    479
    The one thing I'm not clear on is how many k/n's have been eliminated since 1/1/2004, the creation of the new dat file. I'm estimating 8,000 but if I were to over estimate that 16,000 were eliminated for example, the difference in sieve speed increase would only be ~1.25%.
    Candidates eliminated since 1/1/2004 ~17000

  11. #11
    Moderator vjs's Avatar
    Join Date
    Apr 2004
    Location
    ARS DC forum
    Posts
    1,331
    O.K. thanks Mike on the update,

    Somethings weird I tried two more dat files.

    Orginal dat file on my machine 690 kps

    Without 33661 yeilds 723 kps (4.8% speed increase)

    Without 67607 yeilds 743 kps (7.6% speed increase)

    Without 33661 and 67607 (only 9k) yeilds 790 kps (14.5% speed increase)

    Can someone expalin this 67607 has alot less k/n's than 33661, 4% compared to 11%???

  12. #12
    Moderator ceselb's Avatar
    Join Date
    Jun 2002
    Location
    Linkoping, Sweden
    Posts
    224
    This sounds like a topic for mklasson.

  13. #13
    Sieve it, baby!
    Join Date
    Nov 2002
    Location
    Potsdam, Germany
    Posts
    959
    Originally posted by vjs
    Can someone expalin this 67607 has alot less k/n's than 33661, 4% compared to 11%???
    Basically, the number of k/n pairs have hardly any influence on the performance. Maybe 67607 is computationally more intensive as it is larger than 33661. I'm not sure, but the FFT size of the new factor for example heavily depends on the k size...

    Setting the lower bounds to 2M would give a speed increase of 2.7% if effort was O(sqrt(n range)). It is less, so it might be like 1.5-2% - which is better than nothing, I think...


    BTW how is your sieve range coming along? I know your quite active in P-1.
    550000-550500 Mystwalker
    To be honest, I'm not doing any progress currently, as sieving speed in that range is halved due to unknown reason. I hope there will be an explanation (and a solution) once the main sieve effort reaches this depth. So it serves as a spare range I fallback to once in a while...

    Concerning P-1, I'm more like a part time factorer.

  14. #14
    I love 67607
    Join Date
    Dec 2002
    Location
    Istanbul
    Posts
    752
    As you would remember, the speed difference between 3m-20m and 1m-20m (or was it 300k-20m?..) was a mere ~1%.

    It seems to me like we'll (hopefully) find a prime before double check PRP reaches 3m. So, updating the dat file when we find our next prime, and deciding on the lower bound based on the level of double check PRP at that time sounds reasonable to me.

    And another note on the speed differences between taking out 33661 and 67607. May be, it has something to do with this:
    Attached Images Attached Images

  15. #15
    Senior Member
    Join Date
    Feb 2003
    Location
    Sweden
    Posts
    158
    I wouldn't count on any significant speed increase from removing n candidates in the dat. Removing a k is good though -- some 5% as vjs noted.

    The reason why removing 67607 gives a better speedup than 33661 might be because more p values can be skipped entirely then (I'm not sure if this is actually the case, but it sounds like a reasonable explanation). The p values that are potentially interesting for 33661 could happen to share that p with other k:s most of the time, while 67607 might not.

    The data Nuri posted means that 67607 has only 4 possible n values mod 360, while 33661 has 9 mod 360, but also that 67607 has 1 n value mod 8 while 33661 has 1 n value mod 24. The first part is good for 67607, but the latter is bad and could possibly influence what I wrote above. I'm too rusty...

    Mikael

  16. #16
    Sieve it, baby!
    Join Date
    Nov 2002
    Location
    Potsdam, Germany
    Posts
    959
    Good to see you, Mikael!
    I recently encountered a small feature request for proth_sieve:
    How about a switch to start processing after a parametrized amount of time (like "-d 10" for a delay of 10 seconds)? This helps getting things into order when proth_sieve is loaded at startup.

    Especially RieselSieve sievers with their 29MB(?) dat file could possibly improve the startup as a whole...

  17. #17
    Senior Member
    Join Date
    Feb 2003
    Location
    Sweden
    Posts
    158
    Dennis,

    http://mklasson.com/proth_sieve_042b.zip should do the trick. "-delay 10" waits 10 seconds at startup.

    Please test to make sure it's not slower or anything -- I don't really remember if I've done any other changes since the last release.

  18. #18
    Sieve it, baby!
    Join Date
    Nov 2002
    Location
    Potsdam, Germany
    Posts
    959
    Thanks!
    I haven't tested it yet, but I'll look at the performance.

    Concerning performance, I created a SoB.dat file with 2M < n < 20M and tested it.
    The observed performance increase was a mere 0.61%...

    (Test was made over a range of 0.1G, starting at 400T - I took 10 intermediate result of 0.01G and averaged them; the computer had next to no load except proth_sieve, deviation of intermediate result was very low [perfmax - perfmin = 1 or 2 kp/s with perf ~ 280 kp/s; only first intermediate result was a tad higher in both cases])

    I don't know the speed increase when leaving out already found factors - has someone made a test? 17K factors less (most likely still > 16K when lower bounds = 2M) could also have a positive impact...

  19. #19
    Moderator vjs's Avatar
    Join Date
    Apr 2004
    Location
    ARS DC forum
    Posts
    1,331
    I wouldn't consider raising the range until double check has passed at this point. n>1m actually takes a while to process on a machine now so it's not worth the slight speed gain. Even if it were 3% I'd still say leave it at 1m.

  20. #20
    how difficult would it be to have a dat file updated say daily or evn weekly with factors found and the double check region removed. I understand that the speed increase would be almost insignifigant but there would also be a reduction of the amount of memory that the program uses. I would see this as a benefit as the machines i use are used for sieving because of their shortage of memory. Other wise they'd be factorors. Also any increase, even a small one, does help the project and if it would be an easy task to have new .dats spit out automatically it would save the need for this discussion in the future.

  21. #21
    Senior Member
    Join Date
    Jan 2003
    Location
    UK
    Posts
    479
    how difficult would it be to have a dat file updated say daily or evn weekly with factors found and the double check region removed.
    Although I'm probably not the right person to do this, I do have all the data and the tools available to do this. Although I don't think there'll be any noticable performance gains, I really like the idea of a .dat file where the floor is slowly being lifted as the double check PRP effort proceeds. It would also be nice to address the situation for the P-1 factorers - having a fresh .dat file that very slowly shrinks each day is more logical than an ever expanding results.txt file.

    With that in mind I'll take a look. As you say, if nothing else "it would save the need for this discussion in the future"

  22. #22
    Moderator vjs's Avatar
    Join Date
    Apr 2004
    Location
    ARS DC forum
    Posts
    1,331
    Thanks mike this would be highly appreciated and would put this issue to rest as you said.

    Also if it's not too much trouble, the possibility exists, and you have the permissions available, it would be nice to have the list of k/n pairs that require testing from double check up. I know it's possible to generate this from a dat file but a simple text file autoupdated ?weekly? or at your leisure etc would be cool.

    Perhaps you could chime in on submitting factexcl and factrange.

    Since the double check effort just went through 1m I guess there is no reason to submit factrange.txt anymore.

  23. #23
    Originally posted by vjs
    Since the double check effort just went through 1m I guess there is no reason to submit factrange.txt anymore.
    The nice thing about factors is that they can be verified easily, and provides 100% truth when compared with matching residues which are only very close to 100%.

  24. #24
    perhaps we could also eliminate all tests that already have two matching residues also. I don't believe there are many of these but since they are not going to be retested anyways then there is no reason to include them in the sieve.

  25. #25
    Moderator vjs's Avatar
    Join Date
    Apr 2004
    Location
    ARS DC forum
    Posts
    1,331
    100% truth when compared with matching residues
    This is very true, but remember that inorder to pass double check it must have two matching residues. If the resides don't match it's tested for a third time.

    I think two matching resides is just as good as a factor for our purposes.

    In anycase these k/n pairs will not be tested again so there is no reason to sieve for them.

    perhaps we could also eliminate all tests that already have two matching residues also.
    Second this, if it's not that much effort, but I don't really think there are many of these are there???

    It would be really nice if the .dat file could be generated by the server on a weekly bases, the dat file should include, all k/n pairs that don't have factors or matching residues.

    How difficult is this to do, if its more of a bandwidth issue, our community could surely come up with some bandwidth.

    For that matter we could set up a bit-torrent, or yahoo/g-mail, etc.

  26. #26
    Senior Member
    Join Date
    Jan 2003
    Location
    UK
    Posts
    479
    Also if it's not too much trouble, the possibility exists, and you have the permissions available, it would be nice to have the list of k/n pairs that require testing from double check up. I know it's possible to generate this from a dat file but a simple text file autoupdated ?weekly? or at your leisure etc would be cool.
    How would you like the data in this text file to be formatted? Just a set of lines of the form

    k * 2 ^ n

    or something that's usable by another utility?

    Cheers,
    Mike.

    P.S. I've got the sob.dat daily update basically sorted, but I want to run it for a few days locally to pick the fluff off.

  27. #27
    Moderator vjs's Avatar
    Join Date
    Apr 2004
    Location
    ARS DC forum
    Posts
    1,331
    Perhaps others can suggest the best format I don't really care about the format I'm just your doing it.

    I would think the same format as the dat file would work best and take the least amount of work/effort. Something easily picked up in notepad or a simple text editor like the dat file.

    So listed by k then increasing n all in one column works just fine, (the actual n not n1+x=n2, repeat).

    Unless someone else wants something different for a good reason, I just see having each pair listed as k.2^N.... or k's followed by tabbed n's as an issue for people wanting to pick up the values in excel, stats, p-1 programs etc. Also it make the file about 70% larger if not more.

    BTW if you like me to beta test the new dat send me a copy, I'll double sieve some portion and send you the fact files etc.

    mustang35157L
    followed by
    yahoo dot com

  28. #28
    Senior Member
    Join Date
    Jan 2003
    Location
    UK
    Posts
    479
    I would think the same format as the dat file would work best and take the least amount of work/effort. Something easily picked up in notepad or a simple text editor like the dat file.
    Something like this do? There's about two days worth of PRP effort in the files (one for main effort, one for DC).
    Attached Files Attached Files

  29. #29
    Moderator vjs's Avatar
    Join Date
    Apr 2004
    Location
    ARS DC forum
    Posts
    1,331
    Mike I think this format is perfect.

    Any objections????

    Humm, I think everyone is just waiting for your sieve dat file, but nice work

  30. #30
    Moderator vjs's Avatar
    Join Date
    Apr 2004
    Location
    ARS DC forum
    Posts
    1,331
    Mike,

    You didn't metion yet if your new dat file would exclude those numbers that are already double checked by the old secret (or was it garbage)? Just curious...

    I assume your new dat file was created by using the old dat file and removing factors submitted and parsing it at the double check. Therefore not including those of old garbage etc.

    It almost makes me think we should try to bring back the old secret page for each k...

  31. #31
    Senior Member
    Join Date
    Jan 2003
    Location
    UK
    Posts
    479
    You didn't metion yet if your new dat file would exclude those numbers that are already double checked by the old secret (or was it garbage)? Just curious...
    The new sob.dat will each day remove all new factors found, and all those as a result of the floor being trimmed due to secret moving forward. I theory this means that candidates for residue-recovery are being ignored, but then the floor of the sob.dat has been 1M for some time, so secret was also being excluded.

    ...and to answer the original question, yes it will exclude the previous (and current) secret and supersecret efforts, but not those for the previous (and current garbage), since many of those are first time tests anyhow.

    Cheers,
    Mike.

    P.S. I all looks OK tomorrow, I'll post some links to the new files.

  32. #32
    I'm not sure but i believe he (VJS) may have been implying the 2,000,000 range tests performed by secret after it finished all of the below 1,000,000 tests it was scheduled to do.

  33. #33
    Moderator vjs's Avatar
    Join Date
    Apr 2004
    Location
    ARS DC forum
    Posts
    1,331
    Keroberts,

    Perhaps I'm mistakebut I thought some time ago perhaps a year, the que assignments were a little different.

    Supersecret was doing the lowest double checks (unchanged)
    Garbage was picking up the high n long tests etc (unchanged)

    But secret was doing some double check testing around 3m or 2m b/c there were thoughts that if a prime were missed it would be around that number.

    So in other words it was like the start of a double check effort from some medium value at the time. I think prp-n was around 5m then.

    In anycase this wouldn't account for very many k/n pairs.

  34. #34
    Senior Member
    Join Date
    Jan 2003
    Location
    UK
    Posts
    479
    But secret was doing some double check testing around 3m or 2m b/c there were thoughts that if a prime were missed it would be around that number.
    Yeah, you are correct, I'd forgotten that - it started at 2M, and looking back at my records got to 2016345. No, these tests haven't been excluded from the new sob.dat file, it's only about 400 candidates, which is about 5 days work of the current DC effort, so in the interests of simplicity I'd rather ignore those.

    The update seems to be working fine, so the links to the new files are ...

    The sob.dat file zipped (0.5 MB)
    A clear text version of the same (1.13 MB), just for info
    A very small zip file that has clear text .dat files that indicate the next couple of days of main and DC PRP efforts, and a log file that indicates the daily progress of the shrinkage of the sob.dat file, again just for info.

    These files will be updated daily at about 03:00 UK time.

    I would suggest that P-1 users seriously consider a regular download of this file instead of the results.txt file since this is significantly smaller (0.5 MB vs 1.81 MB), and while the sob.dat will slowly shrink, the results.txt will slowly grow.

    If anyone has any problems, please let me know. The sob.dat file that we've been using for a while is still acceable from here , please revert to that if you have any problems or concerns.

  35. #35
    Moderator vjs's Avatar
    Join Date
    Apr 2004
    Location
    ARS DC forum
    Posts
    1,331
    Good Job Mike,

    Trying it out on my fastest machine right now. First observations is that it doesn't increase the speed by a noticeable amount but the k/s jumps around a bit, generally I see a steady 680-684, but now I'm seeing anything between 670-700.

    I'll have to run it for a day or two to determine if it increases or decrease speed etc.

    Did you personally notice an anomolies???

  36. #36
    Senior Member
    Join Date
    Jan 2003
    Location
    UK
    Posts
    479
    I'll have to run it for a day or two to determine if it increases or decrease speed etc.
    Did you personally notice an anomolies???
    Thanks for the quick feedback. In the tests I've done I haven't seen any noticable speed change, but over time the slow lifting of the floor will speed things up.

    No, I didn't see any problems. The last sentence was just one of my usual caveats, kind-of a "there shouldn't be any problems, but if someone finds something and I don't pop by in the next few days ....."

  37. #37
    Moderator vjs's Avatar
    Join Date
    Apr 2004
    Location
    ARS DC forum
    Posts
    1,331
    Well the client with the new dat file seems to have stabalized,generally I don't look at it much except when I first sit down at the computer so I guess some varablity to the kps out put is standard.

    Anyways here are the results averaged from 300 reports in dat file

    Old dat file peak memory usage 27.4mb speed 682.86 kps stdev 17.4 kps

    New dat file peak memory usage 26.7mb speed 692.75 kps stdev 6.40 kps

    So at best its 2% faster but I don't like the sampling error, It's obviously not slower which is very cool.

    Good job Mike

  38. #38
    Moderator vjs's Avatar
    Join Date
    Apr 2004
    Location
    ARS DC forum
    Posts
    1,331
    Mike quick question about one of your new files:

    In your SobDat_history.txt file

    There are several columns n-min, n-max, (remaining) candidates are self explanitory but;

    Under eleminated you have

    results
    n-min
    n-max

    Noted that: previous days candidates = Current days - results - n-min

    do these define the following?

    results - the number of results eliminated through sieve and p-1?
    n-min - the number of candidates eliminated by double-check (supersecret) raising the floor?

  39. #39
    Unholy Undead Death's Avatar
    Join Date
    Sep 2003
    Location
    Kyiv, Ukraine
    Posts
    907
    Blog Entries
    1

    so it happend

    maybe there's a time to update http://www.aooq73.dsl.pipex.com/

    and please tell me, we should use this new file to sieve?
    wbr, Me. Dead J. Dona \


  40. #40
    Unholy Undead Death's Avatar
    Join Date
    Sep 2003
    Location
    Kyiv, Ukraine
    Posts
    907
    Blog Entries
    1
    Originally posted by vjs
    1% would be a good increase, I thought it was said before that simply a new dat file wouldn't change the speed at all. Only something drastic like removing a k from the file would speed things up but even then removing 1/11th of the file would only yeild something like a 3-5% increase if that.

    So in other words were probably going to have to wait until we get another prime for another dat file.
    :bs:







    wbr, Me. Dead J. Dona \


Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •