Results 1 to 24 of 24

Thread: New .dat

  1. #1
    Senior Member engracio's Avatar
    Join Date
    Jun 2004
    Location
    Illinois
    Posts
    237

    New .dat

    Congratulations to all of us. Great job. Is than mean we need a new .dat already? Woohoo

    e



  2. #2
    until then
    ive just manually deleted 27653 from my .dat and gained about 10% in sieve speed

    which i have placed here -> http://shoelaceplace.homeip.net:7777/sobdat.html
    Last edited by ShoeLace; 06-15-2005 at 09:59 AM.

  3. #3
    Moderator vjs's Avatar
    Join Date
    Apr 2004
    Location
    ARS DC forum
    Posts
    1,331
    I'll have a new dat posed asap...

    I'd also like to encourage people to start sieving with the 991<n<50M dat.

    BTW 10% increase is a little high I found closer to a 6% increase in sieve speed.

    Joe is currently working on processing a new 991<n<50M dat with all factors applied and the k removed of course.

  4. #4
    This K removal should make up for the added time taken with sieving a larger range, so those people currently using the smaller sieve range can switch to the larger sieve range with no speed decrease maybe?



  5. #5
    Moderator vjs's Avatar
    Join Date
    Apr 2004
    Location
    ARS DC forum
    Posts
    1,331
    Close it will make up some of the difference, or at least people will see less of a speed hit. But the 991<n<50M dat will still be 15-20% than any 2M<n<20M dat unfotunately.

    Perhaps Louie, Dave, and Mike will now consider all n<50M in the server etc, and other suggestions.

    To let everyone know 991<n<50M is 100% complete to 44T and very close to having all <50T complete.

    Perhaps a new results.txt with all n<50M and p>50T or P>40T is inorder???

  6. #6
    Moderator vjs's Avatar
    Join Date
    Apr 2004
    Location
    ARS DC forum
    Posts
    1,331
    O.K. hopefully a new 991<n<50M dat will be out momentarily...

    Joe sent it to me and I sent it to Mike of teamprimerib.

    http://www.teamprimerib.com/vjs

    It's called SoB_9k-Prime_991-50M.zip

    It extracts as SoB.dat.

    Simply stop your sieve client paste over the old dat and start the client again.

    I'm seeing a 5-6% speed increase.


    Switching from the old 2M<n<20M 10k dat to this 9K-991<n<50M dat will only show a 10-15% speed decrease but you will be producing 220% more work when considering the 20M<n<50M factors.

  7. #7
    Moderator Joe O's Avatar
    Join Date
    Jul 2002
    Location
    West Milford, NJ
    Posts
    643

    New 991-50M dat

    Zip versions of the 991-50M dat for 9 K values are now available on both the Yahoo groups.

    BZ2 versions are also now available on both Yahoo groups.


    Private group(required yahoo ID and signup): http://groups.yahoo.com/group/SierpinskiSieve/

    Public group (required yahoo ID): http://groups.yahoo.com/group/kerobertsdatfiles/

    Yes these are both sub-par but they work
    Last edited by vjs; 06-15-2005 at 03:11 PM.
    Joe O

  8. #8
    Both yahoo groups?!? A link to them wouldn't hurt...



  9. #9
    Increased my kps by 30, thanks



  10. #10
    I just thaught about a thing... when we remove a k by PRP and take out the tests from the dat, we have the impressision to sieve faster, because the rate is going up, but in fact we find less factors! The tests were eliminated by PRP, not by sieving. If we take out a test eliminted by sieving, then sieving becomes really faster.
    Well, all this doesn't mean very much, just in order not to be happy about a thing that doesn't exist.
    We find less factors by sieving, but we have a prime, and we have saved a lot of PRP work.
    H.

  11. #11
    I love 67607
    Join Date
    Dec 2002
    Location
    Istanbul
    Posts
    752
    Originally posted by hhh
    when we remove a k by PRP and take out the tests from the dat, we have the impressision to sieve faster, because the rate is going up, but in fact we find less factors!
    Yep. And if I recall correctly, the increase in sieve speed is less than the decrease in number of factors. Still, this is a good thing, as we become closer to the finish line...


    Originally posted by hhh
    If we take out a test eliminted by sieving, then sieving becomes really faster.
    I doubt that... (or may be did not understand what you meant ). If you mean taking out individual k/n pairs that were factored, then we're already doing that and it has minimam, if not none effect on speeed. On the other hand, if you mean elinimating the whole set of ns for a particular k through sieve (which will not be the case for the remaining k obviously), that in fact implies that we have a covering set for that k, thus a sierpinski number.


    Originally posted by hhh
    Well, all this doesn't mean very much, just in order not to be happy about a thing that doesn't exist.
    We find less factors by sieving, but we have a prime, and we have saved a lot of PRP work.
    H.
    In fact, to my understanding, when we find a prime through PRP, it also means that we save sieving work as well, because what at all matters is the factors for the remaining k as far as sieving is concerned. Once a prime is found for a particular k, all of the factors found thereafter for that k are not worth a dime any more (as far as the project goals are concerned).

  12. #12
    Member
    Join Date
    Dec 2002
    Location
    new york
    Posts
    76
    I have a question about this 991-50M file...I switched from the 7meg sob.dat file to the newer one that has only 8 k's and I got a small boost. Now, if PRP has already scanned n < 6.5 million for all remaining k's (and in some cases higher), why does the dat file go all the way down to 991? What am I missing?

  13. #13
    Its mostly the K values adn the max value that determine the spped of the siever so the .dat is also used as a record of all values that don't have factors. Myself and a few others have been playing with the small N values trying to factor them all just for fun.

    once the second pass and error check running iscompleted there may be some benefit to trimming the bottom out of the .dat but I've been told the speed boost would only be perhaps 1%. Its pretty much just a waste of time ot een change over i mean if you realy think the 1%would be wort hthe effort to trim the .dat file. Perhaps with the DoubleCheck reaching 5 million though there may be some potential for more gain. Does anyone know exactly what the speed increase would be?

    Factors have been found for every value below 1000 except the n value of 991 and K value of 24737. Itsbeen checked up to the 5 0 digit level and isbeing pushed towards the 55 digit one. This is ofcourse not P-1 factoring but ECM. currently I'm running some ECM curves on 10223^1181+1 i believe it is the 3rd smallest still unfactored. This doesn't really have any practical value exceptto see if we can doit and because finding big factors gets ya on Mike H's stats page. Basically a pissing contest but its fun. Who knows factoring 991 might even yeild a record ECM factor.

  14. #14
    Member
    Join Date
    Dec 2002
    Location
    new york
    Posts
    76
    >there may be some benefit to trimming the bottom out of the .dat but I've been told the speed boost would only be perhaps 1%.

    Thanks, this is what I was wondering. And I think what you are saying about doublechecking is that there is actually PRP going on in that low range.

  15. #15
    I love 67607
    Join Date
    Dec 2002
    Location
    Istanbul
    Posts
    752
    See http://www.seventeenorbust.com/secret/ for current assignment queues.

  16. #16
    yes some testing but its only scattered tests at the moment. Hopefully we'll have matching residues for everything up to 5 million or higher and we'll know for close to 100% that none of the mare prime. I guess this is just the admins being extra cautious after finding the missed prime. We wouldn'twanna find out later that there was another missed prime that we had missed twice. Especially since the third tests is just for cases when non matching residues are returned which is only about 5% of total N values.

  17. #17
    so now that in a few short weeks we'll have two residues for everything under 6 million is it reasonable to say that a cropped .dat may result in signifigant speed increases? if anyone has tried this out please do-tell... Of course we'd still have to keep a master .dat file but this would only be needed by people playing with low-N factoring. For the sievers and main PRP factorers it would be useful to eliminate any N values that already have matching PRPs.

  18. #18
    I love 67607
    Join Date
    Dec 2002
    Location
    Istanbul
    Posts
    752
    IIRC, the speed difference between 19m dat (1.xm-20m) and 50m dat (991-50m) was only 15%.

    I guess cropping the dat file to 6m (or 5m, whatever) would contribute only 2-3%.

    And, as we're left with (relatively) very little to sieve, I'd prefer we keep the dat file the way it is for the moment. There's not much boost anyway.

    But, of course, once we're finished with 2^50 and left with the second sieve run only, it might be a good idea to revisit this issue looking at where min n (at secondpass PRP) and min p (at secondpass sieve) will be at that that point in time, check the relative speeds, and decide accordingly.

  19. #19
    thats basically what I'm saying check speeds and decide accordingly. If there is a 5% or more gain i believe it would be worth while.

  20. #20
    Senior Member
    Join Date
    Jun 2005
    Location
    London, UK
    Posts
    271
    I disagree with this, and here's why (with a simplified example).

    The efficiency of the sieve is down to how much work it has to do (funnily enough).

    Take k=67607 for example. If you look at the n values that we need to find factors for, they fall into these residue classes:-

    67607: n = {11,27} mod 40

    (The number 40 comes from the fact that it is the minimum increment value in the dat file for k=67607).

    proth_sieve cannot handle multiple modular residues so it reduces the mod size until there is just one residue:

    67607: n = {7, 11} mod 20
    67607: n = {1, 7 } mod 10
    67607: n = 3 mod 8

    So in the BSGS stage it may need to check 50M/8 = 6,250,000 possible values.

    If you run the sieve on a smaller range of numbers you will not be removing n's from the higher parts of the range, which will have no impact on these relations.

    By running the sieve over the full range there is more chance that we can increase the modulus value and therefore check fewer values.

    I know that Joe_O and Chuck are working towards this for jjsieve. Instead of reducing down to a single residue, it can handle multiple residues, i.e.

    67607: n = 11 mod 40
    67607: n = 27 mod 40

    That way it only needs to check 50M/40 = 1,250,000 * 2 = 2,500,000 values (compare this to 6.25M before).

    There may even be faster relations that can be found, however to find the most optimal set of relations takes a bit of time, something that the sieve could not feasibly do every time it starts up (although it does put in a reasonable effort). I'm knocking up a simple program that tries to find the most optimal small set of relations that covers all of the n values. This program only needs to be run once per .dat file so it can be given plenty of execution time.

    If the program finds that a more optimal set is possible then it could be given to the siever in a seperate file (or encoded in the .dat file somehow).

    Indeed, for k=67607 it can cover all n values with 4 residues mod 360. Meaning it has to check 4*50,000,000/360 = 555,555 values.

    This doesn't translate directly into a 10x speed improvement, as BSGS is just one part of the overall algorithm and, depending on how SPH turns out, may not even be used for a particular p. But it does provide a certain speed boost.

    The other, quite rare, possibility is that one of the residue classes has a very small number of members. Small enough that hopefully, with some P-1 or ECM effort, factors can be found for all of them, then that relation is not needed any more. However I haven't found any relations with a small enough set yet.

    In conclusion, running the sieve on reduced n ranges may offer some very short term gains but in the long run it is much more efficient to sieve to the maximum range (50M).
    Quad 2.5GHz G5 PowerMac. Mmmmm.
    My Current Sieve Progress: http://www.greenbank.org/cgi-bin/proth.cgi

  21. #21
    Moderator vjs's Avatar
    Join Date
    Apr 2004
    Location
    ARS DC forum
    Posts
    1,331
    I also disagree with reducing the dat size at this point. There probably a 4% increase at most however you never know what we are going to be doing with the produced factors at a later point... remeber the 3M<n<20M dat, that didn't work out well in the end.

    There is also a group of people who are combine sieving with psp the reduced dat wouldn't work well for that at all.

  22. #22
    Unholy Undead Death's Avatar
    Join Date
    Sep 2003
    Location
    Kyiv, Ukraine
    Posts
    907
    Blog Entries
    1
    vjs, can we gain some speed improvement by repacking dat file now.
    or it can be repacked only after next prime? ie there's nothing to remove from it except "k".
    wbr, Me. Dead J. Dona \


  23. #23
    Moderator vjs's Avatar
    Join Date
    Apr 2004
    Location
    ARS DC forum
    Posts
    1,331
    Sorry for taking so long to post. I will make a fairly detailed post so that everything is covered for those who want to know. it's been a while since we had a dat discussion.

    The only reason for "repacking" the dat right now is to reduce it's size and memory footprint. Basically the dat consists of k's and n's which we will have to test.

    The k's can only be eliminated through the finding of primes.
    The n's which are stated as the lowest n and all larger n's, are stated as sum of the lowest n and all +###'s up to that particular n, whew!!!.

    The +###'s can be removed as factors are found. We are finding factors everyday, however the majority of those found are actually in the combined PSP/SoB arena.

    Regardless since we have sieved quite far in both SoB (~1150T) and SoB/PSP (~450T) the shrinkage of the dat due to found factors is getting less and less. So when the dat is repacked the shrinkage is mininal as well as the subsequent decrease in memory footprint.

    Simple repacking of the dat at this point does not speed up the sieve client either.

    So no repacking of the dat will not speed up the client the only thing which will speed up the client at this point is removal of k's. On that note even removing 1 k will not significantly speed up the sieve either. The speed decrease in moving from a SoB 8k to a SoB/PSP 19K dat was not that significant eith considering the increased amount of work done.

    Joe, Chuck, and others have also done such a wonderful job with the new client that dat repacking is not really required anymore from a performance standpoint (fixed memory leaks, and some other issues where dat repacking was helpful).

    The next dat will probably come out due to one of three occurances:
    - finding a prime in SoB
    - finding a prime in PSP
    - just b/c joe wants to*

    *- One of these days Joe may just decide for one reason or another non-project specific to generate a new dat. If this happens it won't be critical to download the new dat file.

    All in all lets just look forward to finding another prime.

  24. #24
    Unholy Undead Death's Avatar
    Join Date
    Sep 2003
    Location
    Kyiv, Ukraine
    Posts
    907
    Blog Entries
    1
    thank you for wide explanation. )))

    and ofcourse if we can gain 1% of memory or speed - this IS the GAIN!
    wbr, Me. Dead J. Dona \


Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •