Results 1 to 8 of 8

Thread: New Sieve Client Updates

  1. #1
    Moderator vjs's Avatar
    Join Date
    Apr 2004
    Location
    ARS DC forum
    Posts
    1,331

    New Sieve Client Updates

    O.K. here is a direct copy paste from the creator of the new sieve program himself.



    b2 wrote:
    vjs wrote:
    Couple question,

    Will we still use a dat file...

    Have you tested it against previous sieve efforts....

    Joe_O and myself have sieved several 100G worth of large n upto 100m, I could send you the factors/dat's for comparision/testing purposes...

    I'd also like to test this client against high n and large ranges of n eventually.


    I believe Chuck and I identified problems in the sieve code that only pertain to our project. Some code after an if riesel statement leave doubt to whether our sieve can accurately handle n>20million very well at this moment.

    I'll defer to Chuck to explain this issue. I believe Chuck may have already taken care of the problem in the future sieve.



    From Chuck....

    Ok.. guess it's my turn to fill you all in since I've been so quiet.....


    B2 is correct in that we cannot guarantee... specifically, if i may rephrase, have a very low confidence, of the current version's ability to go above 22M... and definately know that above 24M, there is almost zero confidence. I have factors which are legitimate, but the program ignored / discarded. If it DOES produce any... most are good... but no guarantees.

    I can share the math... but the problem lies with the numbers 1<<25, 1<<22, 1<<19, and 1<<13. these do not work properlly when there are more than 20 K values.... Nor will it work at all above P > 2^50.

    It will appear to work, but it is waisting time... part of the software is making a very honest effort to produce a real factor (and does).... only to be discarded or incorrectly 'reassembled' before printing out. This is because it detects a 2^25th partial factor and discards the entire factor. This is incorrect logic. the factor should be reduced to the lowest set of primes FIRST, recalculated in terms of K.2^N +/- 1... and THEN the test if N > 25.....

    The reverse method (which I use), is to ensure the final N<25 during the factoring / reduction / reconstruction process...

    Hope that makes sense.. it is difficult to describe multiple 32 and 64 bit words that you 'slide' above and below each other.... (in a stack) until they cancel out... (the cross cancelling is the key)..


    The existing bitmap is basically flawed due to coding errors and, mostly, hardware changes. It was originally designed for SOB, and this is obvious in the # of bits used as boundary conditions. It was also designed for fewer than 20 'K' values.


    This is all very very evident if you were to do the following....

    1) sieve 1G -> 24T, 24->48T, 48->76.8T, 76.8->96T, and finally 96T -> 112T.... The sieve will collapse at 112T. You will see speed slowdowns at each marker. These correspond to the bit positions 2^45, 2^46, 2^47, 2^48, 2^49.

    or

    2) sieve any range 2 -> 20M....record the results. clear the files. (startfresh).. and run 20-40M... you will notice 4000days of work vs 1 hr of work. Then do it again (full reset) at 40-60M.





    I am not criticising the design, coding or anything........PLEASE don't get me wrong... it was adapted from +1 to -1, it is a 3000+ line of code highly complex work.... The translation was a non-trivial task. I am pleased how much works as well as it does. It is quite remarkable. It was created before 64 bits was "Native"... now 64 bit math is 'normal'.





    In the new sieve, none of this is a problem because 2^50 is not a limit , nor is 2^25 within the sieve.... The only limits are:

    a) Nmin -> Nmax factor values are searched (2^2 -> 2^64) fast mode.
    aa) Nmax > 2^64 is a bit slower.

    b) Nmax < 2^25 (an LLR limit for now).... so even putting 20M <=N <= 40M would gain nothing...
    ..... UNLESS it can refactor it.... Otherwise it would chop at 2^25 (32meg) if it cannot find a way to reduce N to within the LLR limit and then you are subject to the possibility of a 'duplicate' or 'out of range'.



    Essentiallly... after I complete the repairs to the existing sieve, which take priority, I will finish the new sieve.

    This will allow us to maintain a complete verification database of factors output.


    Repairs / bug fixes are currently prioritized as:

    1) bitmap
    2) ranging (Scanning N, and Output 2^N)
    3) refactoring limit boundary conditions.

    Correct operation is priority 1. Correct ops result in performance. I will implement as many corrections as possible in the existing sieve as rapidly as possible (and time worthy) to make it equivalent to the new sieve with exception of the limits being removed (which is new sieve only).


    I will aim to make this sieve more efficient in it's current implmentation with NO changes to the math.... the core algorithms & techniques will remain the same. The end result will be less memory, less wasted cpu time, and fewer page faults... resulting in more P/ sec.


    I would be greatly interested in knowing the P range and/or value / conditions that caused the trap/core dump... It would point me to possibly yet another area that I have yet to examine.


    Please forgive the prose.. the hour here is late. If I am unclear in anything, please feel free to correct me.


    Thanks
    Chuck

  2. #2
    Moderator vjs's Avatar
    Join Date
    Apr 2004
    Location
    ARS DC forum
    Posts
    1,331
    perhaps someone with great programing knowledge could offer a hand???

    Chucks a really great guy and would probably welcome the help from the right person.

  3. #3
    Moderator vjs's Avatar
    Join Date
    Apr 2004
    Location
    ARS DC forum
    Posts
    1,331
    Anyone???

    From chuck....

    please help me with the smallest tightest, non assembly C code to do pow2mod and powmod() you can think of?

    I am spending 25+% of my time in powmod()... " e & 1" isn't hacking it after you decide.... and the sqr_mul_mod, etc... that all has to get straightened out...

    We also can use some nice Pseudo-code for <SIMD> (then we can have macros for the assemblers or STL ... this would let Mac's and their altivec's kick in ... Sieving would light on fire. PS: I would prefer #ifdef's and macros to STL as it's more portable... C++ is too expensive.

  4. #4

    I'll give it a go

    I'm interested in trying to help out, but I'll need more information:
    * What algorithm is currently being used?
    * Chuck's email
    * Source code (at least to current functions wanting modification)
    * Is thier a test bench for this function already?
    * Other stuff I cant think of now

    I can't say that I will be the best for this job but I can certainly program and I have enough education to understand/workout the algorithms involved.

    So can you point me in the right direction?

    Craig.

    cAjBoChDnEsFtGoH@tIiJtKrL.MuNoOwP.QeRdSuT.UaVu
    remove capitals to despamify the email address.

  5. #5
    Sieve it, baby!
    Join Date
    Nov 2002
    Location
    Potsdam, Germany
    Posts
    959
    Maybe a look here could be useful. GMP is a very fast math library, so the algorithms there should be quite optimal.

  6. #6
    Moderator vjs's Avatar
    Join Date
    Apr 2004
    Location
    ARS DC forum
    Posts
    1,331
    Craig and Mystwalker,

    Thanks Guys/Gals, What I'm going to do is forward all of these messages/e-mail responses to chuck in a semi-private thread. I'll let Chuck take it from there.

    This topic is still open...

  7. #7
    Senior Member
    Join Date
    Dec 2002
    Location
    Madrid, Spain
    Posts
    132
    Have you tried contacting Mikael Klasson, Paul Jobling and/or Phil Charmody?

  8. #8
    Moderator vjs's Avatar
    Join Date
    Apr 2004
    Location
    ARS DC forum
    Posts
    1,331
    Reply from Chuck,

    cjohnston check your pm's (please read the post below, humm, I think you have pm turned off in your profile, PM me to get chucks e-mail)

    Copied...

    Vjs,
    Ok, I will *try* to write again without hitting the WRONG mouse button. (to be blunt honest, the mouse is now on the desk BEHIND ME where i can't reach it! LOL).

    So, let's see if I can get back to where we were before the Porclain got into the picture

    You are definately right that Proth has a severe problem with 2^48. This gives you a 'max prime' it can factor of 2^24. Now, we all know that max prime is 2^25... the trick is simple, pull out one '2' and shove it to the front of the prime list (having pulled it from the exponent of 2^n), hence our 2^50 and 2^25 max prime.

    Now, as for the hash bits and all that.... In SoB, i believe (please correct if wrong), you had 12 original K. This gave rise to the proth design of 12 hash bits and the 22 bit sieve window size. Seeing if the bit was set or not (as compared to P) made for a mega fast pre-sieve which is followed by a mapping of +1 or (for riesel: -1) factors. (( I had previously written a great deal more, but hope this makes more sense and is clearly less verbose.

    Where Riesel has an issue is the large quantity of K. Your (and I believe B2's) orignal statements and calculations about slicing into blocks of 12 and sieving blocks of 12 taking longer do indeed hold true in that B*sqrt() style fashion, where B is the number of blocks it get's sliced into. Nowever, there is one thing that none of us took into consideration... specifically the 1<<25 max prime limit. The bitmap interleave falls apart if overloaded with large N and numerous K in Riesel (am not sure about SoB). If you hit a prime that is > 2^25.. it's all over. You will find that you 'sieve like a race horse' but you are really accomplishing nothing because the -1/+1 factor mapping and the mod bitmaps are no longer in sync and will never align to produce a factor.

    This is all because of the one thing that I find is commonly done is a 5:1 interleaving of hash bits. You will frequently see the Code:
    mask[j>>5] |= 1<< ( j & 31)
    . If i understand it corectly, the initial mod against P, using the remainder as the seed, sets the starting point of which 'thread' into the interleaved list we take. This is a nice space saver in my opinion but also problematic as we now have a potential 5:1 overload of bits. (this is why many (Factor mod(10)) = 3... all because the 'mod 10' is a mod 2 and a mod 5 leaving the 3. There are other F mod(x) == m that show up as patterns for valid factors... All these are artifacts of the list of primes left over from factoring. The 'factor recombination' phase of proth eliminates these artifacts fairly well (reference S-P-H and the crt recombination phase). We are still stuck with forcibly testing the resultant factor, after the +1/-1 operation for primality. (rather ironic, isn't it... we want odd #'s, but we sieve on the even #'s and use the odd-number remainders to guide us because of this)

    As a side note, i'm finding the 5:1 overload / interleave significantly reduces the effectiveness of a single mod(2310) bitmap.

    I'm currently working on a way to expand this bitmap above 5. If I can simply get to 509 it would be a huge improvement, but wonder if 509 is worth the effort and is the answer as it is huge by comparison and leaves the high 3 bits (of 512 bit wide field unused). It definately is more efficient per word. I'm experiementing empirically with various mask bit widths and mask values, as well as expanding the resultant bitmap from 32 bits to 64 bits in width. 23 and 29 are holding some promise for non-interlaced bitmaps, but nothing definative at this point. (I am litterally running tests on 1->100T)..... What this will do for us is reduce the extra mod()s that needs to be done and painting of q throughout the bitmap and significantly reduce the effort required to construct the final factorization. (ie. less overloading) not to mention not break at 2^50. It would break somewhere in the 2^25000 range (as the largest prime value in a factor list)


    If you're wonding why 509 (ie 512)? simple... 8 -> 64 -> 512.. 512 is the next in the progression and 509 is the closest prime. However, this presume a 512-bit wide computer and i don't care to do everything in 512 bit increments (even though GMP can quite easily). This means instead of working in 8 & 64 bit increments, i would be working in 64 and 512 bit increments with the greater ease as i don't have the overload until P is well into 'practical infinity' and our projects complete..

    Peter Montgomery showed basic 'Montgomery Math' with 64 bits... I'm working on some functions that would allow us to extend that with the nice, clean cutoff at 32 bit intervals (which proth does now).


    However, in extending to 512 bits, I have to get it working on paper first. I don't quite have the relationships defined correctly right now as what I have shoots 2310 right in the foot. Killing the first 4 primes is not a good idea unless i want to make a special case of them. Assuming I do.... (which I will) i ask this... we are K*2^N.... could we not eeasily factor a K*2310^N back to K*2^N form by simply multiplying in the 3, 5, 7 and 11 into the 'N' value? If that's true, then what about any base which is a composite of the base primes? Where is the limit? Is there one? It would certainly open up the Max P value as well as the Max Prime factor limit.



    With respect to the the GMP and GIMPS/Mersenne forums... I love GMP... The Mersenne forums are restricted too much for my taste. Wolframs Mathworld has been the greatest source of pure knowledge. It has taken me past all those things I learned in German but could never translate to English as but one example (another is The pure L-L equation which lead to countless ideas and insite to fixing / extending our existing proth sieve... yeah, you are catching my drift... a sieve that can pop out factors for various uses, including MM, as well as popping out factors which are primes themselves and not just (factor mod(P) == 0). I am seeking the math advancement... stepping up the science... not the prize and all the rules for claiming it. Please forgive me if I offend anyone... but anyone with any smarts is using their own techniques (which they have directly mapped to GIMPS.


    If you're good at bitshifting... and that is the key.. then 512 bit fields become a reality very very quickly. All our processors have 1024 bit cache line widths... so 2 entries / row fits REAL nice. I think that using GMP as the high-end 'titanic' handler and native 32 or 64 bit for the lower end finite work will be the key to our success. We can play with the individual branches of GMP easily and they are VERY fast. Done properly, our 512 bit digits are directly accessible.

    I'm going to stop here for now, it's 1am... I have been busy today and tomorrow will be busier... tomorrow I must see the Drs, make preparations for travel and tomorrow evening I must travel for a Tuesday @ 10am appt. The earliest I will probably be online is Wed.


    I will email (and thank you for the spamproofed address).

    Have a good night.
    Chuck

    PS: i have the mouse back, so here we go... and PLEASE let me konw what i missed... I do apologize for being so tired.
    -----------

    Thanks Guys and gals here's a response I'm removing his e-mail from the post.
    Please don't bug him unless you can help, he's really a great guy I just want to avoid alot of send me your program so I can't test it. I do that too much myself, ceselb already has a copy and is testing helping out.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •