Page 1 of 5 12345 LastLast
Results 1 to 40 of 179

Thread: High n-range sieving

  1. #1
    Moderator vjs's Avatar
    Join Date
    Apr 2004
    Location
    ARS DC forum
    Posts
    1,331

    High n-range sieving

    This is the start of the high range sieving thread.

    What is proposed is to start higher n range sieving.

    Facts:

    1. There are more factors at lower p
    2. A 1m<n<100m dat file is quite large and may not even work with cmov
    3. Starting a new dat file from 1T wouldn't find additional factors below n=20m until we get to ~500T
    4. Some people believe sieving above 20m right now is pointless.
    5. Our factor density is decreasing, leading to difficulties in determined if a range has been sieved or not.
    6. This project will continue past 20m if it wishes to finish
    7. Sieving a 20m<n<80m range is a 60% the speed of sieving 1m<n<20m
    8. This speed is probably represenative of a 1m<n<60m file
    9. Eliminating k's with primes will speed up sieving
    10. A new client that hasn't made it to beta yet may increase sieve speeds by 5x.
    11. Factor files don't take up alot of space, if they do as a community we can find space for them.


    It is my belief we should soon start sieving a larger range always including those n's above double check.

    Example stop the [dc(double check)<n<20m] at some T say 600T, once we find a prime, or once we find a new client.

    Continue from that point forward with a new *.dat (dd<n<some high T) I suggest 80m others believe 100m.

    I don't think it's right to sieve 1-20 then dc-40 then dc-60 then dc-80 etc... It doesn't make sence. Widen the sieve range to the max we are going to do asap dc<n<100m, then continute sieving to very deep p.

    Alot of factors will be removed from n=20m+ at low p this can be a second effort for those who wish to do it now, Or we could start the low p range once we get to n=18m.

    Thought's comments...

    We could also use the low p-range to test the new client.

    Thanks to all

  2. #2
    It is my belief that the sieving effort is already fragmented to the point of being almost ineffective. The sole purpose of sieving is to eliminate canidates from the prp tests. All the low n ranges being sieved in the dc ranges should be reserved for the slowest machines as this area is not or should not be a high priority. Everything else available should be sieving, followed close by the P-1 effort and just ahead of the main prp testing.

    Our focus should be on eliminating as many canidates as possible as fast as possible just ahead of the main prp test range.

    Moving off into the distant future of prp testing (sieving in the >20m range) is IMO a waste of resources. It will be years before the main prp testing could hope to be in that area. By then there may well be processors that could prp test those numbers faster than we can now sieve them.

    Then again, what do I know.




  3. #3
    Moderator vjs's Avatar
    Join Date
    Apr 2004
    Location
    ARS DC forum
    Posts
    1,331
    minbari,

    Yes you are correct that sieving n's out past 20m is sort of useless at this point with cmov. Cmov is still good at factor production test elimination on a per machine basis and is still much better with sieve at eliminating k/n pairs then prp'ing with p4's and the new prp client.

    However there is a possibility of a new sieve client in the works that could speed the sieve up greatly. Currently there isn't any reason to believe that it even works but in the mac sieve client discussion there is a link. That link directs you to the riesel forum where someone is creating a new client, if it works for sob.dat is still questionable....

    If it works as reported the intial speed increase was from something like 200kps to 1800 kps. If this is the case, we could start sieving a 1m<n<100m dat file with a greater speed than current, that client is not as strongly dependant on n-range.

    However if we do decide to start sieving a larger range, why start again from 0, after we decide to stop with 1m<n<20m. Why not start asap from where we are with 1m<n<100m if there is no appreciable speed decrease.

    Well one major problem is the dat file would be very large >30mb. However sieving out the first 150G ( 000000-000150 ) or so will reduce the dat file to less than half it's size. Something more managable for users.

    And how do we know if this client works with large n or not. We need a list of factors from the current client upto 100m over a decent range. Then check if the factors are correct and if either client misses anything, by sieving the same range with either client...

    If this will happen who know's... but we have those factors when someone needs them.

    But the point is let's not start yet but at some point, and let's be smart about our decision of when and how deep. Also let other know what we are doing and post our findings.


    <joke>
    If we had an insane fast sieve client 1000's of Yp/s we could probably complete all k/n pairs upto 100m within a few months, we would just have to sieve all p values to p=2^50m.

  4. #4
    Yes, I am aware of the "discussion/development" of the new sieve client. I truely would like to believe it will be as fast as discribed, however I am reserving judgement unitl I see it in action and the results can be verified. I have run several sieving clients (proth_sieve, NewPGen, ksieve..ect) on alot of different projects, some public, some not and have never seen anything close to the speeds reported, if true it will be one heck of an improvement!

    On a differnt note, will the current client even be able to test that high. I know at one time the GIMPS (Prime95) client was limited to something like 74-79M exponents.

  5. #5
    I believe taht the huge speed improvments are stated for 64 bit processors only although some improvements will be seen for all around. However, in a few years time 64 bits wil be the standard. Also with a signifigantly deeper sieve depth we will als obe eliminating a good portion of the tests that would have to be PRP'd well into the future. There is no reason not to save the efforts not if we wait and do the extra work later we'll be having to redo all of our sieve efforts. Why not just do them both at once and go twice as deep into the P ranges. In due time we will reach depths far beyond 20,000,000 wiht the PrP and even if 6 primes are found before 100,000,000 then 70% of the factors we find will still save at least one test.

  6. #6
    Moderator ceselb's Avatar
    Join Date
    Jun 2002
    Location
    Linkoping, Sweden
    Posts
    224
    not only 64 bits. numbers stated are for 32 bits. (benchmarked with a very rough alpha test suite, 8x faster on my P4-1.5)

  7. #7
    geesh gimme gimme gimme

  8. #8
    I love 67607
    Join Date
    Dec 2002
    Location
    Istanbul
    Posts
    752
    Looks promising....

    I'm excited to see the new client at work and try it with a couple of range alternatives.

    I guess it would be wise to decide on what to do with some data on hand, i.e. after seeing performance results at various n range alternatives.

    If that new client fails to come soon enough, I think it would be wise to wait until our next prime for a major change. Yes, we have estimates for when the next prime will come, but who knows really when...

    In case, for example, we have no new sieve client and no prime for the next two years, and sieve builds up to the point where it becomes pointless - or has very little value added - to go further (i.e. p=2^50 ??), we might even consider shifting relatively more powerful machines to P-1 for some time until the next prime finally pops up, n at PRP increases significantly, or a faster sieve client comes in handy. This is, of course, one of the worst case scenarios (I hope).

  9. #9
    Moderator Joe O's Avatar
    Join Date
    Jul 2002
    Location
    West Milford, NJ
    Posts
    643
    As Ceselb can tell you, maintaining the coordination thread is a lot of work. The fewer passes we do, the fewer threads, the less work for the coordinators. Though I think that we should wait before starting a high-n sieve effort, I have uploaded SoB.dat_20M_100M_S25G.zip to the kerobertsdatsdatfiles group under Yahoo. Thanks to Nuri for allowing this.

    I have the factors from 25G to 26G for 20M_100M, but these are not reflected in this file. VJS has the factors from 26G to 175G for 20M_80M and is working on the factors for 80M_100M. When he sends them to me, I will create a new sob.dat file with his factors removed as well.

    Happy Thanksgiving!
    Joe O

  10. #10
    Moderator ceselb's Avatar
    Join Date
    Jun 2002
    Location
    Linkoping, Sweden
    Posts
    224
    Originally posted by Joe O
    As Ceselb can tell you, maintaining the coordination thread is a lot of work.
    This is very true. I'm reluctant to getting involved in more coordination unless it's really needed.
    It's good that you're thinking about it, but until we reach 17-18M I don't think there's any rush. Tests at that size will take a really long time to finish.

    Chuck is also going to implement an automated sieve, so we should wait for that imho.

  11. #11
    Moderator vjs's Avatar
    Join Date
    Apr 2004
    Location
    ARS DC forum
    Posts
    1,331
    Joe,

    I sent you a mail regarding the factors, let me know if you can pick them up that way.

    Were you going to send me the 80m<n<100m dat so that I can close the gap from 26-175G quicker?

    I can also send you the out of range ones but I'd prefer to simply sieve that region with the 80 to 100m

  12. #12
    I love 67607
    Join Date
    Dec 2002
    Location
    Istanbul
    Posts
    752
    BTW, if we get a sieve client thats 8x times faster, it will also mean current sieve effort will be 8x times more productive, i.e. any machine will get 8x times more factors per day, which would make sieving current dat deeper more meaningful.

  13. #13
    I love 67607
    Join Date
    Dec 2002
    Location
    Istanbul
    Posts
    752
    I would be cool to see cumulative sieve factors eliminated per day to catch, or even pass, that of PRP tests completed per day again.

  14. #14
    Moderator vjs's Avatar
    Join Date
    Apr 2004
    Location
    ARS DC forum
    Posts
    1,331
    Lets see we average around 40 most days so 8X would be 500 tests. That would be really easy to pass prp.

    Also if we were to get a sieve client 8X faster I think you would see alot of people in prp switch to sieve. I know alot of people on my team are actually running some fast athlon processors in prp, :-( . Fast enought that they can still complete Wu's in a reasonable fastion, imagine if these people switched to sieve.

    I would think a 100m dat upto several 1000T within a year would be more than possible.

  15. #15
    Moderator vjs's Avatar
    Join Date
    Apr 2004
    Location
    ARS DC forum
    Posts
    1,331
    How about an update on that missed factor at 95G...

  16. #16
    Unholy Undead Death's Avatar
    Join Date
    Sep 2003
    Location
    Kyiv, Ukraine
    Posts
    907
    Blog Entries
    1

    so

    vjs, you want to have 2 parallel seiving projects or you want to completely switch?

    how long it takes to admins to patch script to accept 20+M n's?
    and database will require some changes...
    wbr, Me. Dead J. Dona \


  17. #17
    Nothing would need to be changed really, just configured. We've had multiple sieving efforts before (300k - 3M), (3M - 20M), (1M - 20M). We aren't anywhere close to needing to sieve deep in 20M+, but if people want to get a head start, that's fine. Just make sure to coordinate however you have to, but keep it small (read: high quality) until it is added to the main sieving pages (sob.com/sieve, Mike's stats).

  18. #18
    Moderator vjs's Avatar
    Join Date
    Apr 2004
    Location
    ARS DC forum
    Posts
    1,331
    I have been working on this for sometime giving it thought etc. I'm not convinced it's time to switch to a higher range but I wanted to gather the data so there is less guess work involved and in gathering the data perhaps make it easier for a higher range switch.

    But first let me point out under no uncertain cirumstances should be stop sieving from 1.15m (doublecheck) upto at least 20m. We are still getting alot of factors and it will be benifitical to continue the effort as it stands for quite some time.

    The only question remains is should we consider sieving beyond some point (T level or another prime or twp primes) with a larger range. At that point be it 600T or 2000T all ranges reserved above that point must be sieved with a 2m<n<40m dat file for example.

    I'll post findings later today.

  19. #19
    Moderator vjs's Avatar
    Join Date
    Apr 2004
    Location
    ARS DC forum
    Posts
    1,331
    High n-sieving and extending sieve ranges a discussion to ponder and understand...
    For the past week or so Joe_O and myself have been looking at the effects of high n-sieving evaluating clients and the efficiency of doing so. This project as a whole is multifaceted so what’s best for sieve might not be best for the project.

    It’s always a trade off between how fast do you want results now vs the efficiency of doing things slowly and correct the first time. Finding the exact perfect spot for your tent setting it up pointing true north on flat ground while you do the camp sight might take the better part of a day. But if you want to throw it up any old way it can be done quick in less than 5 min. Which one is better??? Depends if it’s raining I guess J

    Definitions:

    Sieve Range – This is the difference between the highest and lowest n, for all k/n’s tested in the Sob.dat file. 1m<n<20m [1-20] is a 19m range containing all k/n pairs that require testing with the main client.

    The Sob.dat file contains all of the k/n pairs in a particular sieve range which we have not found factors for yet. It’s larger with a larger range and will decrease in size when k/n pairs are eliminated through factoring.

    Sieve speed – The speed of the client reported in kp/s, ie 500 kp/s, means that 500,000 p’s are tested against the entire dat file in one second.

    p’s – These are prime numbers which are checked against the k/n pairs to see if they are factors for those k/n pairs. If it’s a factor that k/n cannot be prime, a prime has no factors but can be a factor of a larger number.

    1/Sieve speed – Inverse of the sieve speed, or the number of seconds required to sieve a number of p across a dat file. Example I will be using 1Gp/s or 1,000,000 kp/s if your speed is 200 kp/s or 0.0002 Gp/s it would take you 1/0.0002 = 5000 sec to sieve 1G over the dat file.

    1/Sieve speed/Range – The above divided by the size of the range in m. If you were able to sieve at 200kp/s over a 20m range vs a 40m range it would take you the same time to complete the range but you would do twice as much work.

    Example

    50 sec / Gp / Mrange = It take 50 second to sieve 1G over a 1m range such as 2m<n<3m.

    The lower this number the more efficient the client, if this value stays the same… if you do 1G over 1Mrange or 0.1G over a 10Mrange, the amount of sieving work done is the same but the time it take to cover a range of G will be different.

    O.K. Here is what we have found.
    All of the data collected by myself was completed on a 2500 Mhz Barton Athlon chip an nforce2 single channel with 256mb of memory on windows 2000 running proth 0.42. Most data from 1M up to 25G was done by Joe_O.

    Joe_O did most if not all of the dat files, removing k/n pairs with factors etc.

    We both have all fact.txt, factrange.txt, for the range of 20m<n<100m sieved from 1m or 0.001G to 175G.

    I have all of the factexcl.txt on my machine over 300mb.

    The 20m<n<100m dat file has been reduced in size from over 30mb to less than 17mb by removing all factors found with sieving up to 175G.

    Now for the data and what happened.

    The original dat files used were taken from Keroberts yahoo group and created by Nuri. That of 20m<n<80m was sieved by myself from 1Gto175G.
    Joe used the 20m<n<100m dat file and sieved it from 0.001G to 25G.
    Joe later created a 80m<n<100m which I sieved from 25G to 175G.

    Joe used various programs he can comment on, I used proth 0.42 the most recent one as of dec1, 2004 exclusively.

    Large portions of the ranges I sieved created out of range factors, factrange.txt. Coincidentally sieving the 20m<n<80m data file, factrange.txt already removed 10% of the factors which were found in the fact.txt for the 80m<n<100m. Fact.txt for 80-100 didn’t miss any of the factors the 20-80 factrange.txt found.

    Note In sieving the 20-80 dat a factor for a k/n pair where n=11m was found in factrange.txt. This factor should have been eliminated by the sieve effort already using the 1-20 dat file, the user for some reason missed it. Sieving that particular range found the factor with the current proth and dat.

    Joe removed a tremendous number of factors from 1m-25G, and more by myself from 25G to 170G. Ranges were double-checked for missed factors dividing up the data’s into different ranges re-sieving portions with updated dat and checking for new factors, no missed factors were found through this method.

    The means proth finds factors equally well or badly over a range no matter how it is divided up.

    Early sieving using the 20m<n<80m dat file was slow using proth ~100 kp/s however, speeds increased drastically by 90G (340 kps) and continued to increase to around 370 kp/s by 175G. Benchmarks were done at 170-171G with the updated dat files.

    Proth sieve will not sieve a range much larger than ~70M i.e 1m<n<70m. Removing k’s from the dat file doesn’t extend the range, function seems to be more range size dependant than range value or number of k.

    Data

    Range (M) R.Size (M) kp/s sec/Gp sec /Gp per range(M) sec /Gp per 40M sec /Gp per 20M
    1.1m-20 m 18.9 622 1608 85.1 3403 1701
    80m-100m 20 609 1642 82.1 3284 1642
    20m-50 m 30 567 1764 58.8 2352 1176
    20m-60 m 40 537 1862 46.6 1862 931
    60m-100m 40 523 1912 47.8 1912 956
    20m-80 m 60 368 2717 45.3 1812 906
    20m-100m*80 199 5025 62.8 2513 1256

    All range speeds are at 170G error is less than 2%
    *- Range completed with sobsieve

    From the above table one can see that as range size increases the sieve speed decreases

    Sieving the same range size at high n ie 1.1-20 (yes a 19m range I know) vs 80-100 or 20-60 and 60-100 has little effect on sieve speed <5% difference.

    sec /Gp per range(M) – is a measure of efficiency although the speed may decreasing for a larger range it is more efficient since more work is done.

    Sobsieve is less efficient than proth 0.42 even tough it can sieve larger ranges.

    Sieve efficiency increases drastically with n-range tailing out by a 40m i.e (20-60) or potentially a (1m<n<40m).

    The efficiency of a 60m dat isn’t much better than a 40m bat.

    Second table combining ranges and efficiency

    More data combining ranges.

    If one were to sieve from 20m<n<100m, it could be done in several ways with the above figures/values, using sobsieve, or splitting the data file into parts: 20m<n<60m and 60m<n<100m, or 20m<n<80m and 80m<n<100m, or 20m<n<40m, 40m<n<60m, 60m<n<80m and 80m<n<100m.


    Range (M) R.Size (M) kp/s sec/Gp sec /Gp per range(M) sec /Gp per 40M sec /Gp per 20M
    20m-60 m 40 537 1862 46.6 1862 931
    60m-100m 40 523 1912 47.8 1912 956

    Combined
    20m-60m-100m 80 N/A 3774 47.2 1887 944


    Range (M) R.Size (M) kp/s sec/Gp sec /Gp per range(M) sec /Gp per 40M sec /Gp per 20M
    20m-80 m 60 368 2717 45.3 1812 906
    80m-100m 20 609 1642 82.1 3284 1642
    Combined
    20m-80m-100m 80 4359 54.5 2180 1090

    As you can see from above from a sieve perspective, dividing into two equal ranges of 40G is most efficient, since it is using 2 40m date files, the use of a 20m and a 60m isn’t that much worse it only takes about 20% longer.

    However dividing into 4 equal 20m ranges produces



    Range (M) R.Size (M) kp/s sec/Gp sec /Gp per range(M) sec /Gp per 40M sec /Gp per 20M
    1.1m-20 m 622 1608 85.1 3403 1701
    20m-40 m est 1608 85.1 3403 1701
    60m-80 m est 1642 82.1 3284 1642
    80m-100m 609 1642 82.1 3284 1642

    Combined
    1.1-20-40-60-100 6500 81.3 3254 1627

    This would take 1.8 times as long or a time increase of 78% over two 40m ranges.

    What this means is if we needed to sieve the entire range from 20m<n<100m for all k in the shortest amount of time up to some specified T two 40G ranges would work best. If we needed to sieve from 1m<n<100m two equally divided ranges 50G would be most efficient.

    The best scenario would be a client able to sieve the entire range in a more efficient matter. If proth were capable using one 100m range that may be best but the increased efficency sec /Gp per range(M) may only work out to around ??43??.

    Before anyone draws conclusions on sieve ranges etc, one must also realize that as k’s are eliminated the sieve speed will increase, decreasing the efficiency of sieving higher n that will never be needed.

    However it was predicted that only 3 more primes will be found by 20m increasing the current sieve speed by ~ 3/11ths from current

    7th 6.5M-10.1M
    8th 10.1M-16.8M
    9th 16.8M-29.5M

    and only two between 20m-100m

    10th 29.5M-57M
    11th 57M-127M

    If we keep on eliminating K's with primes sieve speed will continute to increase but not double by 100m

    Question comments further explaining, corrections etc. (other than grammatical of course :-) it’s late...
    Last edited by vjs; 12-02-2004 at 03:23 PM.

  20. #20
    Sieve it, baby!
    Join Date
    Nov 2002
    Location
    Potsdam, Germany
    Posts
    959
    Originally posted by vjs
    Sieve speed – The speed of the client reported in kp/s, ie 500 kp/s, means that 500,000 p’s are tested against the entire dat file in one second.

    p’s – These are prime numbers which are checked against the k/n pairs to see if they are factors for those k/n pairs. If it’s a factor that k/n cannot be prime, a prime has no factors but can be a factor of a larger number.
    Just a little correction:

    p's in the speed-sense are just plain numbers (which hopefully are factors). The high performance we experience now is partly due to the fact that non-prime numbers can be left out (omitting all even numbers double the speed and so on).

    So although only primes are tested for trial factoring, the speed is metered just in numbers...

  21. #21
    Moderator vjs's Avatar
    Join Date
    Apr 2004
    Location
    ARS DC forum
    Posts
    1,331
    Mystwalker,

    It was/is my understanding that the reported speed took this into account, or number of tested p in a 1G of sieve range doesn't change much per sieve range anyways.

    Regardless all data was collected using the same sieve range " T ", so the number of p in a range wouldn't change the above data points or analysis just the number of factors found in a range of p's. I could repeat a small ranges around 600T if someone is really dismissing the above due to T level. But thanks for pointing out the error.


    ^
    |
    |

    "I'm never wrong, but there was one time I though I made a mistake, but I was mistaken"

    If anyone goes through and understands this analysis, do you agree with the following.

    Proth %eff is maximized using ~40m ranges... and does the following make sence for the analysis.

    The decrease in time "sec /Gp per range(M)" as you approach a 40m range size starts to be offset by the size of the range "sec/Gp", so maximum efficency looks to be in the range of a 40m range size???

    Note this only concerns sieving I don't want to bring in reducing the number of k yet.

  22. #22
    It would be interesting to see how that other sieve client reacts to the data. Since you have so much information about the factors from .001 G to 175 G maybe testing it on the new client I heard mention of would be able to give some evidence towards its ability to find factors correctly. With that client, doing an 80M dat might not be a problem. Still though, unless we see a huge jump in PRP ( ), it will be a while before this work will be (1) needed, and (2) effective. With an estimated 2-3 primes before we get to that horizon, that's 2/11 or 3/11 percent of your work invalidated. I don't think that thinking about 20m to 100m is a bad thing. Instead, it is very good to plan ahead. It's just that right now, if you want to put 27.3% of your cycles into the project and have it possibly be wasted... (Not trying to be mean, but you understand what I'm saying.)

  23. #23
    2 or 3 11ths would not be wasted onely the factors found for larger N values than where the prime is found. Also, a prime doesn't sped up the sieve by one 11th it will only end up being (I believe around 7% so in reality the effect of finding primes on the sieve is only measured in the amount of factors found.

  24. #24
    vjs,

    I would agree with your statement that
    If anyone goes through and understands this analysis, do you agree with the following.

    Proth %eff is maximized using ~40m ranges... and does the following make sence for the analysis.
    but more then that, proth efficiency is maximized by using the minimum number of equal sized dat files over a given range.

    so for 20m-100m a range of 80m.
    a single 80m range is most efficent
    BUT since the client doesn not handle 80m ranges (or handle well)
    2 * 40m is teh next most efficient

    likewise

    20-200m -> 180m range
    again a single 180m range would be best
    but failing that, and assuming the current clients
    3* 60M is left as an efficent option.

    which i guess raises the question (no begging here)
    to which i may have just missed the answer.. WHY 100m?

    if we can(using prothseive) use 60m ranges.. why not 20-80m and 80-140m ?

  25. #25
    the idea is taht the range needs to be expanded exactly how much is uncertain. However, a range from1-80m will take about 20% less time to sieve than a 1-160m range. but to sieve both a 1-80 million range and a 80-160 range would take about twice as long as the 1-80m range and 66% longer than the 1-160m range.

  26. #26
    Moderator ceselb's Avatar
    Join Date
    Jun 2002
    Location
    Linkoping, Sweden
    Posts
    224
    Some minor errors in reasoning, but in the scope of these calculations it doesn't matter.

    I'd like to add one thing however:
    • Available memory will play in more as n range grows.
    • It will take atleast 2-3 years to reach 20M (possibly more).
    • Most new computers will have atleast 1Gb of ram at higher bus speeds than now.

    Based on that I'd say that splitting isn't a good idea, keeping 20-100M is better imo.

  27. #27
    Unholy Undead Death's Avatar
    Join Date
    Sep 2003
    Location
    Kyiv, Ukraine
    Posts
    907
    Blog Entries
    1

    tell me

    well, as I understand the n's in sieving are the same as n in regular sob test...

    so why we sieve 20M not 10M? main effort will catch up n=10M in a year or two. maybe we should try 5-10M range? vjs, can you make some research toward this range?
    wbr, Me. Dead J. Dona \


  28. #28

    Re: tell me

    Originally posted by Death
    well, as I understand the n's in sieving are the same as n in regular sob test...

    so why we sieve 20M not 10M? main effort will catch up n=10M in a year or two. maybe we should try 5-10M range? vjs, can you make some research toward this range?
    Because we want to sieve as much as possible before PRP start at all. The large size of the future tests combined with the fast return of new factors means that we should sieve all ranges where we will do PRP as much as possible before starting the PRP-testing. The low n ranges had a lot more PRP-tests than the ranges we do now because of the increased number of factors through sieving (and factoring).

  29. #29
    Sieve it, baby!
    Join Date
    Nov 2002
    Location
    Potsdam, Germany
    Posts
    959

    Re: tell me

    Originally posted by Death
    so why we sieve 20M not 10M? main effort will catch up n=10M in a year or two. maybe we should try 5-10M range? vjs, can you make some research toward this range?
    It's more efficient to do e.g. 5M-20M than 5M-10M, 10M-15M and 15M-20M.
    5M-20M is not that much slower than 5M-10M - as you can see below, doubling the range slows down sieving speed by < 20%, whereas it would be 50% when you do those ranges one after another.

    So in the short run, it would be better to just concentrate on the current PRP range. But as this window moves along, we'll lose effort in the long run...

  30. #30
    Moderator vjs's Avatar
    Join Date
    Apr 2004
    Location
    ARS DC forum
    Posts
    1,331
    A lot of really good questions and responses here and I’ll try to answer a lot of them in a few posts,

    Post from Death,

    can you tell something about 5-10M range?

    Yes, Death I did a few more short tests to find actual numbers for you…

    Examples

    Range (M) R.Size (M) kp/s sec/Gp sec /Gp per range(M) sec /Gp per 40M sec /Gp per 20M
    7.56m-7.58m 0.02 866 1155 57736.7 2309469 1154734
    0.3m-3m 2.7 700 1429 529.1 21164 10582
    1.1m-20m 18.9 622 1608 85.1 3403 1701
    20m-60m 40 537 1862 46.6 1862 931

    These are just for the dat’s we have available since they do take time to create etc.

    I’m going to use the current dat as a reference. There are two other dat’s that we have tested fully, the low n sieve range from 300k-3m, it’s a 2.7M range and sieves at 700 as compared to ~622 on my machine, the other a 7.56-7.58 dat, a 0.02M range which sieves at 886 kp/s.

    As you can see they are fast but it is the speed of the client over a particular dat size that’s important. If you start looking at the “sec /Gp per range(M)”, the numbers are terrible. Sure you cruise through a 100G range looking for factors in the active window 7.56m-7.58m in 70% of the time. But we wouldn’t be able to sieve very deep before the factors fall into double check. The effect is less so with a 2.7M dat saving only 13% of your time.

    The point above is that you would not be able to sieve very deep with a small range before the factors are only usefull for double checks this is true for very small dat’s.

    Let’s look at this another way, say we sieve from 1m-40m, those 20m-40m dat’s won’t be used for a long long long time but if we did them now, sure it would decrease our speed but how much… 14%. So if we know for certain that we will sieve again from 20m<n<40m with the exact same client, by sieving the two ranges in conjunction as a 1m<n<40m we save ~86% of our time.

    On an aside and I believe, Keroberts1 commented on this. On the 20m<n<40m effort, some of those factors produced won’t be used. Which ones, the ones we find primes for, and only those k/n’s above the prime and below 40m. Also it’s predicted that 3/11th’s of them will fall by 20m, but that’s a guess, might be 4 could be 2, off chance and highly unlikely none. My estimates were very close to Keroberts1, I though 8% ... we would have an effective 8% loss of efficiency due to primes being found with a 1m<n<40m from here on.

    ShoeLace,

    I see you understand my post fully. Yes the largest range that sieves by proth is the most efficient. So sieving a 60M range would be the best.

    However our client slow down or the time it takes to finish ranges is unacceptable IMHO for a 60M. The increased efficency of the client is out weighted by the time it takes to finish a range and produce factors that we will use within ~2-5years not more. After all the goal of sieve is to eliminate prp testing in the long run to make that effort more efficient.

    If we were to start now for above 20m sieving only, then yes 60m or the largest possible. But if we were to simply extend the current dat. Then I'd say only extend it to around a 40m range maybe slightly more perhaps less, but 30m is too small.

    Basically what it boils down to is how deep we want to be in sieve T, how far ahead we expect our next couple primes, and how fast our current client is. Of course you know this and I see what your getting at… So if we have to sieve in two ranges, we will of course sieve the smaller one first as far as we need to then start the second as prp first pass approaches.

    And to answer the question why 100m… because that’s what we had available with dat’s. If I had dat’s to 1000m I would have tried to get an end point in the graph but I think everyone can agree that sieving to 1000m would be insane. (Hold on let me check with myself and I, yes all three of us are in agreement :-)

    Also now is definitely not the time for 20-80m, but extending from 1m-60m would be more inline and reasonable. But I do think 60m is too far, and also only continuing from our current T.

    Ceselb,

    I’m not sure regarding the errors in reasoning, could you point them out…

    As for the clients consumed memory, the current dat file consumes ~25Mb, the 20-80 dat consumes ~43Mb, so it’s an additional 22Mb.

    I think most machines running proth right now wouldn’t suffer from an additional 22mb. Also I’d say 80% if not more of the “effort” currently dedicated is from machines with at least 256mb already and that number will increase as you said. I also ran all of the tests on a 256mb system just for that reason.

    I agree I also think 2 years might be a little optimistic, 3 years is more plausible.
    So that basically means that we have about 1 year maybe more before we need to start the 20m+ sieving, if we want to get to any appreciable depth.

    Mystwalker,

    I agree with your last post except for the 5M-20m comment which I’m sure you didn’t intend too suggest. The lowest n we should sieve for should always be the double check prp. Or 20m from 0T to where we leave the 1-20m dat.
    Last edited by vjs; 12-03-2004 at 12:54 PM.

  31. #31
    Sieve it, baby!
    Join Date
    Nov 2002
    Location
    Potsdam, Germany
    Posts
    959
    Originally posted by vjs
    Mystwalker,

    I agree with your last post except for the 5M-20m comment which I’m sure you didn’t intend too suggest. The lowest n we should sieve for should always be the double check prp. Or 20m from 0T-where we leave the 1-20m dat.
    Of course, we should fully include PRP double checking. The 5M lower bound was "e.g." and originated from the "5M-10M" in Death's posting.

  32. #32
    Moderator vjs's Avatar
    Join Date
    Apr 2004
    Location
    ARS DC forum
    Posts
    1,331
    Yes Mystwalker, and thanks, its always good to hear confimation and additional explainations from others.

    Should I post my excel files and graphs for:

    sec /Gp per range(M) calculations etc.

  33. #33
    Moderator vjs's Avatar
    Join Date
    Apr 2004
    Location
    ARS DC forum
    Posts
    1,331
    Here is a graph, more data is need in the 30-50M range and a largest dat file "end point".

    I cut the d off the end of the "speed" label.

    Everything is normalized,

    Speed - The actual sieve client speed, higher the better of course

    Speed per 1M range - The total speed of the client divided by the range size, the higher the better

    Optimal Range Size- This take some explaining, it correlates the actual speed of the client vs the Range speed of the client.

    The Optimal Range is Basically a plot of the effective speed of the client but the lower the number the better.

    Also everything crosses over at ~19M b/c that's what everything is nomalized to.
    Attached Images Attached Images
    Last edited by vjs; 12-03-2004 at 04:09 PM.

  34. #34
    Moderator vjs's Avatar
    Join Date
    Apr 2004
    Location
    ARS DC forum
    Posts
    1,331
    O.K. my last post on this topic for sometime unless someone wishes to discuss this issue.

    Here is the final graph below, the y axis represents the % change relative to the current dat size. The x axis represent the dat size, (i.e. a 1.1M<n<20M dat our current dat has a 18.9M size)

    As you can see everything crosses at a 18.9m dat size with 100% relative efficiency, this is simply b/c I normalized everything to the current dat.

    A couple interesting conclusions can be drawn:

    First in our testing of proth, the size of the n-values within the dat do not seem to effect client speed.

    - a 1M<n<21M, 20M<n<40M, 40M<n<60M, 60M<n<80M, and 80M<n<100M, all run at about the same speed.

    - Decreasing the data file size does decrease the amount of time required to finish a range, however our current dat is quite small to begin with.

    Our dat file would have to reduce from the current ~18.9m size to approximately a 5m dat to see a 10% increase in sieve speed. (i.e a 8m<n<13m would only be 10% faster).

    - Increasing the dat file size does increase the time required to complete a range.

    This is the blue trace on the graph, a 1m<n<41m dat would require roughly 20% more time to complete the same range compared to the current client.

    A 1m<n<50m dat roughly 25% more time.
    A 1m<n<55m dat roughly 35% more time.

    However due to some effect proth has with very larger ranges.

    A 1m<n<63m dat would take twice as long to complete, 100% more time.

    This can be seen as the sharp increase in sieve time of the blue plot.

    As for the pink plot this represent the time required to sieve a range divided by the size of the dat file.

    Since sieve time (or client speed) is not increasing or decreasing dramatically with ranges less than 50m. Sieving a 20M may take 100 minutes but sieving a 40M range would take 120minutes. So in effect the amount to time spent sieving the first half of the range was only 60minutes and 60 minutes on the second half. Of course you have to do both halves of the dat at the same time, but in essonce you sieve what you would have sieve in 100minutes in 60 minutes, but you have to spend another 60 sieving the other half.

    This can be seen in the decrease in the pink line with dat file size however some effect takes over at around 55m increaseing the time again.

    What this leads to is a maximum efficiency of the client.

    The yellow plot represent this efficency,

    at approximately, a 53m dat file size the client efficency has peaked at over 200% more effective than the current dat.

    So what this means is from purely a sieving stance we could sieve a dat file from 1m<n<55m (over 2.5x larger) in only about 30% more time and be over 200% more efficent with our sieve effort.


    The only drawback in doing so is we would decrease our current factor production for the 1m<n<20m range by 30%. However we would get all of those between 20m<n<55m and wouldn't have to resieve this range.

    Of course there are other factors at work, primes, prp time, etc.

    What I propose is at some point if we are still using proth, extend the current effort from 1M<n<20M, to 1M<n<50m and only continue from the current T level at that time.

    Of course if we decide to wait for a prime the 30% decrease in sieve speed would be off set by a ~10% increase from the eliminated k.

    It's also important to note that we should find two more primes before we reach 20m but probably not 3.
    Last edited by vjs; 12-07-2004 at 03:18 PM.

  35. #35
    Moderator vjs's Avatar
    Join Date
    Apr 2004
    Location
    ARS DC forum
    Posts
    1,331
    The graph for above
    Attached Images Attached Images

  36. #36
    Moderator vjs's Avatar
    Join Date
    Apr 2004
    Location
    ARS DC forum
    Posts
    1,331
    O.K. one more post,

    The Y-axis is total factors found removing k/n pairs from testing
    vs
    sieve depth in T.

    basically what this shows is

    Sieving from 0T- ~4T removed 250,000 factors
    Sieving from 0T- 10T removed 265,000 factors
    Sieving from 0T- 99T removed 312,000 factors

    So sieving from 10T to 100T removed an additional 17% or 47,000 tests...

    The plot basically shows the fall off in useful factors with T.

    Regardless, sieving removes morek/n pairs than prp in the same amount of time until at least 5000T, we are currently at about 580T.
    Attached Images Attached Images

  37. #37
    Moderator ceselb's Avatar
    Join Date
    Jun 2002
    Location
    Linkoping, Sweden
    Posts
    224
    Originally posted by vjs
    Regardless, sieving removes morek/n pairs than prp in the same amount of time until at least 5000T, we are currently at about 580T.
    If we can go as fast above 1125T (2^50).

  38. #38
    Moderator vjs's Avatar
    Join Date
    Apr 2004
    Location
    ARS DC forum
    Posts
    1,331
    Update,

    Just to update everyone, we have almost doublecheck all sieve ranges less than 25T with a 991<n<50M dat. This dat is slower than the 1.5M<n<20M dat but not by much considering it's size and covers all of the k/n pairs contained the the 1.5M<n<20M dat as well. There are a few holes left unsieved <25T but not many and we will finish these shortly.

    Please consider all sieve ranges <25000G doublechecked and nolonger in need of retesting.

    0-25000 [completed] (a double check sieve for the current dat)



    An update on our progress...

    In doing this sieve 0-25000G we have been collecting factrange for n values over 50M and applying those to a 50M-100M dat. (May as well keep them) By sieving up to a little less than 25T we have managed to eliminate almost half of the k/n pairs we orginally started with it's quite suprising. We have also found quite a few missed factors in the 1.5M<n<20M range. Is this a good use of our computing power? Questionable, but we feel the effort is valid for the missed factors and our curiosity.

    I'd ask that nobody at this time double check any sieve ranges without posting first. We may have already sieved the region you have in question. We would also like to keep track of which ranges that have been doublechecked.

    Thanks for your time.
    Last edited by vjs; 02-13-2005 at 05:15 PM.

  39. #39
    Moderator vjs's Avatar
    Join Date
    Apr 2004
    Location
    ARS DC forum
    Posts
    1,331
    Update on the high-n sieving,

    For the past few weeks Joe_O, myself, a very few others have been doing some high n sieving. We have gone through various dats etc as you can see from previous discussions and are currently sieving a dat with a size of ~50M. (This is near optimal from a pure sieve stance, optimal is ~53M).

    The dat we are using is 991<n<50M ~2.7 times larger than the 1.5M<n<20M dat currently in use by the maineffort. This 2.7X size increase reduces the client speed by roughtly 15% when compared to the 18.5M dat in current use.

    Since our dat overlaps the current effort we have found several factors missed by the previous and current efforts.

    What we have sieved so far is all p=~64k to p=7T with a 991<n<50M dat several holes exist but basically all p<25T are either inprogress or complete.

    In addition to this low-p sieving we have been collecting large factors >20M and small factors <1.5M from the factrange.txt files, submitted from those people who chose to submit factrange.txt factors to factrange@yahoo.com. We have taken these factors as well as factrange from our efforts and applied them to Joe's database of all unfactored k/n pairs in the range of 0<n<100M. Currently the smallest unfactored n is n=991.

    In order to reduce the number of total factors in this database we have also sieved 50M<n<100M to approximately 3T b/c the factor density is so high at low p.

    Our results...

    We have managed to reduce the 991<n<50M dat size from >20mb to approximately 8.1Mb currently, also the memory requirement for running proth sieve have been reduced from a high of ~50Mb to ~32.5Mb currently while speeding up the client.

    Table of results...

    See above post for descriptions of table

    Code:
    (n>)	(n<)	Start	Now	T.Fact	10K	2.5T	3T	3T+	5T+
    0	1	28187	27609	578	39	17	139	251	132
    1	3	53908	53158	750	23	22	76	378	251
    3	8	131984	131369	615	0	0	284	143	188
    8	10	53115	52847	268	0	0	124	61	83
    10	20	265330	264119	1211	240	335	0	284	352
    20	30	648872	300372	348500	331271	6492	0	7520	3217
    30	40	648663	301172	347491	330829	6236	0	7610	2816
    40	50	649463	302275	347188	330923	6099	0	7371	2795
    50	60	649117	312789	336328	318159	11629	0	5938	602
    60	70	648603	315006	333597	315355	12319	0	5696	227
    70	80	648590	315497	333093	310861	16388	0	5712	132
    80	90	648497	314856	333641	310689	17239	0	5639	74
    90	100	648923	315669	333254	310061	17379	0	5792	22
    -	Sum	5723252	3006738	2716514	2558450	94155	623	52k	~11k
    
    
    (n>)	(n<)	Start	Now	T.Fact	10K	2.5T	3T	3T+	5T+
    
    0	1	28187	27609	578	39	17	139	251	132
    dat	%	100	97.95	2.05	0.14	0.06	0.49	0.89	0.47
    
    1	20	504337	501493	2844	263	357	484	866	874
    dat	%	100	99.44	0.56	0.05	0.07	0.10	0.17	0.17
    
    0	50	2479522	1432921	1046601	993325	19201	623	23618	9834
    dat	%	100	57.79	42.21	40.06	0.77	0.03	0.95	0.40
    
    20	50	1946998	903819	1043179	993023	18827	0	22501	8828
    dat	%	100	46.42	53.58	51.00	0.97	0.00	1.16	0.45
    
    50	100	3243730	1573817	1669913	1565125	74954	0	28777	1057
    dat	%	100	48.52	51.48	48.25	2.31	0.00	0.89	0.03
    
    0	100	5723252	3006738	2716514	2558450	94155	623	52395	10891
    dat	%	100	52.54	47.46	44.70	1.65	0.01	0.92	0.19
    I can't seem to get the formatting correct on the board

    As you can see the total number of unfactored k/n pairs has reduced by a little more than 47% 991<n<100M.

    We have reduced the number of k/n pairs between 20M<n<50M by 53.5%, more than half.

    Perhaps Joe can comment regarding factors missed by the maineffort which would require testing. I know we found >100, but alot of them were between firstpass and second pass <3M, however more than an handfull have been above firstpass.

    If anyone has questions or comments?


    P.S. I'd like to personally thank all of you who have submitted your factrange.txt files.
    Last edited by vjs; 02-18-2005 at 02:12 PM.

  40. #40
    Unholy Undead Death's Avatar
    Join Date
    Sep 2003
    Location
    Kyiv, Ukraine
    Posts
    907
    Blog Entries
    1

    how do you submit them?

    what are you do with this missed factors? submit in a http://seventeenorbust.com/sieve?

    how they scores?


    you are welcome.
    wbr, Me. Dead J. Dona \


Page 1 of 5 12345 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •