Results 1 to 21 of 21

Thread: Bounds

  1. #1
    Senior Member
    Join Date
    Jun 2005
    Location
    London, UK
    Posts
    271

    Bounds

    OK, I've been looking at the results.txt file and seeing what we can work out based on previous factors. It's a 3 or 4 week old results.txt file but it should show the general trend.

    I've factored every p-1 in the results.txt factor file and computed B1 and B2 for these numbers. A quick summary is here:-

    size=44 count=14199 avgB1=39831 avgB2=482409760958
    size=45 count=55314 avgB1=53624 avgB2=692779684881
    size=46 count=44620 avgB1=68881 avgB2=1391798543402
    size=47 count=37615 avgB1=99281 avgB2=2525446194684
    size=48 count=98548 avgB1=131137 avgB2=4993330745133
    size=49 count=71495 avgB1=154455 avgB2=7787886979296
    size=50+ count=833 avgB1=1731790 avgB2=1.05555454971016e+15
    although it ignores any factors over 2^64

    size is in bits
    count is the number of factors of that size
    avgB1 is the average B1 value
    avgB2 is the average B2 value

    The data for size=50+ will be skewed as many of those factors were found with p-1 factoring programs themselves!

    The B2 value as presented in the output above is a bit useless so does anyone have any suggestions of graphs or other ways to show the data?

    I was thinking along the lines of a table for each factor bit size (44, 45, 46, 47, 48, 49). Each row would represent a B1 value (1000, 2000, 5000, 10000, 20000, 40000, 50000, etc). Each column would represent a B2 value and the cell would show the percentage of factors would have been found with this combination. (Working on this at the moment).

    I might also limit it on ranges of n too (i.e. 9M < n < 10M).

    It's also available for download at (6.5MB):- http://www.greenbank.org/sob/factb1b2.txt.bz2

    The format of the file is:-

    <p-1>: <factors> : B1=x B2=y

    I wouldn't mind getting my hands on the lowresults.txt file too, to have the data for < 44 bits. Infact, I'd like to get my hands on every factor found (i.e. factrange and factexcl included however I know these are not complete). Why?

    Consider a k,n pair that we have found a low factor for (say 43 bits) however its p-1 is nowhere near smooth (B1=80000, B2=4000000000). Later on we find a much larger factor whose p-1 is smooth (say B1=1000 B2=20000), this will only be recorded in the factexcl files and not in results.txt.

  2. #2
    Senior Member
    Join Date
    Jun 2005
    Location
    London, UK
    Posts
    271
    Tables for hit rates of B1/B2 bounds at: http://www.greenbank.org/sob/p-1.html

    Caveats:-

    1) The percentages displayed are not the chances of finding a factor with a particular B1/B2 combination.

    Consider the 44-bit table (which we have sieved in its entirety for the n < 20M range). There are many duplicate factors that may be smoother than the first factor found (the only one listed in results.txt). Also there are many k,n pairs in this range for which we have no factor.

    2) The 50bit+ table data is skewed by the fact that we haven't really sieved up there so most of the factors have come from P-1 factoring so they will naturally fit low B1/B2 bounds.

    3) The percentages are cumulative.

    I'm going to work on automating this using a database and a bunch of scripts.
    Last edited by Greenbank; 08-24-2005 at 09:12 AM.

  3. #3
    Moderator vjs's Avatar
    Join Date
    Apr 2004
    Location
    ARS DC forum
    Posts
    1,331
    Wow this is cool, these are obviously factors found through sieve as well not only P-1 and ECM. But it's important.

    I think there are two things to think about when it comes to B1:B2.

    First is the B1:B2 ratio, when considering using prime95 to determine these factors not sieve... Prime95 spends equal time in stage1 and stage2 when you chose a 1:14 ratio.

    Second is how many factors fall below a particular B1 value. Double that value how many etc.

    It might be useful to only consider factors 9M<n<20M and P>2^47 or 48, see if p makes a difference.

  4. #4
    Senior Member
    Join Date
    Jun 2005
    Location
    London, UK
    Posts
    271
    Originally posted by vjs
    Wow this is cool, these are obviously factors found through sieve as well not only P-1 and ECM. But it's important.

    I think there are two things to think about when it comes to B1:B2.

    First is the B1:B2 ratio, when considering using prime95 to determine these factors not sieve... Prime95 spends equal time in stage1 and stage2 when you chose a 1:14 ratio.

    Second is how many factors fall below a particular B1 value. Double that value how many etc.

    It might be useful to only consider factors 9M<n<20M and P>2^47 or 48, see if p makes a difference.
    OK, I'll have a look at implementing some of this tomorrow.

    I've reimplemented everything with scripts and a mysql DB (no more working with files thank God). I was just working with p (and p-1) to get B1 and B2. The DB includes columns for k and n which means I can limit the searches to ranges of n or p much more easily.

    Also, during my validation I check the following things from the results.txt file:-
    = p actually divides k*2^n+1 (they all do)
    = my computed B1 divides p-1 (they all do)
    = my computed B2 divides p-1 (they all do)

    and finally, p is prime!

    It's this last test that I noticed this line in results.txt (20% checked so far):-

    p=32258662112491 k=24737 n=3007

    32258662112491 is 1831 x 61231 x 287731

    How was this missed in the initial sieve?

  5. #5
    Moderator vjs's Avatar
    Join Date
    Apr 2004
    Location
    ARS DC forum
    Posts
    1,331
    Missed in the initial sieve yes, there were quite a few missed during intial sieving for one reason or another.

    Client, user, not-submitted etc... This was the driving force behind the 991<n<50m effort. I think you will find that we missed quite a few hundred but not thousands.

    Second you probably don't have all the data for p<25T, after some time after p>25T was sieved an agreement was meet to cut the results file at 25T.

    What's important is if you find any factors <50T (currently) that factor some k/n pair in the dat, either mikes updated 2.7m<n<20M or our 991<n<50m. You won't find any if you do there are serious problems.

    I've been looking at your data a little closer not sure what to make of it yet. What is your goal, just info or is it geared towards a particular future aspect of the project. I can see it's use in P-1 factoring but not nessarily sieve. It would shed some light on factoring bounds what to expect etc if you made the data specific to 900T<p<2^50 and (8M<n<15M ???). But for sieve it's always most efficient to start with the smallest un-sieved p.

  6. #6
    I love 67607
    Join Date
    Dec 2002
    Location
    Istanbul
    Posts
    752
    I guess it's to give some idea on optimal B1 B2 bounds.

    If so, I'd like to hear about the data for p>2^50 and 8m<n<10m. Almost all of the factors here would definitely be P-1 factors, and it would give an idea of where we're standing recently.

  7. #7
    Moderator Joe O's Avatar
    Join Date
    Jul 2002
    Location
    West Milford, NJ
    Posts
    643
    Originally posted by Greenbank

    It's this last test that I noticed this line in results.txt (20% checked so far):-

    p=32258662112491 k=24737 n=3007

    32258662112491 is 1831 x 61231 x 287731

    How was this missed in the initial sieve?
    Actually it wasn't missed!

    32258662112491 24737 3007 2437 61 e

    The e says that this was an "excluded" result. ie it did not exist in the orginal sob.dat that was sieved to 1G

    ProthSieve (and some versions of SoBSieve) produce extraneous factors. ProthSieve writes these to factexcl.dat This could also have been produced by a factor program (P-1, ECM or other) which produces factors, not necessarily prime factors.

    We probably should not be keeping these excluded factors. They only clutter up the databases. Yes, I know, I was one of the people that urged we keep them when we started the sieve effort. They do have a purpose, but a limited one. Note, I am not saying that we don't need the factexcl.txt files. There are factors in them that are important to keep. The problem is that all sort of factors go into factexcl. The same is true of factrange. Not all of the factrange factors are valuable. Some of the factrange factors are also excluded factors.

    We need to keep:
    1) The lowest factor found for a k n pair
    2) the first factor found for a k n pair

    But to keep the database small, we need to limit these factors to those above some minimum value, 1M at least or 1G the way we have been doing for the most part.
    And yes, I do have a Database of these factors. It is in Berkely DB form but I can produce it in a flat file. Most of these factors are also in this file produced daily by MikeH. This file also includes duplicate and excluded factors and non prime factors. Almost everything that has been submitted is in there. A "d" at the end, possibly followed by a number indicates a duplicate. An "e" at the end indicates an excluded factor, that in my opinion should be ignored.

    What VJS said about missed factors is very true, but does not directly answer Greenbank's question. It took a few readings of Greenbanks post for me to realize that he was talking about a nonprime factor rather one that had just been found and should have been found long ago. We are still finding those as VJS said. There are still more to find. I have a few ranges in mind and hope to have someone sieve them soon. Any volunteers?
    Joe O

  8. #8
    Senior Member
    Join Date
    Jun 2005
    Location
    London, UK
    Posts
    271
    Thanks Joe, I'll have a look at that zip file from Mike.

    As for those factors, it was the point that if

    32258662112491 | 24737*2^3007+1

    And, 32258662112491 = 1831 x 61231 x 287731

    I'm not talking about the factors of p-1 here, but p itself!

    Then:-

    1831 | 24737*2^3007+1
    61231 | 24737*2^3007+1
    287731 | 24737*2^3007+1

    It's more the point that 24737,3007 could have been filtered out by sieving with 1831.

    Also, this one is quite boggling:-

    79897555161219 | 22699*2^8600263+1

    surely it would have been easier to rule out this k,n pair by trial division with a smaller number, like, err, 3.

    As in:-

    3 | 22699*2^8600263+1

    How about the two factors in results.txt that end in a 5! etc...
    Last edited by Greenbank; 08-25-2005 at 06:42 AM.

  9. #9
    Senior Member
    Join Date
    Jun 2005
    Location
    London, UK
    Posts
    271
    We need to keep:
    1) The lowest factor found for a k n pair
    2) the first factor found for a k n pair
    And I would like to keep:
    3) The factors with the lowest B1/B2 for a k n pair.

    By looking at as many factors as possible I want to get an idea of the B1:B2 values required to find factors by p-1.

    We've sieved everything up to 48 bits for some n ranges so we can use that info to work out what would be best for p-1 factoring.
    Last edited by Greenbank; 08-25-2005 at 06:41 AM.

  10. #10
    Moderator Joe O's Avatar
    Join Date
    Jul 2002
    Location
    West Milford, NJ
    Posts
    643
    Greenbank,
    It's more the point that 24737,3007 could have been filtered out by sieving with 1831
    Yes, and it was. That was the point that I was making by posting:
    32258662112491 24737 3007 2437 61 e

    The e says that this was an "excluded" result. ie it did not exist in the orginal sob.dat that was sieved to 1G
    79897555161219 22699 8600263 1577 0 e

    was also eliminated before the first dat was ever made available for download.

    And the factors ending in 5 etc.

    Please reread my post carefully. All factors are accepted by Louie and therefore MikeH. There is no filter! As long as it is a factor for the k n pair it is accepted. Note, a factor, not a prime factor. It is accepted even if a factor is already known for that k n pair, even if we already have four factors for that k n pair! MikeH categorizes these factors by placing a d or e as the last item on the line.

    So just because a factor is found in results.txt does not mean it was the factor that ruled out that k n pair. It just means that it was submitted and accepted.
    Joe O

  11. #11
    Senior Member
    Join Date
    Jun 2005
    Location
    London, UK
    Posts
    271
    OK, sorry! I get it now. ;-)

    My interest with any and all *prime* factors is that the first found, or the lowest factor may not necessarily have the smoothest p-1.

    Anyway, all factors imported and p-1 factored (except the 347-bit monster). Will see how the tables look when at various ranges.

  12. #12
    Moderator vjs's Avatar
    Join Date
    Apr 2004
    Location
    ARS DC forum
    Posts
    1,331
    Greenbank,

    It may also be interesting to compare not only B1:B2 ratios but...

    B0:B1 ratios.

    Example

    32258662112491 = 1831 x 61231 x 287731

    B1= 61231
    B2= 287731

    But the next smallest is

    B0= 1831

    It's my guess that the ratio and distribution between B1:B2 is very similar to B0:B1, if it's not that's also interesting.

    Also I would limit myself to values above 2^48 simply b/c when you start thinking about bounds. Only for the simple fact that it doesn't take a very large B1 or B2 for p-1 before you exceed even 2^50.

    Using 2^20 etc you are probably biasing the values to lower B1 B2, I don't think the ratio will change however.

    My thoughts are look at B0:B1 and B1:B2 ratios first, then compare what the average value is over a p range.

    My guess ratios stay constant, B values increase with p and the distribution becomes wider.

  13. #13
    Senior Member
    Join Date
    Jun 2005
    Location
    London, UK
    Posts
    271
    Originally posted by vjs
    [B]Greenbank,

    It may also be interesting to compare not only B1:B2 ratios but...

    B0:B1 ratios.
    Hmm, not convinced but I could do it.

    Originally posted by vjs
    Also I would limit myself to values above 2^48 simply b/c when you start thinking about bounds. Only for the simple fact that it doesn't take a very large B1 or B2 for p-1 before you exceed even 2^50.

    Using 2^20 etc you are probably biasing the values to lower B1 B2, I don't think the ratio will change however.
    I agree that 20 bit factors won't produce anything interesting, but I'll still produce the tables (or whatever else I do) for every factor size I have that make sense (30 to 49).

    The reason I want to look at these figures for the small factor sizes (48bit and below) is that, by sieving, we have (hopefully) a complete list of factors (well, within certain ranges of n we do).

    The majority of factors we have for > 49-bits have come from p-1 itself, so they will obviously have much tighter bounds.

    Anyway, the cricket is on TV and I'm off for drinks and a BBQ.

  14. #14
    Moderator vjs's Avatar
    Join Date
    Apr 2004
    Location
    ARS DC forum
    Posts
    1,331
    O.K. I looked a little deeper into your tables, the more I look the more I see.

    All I can say at this point is hey this is really really great.

    I was also supprised by the total % of factors found for a particular bit didn't change much with B1:B2 bounds... for 47 48 49.

    I think this goes back to what Garo has stated all along.

    Paraphrase, don't let any k/n pair go untested before prp.

  15. #15
    Moderator Joe O's Avatar
    Join Date
    Jul 2002
    Location
    West Milford, NJ
    Posts
    643
    Originally posted by Greenbank
    OK, sorry! I get it now. ;-)

    My interest with any and all *prime* factors is that the first found, or the lowest factor may not necessarily have the smoothest p-1.

    Anyway, all factors imported and p-1 factored (except the 347-bit monster). Will see how the tables look when at various ranges.
    I can definitely see the interest in the smoothest factors.


    Which 347 bit monster?
    If you mean 2205289047982057190004982033640819117394792440538447611272504341901412181560548264808624812752770970 54871

    its composite anyway.
    Joe O

  16. #16
    Moderator vjs's Avatar
    Join Date
    Apr 2004
    Location
    ARS DC forum
    Posts
    1,331
    You better double check that all those p are prime.

    A few of them are not I know at least 2 at composite, if you do a p-1 on those bounds are huge. But P-1 on the factors are fairly small. Those found through sieve should be prime... pretty sure.

  17. #17
    Originally posted by Greenbank
    By looking at as many factors as possible I want to get an idea of the B1:B2 values required to find factors by p-1.
    You will probably find that the factors are distributed very much like proth-weight adjusted random numbers, and you can make excellent predictions from Merten's Theorem and the Dickman function. Although I was asking different questions, I found excellent matches when I was studying SOB factors to make predictions two years ago.

    William
    Poohbah of the search for Odd Perfect Numbers
    http://OddPerfect.org

  18. #18
    Senior Member
    Join Date
    Jun 2005
    Location
    London, UK
    Posts
    271
    Updated with full list of factors from MikeH's zip file.

    Limited to 64 bit though...

    http://www.greenbank.org/sob/p-1.html

    Working on limiting it to ranges of n (i.e. 9->11M)

    http://www.greenbank.org/sob/p-1_limit.html 9M <= n <= 11M

  19. #19
    Moderator vjs's Avatar
    Join Date
    Apr 2004
    Location
    ARS DC forum
    Posts
    1,331
    Hey greenbank nice job but consider the following.

    We are not really going to try to P-1 with anything less than
    B1=20K and B2=200k and probably nothing more than
    b1=10M and B2=500M.

    Anyways just a suggestion that you make the colums and rows a little more real life P-1 effort.

    If it were me I'd break at..

    B1
    10k
    20k
    30k
    40k
    60k
    75k
    100k
    150k
    200k
    300k
    500k
    1M
    1.5m
    2m
    1000000 (<-something really high)

    Then B2
    100k
    200k
    300k
    400k
    300K
    750k
    1m
    1.5m
    2m
    3m
    5m
    10M
    15m
    20m
    10000000

    Might make for a better table, I've been looking mostly at how many more would have been found by doubling B1, B2 etc. One can consider a double in both bounds as taking twice as long per test etc. It would also be good to have the colums and row B2=B1x10 so as you go across the diagonals the ratio doesn't change.

    Looks like there is a sweet spot where increasing B2>2M doesn't help that much.
    Also the benifit from going beyond 50K for B1 is not great, but all increases up to this point are worth it. (This is only looking at sieved values of course, 48-49 bit

  20. #20
    Senior Member
    Join Date
    Jun 2005
    Location
    London, UK
    Posts
    271
    OK, updated http://www.greenbank.org/sob/p-1_limit.html with suggested bounds (for 9M <= n <= 11M).

  21. #21
    Moderator vjs's Avatar
    Join Date
    Apr 2004
    Location
    ARS DC forum
    Posts
    1,331
    This looks very good Greenbank,

    Based upon your work what does this say to the layman about P-1 bounds???

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •