PDA

View Full Version : Bounds



Greenbank
08-24-2005, 06:18 AM
OK, I've been looking at the results.txt file and seeing what we can work out based on previous factors. It's a 3 or 4 week old results.txt file but it should show the general trend.

I've factored every p-1 in the results.txt factor file and computed B1 and B2 for these numbers. A quick summary is here:-

size=44 count=14199 avgB1=39831 avgB2=482409760958
size=45 count=55314 avgB1=53624 avgB2=692779684881
size=46 count=44620 avgB1=68881 avgB2=1391798543402
size=47 count=37615 avgB1=99281 avgB2=2525446194684
size=48 count=98548 avgB1=131137 avgB2=4993330745133
size=49 count=71495 avgB1=154455 avgB2=7787886979296
size=50+ count=833 avgB1=1731790 avgB2=1.05555454971016e+15
although it ignores any factors over 2^64

size is in bits
count is the number of factors of that size
avgB1 is the average B1 value
avgB2 is the average B2 value

The data for size=50+ will be skewed as many of those factors were found with p-1 factoring programs themselves!

The B2 value as presented in the output above is a bit useless so does anyone have any suggestions of graphs or other ways to show the data?

I was thinking along the lines of a table for each factor bit size (44, 45, 46, 47, 48, 49). Each row would represent a B1 value (1000, 2000, 5000, 10000, 20000, 40000, 50000, etc). Each column would represent a B2 value and the cell would show the percentage of factors would have been found with this combination. (Working on this at the moment).

I might also limit it on ranges of n too (i.e. 9M < n < 10M).

It's also available for download at (6.5MB):- http://www.greenbank.org/sob/factb1b2.txt.bz2

The format of the file is:-

<p-1>: <factors> : B1=x B2=y

I wouldn't mind getting my hands on the lowresults.txt file too, to have the data for < 44 bits. Infact, I'd like to get my hands on every factor found (i.e. factrange and factexcl included however I know these are not complete). Why?

Consider a k,n pair that we have found a low factor for (say 43 bits) however its p-1 is nowhere near smooth (B1=80000, B2=4000000000). Later on we find a much larger factor whose p-1 is smooth (say B1=1000 B2=20000), this will only be recorded in the factexcl files and not in results.txt.

Greenbank
08-24-2005, 07:23 AM
Tables for hit rates of B1/B2 bounds at: http://www.greenbank.org/sob/p-1.html

Caveats:-

1) The percentages displayed are not the chances of finding a factor with a particular B1/B2 combination.

Consider the 44-bit table (which we have sieved in its entirety for the n < 20M range). There are many duplicate factors that may be smoother than the first factor found (the only one listed in results.txt). Also there are many k,n pairs in this range for which we have no factor.

2) The 50bit+ table data is skewed by the fact that we haven't really sieved up there so most of the factors have come from P-1 factoring so they will naturally fit low B1/B2 bounds.

3) The percentages are cumulative.

I'm going to work on automating this using a database and a bunch of scripts.

vjs
08-24-2005, 12:45 PM
Wow this is cool, these are obviously factors found through sieve as well not only P-1 and ECM. But it's important.

I think there are two things to think about when it comes to B1:B2.

First is the B1:B2 ratio, when considering using prime95 to determine these factors not sieve... Prime95 spends equal time in stage1 and stage2 when you chose a 1:14 ratio.

Second is how many factors fall below a particular B1 value. Double that value how many etc.

It might be useful to only consider factors 9M<n<20M and P>2^47 or 48, see if p makes a difference.

Greenbank
08-24-2005, 01:29 PM
Originally posted by vjs
Wow this is cool, these are obviously factors found through sieve as well not only P-1 and ECM. But it's important.

I think there are two things to think about when it comes to B1:B2.

First is the B1:B2 ratio, when considering using prime95 to determine these factors not sieve... Prime95 spends equal time in stage1 and stage2 when you chose a 1:14 ratio.

Second is how many factors fall below a particular B1 value. Double that value how many etc.

It might be useful to only consider factors 9M<n<20M and P>2^47 or 48, see if p makes a difference.

OK, I'll have a look at implementing some of this tomorrow.

I've reimplemented everything with scripts and a mysql DB (no more working with files thank God). I was just working with p (and p-1) to get B1 and B2. The DB includes columns for k and n which means I can limit the searches to ranges of n or p much more easily.

Also, during my validation I check the following things from the results.txt file:-
= p actually divides k*2^n+1 (they all do)
= my computed B1 divides p-1 (they all do)
= my computed B2 divides p-1 (they all do)

and finally, p is prime!

It's this last test that I noticed this line in results.txt (20% checked so far):-

p=32258662112491 k=24737 n=3007

32258662112491 is 1831 x 61231 x 287731

How was this missed in the initial sieve?

vjs
08-24-2005, 02:07 PM
Missed in the initial sieve yes, there were quite a few missed during intial sieving for one reason or another.

Client, user, not-submitted etc... This was the driving force behind the 991<n<50m effort. I think you will find that we missed quite a few hundred but not thousands.

Second you probably don't have all the data for p<25T, after some time after p>25T was sieved an agreement was meet to cut the results file at 25T.

What's important is if you find any factors <50T (currently) that factor some k/n pair in the dat, either mikes updated 2.7m<n<20M or our 991<n<50m. You won't find any if you do there are serious problems.

I've been looking at your data a little closer not sure what to make of it yet. What is your goal, just info or is it geared towards a particular future aspect of the project. I can see it's use in P-1 factoring but not nessarily sieve. It would shed some light on factoring bounds what to expect etc if you made the data specific to 900T<p<2^50 and (8M<n<15M ???). But for sieve it's always most efficient to start with the smallest un-sieved p.

Nuri
08-24-2005, 03:14 PM
I guess it's to give some idea on optimal B1 B2 bounds.

If so, I'd like to hear about the data for p>2^50 and 8m<n<10m. Almost all of the factors here would definitely be P-1 factors, and it would give an idea of where we're standing recently.

Joe O
08-24-2005, 03:27 PM
Originally posted by Greenbank

It's this last test that I noticed this line in results.txt (20% checked so far):-

p=32258662112491 k=24737 n=3007

32258662112491 is 1831 x 61231 x 287731

How was this missed in the initial sieve?

Actually it wasn't missed!

32258662112491 24737 3007 2437 61 e

The e says that this was an "excluded" result. ie it did not exist in the orginal sob.dat that was sieved to 1G

ProthSieve (and some versions of SoBSieve) produce extraneous factors. ProthSieve writes these to factexcl.dat This could also have been produced by a factor program (P-1, ECM or other) which produces factors, not necessarily prime factors.

We probably should not be keeping these excluded factors. They only clutter up the databases. Yes, I know, I was one of the people that urged we keep them when we started the sieve effort. They do have a purpose, but a limited one. Note, I am not saying that we don't need the factexcl.txt files. There are factors in them that are important to keep. The problem is that all sort of factors go into factexcl. The same is true of factrange. Not all of the factrange factors are valuable. Some of the factrange factors are also excluded factors.

We need to keep:
1) The lowest factor found for a k n pair
2) the first factor found for a k n pair

But to keep the database small, we need to limit these factors to those above some minimum value, 1M at least or 1G the way we have been doing for the most part.
And yes, I do have a Database of these factors. It is in Berkely DB form but I can produce it in a flat file. Most of these factors are also in this file produced daily by MikeH. (http://www.aooq73.dsl.pipex.com/results_duplicates_excluded_marked.zip) This file also includes duplicate and excluded factors and non prime factors. Almost everything that has been submitted is in there. A "d" at the end, possibly followed by a number indicates a duplicate. An "e" at the end indicates an excluded factor, that in my opinion should be ignored.

What VJS said about missed factors is very true, but does not directly answer Greenbank's question. It took a few readings of Greenbanks post for me to realize that he was talking about a nonprime factor rather one that had just been found and should have been found long ago. We are still finding those as VJS said. There are still more to find. I have a few ranges in mind and hope to have someone sieve them soon. Any volunteers?

Greenbank
08-25-2005, 05:42 AM
Thanks Joe, I'll have a look at that zip file from Mike.

As for those factors, it was the point that if

32258662112491 | 24737*2^3007+1

And, 32258662112491 = 1831 x 61231 x 287731

I'm not talking about the factors of p-1 here, but p itself!

Then:-

1831 | 24737*2^3007+1
61231 | 24737*2^3007+1
287731 | 24737*2^3007+1

It's more the point that 24737,3007 could have been filtered out by sieving with 1831.

Also, this one is quite boggling:-

79897555161219 | 22699*2^8600263+1

surely it would have been easier to rule out this k,n pair by trial division with a smaller number, like, err, 3.

As in:-

3 | 22699*2^8600263+1

How about the two factors in results.txt that end in a 5! etc...

Greenbank
08-25-2005, 05:45 AM
We need to keep:
1) The lowest factor found for a k n pair
2) the first factor found for a k n pair


And I would like to keep:
3) The factors with the lowest B1/B2 for a k n pair.

By looking at as many factors as possible I want to get an idea of the B1:B2 values required to find factors by p-1.

We've sieved everything up to 48 bits for some n ranges so we can use that info to work out what would be best for p-1 factoring.

Joe O
08-25-2005, 07:54 AM
Greenbank,

It's more the point that 24737,3007 could have been filtered out by sieving with 1831
Yes, and it was. That was the point that I was making by posting:

32258662112491 24737 3007 2437 61 e

The e says that this was an "excluded" result. ie it did not exist in the orginal sob.dat that was sieved to 1G

79897555161219 22699 8600263 1577 0 e

was also eliminated before the first dat was ever made available for download.

And the factors ending in 5 etc.

Please reread my post carefully. All factors are accepted by Louie and therefore MikeH. There is no filter! As long as it is a factor for the k n pair it is accepted. Note, a factor, not a prime factor. It is accepted even if a factor is already known for that k n pair, even if we already have four factors for that k n pair! MikeH categorizes these factors by placing a d or e as the last item on the line.

So just because a factor is found in results.txt does not mean it was the factor that ruled out that k n pair. It just means that it was submitted and accepted.

Greenbank
08-25-2005, 09:22 AM
OK, sorry! I get it now. ;-)

My interest with any and all *prime* factors is that the first found, or the lowest factor may not necessarily have the smoothest p-1.

Anyway, all factors imported and p-1 factored (except the 347-bit monster). Will see how the tables look when at various ranges.

vjs
08-25-2005, 10:35 AM
Greenbank,

It may also be interesting to compare not only B1:B2 ratios but...

B0:B1 ratios.

Example

32258662112491 = 1831 x 61231 x 287731

B1= 61231
B2= 287731

But the next smallest is

B0= 1831

It's my guess that the ratio and distribution between B1:B2 is very similar to B0:B1, if it's not that's also interesting.

Also I would limit myself to values above 2^48 simply b/c when you start thinking about bounds. Only for the simple fact that it doesn't take a very large B1 or B2 for p-1 before you exceed even 2^50.

Using 2^20 etc you are probably biasing the values to lower B1 B2, I don't think the ratio will change however.

My thoughts are look at B0:B1 and B1:B2 ratios first, then compare what the average value is over a p range.

My guess ratios stay constant, B values increase with p and the distribution becomes wider.

Greenbank
08-25-2005, 11:20 AM
Originally posted by vjs
Greenbank,

It may also be interesting to compare not only B1:B2 ratios but...

B0:B1 ratios.



Hmm, not convinced but I could do it.


Originally posted by vjs [B]
Also I would limit myself to values above 2^48 simply b/c when you start thinking about bounds. Only for the simple fact that it doesn't take a very large B1 or B2 for p-1 before you exceed even 2^50.

Using 2^20 etc you are probably biasing the values to lower B1 B2, I don't think the ratio will change however.


I agree that 20 bit factors won't produce anything interesting, but I'll still produce the tables (or whatever else I do) for every factor size I have that make sense (30 to 49).

The reason I want to look at these figures for the small factor sizes (48bit and below) is that, by sieving, we have (hopefully) a complete list of factors (well, within certain ranges of n we do).

The majority of factors we have for > 49-bits have come from p-1 itself, so they will obviously have much tighter bounds.

Anyway, the cricket is on TV and I'm off for drinks and a BBQ.

vjs
08-25-2005, 12:35 PM
O.K. I looked a little deeper into your tables, the more I look the more I see.

All I can say at this point is hey this is really really great.

I was also supprised by the total % of factors found for a particular bit didn't change much with B1:B2 bounds... for 47 48 49.

I think this goes back to what Garo has stated all along.

Paraphrase, don't let any k/n pair go untested before prp.

Joe O
08-25-2005, 01:39 PM
Originally posted by Greenbank
OK, sorry! I get it now. ;-)

My interest with any and all *prime* factors is that the first found, or the lowest factor may not necessarily have the smoothest p-1.

Anyway, all factors imported and p-1 factored (except the 347-bit monster). Will see how the tables look when at various ranges.

I can definitely see the interest in the smoothest factors.


Which 347 bit monster?
If you mean 2205289047982057190004982033640819117394792440538447611272504341901412181560548264808624812752770970 54871

its composite anyway.

vjs
08-25-2005, 03:25 PM
You better double check that all those p are prime.

A few of them are not I know at least 2 at composite, if you do a p-1 on those bounds are huge. But P-1 on the factors are fairly small. Those found through sieve should be prime... pretty sure.

wblipp
08-25-2005, 09:57 PM
Originally posted by Greenbank
By looking at as many factors as possible I want to get an idea of the B1:B2 values required to find factors by p-1.

You will probably find that the factors are distributed very much like proth-weight adjusted random numbers, and you can make excellent predictions from Merten's Theorem and the Dickman function. Although I was asking different questions, I found excellent matches when I was studying SOB factors to make predictions two years ago.

William

Greenbank
08-30-2005, 12:32 PM
Updated with full list of factors from MikeH's zip file.

Limited to 64 bit though...

http://www.greenbank.org/sob/p-1.html

Working on limiting it to ranges of n (i.e. 9->11M)

http://www.greenbank.org/sob/p-1_limit.html 9M <= n <= 11M

vjs
08-30-2005, 02:04 PM
Hey greenbank nice job but consider the following.

We are not really going to try to P-1 with anything less than
B1=20K and B2=200k and probably nothing more than
b1=10M and B2=500M.

Anyways just a suggestion that you make the colums and rows a little more real life P-1 effort.

If it were me I'd break at..

B1
10k
20k
30k
40k
60k
75k
100k
150k
200k
300k
500k
1M
1.5m
2m
1000000 (<-something really high)

Then B2
100k
200k
300k
400k
300K
750k
1m
1.5m
2m
3m
5m
10M
15m
20m
10000000

Might make for a better table, I've been looking mostly at how many more would have been found by doubling B1, B2 etc. One can consider a double in both bounds as taking twice as long per test etc. It would also be good to have the colums and row B2=B1x10 so as you go across the diagonals the ratio doesn't change.

Looks like there is a sweet spot where increasing B2>2M doesn't help that much.
Also the benifit from going beyond 50K for B1 is not great, but all increases up to this point are worth it. (This is only looking at sieved values of course, 48-49 bit

Greenbank
08-31-2005, 08:54 AM
OK, updated http://www.greenbank.org/sob/p-1_limit.html with suggested bounds (for 9M <= n <= 11M).

vjs
08-31-2005, 01:15 PM
This looks very good Greenbank,

Based upon your work what does this say to the layman about P-1 bounds???