Sieve Client Thread

**paul.jobling** · 03-31-2003, 03:46 AM

Originally posted by Joe O
Mikael, We're probably boring everybody else. I'll reply to you via PM.

Hey, Joe, Mikael,

You aren't boring me - I am very interested, because if something useful comes out of this I will implement it! Unfortunately I have only had the time to skim the posts rather than think about them, but I will try to find some time to go into them in detail.

Regards,

Paul.

**mklasson** · 03-31-2003, 08:16 PM

Paul,

your "iff" statements are false. That is, the other way isn't true.
Here's a counter example:
p=31
k=26
L( -2*26 | 31 ) = 1
L( -26 | 31 ) = 1
But p does not divide any 26*2^n+1.
-26^-1 = 25
<2> = {1,2,4,8,16}

And now for my oops... L( a | p ) = -1 doesn't imply order(a,p) = p-1, it just tells you order(a,p) does not divide (p-1)/2.

Anyhow, now that we know the kronecker tests don't catch all the impossible k, I'm still proposing a consideration of something like this:
Trial-factor p-1 a little bit to get one or a few small factors. Then establish as small an upper bound u as possible for order(2,p) using the found factors. Next proceed to calculate k^u (mod p) for the different k. Should any of the k^u != 1 we discard that k.

As some sort of "proof of concept", this method applied to p = 31 and provided with the factors 2 and 3 of p-1=30 successfully eliminates all 26 impossible k values, while the "Kronecker tests" miss 10 of them. Granted, this is an extreme case... But still, it seems worth investigating further.

Regards,
Mikael

**paul.jobling** · 04-01-2003, 04:56 AM

Originally posted by mklasson
Paul,

your "iff" statements are false. That is, the other way isn't true.

I see what you mean. I'll check the original - ah, I added them here - you are right, it is just an if.

Trial-factor p-1 a little bit to get one or a few small factors. Then establish as small an upper bound u as possible for order(2,p) using the found factors. Next proceed to calculate k^u (mod p) for the different k. Should any of the k^u != 1 we discard that k.

I'll read Cohen tonight for a fast order(2,p) algorithm. This might be the hard part, as to work properly we need to get the order right. I can find the inverse of k mod p quickly; i can find a^order(2,p) quickly; so it all depends on how quickly I can find order(2,p). If I have to _completely_ factorise p-1, then all bets are probably off at the level that we are at.

I'll let you know. Thanks for a great idea!

Regards,

Paul.

**cmprince** · 04-01-2003, 10:10 AM

Joe, Mikael, Paul,

I haven't thought about things like quadratic residues and modular math in years, and using my math and physics degree to write VBA spreadsheets at work doesn't exactly help to flex my cranial muscles. I can't think of a more interesting discussion to be following along with; don't mistake silence for disinterest!

Thanks for the illuminating discussion,
Chris

**paul.jobling** · 04-02-2003, 10:26 AM

Hi all,

Well, I implemented Mikael's idea, but unfortunately the time spent getting the order of 2 mod p was too long, and overall no time was saved - in fact, the program ran at half the previous speed.

This is what I did:

First, I set order to p-1. This is my guess at the order of 2 mod p.

Then for each of the first 50 small primes, if the prime - call it r - divided p-1 then I checked to see if 2^((p-1)/r) was 1. If it was, then I divided order by r.

If a k passed the Kroneker test, I saw if k^order mod p was 1. If it was not then I did not use that k. Obviously I only tried this if order was not p-1.

I didn't manage to look at Cohen last night to see if I was going about getting the order of 2 mod p in the best way, I will have a look tonight and see.

Many thanks for the good idea though, Mikael, and keep them coming!

Regards,

Paul.

**mklasson** · 04-02-2003, 12:35 PM

Paul,

a couple of thoughts:

trying the first 50 primes is probably too much. I've tested several different counts, and almost all cases are caught by using the first 10 or so primes.

Using the method on the first 1000 primes > 11000*10^9, below are the results. Trial-dividing p-1 with the first u primes eliminates v k-values (in addition to those eliminated by your Legendre tests, that is). For w of the primes, all k-values are eliminated. Table:
u v w
5 1382 92
10 1436 98
20 1466 103
100 1471 104
100000 1471 104
So on average about 1.5 k values are eliminated for each p, and about 10% of the p values are discarded altogether (no possible k left).

The trial division itself probably isn't the most costly part though. Here's an idea for speeding up the power calculating part:
Instead of calculating 2^((p-1)/factor) you can store the found factors, calculate their product d, let lb=(p-1)/d, calculate a=2^lb, and then calculate t=a^(d/factor), setting d=d/factor whenever t=1. At the end, the upper bound for order(2,p) is lb*d.

Btw, you're allowing multiple identical prime factors, right? Seems very important.

Another idea I think you should look into, and it will definitely help this idea along as well, is to dynamically adapt your alpha value to how many k are left. Under the simplifying assumption that one step of arange costs as much as one step of brange, the optimal work when r k-values are left is something like this:
r work in % of the work for r=12
0 0
1 29
2 41
3 50
4 58
5 65
6 71
7 76
8 82
9 87
10 91
11 96
12 100

Many thanks for the good idea though, Mikael, and keep them coming!

I'll try.

I don't know what your views are on sharing the source code, but it would certainly help me out if I could try things directly on the regular sobsieve program.

Regards,
Mikael

**mklasson** · 04-03-2003, 10:13 AM

Good news, everyone!

Closer inspection of the remaining n values for each k yields the following.

for k=4847, all n values are equivalent modulo 24... Table:
k t (all remaining n equiv mod t)
4847 24
5359 8
10223 12
19249 36
21181 24
22699 72
24737 24
27653 36
28433 24
33661 24
55459 12
67607 8

Sooo, extending the Kronecker idea to handle t:th power residues instead of just QR we get:
n = a (mod t)
-k*2^n = 1
-k*2^a*2^(n-a) = 1
2^(n-a) = (-k*2^a)^-1
(2^((n-a)/t))^t = (-k*2^a)^-1
i.e. (-k*2^a)^-1 has to be a t:th power residue mod p for p to be able to divide some remaining k*2^n+1.

We check this condition by raising (-k*2^a)^-1 to the (p-1)/gcd(t,p-1):th degree. If the result mod p is different from 1, we discard this k.

The old Kronecker tests are a subset of this and can be fully replaced.

Mandatory happy figures coming up next:
for the first 1000 primes > 11000*10^9, 9015 k values (2956 more than the Kronecker tests) are discarded, and 116 p values are left with no remaining k.
If this idea is combined with my previous one about order(2,p), 365 extra k values are discarded and 83 extra p values are left empty.

For as high p values as we're at now, one can of course precalc -k*2^a for every k except two; 27653 and 22699 -- their a values are 33 and 46. One could still precalc something like -27653*2^10, 2^23, and -22699*2^23.

A newer sob.dat (mine's dated Jan 30) may allow t values to be increased further. Where can I pick up one of those?

I'm just fervently hoping I haven't taken a wrong turn somewhere now...

Regards,
Mikael

**paul.jobling** · 04-03-2003, 11:05 AM

Mikael,

If your math is right you could be on to something.... watch this space.

Regards,

Paul.

**paul.jobling** · 04-04-2003, 05:47 AM

Well, I sat down with a pad last night and convinced myself that Mikael was right, and that I would be able to come up with a fast implementation of this. I think that I have

But I need some beta testing to be performed - specifically, I need people to test that when it is given the same data as the last version of SoBSieve, it removes the same {k,n} values.

If you want to do this, send an email to me (paul.jobling at whitecross dot com) and I will send the beta version of the software to you by return.

I see an increase in rate from 75 Kp/sec to 87 Kp/sec. Which is nice...

Regards,

Paul.

**paul.jobling** · 04-04-2003, 11:23 AM

Hi all,

Well, I have got something that is about 45% faster on this PC - which is nice

I implemented Mikael's idea of only looking for t:th powers, which is an extension of the Kroneker condition that I already had. I have got extra speed by using my original Kroneker code when necessary, and also by doing as little work as possible.

You may find that you need to slightly modify alpha.

I attach a zip file containing both programs - the windows and the console version. The windows version is actually *faster* than the console version now, which is odd.

EDIT: there is a bug somewhere! I have removed the link to the doownload and I will post it back when I find the bug.

Regards,

Paul.

**paul.jobling** · 04-04-2003, 11:51 AM

Just to be on the safe side, could somebody check that the new 1.26 SoBSieve removes the same values in a known range as 1.25 did? I am 99% sure that it will, but it would be nice to have confirmation.

**Mystwalker** · 04-04-2003, 12:37 PM

It found the one (and only) factor I still have saved locally...
Speed increase was ~15% respectively ~25% - once again, you've bet yourself!

One bug report:
The help page still states version 1.24

btw: The optimal (well, at least better) alpha setting on my machines are:

900 MHz Duron: 1.5
1 GHz P3-m: 2.0

If someone finds better values, just post them or send me a pm...

**Troodon** · 04-04-2003, 12:49 PM

I got a ~30000 p/s increase! BUT there is a problem. Currently I'm sieving the range 10660-11000. The two first factors found by SoBSieve 1.25 are:
10660848993103 | 19249*2^5172926+1
10661083508609 | 55459*2^15208654+1

The first one was found by 1.26, but the second one was NOT FOUND (tried with both, console and GUI, versions).

This is the SoBStatus.dat file I used for the second test:
pmax=11000000000000
pmin=10661082500000

**Joe O** · 04-04-2003, 12:53 PM

Originally posted by ceselb
If anybody need a test range here's one:
25 - 25.34G. Done by SoBSieve 1.06, so it should be accurate.

Do you recall what sob.dat file you used for this test range?

**mklasson** · 04-04-2003, 01:47 PM

Originally posted by Troodon
[B]I got a ~30000 p/s increase! BUT there is a problem. Currently I'm sieving the range 10660-11000. The two first factors found by SoBSieve 1.25 are:
10660848993103 | 19249*2^5172926+1
10661083508609 | 55459*2^15208654+1

The first one was found by 1.26, but the second one was NOT FOUND (tried with both, console and GUI, versions).

Fortunately the math checks out ok, so it seems to be some code bug.

Regards,
Mikael

**Xrillo** · 04-04-2003, 02:20 PM

Good work Paul, and especially good math Mikael!...

I hope the bug isn't fatal..

v1.26 runs at ~110kp/s on my machine now, I'd say about 30% faster than the previous version.

**paul.jobling** · 04-04-2003, 02:34 PM

Well, I see exactly the same problem as Troodon - thanks for testing it. I'll remove the download and ask you all not to use version 1.26 until I have fixed it.

**Joe O** · 04-04-2003, 05:41 PM

Subscript? There are more than 65536 n's for k=55459. I tested it on

pmax=8791000000000
pmin=8790622787860
8790622787861 | 55459*2^14523346+1

and it worked, so it's not the k value.

**mklasson** · 04-04-2003, 06:17 PM

Originally posted by Joe O
Subscript? There are more than 65536 n's for k=55459. I tested it on and it worked, so it's not the k value.

Say, Joe, what's going on here? That's the second post I've seen from you that quotes something I've never seen before. In your previous post you quoted ceselb. I'd never seen his post either.

Am I missing some posts for some strange reason, or are you quoting from other threads or something?

Regards,
Mikael

**Joe O** · 04-04-2003, 06:53 PM

Mikael,
Ceselb's quote was from another thread, one of the original sieving threads. Since he posted all the results for that range, it would have been a good test if we knew which sob.dat file he originally used. It wouldn't do to use a newer sob.dat file and get fewer factors since they had been removed. Debugging is hard enough without looking for a problem that isn't there.

The second quote was just a way of formatting the data, I was quoting myself.

**jjjjL** · 04-04-2003, 10:04 PM

10661083508609 | 55459*2^15208654+1 is the only known factor for that number (according to the database) so Troodon's observation is not simply caused by the use of an updated sob.dat file... there really must be an error in the computations.

hope it's something minor. thanks for all the hard work paul.

-Louie

**paul.jobling** · 04-05-2003, 06:21 AM

Well, I had a look at the code and found a small bug- a 64 bit a*b mod p was returning p+1 rather than 1. On investigation, this is a very rare occurrence, so we were lucky to see it.

I have fixed the code so that this is sorted out, and the corrected version is attached. Again, I don't have any set of test cases to try this out with, so if somebody could try this out it would be good.

It is attached.

Regards,

Paul.

**Nuri** · 04-05-2003, 07:21 AM

I'll test it on some of my completed ranges (both on normal and DC sieves) for a couple of hours and give some feedback.

**priwo** · 04-05-2003, 07:31 AM

the new version of 1.26 has successfully passed the Trodoon-test

pmax=10662000000000
pmin=10660839000000
10660848993103 | 19249*2^5172926+1
10661083508609 | 55459*2^15208654+1

speed increase to version 1.25 is 22% on my Duron 1500+

**Troodon** · 04-05-2003, 08:16 AM

After some testing, SoBSieve 1.27 seems to work correctly (same results as with 1.25). On my computer (it's a Tualatin), SoBSieve 1.27 is just a bit faster than 1.25 and much slower than 1.26. I'll try to do some apha value testing.

Thanks Paul!

**Mystwalker** · 04-05-2003, 08:18 AM

I switched back to the old version right after Trodoon's posting - hope I haven't missed a factor now.

I think we should use a reference range as test bed - like caselb did. Ideally, it would be a low range, as the factor density is higher there, so there's a much greater chance to quickly miss a factor.

Paul:
I'm not familiar with the sieving algorithm (I surrendered after reading your discussion for a minute or two

). Are there different factors (pun not intended) that come into play when determining whether or not the current p is a factor? I seems so, as the p+1 problem only affects very few factors, as you've said.
Would it be possible to create a variant of the sieving program that additionally saves the way the factor was found. That way, one could search for a small test range which has all (or at least much) of the "different factor types" in it...

:

Duron 1500+

Durons also have quantispeed measuring? I thought only Athlons have it...

**Troodon** · 04-05-2003, 10:19 AM

Celeron Tualatin @ 142x10, Windows 2000, 1.27:

Alpha | ~p/s
------------------
2.000 | 148500
2.500 | 148500
3.000 | 147000

For the console version:
1.27 is ~15% slower than 1.26 on my computer. 1.27 is ~5% faster than 1.25. Anyone else is getting these performances?
Haven't done "detailed" comparision of the GUI version because is slower (and always was slower) than the console version.

Edit: added details.

**priwo** · 04-05-2003, 11:24 AM

tested range
13007417126519 | 10223*2^12492989+1
13008326296387 | 21181*2^13984292+1
13009218296443 | 5359*2^9325182+1

same result with 1.25 and 1.27
1.27 is 22% faster than 1.25
1.27 is 12 % slower than 1.26

148.500 with Duron 1500+(real clock is 1300) alpha=2,500

**Nuri** · 04-05-2003, 12:50 PM

And these are my test results:

First, there seems to be no problem with finding factors.

The machine used for testing is P4-1700, and system Win 2000.

Versions 1.25 and 1.26 were not faster than 1.24 on my machine, so I tested 1.27 vs. 1.24.

All tests are made in default alpha values.

Tested three different ranges:

1- A range of 0.01G at around 780G.
SoBSieve 1.24: 53.5k p/sec
SoBSieveConsole 1.24: 53.8k p/sec
SoBSieve 1.27: 61.0k p/sec
SoBSieveConsole 1.27: 62.5k p/sec 16% faster

2- A range of 0.01G at around 12350G.
SoBSieve 1.24: 55.6k p/sec
SoBSieveConsole 1.24: 58.8k p/sec
SoBSieve 1.27: 66.2k p/sec
SoBSieveConsole 1.27: 67.1k p/sec 14% faster

3- (for DC Sieve) A range of 0.11G at around 1500G
SoBSieve 1.24: 133.7k p/sec
SoBSieveConsole 1.24: 134.5k p/sec still faster than 1.27, as well as 1.25 and 1.26
SoBSieve 1.27: 111.8k p/sec
SoBSieveConsole 1.27: 131.3k p/sec

It seems to me like, relative performances change wrt. the hardware and system used as well as the p and "n" ranges for the sieve (normal vs. DC etc.).

Thanks again Paul. Your contributions to the sieving effort are very valuable for us.

Regards,

Nuri

**priwo** · 04-05-2003, 03:17 PM

results from a Celeron 1300@1560

range:
17907000000000
17909503000000

same factors found by 1.25 and 1.27

1.27 is 13% faster than 1.25 (all my results are with the GUI version)

**Slatz** · 04-05-2003, 06:20 PM

Just another data point, I got an 11 % increase from 1.27 over 1.24 on a PIII 1 ghz machine

Slatz

**Joe O** · 04-06-2003, 07:45 PM

Version 1.25 GUI 80633 p/sec
Version 1.27 GUI 85333 p/sec 5.83 % increase on a PIII/500 under Win98SE.

**paul.jobling** · 04-07-2003, 08:01 AM

Hi all,

I have got a version 1.28 in development that is faster that 1.27. I have managed this by consolidating the new functions that were written for the 1.27 tests together, and by making them all assembler (instead of C code that called assembler code to do the work).

To make sure that things are working, if any7body has got a body of test data that they can use (particularly for large values of p) that they can give to me or they can use to test themselves, please send me a pm.

Regards,

Paul.

**Joe O** · 04-07-2003, 12:10 PM

Your PM mailbox is full. I've tried to send you a test range, and volunteer to test the new versions. Even though there is a new version coming, you might find the following results interesting:

Code:

Version 1.27 Console 86882
Version 1.27 Gui     88903
Version 1.25 Console 92827

This is on a Celeron/450 running Windows NT. These are ~ 12 hr averages.

So the winner and still champion is Version 1.25 console dated 3/19/2003. Note not the one dated 3/20/2003, but the prior one dated 3/19/2003 with the "old" optimizations!

**Nuri** · 04-07-2003, 01:53 PM

Hi Joe,

Are these results based on DC sieve, or normal sieve?

1.27 performs worse on my machine at DC sieve, but better at normal sieve (see above).

And another benchmark on 1.24 vs 1.27:
I've changed to 1.27 on my three machines at work, all three working on 19T-20T range. One has got 10% speed increase, but no significant speed change (~1% increase) in other two.

It's interesting to see different versions perform better on different occasions.

**Joe O** · 04-07-2003, 02:37 PM

Nuri,
These are all on DC(Double Checking) Sieve (101 < n < 3M). In the past, I've obtained consistent results for both sieve ranges on my machines. I'll test the 3M < n < 20m range, when V 1.28 gets here, if not before. It depends on RL.

Joe.

**Troodon** · 04-07-2003, 03:07 PM

Originally posted by Joe O
So the winner and still champion is Version 1.25 console dated 3/19/2003. Note not the one dated 3/20/2003, but the prior one dated 3/19/2003 with the "old" optimizations!

What are the differences between them? I was using the 19 March version and didn't knew about the 20 March version.

**Joe O** · 04-07-2003, 03:20 PM

Originally posted by paul.jobling
3%, so nothing amazing. I just noticed that I built it with the old optimizations, so here is another build using the new optimizations. Is this any faster?

Regards,

Paul.

**Troodon** · 04-07-2003, 08:31 PM

Thanks Joe! Re-reading the thread I realised that I knew about it, but I forgot there were two 1.25 versions

.

**paul.jobling** · 04-08-2003, 07:33 AM

Hi all,

Attached is a 1.28 client. This is about 10% faster than the 1.27 release (on this PC).

This speed-up is due to moving more code to pure assembler, and also by performing a fast Kronecker test before seeing if 'a' is a t-th power (see previous discussion with MKlassen).

If anybody has got code for performing a similar fast test to the Kronecker test for 4th powers I would be interested.

Thanks to the beta testers.

I think my PM inbox is broken - it is empty but people say they cannot send me messages as it is full...

Regards,

Paul.

Thread: Sieve Client Thread

Thread Tools

Rate This Thread

Display

D'oh

Paul

Re: Paul

1.28 Sieve Client

Posting Permissions