PDA

View Full Version : Speed of the SoB client



smh
11-01-2002, 05:26 AM
Has anybody compared the the speed of the SoB client with other programs like PRP and PFGW?

Does this client use SSE2 instructions? If not then i guess PRP will be much faster on a P4?

smh
11-04-2002, 08:41 AM
i read on another forum that the P4 wasn't faster then an Athlon, i guess that means no SSE2 instructions are used yet.

Will these be in when client 1.0 comes out?

jjjjL
11-04-2002, 08:59 AM
i just recently realized that there are no SSE2 instructions... however, PRP and PFGW don't support them either :eek:

i thought that SSE2 instuctions were being used but there is a flag that disables SSE2 instructions built into PRPs code. i don't have a P4 for testing, but if someone confirms that PRP has a way of using SSE2, then it will be in the next version. :)

-L

smh
11-04-2002, 09:19 AM
Originally posted by jjjjL
i just recently realized that there are no SSE2 instructions... however, PRP and PFGW don't support them either :eek:

i thought that SSE2 instuctions were being used but there is a flag that disables SSE2 instructions built into PRPs code. i don't have a P4 for testing, but if someone confirms that PRP has a way of using SSE2, then it will be in the next version. :)

-L

I'm not sure about the newest version of PRP (maybe ask George?) but the latest beta (or was it Alpha) of PFGW does have SSE2 instructions.

I haven't been reading the OpenPFGW list on Yahoo groups for a while, but maybe you could ask your Questions there

Alien88
11-04-2002, 12:08 PM
Originally posted by jjjjL
i just recently realized that there are no SSE2 instructions... however, PRP and PFGW don't support them either :eek:

i thought that SSE2 instuctions were being used but there is a flag that disables SSE2 instructions built into PRPs code. i don't have a P4 for testing, but if someone confirms that PRP has a way of using SSE2, then it will be in the next version. :)

-L

this baby that im on (the laptop) is a p4-m 2.2ghz.. it only does around 117kcEMs/sec.. my athlon 1.2 does around 90kcEMs/sec

Mystwalker
11-04-2002, 03:50 PM
My Duron 900 does ~60KcEMs/sec.
Seems like a lot of L2 cache does not improve performance...

And a 30% performance hit on a P4... not good - although I know it's the CPU, not (necessarily) the software.

smh
11-04-2002, 05:04 PM
And a 30% performance hit on a P4... not good - although I know it's the CPU, not (necessarily) the software.

It's both the CPU and the software.

Without the use of SSE2 the Athlon outperforms the P4, but if a SSE2 implementation exits it would be about twice as fast on the P4 for the same clock speed. (Take a look at the GIMPS benchmarks)

MAD-ness
11-05-2002, 03:37 AM
What are the primary types of calculations and instructions in the SOB client?

Is it working with an FFT like GIMPS is? There was a lot of optimization done to make the FFTs in GIMPS use the SSE2 instructions (not to mention a lot of other optimizations on stuff other than SSE2 and the P4).

Ken_g6[TA]
11-05-2002, 11:54 AM
I still have the same question the first guy asked. Which is faster, your program or PRP? Since your code does complete Proth proofs, I suspect that PRP is slightly faster because it only does a probable test. Though Gallot's Proth has almost reached PRP's speed.

smh
11-05-2002, 12:57 PM
Since your code does complete Proth proofs,

Hmm, does it? I seriously doubt that. Primality tests take a lot longer, thats why PRP (and pfgw) exists and why proth first does a prp test.

Only the GIMPS client does a primality test since the LL test is so efficient that would only save a short time to first do a PRP test on mersenne numbers.

If no one has any data about the speed compared to other programs i might be able to do some testing, but it's a bit complicated coz the SOB client needs some registry editing to test different numbers (not assigned by the server)

MAD-ness
11-05-2002, 01:54 PM
Ken!

Never seen you on the forums, just wanted to say thanks for optimizing the heck out of the ECCp109 client. :)

You really did a number on it. =)

Ken_g6[TA]
11-06-2002, 01:08 AM
Originally posted by smh


Hmm, does it? I seriously doubt that. Primality tests take a lot longer, thats why PRP (and pfgw) exists and why proth first does a prp test.

I've never seen Proth (http://www.utm.edu/research/primes/programs/gallot/) do two passes for a number of the form k*2^n+1; just on the other forms. And I've never seen Proth take longer to prove than to disprove a prime of that form. So I think Proth doesn't do PRP testing. But somehow, Yves Gallot (the programmer of Proth) had gotten Proth to within about 20% of PRP's speed last I checked. :confused:



Only the GIMPS client does a primality test since the LL test is so efficient that would only save a short time to first do a PRP test on mersenne numbers.

If no one has any data about the speed compared to other programs i might be able to do some testing, but it's a bit complicated coz the SOB client needs some registry editing to test different numbers (not assigned by the server)
I haven't tried the client yet, but if it displays what it's doing couldn't you just time it and run PRP and/or Proth on the same number?

P.S. Thanks, MAD-ness! I wondered how many teams got my final optimized client.

Edit: ooh, ooh, another benchmarking idea. I assume that cEM/s number I saw in the FAQ is constant? If so, you can calculate how long it would take to test a 2^300000 or 2^400000 candidate. Then run Proth and PRP with sample candidates (pick one) of those sizes - probably best to check both for consistency.

smh
11-06-2002, 03:15 AM
I've never seen Proth do two passes for a number of the form k*2^n+1; just on the other forms

Yup, think you're right. Just checked it.

So then i dunno if the SOB client does a prp or prime test.


I haven't tried the client yet, but if it displays what it's doing couldn't you just time it and run PRP and/or Proth on the same number?

The current numbers are a bit too large, i was more thinking af testing with numbers of around 25.000 digits

jjjjL
11-06-2002, 07:42 AM
i must be going insane because i thought i already posted to let everyone know.. but there ARE SSE2 instructions in SB currently! :)

athlons are still slightly faster... SSE2 instructions can't solve all the FPU bandwidth issues of the P4.

i was wrong when mentioned that there weren't. also, the client is doing a prp test.

also, some of you seem confused: proth tests (done by proth.exe) are slower than prp tests (unless something has recently changed.)

-L

Halon50
11-07-2002, 09:28 AM
I thought I should pass this along:

My Celeron-433 completes an n assignment twice as quickly as my poor K6-500's. My PII-300 is between 30-50% faster than my K6 machines. I assume this to be an MMX and/or SSE instruction set advantage.

Ken_g6[TA]
11-07-2002, 10:23 AM
Actually, Halon, if your K6 has 3dnow! then it has MMX as well. And I don't think any PII has SSE anything. Plus I believe floating point instructions have been found more useful in primality testing than MMX.

The most likely problem is - how do I put this delicately - your K6 is "floating-point challenged" :o :(

Halon50
11-07-2002, 03:47 PM
Heh, quite true. These K6 machines are definitely at the end of their life cycles, and I'm in the process of looking for local people to take them (and some slower machines) off my hands.

I believe this dual 2466N board (going up this weekend) will more than make up for all my K6 machines combined! :D