v2 client for SSE2 processors (FASTER!)

**Cmarc** · 10-04-2004, 10:08 PM

Gothmog:/home/marc# dpkg -l libc6
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Installed/Config-files/Unpacked/Failed-config/Half-installed
|/ Err?=(none)/Hold/Reinst-required/X=both-problems (Status,Err: uppercase=bad)
||/ Name Version Description
+++-==============-==============-============================================
ii libc6 2.2.5-11.5 GNU C Library: Shared libraries and Timezone

**jjjjL** · 10-04-2004, 11:36 PM

Originally posted by Cmarc
Gothmog:/home/marc# dpkg -l libc6
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Installed/Config-files/Unpacked/Failed-config/Half-installed
|/ Err?=(none)/Hold/Reinst-required/X=both-problems (Status,Err: uppercase=bad)
||/ Name Version Description
+++-==============-==============-============================================
ii libc6 2.2.5-11.5 GNU C Library: Shared libraries and Timezone

How does version 1.2.5 work fine on this machine?

Cheers,
Louie

**Cmarc** · 10-04-2004, 11:48 PM

works fine

**royanee** · 10-05-2004, 02:14 AM

If you don't have /etc/debian_version at 3.1, and you are set on running stable, at least upgrade to that. I personally run sid, with some grabs to experimental, and it's more stable enough. Sarge will be out soon, and that has 2.3.2 right now.

**Cmarc** · 10-05-2004, 06:55 AM

ok I've updated to the latest unstable release. sb 1.25 still runs fine but 2.0 still segfaults. libc6 version 2.3.2. Since this is a dedicated cruncher I'm going to do a fresh install (it needs a lighter install anyhow. still too much unnecessary stuff on that system). I'll chime in tomorrow with results on the other distribution. I somehow doubt it'll change but you never know.
Cheers,
Marc

**royanee** · 10-06-2004, 01:06 AM

Make sure that: cat /proc/cpuinfo shows sse2. It probably will, but it's worth checking. Which version of the kernel?

**Cmarc** · 10-07-2004, 09:15 AM

royanee: newp that wasn't it. but good idea.

anyhow to cut a long story short I installed a minimal Suse 9.1 on that particular box and the client seems to be working fine producing approximately 1.5McEM per sec. This won't help much with troubleshooting the problem but provides me with a quick resolution and also allows me to get a slimmer system on that box.

Thanks for the suggestions.

Cheers,
Marc

**Nuri** · 10-13-2004, 04:21 PM

New client crashed at the very end of the test for k=4847, n=6997023 at AMD Athlon(tm) XP 2400+

I reinstalled 125, completed and sent in the result, then reinstalled 20SSE2. This time it stated to crash at the very beginning.

This post is just for feedback on the client.

I'm moving back to sieve on this machine.

**[DPC]Mobster** · 10-14-2004, 03:15 AM

This client is meant for processors with SSE2 support (hence the topic name).
Athlon XP's do not support SSE2 and therefore the client will not work (or crash in your case) on them...

**Nuri** · 10-14-2004, 04:37 PM

Ooops.

**Death** · 10-15-2004, 04:46 AM

http://distributed.org.ru/forum/?a=t...pic=122&page=6

Ð4 2.4HT Socket478, old client ~450.000 cEMs/sec, new client ~1.350.000 cEMs/sec

Death
SB v2.0 SSE2 says in a log v1.10+

few questions - why it write at the log 1.10+ and should he use HT on a SB.

**UoMDeacon** · 10-18-2004, 03:49 AM

I'm baffled as to why my AMD64 3000+ is not coming anywhere close to the P4 machines in terms of cEMs/s rates. Right now I'm sitting at just below 1mil cEMs/s with the new SSE2 client, while my 2.4GHz P4 Celeron is at 600,000cEMs/s. Any ideas?

**larsivi** · 10-18-2004, 04:02 AM

I heard once that even though those AMDs have SSE2, their implementation of it is vastly poorer than Intel's. Thus using SSE2 on an AMD proc will improve the speed over an AMD proc without SSE2, but will be much slower than an Intel proc with SSE2.

**[DPC]Mobster** · 10-18-2004, 04:34 AM

Originally posted by UoMDeacon
I'm baffled as to why my AMD64 3000+ is not coming anywhere close to the P4 machines in terms of cEMs/s rates. Right now I'm sitting at just below 1mil cEMs/s with the new SSE2 client, while my 2.4GHz P4 Celeron is at 600,000cEMs/s. Any ideas?

My Athlon64 3400+ is doing 1,1 mio cEMs/s so your result is not that out of the ordinary. You may find it low, at least it is 'normal'

**Theadalus** · 10-18-2004, 08:25 AM

Originally posted by UoMDeacon
..., while my 2.4GHz P4 Celeron is at 600,000cEMs/s.

Isn't that a bit low (my "normal" 2.4GHz P4 is doing 1.43M cEMs/s)?

**expinete** · 10-18-2004, 10:27 AM

Originally posted by larsivi
I heard once that even though those AMDs have SSE2, their implementation of it is vastly poorer than Intel's. Thus using SSE2 on an AMD proc will improve the speed over an AMD proc without SSE2, but will be much slower than an Intel proc with SSE2.

I think that isn't the reason.

I think that de P4 and K8 are equal in perfomance in SS2 AT THE SAME CLOCK SPEED. Both of them execute the same number of instructions per clock. But the P4 has an advantage in clock speed, giving it more power.

Al least, that's what I once read somewhere....

**priwo** · 10-18-2004, 12:31 PM

quote:
--------------------------------------------------------------------------------
Originally posted by UoMDeacon
..., while my 2.4GHz P4 Celeron is at 600,000cEMs/s.
--------------------------------------------------------------------------------

Isn't that a bit low (my "normal" 2.4GHz P4 is doing 1.43M cEMs/s)?

my 2.4GHz P4 Celeron (FSB 122MHz) does 700,000cEMs/s with fast RAM
my 2.66GHz P4 (FSB 134MHz) does 1,700,000cEMs/s also with fast RAM

i think the larger cache of the real P4 makes the difference

**priwo** · 10-20-2004, 01:32 PM

after a reboot (it was on for some weeks) the Celeron reaches 1,200,000cEMs/s

**Polski Radon** · 11-04-2004, 10:30 AM

Great work guys!

I am now able to finish at least 14 WU's per month on my 2.8E @3.1GHz, ~2M cEM/s .

I have calculated that comparing my total production over the non-SSE2 version, I would have saved about US$10 on electricity per month

(Although, I'll still leave my PC's on 24/7 so there isn't going to be a difference)

Thanks again!

**vjs** · 11-04-2004, 12:12 PM

Yes the sse2 version of the client is much faster just a shout out to all those who don't know.

If you decide to donate some of your precious clock cycles to the double check effort, the blocks will complete very quickly...

Enter the name supersecret to recieve tests of n=~1.08m
or garbage for tests n= ~3.8m

A supersecret test k/n pair will finish in around an hour.

I still beg for a way to run ss or s k/n pair's for team credit, at least we could stick our now very slow boxes without sse2 on these k/n pairs.

Major reason: Now that current k/n pairs take > 1week to complete sometimes >2 weeks on slow machines are they not better suited for the these smaller tests??? Where as the lastest fastest sse2 machines could contiune to press the upper bounds ahead with break neck speed.

Third: It would give us sievers and p-1'ers more time to factor out some of the pairs before they are tested.

Sorry for hi-jacking this thread.

Suggest eveyone with slow machines try supersecret to press ahead the double check effort until we find the next prime or at least for a few days just for fun of it.

you can check out the super secretstats and advancement here

www.seventeenorbust.com/secret

Note: secret was edited to reflect garbage the correct name.

**pixl97** · 11-04-2004, 02:11 PM

VJS

Enter the name supersecret to recieve tests of n=~1.08m
or secret for tests n= ~3.8m

I don't think your correct. I have 3 clients running username 'secret' and they are all receiving test in the 1.08m range, thats why 70 some tests in that range are being done in a day. I'll try supersecret when one of these test finish and see what is actually being handed out.

**vjs** · 11-04-2004, 02:17 PM

Thanks for the correction pixl97, I never use the name secret always supersecret, I guess garbage is the correct user name for that que.

**Mystwalker** · 11-04-2004, 03:48 PM

secret and supersecret used to be different in the past, but since the change to the new queue system, they are the same.

**DigitalConcepts** · 11-17-2004, 04:07 PM

Originally posted by ceselb
It had, but search for 'idwt percival' and you'll find some stuff about the optimizations done now.

But wasn't this already used by GIMPS and therefore in V1.2?

**ceselb** · 11-18-2004, 02:19 AM

idwt was, but not all the tricks.

**DigitalConcepts** · 11-18-2004, 01:38 PM

I'm not understanding - what tricks?

**prime95** · 11-18-2004, 03:13 PM

Crandall and Fagin's 1994 paper showed how to use Irrational Base Discrete Weighted Transforms (IBDWT) for 2^n+/-1 values.

Colin Percival's paper extends the IBDWT concept to work on values k*2^n+/-1 for small k values or for highly composite k values.

I've extended Colin Percival's ideas so that even larger k values can use the IBDWT, but Colin Percival's method is still better for highly composite k values.

In the old days, to multiply numbers using an FFT one would double the size of the input number by padding the upper half with zeros. The IBDWT trick lets you use a smaller FFT, no zero-padding an FFT, and specicial weights so that the FFT multiplication give you a result modulo k*2^n+/-1.

**Mystwalker** · 11-18-2004, 04:00 PM

Originally posted by prime95
The IBDWT trick lets you use a smaller FFT, no zero-padding an FFT

Under certain circumstances (k/n-wise), I still encounter zero-padded FFTs (I think they were 128k) when using PRP3 for the PSP.
Is this intended or some sort of left-over?

**royanee** · 11-18-2004, 04:33 PM

Is there a nice tutorial link that could explain to me how Fourier transforms allow us to figure out if a number is prime or not? Pretty pictures would be helpful...

**prime95** · 11-18-2004, 06:11 PM

Originally posted by Mystwalker
Under certain circumstances (k/n-wise), I still encounter zero-padded FFTs (I think they were 128k) when using PRP3 for the PSP.
Is this intended or some sort of left-over?

What I said before is correct in a general hand-wavy way. Now for a few more details.

When k is one, you can use an FFT that is half the size of a zero-padded FFT. As k increases in size, a larger FFT size must be used. When k gets up to around 50,000 or so (the actual formula is complicated), then you end up using the same FFT size as the zero-padded FFT but at least you get the mod k*2^n+/-1 operation for free - so you still save some time.

So, why are zero-padded FFTs ever used for SoB? Well, there are different FFTs used for k*2^n-1 and k*2^n+1 and there are more FFT sizes coded for k*2^n-1. So sometimes it is faster to use a zero-padded 2^2n-1 FFT than a IBDWT FFT of k*2^n+1 but only because of the extra FFT sizes available.

There is one other time a zero-padded will be used. The k*2^n-1 FFTs are slightly more accurate than the k*2^n+1 FFTs (about 1% higher n values are supported). I need to do some studies to see if that 1% discrepancy is too conservative.

**Mystwalker** · 11-18-2004, 06:12 PM

For Mersenne numbers, it's pretty easy:
http://mersenne.org/math.htm

I don't know the algorithm for Proth numbers, though.

**Jammy** · 11-27-2004, 03:05 PM

This is my first venture into Intel Land since 1997. After so may have teased me for running AMD's . . .I decided to take a step over to the dark side and procure myself a P4.

I had been ruuning two instances of the 1.25 client until I chanced upon the home page of the official site. I did not read a thing but downloaded and installed v2.0 of the SoB client.

Dang it is fast! Gawd but I just love using this P4 with XP Pro sp2, lol.

Jammy

Thread: v2 client for SSE2 processors (FASTER!)

Thread Tools

Rate This Thread

Display

same stuff

My new P4 3 GHZ w/HT is blazing!

Posting Permissions