Page 2 of 2 FirstFirst 12
Results 41 to 72 of 72

Thread: v2 client for SSE2 processors (FASTER!)

  1. #41
    Member Cmarc's Avatar
    Join Date
    Dec 2002
    Location
    SF Bay Area
    Posts
    70
    Gothmog:/home/marc# dpkg -l libc6
    Desired=Unknown/Install/Remove/Purge/Hold
    | Status=Not/Installed/Config-files/Unpacked/Failed-config/Half-installed
    |/ Err?=(none)/Hold/Reinst-required/X=both-problems (Status,Err: uppercase=bad)
    ||/ Name Version Description
    +++-==============-==============-============================================
    ii libc6 2.2.5-11.5 GNU C Library: Shared libraries and Timezone

  2. #42
    Originally posted by Cmarc
    Gothmog:/home/marc# dpkg -l libc6
    Desired=Unknown/Install/Remove/Purge/Hold
    | Status=Not/Installed/Config-files/Unpacked/Failed-config/Half-installed
    |/ Err?=(none)/Hold/Reinst-required/X=both-problems (Status,Err: uppercase=bad)
    ||/ Name Version Description
    +++-==============-==============-============================================
    ii libc6 2.2.5-11.5 GNU C Library: Shared libraries and Timezone
    How does version 1.2.5 work fine on this machine?

    Cheers,
    Louie

  3. #43
    Member Cmarc's Avatar
    Join Date
    Dec 2002
    Location
    SF Bay Area
    Posts
    70
    works fine

  4. #44
    If you don't have /etc/debian_version at 3.1, and you are set on running stable, at least upgrade to that. I personally run sid, with some grabs to experimental, and it's more stable enough. Sarge will be out soon, and that has 2.3.2 right now.

  5. #45
    Member Cmarc's Avatar
    Join Date
    Dec 2002
    Location
    SF Bay Area
    Posts
    70
    ok I've updated to the latest unstable release. sb 1.25 still runs fine but 2.0 still segfaults. libc6 version 2.3.2. Since this is a dedicated cruncher I'm going to do a fresh install (it needs a lighter install anyhow. still too much unnecessary stuff on that system). I'll chime in tomorrow with results on the other distribution. I somehow doubt it'll change but you never know.
    Cheers,
    Marc

  6. #46
    Make sure that: cat /proc/cpuinfo shows sse2. It probably will, but it's worth checking. Which version of the kernel?

  7. #47
    Member Cmarc's Avatar
    Join Date
    Dec 2002
    Location
    SF Bay Area
    Posts
    70
    royanee: newp that wasn't it. but good idea.

    anyhow to cut a long story short I installed a minimal Suse 9.1 on that particular box and the client seems to be working fine producing approximately 1.5McEM per sec. This won't help much with troubleshooting the problem but provides me with a quick resolution and also allows me to get a slimmer system on that box.

    Thanks for the suggestions.

    Cheers,
    Marc

  8. #48
    I love 67607
    Join Date
    Dec 2002
    Location
    Istanbul
    Posts
    752
    New client crashed at the very end of the test for k=4847, n=6997023 at AMD Athlon(tm) XP 2400+

    I reinstalled 125, completed and sent in the result, then reinstalled 20SSE2. This time it stated to crash at the very beginning.

    This post is just for feedback on the client.

    I'm moving back to sieve on this machine.

  9. #49
    Senior Member
    Join Date
    Apr 2002
    Location
    Oosterhout, Netherlands
    Posts
    223
    This client is meant for processors with SSE2 support (hence the topic name).
    Athlon XP's do not support SSE2 and therefore the client will not work (or crash in your case) on them...
    Proud member of the Dutch Power Cows

  10. #50
    I love 67607
    Join Date
    Dec 2002
    Location
    Istanbul
    Posts
    752
    Ooops.

  11. #51
    Unholy Undead Death's Avatar
    Join Date
    Sep 2003
    Location
    Kyiv, Ukraine
    Posts
    907
    Blog Entries
    1

    same stuff

    http://distributed.org.ru/forum/?a=t...pic=122&page=6

    Ð4 2.4HT Socket478, old client ~450.000 cEMs/sec, new client ~1.350.000 cEMs/sec

    Death
    SB v2.0 SSE2 says in a log v1.10+

    few questions - why it write at the log 1.10+ and should he use HT on a SB.
    wbr, Me. Dead J. Dona \


  12. #52
    I'm baffled as to why my AMD64 3000+ is not coming anywhere close to the P4 machines in terms of cEMs/s rates. Right now I'm sitting at just below 1mil cEMs/s with the new SSE2 client, while my 2.4GHz P4 Celeron is at 600,000cEMs/s. Any ideas?
    -Wu

  13. #53
    I heard once that even though those AMDs have SSE2, their implementation of it is vastly poorer than Intel's. Thus using SSE2 on an AMD proc will improve the speed over an AMD proc without SSE2, but will be much slower than an Intel proc with SSE2.

  14. #54
    Senior Member
    Join Date
    Apr 2002
    Location
    Oosterhout, Netherlands
    Posts
    223
    Originally posted by UoMDeacon
    I'm baffled as to why my AMD64 3000+ is not coming anywhere close to the P4 machines in terms of cEMs/s rates. Right now I'm sitting at just below 1mil cEMs/s with the new SSE2 client, while my 2.4GHz P4 Celeron is at 600,000cEMs/s. Any ideas?
    My Athlon64 3400+ is doing 1,1 mio cEMs/s so your result is not that out of the ordinary. You may find it low, at least it is 'normal'
    Proud member of the Dutch Power Cows

  15. #55
    Grutte Pier [Wa Oars] Theadalus's Avatar
    Join Date
    Oct 2004
    Location
    Home of DPC
    Posts
    37
    Originally posted by UoMDeacon
    ..., while my 2.4GHz P4 Celeron is at 600,000cEMs/s.
    Isn't that a bit low (my "normal" 2.4GHz P4 is doing 1.43M cEMs/s)?
    Powered by: Warlock, Necromancer and Sorcerer

  16. #56
    Junior Member
    Join Date
    Jan 2003
    Location
    Spain
    Posts
    12
    Originally posted by larsivi
    I heard once that even though those AMDs have SSE2, their implementation of it is vastly poorer than Intel's. Thus using SSE2 on an AMD proc will improve the speed over an AMD proc without SSE2, but will be much slower than an Intel proc with SSE2.
    I think that isn't the reason.

    I think that de P4 and K8 are equal in perfomance in SS2 AT THE SAME CLOCK SPEED. Both of them execute the same number of instructions per clock. But the P4 has an advantage in clock speed, giving it more power.

    Al least, that's what I once read somewhere....

  17. #57
    Member
    Join Date
    Oct 2002
    Location
    Austria
    Posts
    37
    quote:
    --------------------------------------------------------------------------------
    Originally posted by UoMDeacon
    ..., while my 2.4GHz P4 Celeron is at 600,000cEMs/s.
    --------------------------------------------------------------------------------

    Isn't that a bit low (my "normal" 2.4GHz P4 is doing 1.43M cEMs/s)?

    my 2.4GHz P4 Celeron (FSB 122MHz) does 700,000cEMs/s with fast RAM
    my 2.66GHz P4 (FSB 134MHz) does 1,700,000cEMs/s also with fast RAM

    i think the larger cache of the real P4 makes the difference

  18. #58
    Member
    Join Date
    Oct 2002
    Location
    Austria
    Posts
    37
    after a reboot (it was on for some weeks) the Celeron reaches 1,200,000cEMs/s

  19. #59
    AMD Fanboy
    Join Date
    Jun 2003
    Location
    Gdansk, Tricity
    Posts
    29
    Great work guys!

    I am now able to finish at least 14 WU's per month on my 2.8E @3.1GHz, ~2M cEM/s .

    I have calculated that comparing my total production over the non-SSE2 version, I would have saved about US$10 on electricity per month

    (Although, I'll still leave my PC's on 24/7 so there isn't going to be a difference)

    Thanks again!

  20. #60
    Moderator vjs's Avatar
    Join Date
    Apr 2004
    Location
    ARS DC forum
    Posts
    1,331
    Yes the sse2 version of the client is much faster just a shout out to all those who don't know.

    If you decide to donate some of your precious clock cycles to the double check effort, the blocks will complete very quickly...

    Enter the name supersecret to recieve tests of n=~1.08m
    or garbage for tests n= ~3.8m

    A supersecret test k/n pair will finish in around an hour.

    I still beg for a way to run ss or s k/n pair's for team credit, at least we could stick our now very slow boxes without sse2 on these k/n pairs.

    Major reason: Now that current k/n pairs take > 1week to complete sometimes >2 weeks on slow machines are they not better suited for the these smaller tests??? Where as the lastest fastest sse2 machines could contiune to press the upper bounds ahead with break neck speed.

    Third: It would give us sievers and p-1'ers more time to factor out some of the pairs before they are tested.

    Sorry for hi-jacking this thread.

    Suggest eveyone with slow machines try supersecret to press ahead the double check effort until we find the next prime or at least for a few days just for fun of it.


    you can check out the super secretstats and advancement here

    www.seventeenorbust.com/secret



    Note: secret was edited to reflect garbage the correct name.
    Last edited by vjs; 11-04-2004 at 02:14 PM.

  21. #61
    Forgotten Member
    Join Date
    Dec 2003
    Location
    US
    Posts
    64
    VJS
    Enter the name supersecret to recieve tests of n=~1.08m
    or secret for tests n= ~3.8m
    I don't think your correct. I have 3 clients running username 'secret' and they are all receiving test in the 1.08m range, thats why 70 some tests in that range are being done in a day. I'll try supersecret when one of these test finish and see what is actually being handed out.

  22. #62
    Moderator vjs's Avatar
    Join Date
    Apr 2004
    Location
    ARS DC forum
    Posts
    1,331
    Thanks for the correction pixl97, I never use the name secret always supersecret, I guess garbage is the correct user name for that que.

  23. #63
    Sieve it, baby!
    Join Date
    Nov 2002
    Location
    Potsdam, Germany
    Posts
    959
    secret and supersecret used to be different in the past, but since the change to the new queue system, they are the same.

  24. #64
    Originally posted by ceselb
    It had, but search for 'idwt percival' and you'll find some stuff about the optimizations done now.
    But wasn't this already used by GIMPS and therefore in V1.2?

  25. #65
    Moderator ceselb's Avatar
    Join Date
    Jun 2002
    Location
    Linkoping, Sweden
    Posts
    224
    idwt was, but not all the tricks.

  26. #66
    I'm not understanding - what tricks?

  27. #67
    Crandall and Fagin's 1994 paper showed how to use Irrational Base Discrete Weighted Transforms (IBDWT) for 2^n+/-1 values.

    Colin Percival's paper extends the IBDWT concept to work on values k*2^n+/-1 for small k values or for highly composite k values.

    I've extended Colin Percival's ideas so that even larger k values can use the IBDWT, but Colin Percival's method is still better for highly composite k values.


    In the old days, to multiply numbers using an FFT one would double the size of the input number by padding the upper half with zeros. The IBDWT trick lets you use a smaller FFT, no zero-padding an FFT, and specicial weights so that the FFT multiplication give you a result modulo k*2^n+/-1.

  28. #68
    Sieve it, baby!
    Join Date
    Nov 2002
    Location
    Potsdam, Germany
    Posts
    959
    Originally posted by prime95
    The IBDWT trick lets you use a smaller FFT, no zero-padding an FFT
    Under certain circumstances (k/n-wise), I still encounter zero-padded FFTs (I think they were 128k) when using PRP3 for the PSP.
    Is this intended or some sort of left-over?

  29. #69
    Is there a nice tutorial link that could explain to me how Fourier transforms allow us to figure out if a number is prime or not? Pretty pictures would be helpful...

  30. #70
    Originally posted by Mystwalker
    Under certain circumstances (k/n-wise), I still encounter zero-padded FFTs (I think they were 128k) when using PRP3 for the PSP.
    Is this intended or some sort of left-over?
    What I said before is correct in a general hand-wavy way. Now for a few more details.

    When k is one, you can use an FFT that is half the size of a zero-padded FFT. As k increases in size, a larger FFT size must be used. When k gets up to around 50,000 or so (the actual formula is complicated), then you end up using the same FFT size as the zero-padded FFT but at least you get the mod k*2^n+/-1 operation for free - so you still save some time.

    So, why are zero-padded FFTs ever used for SoB? Well, there are different FFTs used for k*2^n-1 and k*2^n+1 and there are more FFT sizes coded for k*2^n-1. So sometimes it is faster to use a zero-padded 2^2n-1 FFT than a IBDWT FFT of k*2^n+1 but only because of the extra FFT sizes available.

    There is one other time a zero-padded will be used. The k*2^n-1 FFTs are slightly more accurate than the k*2^n+1 FFTs (about 1% higher n values are supported). I need to do some studies to see if that 1% discrepancy is too conservative.
    Last edited by prime95; 11-18-2004 at 09:56 PM.

  31. #71
    Sieve it, baby!
    Join Date
    Nov 2002
    Location
    Potsdam, Germany
    Posts
    959
    For Mersenne numbers, it's pretty easy:
    http://mersenne.org/math.htm

    I don't know the algorithm for Proth numbers, though.

  32. #72
    Senior Member quv vaj Jammy's Avatar
    Join Date
    Dec 2001
    Location
    San Diego, CA
    Posts
    375

    My new P4 3 GHZ w/HT is blazing!

    This is my first venture into Intel Land since 1997. After so may have teased me for running AMD's . . .I decided to take a step over to the dark side and procure myself a P4.

    I had been ruuning two instances of the 1.25 client until I chanced upon the home page of the official site. I did not read a thing but downloaded and installed v2.0 of the SoB client.

    Dang it is fast! Gawd but I just love using this P4 with XP Pro sp2, lol.

    Jammy


    A great many people think they are thinking when they are really rearranging their prejudices.
    --William James



    Seventeen or Bust: Team Prime Rib


Page 2 of 2 FirstFirst 12

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •