Page 1 of 18 1234511 ... LastLast
Results 1 to 40 of 709

Thread: Sieve Client Thread

  1. #1
    Well, all this competition has proven productive I have produced a version that is 45% faster than the 1.10 version at p=150G, though you may get different results based on the platform and the p values that you are using. However, I have tried to be as friendly as possible to different processors.

    At the end of the day, myself and Phil both know that 99% of the time in the code is being spent doing a couple of things, and you can only optimise those things so far. I would expect that in the limit, the only difference between our software on the same platform would be the OS overhead. Which is probably why his Linux version is faster than the Windows version!

    Regards,

    Paul.
    Attached Files Attached Files

  2. #2
    Originally posted by paul.jobling
    Well, all this competition has proven productive I have produced a version that is 45% faster than the 1.10 version at p=150G, though you may get different results based on the platform and the p values that you are using. However, I have tried to be as friendly as possible to different processors.

    At the end of the day, myself and Phil both know that 99% of the time in the code is being spent doing a couple of things, and you can only optimise those things so far. I would expect that in the limit, the only difference between our software on the same platform would be the OS overhead. Which is probably why his Linux version is faster than the Windows version!

    Regards,

    Paul.
    I think it's cooperation works best when both brains are truly primed for tackling the problem, cobwebs dusted off and all that. I think we're both at ramming speed now, aren't we? The amusing thing is that I know I can make my latest version much faster, but a quick attempt failed curiously, and I wanted to draw a line under it tonight for Sander.

    On the website is version 009, which is over times the speed of 006 (my initial release). That could potentially make it 5 or more times the speed of the old SoBSieve that people were using only a few days ago. My Duron 900 does 10M in 325 seconds., or >30750p/s.

    Solaris, unfortunately failed to take to the latest improvement, and it got horribly slower. I might make a 008+ or a 009- for solaris, which would have only 1 of the 2 optimisations that I put in between 008 and 009, though.

    Don't worry folks, this "competition" isn't a damaging one at all, I simply come up with my best ideas when under pressure, and I want to maintain that pressure just for another day or two. I have one more path I _must_ follow, and I'm just a bit focussed on it at the moment. To misquote Homer Simpson - "can't talk, coding".

    Of course my code, when it reaches its final state in C would benefit from some asm at the important places, but I'm not sure I'm up for the job. Paul, however, is a pretty nifty asm coder, so the conclusions are obvious.

    (I remember when we were in that pub with the biltong in Reading you tried to tell me that I was an alright asm programmer, Paul, and I denied it. Well here's the proof - I can't asm-ify this for toffee, even though it shouldn't be hard at all!)

    So, Windows and Linux dudes, grab yourself a 009, and report back, please :-)

    Phil

  3. #3
    Remember, please use this thread for discussion of the client. Use the other thread to coordinate your sieving.

  4. #4
    OK, so with SoBSieve 1.11 I should NOT change the alpha from 1? Or was that 1.10 specific? >.<

  5. #5
    Moderator ceselb's Avatar
    Join Date
    Jun 2002
    Location
    Linkoping, Sweden
    Posts
    224
    Originally posted by RangerX
    OK, so with SoBSieve 1.11 I should NOT change the alpha from 1? Or was that 1.10 specific? >.<
    You could always change it if you wanted. The reminder was to get people to not continue on 3.2. 1 should be near optimal right now afaik.

    Btw, my PIV isn't getting the stated 45% increase, but instead around 6%. Not that I'm complaining or anything.

  6. #6
    The amusing thing is that I know I can make my latest version much faster, but a quick attempt failed curiously, and I wanted to draw a line under it tonight for Sander.
    Phil, please don't let me stop you, it's just an old P3 that is going to run until someone reboots it (probably somewere next week), so it's only of small use for the project at all. (And i'll have access to it until tomorrow evening ;-))

    Actually i prefer SoBSieve. I can set that one up as a windows service. (The pc automatically switches on every evening incase someone turns it off).

    I'll do some testing tomorrow.

  7. #7
    Originally posted by smh
    Phil, please don't let me stop you
    ...
    I just think it would be stupid for me to spend a day trying to make this optimisation again, and to fail again, so I'm making sure that that with which I am happy is available.

    009 is between one and a half and twice as fast as 008 on my Durons, I thought it was worth getting it out there sooner rather than later.

    Phil

  8. #8
    Junior Member
    Join Date
    Nov 2002
    Location
    Saline, MI
    Posts
    9
    Not that it particularly matters, but the newest version of SOBsieve is giving me much larger numbers for sec/n/k. Before I was getting around 7 or 8 when i wasn't doing other stuff, and now it's up in the 1400's. Why the big change?

    Also, SOBsieve is set up to automatically highlight all factors in the window everytime you re-open the window. Is there some way this could be disabled?

  9. #9
    Sieve it, baby!
    Join Date
    Nov 2002
    Location
    Potsdam, Germany
    Posts
    959
    Phil:
    Got a 75% performance increase with my P3-m and even a bit more with the Duron. Well Done!

    One suggestion:
    Could you make it possible to make the batch file customizable? Perhaps with a parameter that's printed in front of the program call.
    Then one could use a "start /low NbeGon[...]" command...

  10. #10
    Moderator ceselb's Avatar
    Join Date
    Jun 2002
    Location
    Linkoping, Sweden
    Posts
    224
    I did a little timed run to compare the speed of the sievers. Range 25 - 25.01G.

    SoBSieve 1.11 12m03s
    NbeGon_009 7m30s

    Conclusion: PIV should use NbeGon!

  11. #11
    Sieve it, baby!
    Join Date
    Nov 2002
    Location
    Potsdam, Germany
    Posts
    959
    My suggestion above is more or less obsolete!
    I just created another batch file containing "start /low sob.bat"...

    But maybe you could add a key (combination) to end the program after having updated the start point in the batch file?
    So no work is lost when one stops the program.

  12. #12
    Moderator ceselb's Avatar
    Join Date
    Jun 2002
    Location
    Linkoping, Sweden
    Posts
    224
    Did another test, on an old PII this time. Range 25 - 25.005G.

    SoBSieve 1.11 14m09s
    NbeGon_009 7m39s

    Conclusion: PII (and probably P3) should use NbeGon!

  13. #13
    My Athlon 1.33GHz:

    SoBSieve 1.11 (a=1.0) --> 25,000 p/sec
    NBeGon 009 (d=2.9) --> 55,000 p/sec (!!)

    Wow. My results may not be universal because my range is low but those speeds are amazing. Good job Paul and Phil! I don't think people (even those who have seen all the recent improvements in the public sieves) really grasp just how amazing these speedups are!

    Just to give everyone a little perspective, lets look at the previous speed king: NewPGen 2.70. Don't get me wrong, it's a GREAT program and much faster than other sieves, but the ability to sieve all 12 k's at once is a real big deal. I just benchmarked p=~14billion for k=27653 from n=3-20million. This runs at 11,500 p/sec. Now, correct for the fact that you have to sieve each k seperately and you're down to an average total speed of 960 p/sec. That doesn't even take into account the fact that most of the sieving was done for SB with an out of date version of NewPGen (v2.40 i think) that ran about 40% slower. This means the previous effective rate was (assuming I had used this computer):

    580 p/sec.

    I did eventually upgrade a few months ago but still... to consider that we are now capable of runing at 100x the previous speed is insane. My workstation is now capable of doing in a day what I normally did on the U of M supercomputing cluster in a week!

    Now add to the 100x speedup the fact that it is a public search and we have effectively 500-1000x the previous sieving power. Sit back and think about that for a minute and just try and tell me that you don't love Paul and Phil for all their hard work.

    -Louie

  14. #14
    Senior Member dmbrubac's Avatar
    Join Date
    Dec 2002
    Location
    Ontario Canada
    Posts
    112

    What if...

    ... we have effectively 500-1000x the previous sieving power.


    Please don't hurt me Mr. Moderator, but...

    Is there similar optimization room in the SoB client? I won't pretend to understand the math, but it just stands to reason.

    Instead of my current production sitting at about 300, it could be at 30,000! Instead of the entire production sitting at about 90 M, it would be 9 G.

    Also considering that these enhancements were done by two people over just a few days (?) I believe it is worth the effort.

  15. #15

    Re: What if...

    Originally posted by dmbrubac


    Please don't hurt me Mr. Moderator, but...

    Is there similar optimization room in the SoB client? I won't pretend to understand the math, but it just stands to reason.

    Instead of my current production sitting at about 300, it could be at 30,000! Instead of the entire production sitting at about 90 M, it would be 9 G.

    Also considering that these enhancements were done by two people over just a few days (?) I believe it is worth the effort.
    There already has been a lot of optimization done to the code.. if you only knew how slow the client was when the project first started.

    I'm sure there's room for some improvement.. there always is.. but I'll let Louie field the rest of the answer.

  16. #16
    SoB uses George Woltman's FFT libraries, which are insanely well optimised (its almost all in very carefully hand-crafted assembler). They have been worked on over a number of years (about 10 now? Maybe more?) for the Mersenne Prime project. Consequently, I don't think the project will see any major increase in speed. As Alien mentioned there already was a huge speed increase when the Woltman libs first started to be used.

    Mike Bell.

    P.S. Great to see Paul and Phil both working on this, Paul's code was already a few times faster than mine before the last round of optimisations.....

  17. #17
    there is actually someone working on the asm core of sB. I don't know how public he wants his effort to be or I'd post more, but there is definately some effort directed to it.

    Any asm folks out there who need a happy-fun weekend project are welcome to email me for the code. this is not a good "learning" or "first stab" at asm type project... but if you think you have a good command of x86 asm, drop me a line and I'll hook you up.

    -Louie

  18. #18
    Phil's client (NbeGon): (OpenBSD p3-500)

    d=1
    version: 008
    9805

    version: 009

    ---
    Speed testing:

    13140 d=1
    14445 d=1.5
    14991 d=1.75
    15283 d=2
    15531 d=2.5
    15729 d=3
    15772 d=3.5
    15801 d=4

    --
    Alien88

  19. #19
    Wow!!! NbeGon_009 runs at 73,000 p/s on my Athlon XP 1800+! And this is pure, portable C code! I can't wait to see how high this goes when further optimized and the core routines are converted to ASM. 'Tis already almost 6x faster than the original SOBSieve. Go Phil and Paul!!

    Greg

  20. #20
    Senior Member dmbrubac's Avatar
    Join Date
    Dec 2002
    Location
    Ontario Canada
    Posts
    112

    Re: Re: What if...

    Originally posted by Alien88
    There already has been a lot of optimization done to the code.. if you only knew how slow the client was when the project first started.

    I'm sure there's room for some improvement.. there always is.. but I'll let Louie field the rest of the answer.
    all righty then.

  21. #21
    Phil,

    It would be convienent if symbols could be used in the range settings on NBeGon.

    i.e.

    ./NbeGon_009_x86 -s=SoB.dat -d=2.0 -p=200G-210G &

    instead of

    ./NbeGon_009_x86 -s=SoB.dat -d=2.0 -p=200000000000-210000000000 &

    It would help me do automated execution on my cluster. Thanks.

    -Louie

    *EDIT* - I already found a work around. I don't need this feature, although I think it might be good for people.
    Last edited by jjjjL; 01-24-2003 at 01:15 AM.

  22. #22
    Originally posted by jjjjL
    Phil,
    ./NbeGon_009_x86 -s=SoB.dat -d=2.0 -p=200G-210G &
    Good suggestion. v010 will be out tomorrow. It promises to be nearly 50% faster (that is a 3:2 ratio) than v009. (I made one lousy typo when doing the optimisation yesterday).

    I'm praying that this means that someone will post a 100,000p/s figure, which I think will make me quite happy.

    I'm heading off to HUT's CSC now to compile native versions for every architecture I can find there. So briefly it will run on the 37th most powererful supercomputer in the world!
    Then I'm off to a beer festival. Exactly what happens after that noone knows, but I'm sure Paul can guess.

    What I'll do for the 24 hours between now and release is run 010 on a vast range that someone else has already tested with SoBSieve, so that everything can be double-checked.
    Given that the most factors come out at low P, I'll simply run from 10G upwards, and gather a huge number.

    Phil

  23. #23

    DO NOT use OpenBSD's NbeGon_009_OB

    Originally posted by Alien88
    Phil's client (NbeGon): (OpenBSD p3-500)
    The first bug report is in! However it is so far only reproducable on alien88's OpenBSD's machine with version 009.

    The following I have tested just now and do not have the error spotted:
    OpenBSD 008 (on Alien88's machine)
    Linux 009 (tested on Duron, and PPro)
    Windows 009 (tested on Duron)
    Alpha 009
    Sparc 009(-FP16)

    The difference between 008 and 009 is the replacement of the baby-step integer code with FP equivalent, which means that it _could_ be chip-specific rather than OS/build specific. The OpenBSD code was compiled with gcc 2.95, which is old, and not exactly bug-free, so I hope it's a compiler problem!

    Anyway, if someone can check the following range
    -p=280000900000-280001000000
    for the existance of
    280000989923 | 21181*2^15223028+1
    on either other OpenBSD systems, or on a P3 using the linux/windows code, I'd be grateful. (a P4 verification would be nice too).

    Thanks,
    Phil

  24. #24
    Member
    Join Date
    Sep 2002
    Location
    London
    Posts
    94
    I found that factor using PII500 and win2000 and 009.

    Yours,
    Nuutti

  25. #25
    Found it on PIII 450, (using version 9)

    Cheers,

    ola

  26. #26
    Originally posted by nuutti
    I found that factor using PII500 and win2000 and 009.
    Thanks Nuutti - I've just worked out what the problem is.
    (Literally 30 seconds ago - so yes, I'm not at HUT, I'm debugging instead, but will be at the beer festival - fancy coming? Gallows Bird, Meritulentie 30, in Niittykumpu, http://www.kolumbus.fi/gallows/ra03festival.html).

    It seems that under all x86 OSes apart from OpenBSD the default processor FP precision state is extended precision (80-bits), rather than double precision (64-bits), and I wan't manually setting the state myself. I have hardcoded some magic numbers that assume 80-bits on x86 and 64-bits on other processors, and therefore OpenBSD was b0rking up my rounding completely. When I've worked out what bits to diddle with in the FPU control word, I should be as right as rain.

    Because I'm not at HUT, I'm doing a shorter test 10G-10.1G for 010 (I've finished in 5950s on a 533MHz machine, I'm now waiting for SoBSieve, and we have 90 out of 90 agreement so far :-) ), and I'll get this latest version out later today, _before_ going to the beer festival.

    Phil

  27. #27
    Member
    Join Date
    Sep 2002
    Location
    London
    Posts
    94
    I have to miss but may be some other time during this winter/spring ?
    (I will move to Chicago,IL during next summer)

    Yours,

    Nuutti

  28. #28
    Hi all,

    Okay, here is the latest version. This is more cache-friendly, so it should be faster as it spends less time waiting for the process to get memory into its internal cache; also I have tried to keep the number of instructions to a minimum,which should make it faster. It is certainly faster on this machine.

    Also, I have removed the message box on startup. The range is now in the title bar - thanks for the suggestion!

    Oh, and the rate now shows the number of seconds between removal on a per-k basis - so if there are 10 k's, and we are removing one n every 2 seconds, then the rate is 20 seconds (on average) to remove an n for each k. This is a better measure.

    Regards,

    Paul.
    Attached Files Attached Files
    Last edited by paul.jobling; 01-24-2003 at 10:16 AM.

  29. #29
    Senior Member dmbrubac's Avatar
    Join Date
    Dec 2002
    Location
    Ontario Canada
    Posts
    112
    Look at Analyzing Processor Activity
    for more info on process and thread priority. The combinations and permutations give a total of 31 priority levels.

    Also, before the two threads were split, I responded to Phil about NbeGone and Sob not sharing CPU time well. I found if I ran SoB at normal and NbeGone at Below Normal, they each received 48-49%

    I know it's not recommended to run SoB at Normal, but too bad.

  30. #30
    Originally posted by paul.jobling
    Okay, here is the latest version.
    Snap.

    My 010 has just passed a very lengthy test, and is available from http://fatphil.org/maths/sierpinski/bin/ . One and a half times the speed of 009 on linux and windows, as hoped.

    The dimension parameter will now almost certainly want to be higher - I'm using -d=8.0 for optimal results on my machine.

    That's all for today. Beer has priority over coding...

    Phil

  31. #31
    Member
    Join Date
    Sep 2002
    Location
    London
    Posts
    94
    Nice job !
    I tested range 150,000,000,000 -> 150,001,000,000
    and it took 48 sec using your sieve and 150 sec using Paul's
    Rates : Phil 21,000 and Paul : 9,100

    I have PII 500.


    Nuutti

  32. #32
    Moderator ceselb's Avatar
    Join Date
    Jun 2002
    Location
    Linkoping, Sweden
    Posts
    224
    If anybody need a test range here's one:
    25 - 25.34G. Done by SoBSieve 1.06, so it should be accurate.
    Attached Files Attached Files

  33. #33
    Originally posted by FatPhil
    Snap.

    My 010 has just passed a very lengthy test, and is available from http://fatphil.org/maths/sierpinski/bin/ . One and a half times the speed of 009 on linux and windows, as hoped.

    The dimension parameter will now almost certainly want to be higher - I'm using -d=8.0 for optimal results on my machine.
    On my Athlon XP 1800+, the optimal d parameter is -d=4.5, and there it checks 97000 p/s. I'm sure someone with a faster Athlon will report your 100,000 p/s number.

    Greg

  34. #34
    Originally posted by frmky
    On my Athlon XP 1800+, the optimal d parameter is -d=4.5, and there it checks 97000 p/s. I'm sure someone with a faster Athlon will report your 100,000 p/s number.

    Greg
    I can't believe my P4 2.2G only gets 50875k/s..

  35. #35
    I don't know if everyone saw the other thread where I posted, but this may help phil/paul for verification

    http://www.seventeenorbust.com/sieve/results.txt

    results below 100G not shown to stop it from being too long. it will rebuild itself every night at 3:45am EST.

    It's meant to be a dirty dump from the dB just to make sure there aren't gaps in the sieve, but you could easily reconstruct it into sieve output for testing.

    -Louie

    PS - Athlon 1.33 = 88,000 p/sec with NBeGon 0.10 (d=3.5)

  36. #36
    Just in time to give my pc a last update before flying to KL tomorrow morning.

    My P3 450 MHz gets 19,2K
    FSB=100MHz, 128Mb sdram(100mhz)

    My P4 2400 MHz gets 55.2K
    FSB=533MHZ, 256Mb ddrram (166mhz)

    It seems all P4's score much lower then expected.

    Any idea's why?

  37. #37
    Senior Member
    Join Date
    Jan 2003
    Location
    UK
    Posts
    479
    Original post by frmky
    On my Athlon XP 1800+, the optimal d parameter is -d=4.5, and there it checks 97000 p/s. I'm sure someone with a faster Athlon will report your 100,000 p/s number.
    Phil, we are there!! (frmky thx for the d=4.5)

    AMD XP 2100+, Win2K, NbeGon_010 ...... 112Kp/s

    SoBSieve 1.12 (alpha = 1.0) ..... 43Kp/s

    (both using p=670G)

  38. #38
    Sieve it, baby!
    Join Date
    Nov 2002
    Location
    Potsdam, Germany
    Posts
    959
    My results with NbeGon010:

    1 GHz P3-m: ~63kp/sec d=4.00
    900 MHz Duron: ~46kp/sec d=7.20

    smh:
    Did you tweak the alpha values (separately for each system)? As you can see in my example, the optimal value changes alot between different architectures!

  39. #39
    Member
    Join Date
    Sep 2002
    Location
    London
    Posts
    94
    It would be easier to test correct alpha if program could calculate
    rate directly. Now using clock, paper and pencil is required.

    Yours,

    Nuutti

  40. #40
    Sieve it, baby!
    Join Date
    Nov 2002
    Location
    Potsdam, Germany
    Posts
    959
    Well, I just let it run approx. 1 minute (started at a round value) and look where it stopped (actually, I take another round value somewhere nearby). Then I'll vary the alpha value and note how long it takes to reach that mark (from the same start, of course) - until I find the alpha with the least time needed to get there. With constant d, the time is quite constant, too, if your computer does no other tasks at that time.


    btw.:
    What the hell happened to the memory needs of NbeGon?!??!
    Up until 009 (incl.), it needed more than 20 MByte. Now it fluctuates between 300 KByte and 3 MByte!

Page 1 of 18 1234511 ... LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •