Results 1 to 26 of 26

Thread: Opteron DF performance?

  1. #1
    Senior Member
    Join Date
    Jan 2002
    Location
    England, near Europe
    Posts
    211

    Opteron DF performance?

    Has anybody ran DF on an Opteron system yet? Has anybody seen word of the performance elsewhere?

    Cheers,
    Phil.

  2. #2
    OCworkbench Stats Ho
    Join Date
    Jan 2003
    Posts
    519
    amdzone.com has some Seti@home benchmarks just up
    I am not a Stats Ho, it is just more satisfying to see that my numbers are better than yours.

  3. #3
    Spending finger getting itchy again, Phil?

  4. #4
    Senior Member
    Join Date
    Jul 2002
    Location
    Kodiak, Alaska
    Posts
    432
    I noticed links to the Seti benchmarks over the last few days; and they had similar results with 64 bit code as we've had here; i.e. the 64 bit client ended up being slower than the 32 bit client. I'll be nice to see the benchmark differences between the 32 clients running Opteron/Athlon64 vs normal Athlon/p4, though.

  5. #5
    Senior Member
    Join Date
    Jan 2002
    Location
    England, near Europe
    Posts
    211
    Originally posted by HaloJones
    Spending finger getting itchy again, Phil?

    Just evaluating potential future purchases . TBH, the Opteron hasn't impressed me with it's benchmarks so far, yet Intel seems to be pulling further and further ahead. My current P4 absolutely rocks and I am probably going to go the Xeon route when Intel implement the 800MHz fsb for that platform.
    Train hard, fight easy


  6. #6
    Target Butt IronBits's Avatar
    Join Date
    Dec 2001
    Location
    Morrisville, NC
    Posts
    8,619
    Howard did mention he was gonna look at it
    Might be a client for it in say ohhhh, about a month would be perfect timing I would say

  7. #7
    OCworkbench Stats Ho
    Join Date
    Jan 2003
    Posts
    519
    Psssst, anyone visited http://www.ocworkbench.com lately
    I am not a Stats Ho, it is just more satisfying to see that my numbers are better than yours.

  8. #8
    Senior Member
    Join Date
    Apr 2002
    Location
    Oosterhout, Netherlands
    Posts
    223
    I have a dual Opteron 242 (1,6GHz) running here and yesterday I benchmarked the ECC2 and the distributed.net clients.

    Both turned out as being equal to running 2 XP2000+

    As long as there are no optimized client you can say that the number of GHz of Opterons equals the number of GHz of Athlon XP's
    Proud member of the Dutch Power Cows

  9. #9
    Senior Member
    Join Date
    Jan 2002
    Location
    England, near Europe
    Posts
    211
    Both RC5 and ECC2 are clockspeed dependant. You can directly compare a duron to an athlon with those clients.

    DF on the other hand should respond well to the Opteron's extra L2 cache and on-die memory controller. It'd be great if you could run the DF bench for me. This will help my next purchase decision.

    Cheers.
    Train hard, fight easy


  10. #10
    OCworkbench Stats Ho
    Join Date
    Jan 2003
    Posts
    519
    usrtime systtime
    6.65 0.156
    32.5 5.141

    This may be from an AMD64 3100XP, or it may not It may be an untweaked reference MB running at DDR400 default ram timings.



    I am not a Stats Ho, it is just more satisfying to see that my numbers are better than yours.

  11. #11
    Senior Member
    Join Date
    Jan 2002
    Location
    England, near Europe
    Posts
    211
    That looks similar to ~2.3GHz AXP. Not too bad but not as fast as I was expecting.
    Train hard, fight easy


  12. #12
    OCworkbench Stats Ho
    Join Date
    Jan 2003
    Posts
    519
    I think an Opteron 144 would be as fast, and I dint know how much cache is in this cpu, all will be revealed as I suspect a Review is in the making..still waiting for Opteron benches
    I am not a Stats Ho, it is just more satisfying to see that my numbers are better than yours.

  13. #13
    OCworkbench Stats Ho
    Join Date
    Jan 2003
    Posts
    519
    THe 940 Pin AMD64 with 1 meg cache will be a good bit faster
    I am not a Stats Ho, it is just more satisfying to see that my numbers are better than yours.

  14. #14
    If the DF client runs "twice as fast" with the -rt switch using up to 150MB of RAM, I would think maybe the 4+ GB of RAM potential could be utilized to spped things further. Not to mention the extremly low latencies of the integrated mem controller.

    If DF could truly become multi-threaded I think you would see a much more significant boost for Opterons in Dual or higher configs. As it stants though I think all advantages are boing overcome by the low clock speed. (thank god for low IPC)

    Wasn't there a stage or two of pipeline added too? If so AMD64 chips may hit their stride at closer to 2.5GHz due to speculative error penalties. (kinda like the P4 not really stacking up until it hit 2.2-2.4 GHz.)

    And just for those who like to ride last year's pony - I saw today 1.2GHz Athlon MP's are $55 on pricewatch. So for the price of one 242 Opteron you could have two dualies totaling almost 5GHz! I love bleeding edge tech as much as the next guy but sometimes the argument just comes down to GHz per $$$.

    Lastly, the "big" A64 with 1MB of cache essentially IS an opteron from everything I can see. It just won't be Dual-capable (maybe).

    Me I want to stick an Opteron in a freezer with an insane core voltage and see just how mad-scientist I gan get.
    -Self Denying Stat Ho-

  15. #15
    Originally posted by AMDPHREAK
    If the DF client runs "twice as fast" with the -rt switch using up to 150MB of RAM, I would think maybe the 4+ GB of RAM potential could be utilized to spped things further. Not to mention the extremly low latencies of the integrated mem controller.
    I dont think so because there is nothing more to put in the memory ...
    Alls files that need to be cached are at most 80 mb.
    The German DC Community : Team Rechenkraft.net - Join now ! Rechenkraft.net

  16. #16
    Member
    Join Date
    May 2003
    Location
    Portland, OR USA
    Posts
    79
    I dont think so because there is nothing more to put in the memory ... Alls files that need to be cached are at most 80 mb.
    Actually, I think that's a different chunk of storage you're considering. I don't have much Computer Science training, so I'll have to put this in more verbose layman's terms:

    The -rt switch determines whether the folding engine uses a smaller or a larger block of your system's RAM to set-up arrays (think of them as stacks of spreadsheets for the moment, though I'm pretty sure we're talking about more than two or three dimensions of data here) of data for computation on each generation. If you give DF a large block of RAM, it will set-up a cubic array as a tall stack of grids of data. If you select a smaller RAM footprint, it will hold in memory only the fewest number of grids of data possible to complete the computation. In the large memory model, the computing engine can set-up the data arrays once and then start calculating. In the small memory model, the computer must offload the old grids of data regularly to make room for newer info. This housekeeping takes extra time.

    The proposal to take advantage of larger memory models (continuing my oversimplification) might keep the past couple of generations' data cubes in memory to aid somehow in calculation of the next generation of data.

    Memory chips are much cheaper than they ever have been before. This condition allows programmers to do one of three things: 1) add more cup-holders, blinky-lights, bells and whistles to existing code (see M$oft), 2) perform less problem simplification and data reduction in order to spend more time on actual computation of complex systems (see the -rt switch), or 3) ignore the extra resources and let the user run more tasks.

    If the folding algorithm can be made faster by keeping still more data in memory than it does today, it would be nice to see a -XRT switch that uses more memory (where available) in order to fold more protiens per hour.
    -djp
    I'm not a Stats Ho either. I just want to go and check to see that all my spare boxen are busy. Hang on a minute....

  17. #17
    djp, the reason why the -rt switch uses more memory and speeds up the folding client is because it reduces the need to read from disk.

    Without the -rt switch, the client needs to read the protein.trj file each time it works on a protein and this takes time since reading from the hard drive or a USB memory stick or CD or whatever is much slower than RAM.

    Using the -rt switch loads the protein.trj file into memory so it no longer needs to read this data from disk each time it works on a protein.

    That is why the readme says up to 150 MB of RAM because the size of the proteins the DF project is expecting to work on won't exceed that limit with the protein.trj file.

    Hope that helps explain why using more memory wouldn't speed things up the way the client is currently designed.

    Jeff.

  18. #18
    These benches were done by an OCAU member in early july, i was hoping somone else would post them here.



    Code:
    processor	: 0 and 1
    vendor_id	: AuthenticAMD
    cpu family	: 15
    model		: 5
    model name	: AMD Opteron(TM) 64 Processor 242
    stepping	: 0
    cpu MHz		: 1593.811
    cache size	: 1024 KB
    bogomips	: 3178.49

    Code:
    # > top
    10:23pm  up  5:37,  4 users,  load average: 1.32, 1.36, 1.28
    64 processes: 61 sleeping, 3 running, 0 zombie, 0 stopped
    CPU0 states: 99.1% user,  0.0% system, 98.0% nice,  0.1% idle
    CPU1 states: 99.0% user,  0.0% system, 99.0% nice,  0.1% idle
    Mem:  8071096K av, 2253444K used, 5817652K free,       0K shrd,   35476K buff
    Swap: 16779852K av,       0K used, 16779852K free                 1861148K cached
      PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
    10428 user      39  19 88644  86M  1836 R N  99.9  1.0  30:37 foldtrajlite
    10754 user      39  19 87916  85M  1784 R N  98.8  1.0   0:16 foldtrajlite
    10756 user      15   0  1044 1044   772 R     0.9  0.0   0:00 top
    
    #1 > foldtrajlite -bench
    One moment, opening rotamer library...
    Predicting secondary structure and generating trajectory distribution...
    Folding protein...
    Benchmark complete.
    Summary
    -------
              Usr time  Sys time
              --------  --------
    Maketrj      2.720     0.540
    Foldtraj    37.310     4.300
    
    #2 > foldtrajlite -bench
    <snipped>
              Usr time  Sys time
              --------  --------
    Maketrj      2.960     0.360
    Foldtraj    36.780     4.780
    
    #3 > foldtrajlite -bench
    <snipped>
              Usr time  Sys time
              --------  --------
    Maketrj      2.810     0.490
    Foldtraj    37.230     4.400
    
    # > uname -a
    # > cat /etc/UnitedLinux-release
    Linux servername 2.4.19-SMP #1 SMP Wed Feb 12 18:42:27 UTC 2003 x86_64 unknown
    UnitedLinux 1.0 (x86_64)
    VERSION = 1.0
    An experimental spare box, for only 2 weeks tho
    Unfortunately DF is compiled for a 32 bit proc under Linux.
    So 32 bit execution is emulated. Not fully optimized yet.
    Unmodded, bios says ~1.56v for each CPU, ~70deg C max on load in a 1RU.

  19. #19
    OCworkbench Stats Ho
    Join Date
    Jan 2003
    Posts
    519
    Look at those scores

    Look at the results our Dual MPs get at 2 Ghz or higher and lookat the dual Opteron..I am speechless. I am phoning my bank Manager right now for a loan ang getting me a dual 246 System


    Thanks for the info
    I am not a Stats Ho, it is just more satisfying to see that my numbers are better than yours.

  20. #20
    25/25Mbit is nearly enough :p pointwood's Avatar
    Join Date
    Dec 2001
    Location
    Denmark
    Posts
    831
    Pointwood
    Jabber ID: pointwood@jabber.shd.dk
    irc.arstechnica.com, #distributed

  21. #21
    OCworkbench Stats Ho
    Join Date
    Jan 2003
    Posts
    519
    Well, the truth of those benchmarks is the Opteron 140 is faster than a MP2400 @ 2 Ghz. So for those doing DC on Duallies, it is a far superior setup. Value for money,well, not right now
    I am not a Stats Ho, it is just more satisfying to see that my numbers are better than yours.

  22. #22
    how to interpret those benchmarks ? I have a 142 at 1.8GHz.........

  23. #23
    25/25Mbit is nearly enough :p pointwood's Avatar
    Join Date
    Dec 2001
    Location
    Denmark
    Posts
    831
    All I can remember is that lower numbers are better. Some of the numbers are much more important than the others, can't remember which though
    Pointwood
    Jabber ID: pointwood@jabber.shd.dk
    irc.arstechnica.com, #distributed

  24. #24
    Senior Member
    Join Date
    Jan 2003
    Location
    North Carolina
    Posts
    184
    Originally posted by Hua Luo Han
    how to interpret those benchmarks ? I have a 142 at 1.8GHz.........
    Ignore the Maketrj numbers. Add the two Foldtraj numbers together. That sum tells how long it takes to fold a standard piece of work.

  25. #25
    Is this okay ?
    Attached Images Attached Images

  26. #26
    25/25Mbit is nearly enough :p pointwood's Avatar
    Join Date
    Dec 2001
    Location
    Denmark
    Posts
    831
    That's not bad, if I understand it correctly

    Compare it to the 1.4Ghz Opteron I got access to:
    Code:
    Summary
    -------
              Usr time  Sys time
              --------  --------
    Maketrj      4.740     0.670
    Foldtraj    47.440     8.750
    Pointwood
    Jabber ID: pointwood@jabber.shd.dk
    irc.arstechnica.com, #distributed

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •