Opteron DF performance?

**TheOtherPhil** · 07-21-2003, 07:15 AM

Has anybody ran DF on an Opteron system yet? Has anybody seen word of the performance elsewhere?

Cheers,
Phil.

**Grumpy** · 07-22-2003, 01:43 PM

amdzone.com has some Seti@home benchmarks just up

**HaloJones** · 07-22-2003, 02:10 PM

Spending finger getting itchy again, Phil?

**tpdooley** · 07-22-2003, 03:58 PM

I noticed links to the Seti benchmarks over the last few days; and they had similar results with 64 bit code as we've had here; i.e. the 64 bit client ended up being slower than the 32 bit client. I'll be nice to see the benchmark differences between the 32 clients running Opteron/Athlon64 vs normal Athlon/p4, though.

**TheOtherPhil** · 07-22-2003, 05:23 PM

Originally posted by HaloJones
Spending finger getting itchy again, Phil?

Just evaluating potential future purchases

. TBH, the Opteron hasn't impressed me with it's benchmarks so far, yet Intel seems to be pulling further and further ahead. My current P4 absolutely rocks and I am probably going to go the Xeon route when Intel implement the 800MHz fsb for that platform.

**IronBits** · 07-24-2003, 12:59 AM

Howard did mention he was gonna look at it

Might be a client for it in say ohhhh, about a month would be perfect timing I would say

**Grumpy** · 08-01-2003, 05:08 AM

Psssst, anyone visited http://www.ocworkbench.com lately

**[DPC]Mobster** · 08-01-2003, 08:26 AM

I have a dual Opteron 242 (1,6GHz) running here and yesterday I benchmarked the ECC2 and the distributed.net clients.

Both turned out as being equal to running 2 XP2000+

As long as there are no optimized client you can say that the number of GHz of Opterons equals the number of GHz of Athlon XP's

**TheOtherPhil** · 08-02-2003, 06:51 AM

Both RC5 and ECC2 are clockspeed dependant. You can directly compare a duron to an athlon with those clients.

DF on the other hand should respond well to the Opteron's extra L2 cache and on-die memory controller. It'd be great if you could run the DF bench for me. This will help my next purchase decision.

Cheers.

**Grumpy** · 08-02-2003, 08:38 AM

usrtime systtime
6.65 0.156
32.5 5.141

This may be from an AMD64 3100XP, or it may not

It may be an untweaked reference MB running at DDR400 default ram timings.

**TheOtherPhil** · 08-02-2003, 01:59 PM

That looks similar to ~2.3GHz AXP. Not too bad but not as fast as I was expecting.

**Grumpy** · 08-03-2003, 02:57 AM

I think an Opteron 144 would be as fast, and I dint know how much cache is in this cpu, all will be revealed as I suspect a Review is in the making..still waiting for Opteron benches

**Grumpy** · 08-04-2003, 02:02 AM

THe 940 Pin AMD64 with 1 meg cache will be a good bit faster

**AMDPHREAK** · 08-05-2003, 05:49 AM

If the DF client runs "twice as fast" with the -rt switch using up to 150MB of RAM, I would think maybe the 4+ GB of RAM potential could be utilized to spped things further. Not to mention the extremly low latencies of the integrated mem controller.

If DF could truly become multi-threaded I think you would see a much more significant boost for Opterons in Dual or higher configs. As it stants though I think all advantages are boing overcome by the low clock speed. (thank god for low IPC)

Wasn't there a stage or two of pipeline added too? If so AMD64 chips may hit their stride at closer to 2.5GHz due to speculative error penalties. (kinda like the P4 not really stacking up until it hit 2.2-2.4 GHz.)

And just for those who like to ride last year's pony - I saw today 1.2GHz Athlon MP's are $55 on pricewatch. So for the price of one 242 Opteron you could have two dualies totaling almost 5GHz! I love bleeding edge tech as much as the next guy but sometimes the argument just comes down to GHz per $$$.

Lastly, the "big" A64 with 1MB of cache essentially IS an opteron from everything I can see. It just won't be Dual-capable (maybe).

Me I want to stick an Opteron in a freezer with an insane core voltage and see just how mad-scientist I gan get.

**Keller** · 08-05-2003, 06:05 AM

Originally posted by AMDPHREAK
If the DF client runs "twice as fast" with the -rt switch using up to 150MB of RAM, I would think maybe the 4+ GB of RAM potential could be utilized to spped things further. Not to mention the extremly low latencies of the integrated mem controller.

I dont think so because there is nothing more to put in the memory ...
Alls files that need to be cached are at most 80 mb.

**djp** · 08-08-2003, 10:01 AM

I dont think so because there is nothing more to put in the memory ... Alls files that need to be cached are at most 80 mb.

Actually, I think that's a different chunk of storage you're considering. I don't have much Computer Science training, so I'll have to put this in more verbose layman's terms:

The -rt switch determines whether the folding engine uses a smaller or a larger block of your system's RAM to set-up arrays (think of them as stacks of spreadsheets for the moment, though I'm pretty sure we're talking about more than two or three dimensions of data here) of data for computation on each generation. If you give DF a large block of RAM, it will set-up a cubic array as a tall stack of grids of data. If you select a smaller RAM footprint, it will hold in memory only the fewest number of grids of data possible to complete the computation. In the large memory model, the computing engine can set-up the data arrays once and then start calculating. In the small memory model, the computer must offload the old grids of data regularly to make room for newer info. This housekeeping takes extra time.

The proposal to take advantage of larger memory models (continuing my oversimplification) might keep the past couple of generations' data cubes in memory to aid somehow in calculation of the next generation of data.

Memory chips are much cheaper than they ever have been before. This condition allows programmers to do one of three things: 1) add more cup-holders, blinky-lights, bells and whistles to existing code (see M$oft), 2) perform less problem simplification and data reduction in order to spend more time on actual computation of complex systems (see the -rt switch), or 3) ignore the extra resources and let the user run more tasks.

If the folding algorithm can be made faster by keeping still more data in memory than it does today, it would be nice to see a -XRT switch that uses more memory (where available) in order to fold more protiens per hour.

**Digital Parasite** · 08-08-2003, 10:38 AM

djp, the reason why the -rt switch uses more memory and speeds up the folding client is because it reduces the need to read from disk.

Without the -rt switch, the client needs to read the protein.trj file each time it works on a protein and this takes time since reading from the hard drive or a USB memory stick or CD or whatever is much slower than RAM.

Using the -rt switch loads the protein.trj file into memory so it no longer needs to read this data from disk each time it works on a protein.

That is why the readme says up to 150 MB of RAM because the size of the proteins the DF project is expecting to work on won't exceed that limit with the protein.trj file.

Hope that helps explain why using more memory wouldn't speed things up the way the client is currently designed.

Jeff.

**[ocau]Doc** · 08-09-2003, 10:06 PM

These benches were done by an OCAU member in early july, i was hoping somone else would post them here.

Code:

processor	: 0 and 1
vendor_id	: AuthenticAMD
cpu family	: 15
model		: 5
model name	: AMD Opteron(TM) 64 Processor 242
stepping	: 0
cpu MHz		: 1593.811
cache size	: 1024 KB
bogomips	: 3178.49

Code:

# > top
10:23pm  up  5:37,  4 users,  load average: 1.32, 1.36, 1.28
64 processes: 61 sleeping, 3 running, 0 zombie, 0 stopped
CPU0 states: 99.1% user,  0.0% system, 98.0% nice,  0.1% idle
CPU1 states: 99.0% user,  0.0% system, 99.0% nice,  0.1% idle
Mem:  8071096K av, 2253444K used, 5817652K free,       0K shrd,   35476K buff
Swap: 16779852K av,       0K used, 16779852K free                 1861148K cached
  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
10428 user      39  19 88644  86M  1836 R N  99.9  1.0  30:37 foldtrajlite
10754 user      39  19 87916  85M  1784 R N  98.8  1.0   0:16 foldtrajlite
10756 user      15   0  1044 1044   772 R     0.9  0.0   0:00 top

#1 > foldtrajlite -bench
One moment, opening rotamer library...
Predicting secondary structure and generating trajectory distribution...
Folding protein...
Benchmark complete.
Summary
-------
          Usr time  Sys time
          --------  --------
Maketrj      2.720     0.540
Foldtraj    37.310     4.300

#2 > foldtrajlite -bench
<snipped>
          Usr time  Sys time
          --------  --------
Maketrj      2.960     0.360
Foldtraj    36.780     4.780

#3 > foldtrajlite -bench
<snipped>
          Usr time  Sys time
          --------  --------
Maketrj      2.810     0.490
Foldtraj    37.230     4.400

# > uname -a
# > cat /etc/UnitedLinux-release
Linux servername 2.4.19-SMP #1 SMP Wed Feb 12 18:42:27 UTC 2003 x86_64 unknown
UnitedLinux 1.0 (x86_64)
VERSION = 1.0

An experimental spare box, for only 2 weeks tho

Unfortunately DF is compiled for a 32 bit proc under Linux.
So 32 bit execution is emulated. Not fully optimized yet.

Unmodded, bios says ~1.56v for each CPU, ~70deg C max on load in a 1RU.

**Grumpy** · 08-10-2003, 02:06 AM

Look at those scores

Look at the results our Dual MPs get at 2 Ghz or higher and lookat the dual Opteron..I am speechless. I am phoning my bank Manager right now for a loan ang getting me a dual 246 System

Thanks for the info

**pointwood** · 09-05-2003, 08:27 AM

http://arstechnica.infopop.net/OpenT...385#6010966385

**Grumpy** · 09-06-2003, 05:19 AM

Well, the truth of those benchmarks is the Opteron 140 is faster than a MP2400 @ 2 Ghz. So for those doing DC on Duallies, it is a far superior setup. Value for money,well, not right now

**Hua Luo Han** · 09-15-2003, 06:59 AM

how to interpret those benchmarks ?

I have a 142 at 1.8GHz.........

**pointwood** · 09-15-2003, 07:05 AM

All I can remember is that lower numbers are better. Some of the numbers are much more important than the others, can't remember which though

**AMD_is_logical** · 09-15-2003, 07:51 AM

Originally posted by Hua Luo Han
how to interpret those benchmarks ? I have a 142 at 1.8GHz.........

Ignore the Maketrj numbers. Add the two Foldtraj numbers together. That sum tells how long it takes to fold a standard piece of work.

**Hua Luo Han** · 09-15-2003, 08:23 AM

Is this okay ?

**pointwood** · 09-15-2003, 08:40 AM

That's not bad, if I understand it correctly

Compare it to the 1.4Ghz Opteron I got access to:

Code:

Summary
-------
          Usr time  Sys time
          --------  --------
Maketrj      4.740     0.670
Foldtraj    47.440     8.750

Thread: Opteron DF performance?

Thread Tools

Rate This Thread

Display

Opteron DF performance?

Posting Permissions