Call for Benchmarks

**TheOtherPhil** · 08-06-2003, 07:24 PM

Originally posted by Grumpy
So close to breaking 20

Yeah, that's what I was aiming for. I'll try a higher o/c tomorrow

**Grumpy** · 08-07-2003, 03:16 AM

If the Fire Brigade aint at ya front door, yor not OCing it enough

**TheOtherPhil** · 08-07-2003, 06:48 PM

Hmmm, deja vu.

Anyway, I managed to break the 20 secs mark with:

Intel P4, 3724MHz, WinXP Pro SP1, 1GB

5.734, 0.500, 19.688, 8.406 (Screenshot)

The system ran the Prime95 torture test for just over an hour before running the DF bench so it seems stable enough at that speed. The problem is the high vcore (1.8V)...it's just too high for my comfort to run 24/7. I have now dropped back to 3.5GHz and 1.65V.

**Grumpy** · 08-10-2003, 02:20 AM

Well, I have discovered that Client Priority of 0 is best under Win 2 K...my Foldtraj went from 65 to 53.5 seconds

**[da'rayven]** · 08-11-2003, 02:01 PM

Client bench doesn't work on the MacOS X client anymore

At least on two machines, the bench always dies with a generic Error 3...

As for the PCs:

Athlon XP @ 2.3GHz, 200FSB, 512MB Dual Channel 11-3-2-2.5 PC3200, Windows XP

Code:

One moment, opening rotamer library...
Predicting secondary structure and generating trajectory distribution...
Folding protein...
Benchmark complete.

Summary
-------
          Usr time  Sys time
          --------  --------
Maketrj      6.609     0.344
Foldtraj    32.297     6.109

Press any key to continue . . .

Athlon XP Palomino @ 1.66GHz, 145FSB, 128MB 5-2-2-2 PC2100, SuSE Linux 8.2

Code:

One moment, opening rotamer library...
Predicting secondary structure and generating trajectory distribution...
Folding protein...
Benchmark complete.

Summary
-------
          Usr time  Sys time
          --------  --------
Maketrj      5.010     0.420
Foldtraj    58.848     7.600

P4 Northwood 2.0GHz @ 2.6GHz, 130FSB, 512MB 4-2-2-2 PC2700, Windows XP

Code:

One moment, opening rotamer library...
Predicting secondary structure and generating trajectory distribution...
Folding protein...
Benchmark complete.

Summary
-------
          Usr time  Sys time
          --------  --------
Maketrj      9.391     0.547
Foldtraj    37.609    11.984

**dtsang** · 08-16-2003, 11:20 AM

Apple,PowerPC G4,466,MacOSX10.1.5,15.600,0.000,172.490,0.000

**[da'rayven]** · 08-16-2003, 11:40 AM

okay, in that case, either it doesn't work on 10.2.x, or it will work with the new client.... I'll try again later

**dtsang** · 08-16-2003, 06:18 PM

Originally posted by [da'rayven]
okay, in that case, either it doesn't work on 10.2.x, or it will work with the new client.... I'll try again later

I have a feeling it will work just fine. The last client did NOT work on my machine with 10.1.5 - it would fail on the trajectory thing (after gen0). It works absolutely perfectly now (with the exception of the native.val mixup).

Let me know if you are unable to get the current client to run on 10.2.x. I have a 10.3 beta installed and could see if it runs on that - if it runs fine on the 10.3 beta, then it should run under Jaguar.

**[da'rayven]** · 08-16-2003, 07:08 PM

The client runs. It's the benchmark that doesn't

I have been folding with my Macs as long as I have been folding

What I'm saying is its a few weeks since I tried, and maybe the new client's bench will work...

**erk** · 08-24-2003, 07:57 AM

Originally posted by TheOtherPhil
Actually Grumpy, I am not convinced that it does. I am estimating that a dual AMD is something like 70% efficient for DF....if that. I'm personally running 4x dual AMD's and a P4 (~19.6GHz). 24/7 power is 2x duals and the P4 (11.8GHz). The part time dual's (~7.8GHz) run ~8hrs a day. All run as a service with useram=1.

My daily output is ~240K/ day. I really should be getting much higher than that I feel with the power I have invested in this project.

I am going to conduct a small test within the next few weeks where I remove the procs from my 2x full time dual's and run them in uni-processor boards for a while. I am expecting to see significantly higher numbers (~+30%).

I am not getting the SMP results either, K7D mothboard with a pair of MP2800+, I tried FreeBSD 2 versions and RedHat 9.0. FreeBSD 4.8-RELEASE was the quickest but not by much:

Usr time Sys time
-------- --------
Maketrj 4.836 0.875
Foldtraj 58.367 3.242

My soltek SL-75FRN2 with XP2600+ and RedHat 9.0:

Usr time Sys time
-------- --------
Maketrj 3.590 0.620
Foldtraj 37.120 11.750

**erk** · 08-30-2003, 10:46 PM

Could someone please explain exactly what the four numbers returned by the benchmark mean?

**dano** · 10-11-2003, 11:52 PM

athlon 64 3200
gigabyte k8vt800pro
256 mb apacer pc3200 cl3

winXP 32bit

Maketrj 5.939, 0.300
Foldtraj 36.663, 5.438

Mandrake linux 64 bit bata

Maketrj 2.210, 0.470
Foldtraj 33.340, 3.880

**Grumpy** · 10-12-2003, 10:54 AM

CL3 Ram

Nice time for ram at that speed...can you get the Ram down to 2.5 and try it ?

And Mandrake 64 Bit seems to be getting some extra juice too, it is a good sign for the 64 Bit Code running 32 Bit Apps at faster speeds

**dano** · 10-12-2003, 12:30 PM

Nice time for ram at that speed...can you get the Ram down to 2.5 and try it ?

I only have an adjustment for the ram voltage in the bios.

The memory controler is intgrated into the cpu so I guess the CAS timing is not adjustable.

Here is the benchmark at 2.2 gig

Maketrj 1.99, 0.480
Foldtraj 30.710, 3.300

**HaloJones** · 10-12-2003, 01:44 PM

I'd love to know a bit more about this benchmark program. TheOtherPhil got sub-20 seconds with his P4 at 3.7GHz. My XP @2400 gets 35 seconds which suggests that the benchmark program reflects pure MHz. Yet my office 2400MHz P4s suck producing much slower than my home Athlons. (I'll benchmark a sample office P4 tomorrow.)

Most crunchers here seem to agree that Athlons are faster that P4s at DF so how come the benchmarks don't reflect that? Is the benchmark representative?

What does it actually mean?

**Grumpy** · 10-12-2003, 06:09 PM

P4 + 800 FSB + 865/875 + HT = Below 20 Seconds

Whether this transfers to real world speed over the Athlons is another question.. it is possible that only the benchmark gets a boost from the above points, I doubt we will ever prove or disprove it

**HaloJones** · 10-13-2003, 03:00 AM

Originally posted by Grumpy
P4 + 800 FSB + 865/875 + HT = Below 20 Seconds

Whether this transfers to real world speed over the Athlons is another question.. it is possible that only the benchmark gets a boost from the above points, I doubt we will ever prove or disprove it

P4 @ 3.7GHz in a phase change cooled computer. Unless DFII uses SSE2 or Netburst, it cannot be able to compute DF as fast per MHz as an Athlon simply due to the number of instructions per clock cycle. I'm not trying to re-start an old argument here but programs have to be specifically written for P4s to take advantage of them. A simple x86 routine is quicker on AMD than Intel.

Perhaps Howard could enlighten us on what the benchmark actually does.

**HaloJones** · 10-13-2003, 07:04 AM

As promised:

Predicting secondary structure and generating trajectory distribution...
Folding protein...
Benchmark complete.

Summary
-------
Usr time Sys time
-------- --------
Maketrj 9.438 0.750
Foldtraj 47.328 15.922

P4-2400 (W2K)

Athlon XP at the same clockspeed does 35 seconds.

**bwkaz** · 10-13-2003, 07:02 PM

Originally posted by HaloJones
Unless DFII uses SSE2 or Netburst, it cannot be able to compute DF as fast per MHz as an Athlon simply due to the number of instructions per clock cycle.

And even if it does use SSE2 (or Netburst? dunno, I'm not familiar with what Netburst is), it still won't be able to compete with the Athlon.

The vast majority of the DF client's time is spent chasing pointers (AKA, doing integer arithmetic on memory addresses), not doing floating-point stuff. That's why the current client doesn't even use SSE (and may not use MMX, either) -- there's simply nothing to be gained from it, because that's not where the code hot-spots are.

**Brian the Fist** · 10-14-2003, 12:24 PM

I think I've mentioned before but the benchmark builds a .trj file (trajectory distribution) for one particular sequence, and then builds 100 structures of it (like gen. 0). The protein is always the same, regardless of what protein we are working on. The random seed is fixed as well, so the procedure is completely deterministic (will always make the same 100 structures). Unfortunately this doesn't hold true across different operating systems as the floating point rounding error seems to vary on different platforms which in turn influences the sequence of events.

Thus it should reflect well the performance of the actual client in most cases.

**TheOtherPhil** · 10-18-2003, 08:54 AM

Mike, the P4 I was using was running a 266fsb (1064MHz Quad Pumped) with the RAM at very aggressive timings (~6GB/s mem bandwidth Sandra Bench). Clock for clock the Athlon may be faster but the P4 in question has a 1.3GHz Clock speed advantage over your 533fsb 2.4's and almost double the effective FSB.

FWIW, the P4 chewed through DF extremely fast and pretty much equalled my dual barton's at 2.3GHz in output.

**Grumpy** · 10-18-2003, 06:23 PM

Yer, the OCed PIV running above 800 FSB is ugly

And TheOtherPhil, your Signature is scaring the children, damn snoop coder

**HaloJones** · 10-19-2003, 07:35 AM

My office P4s are almost certainly 400MHz FSB since they are "cheap" and nasty office-use Compaq Evos.

Off-topic: why do businesses allow themselves to get so badly ripped off by the big manufacturers? I'm all for buying suoer-stable machines but do they need to be s o s l o w?

**dtsang** · 10-19-2003, 11:58 AM

Does anybody here have a Power Mac G5? I would love to see how one of those performed, cause my G4 just plain sucks in dfold.

**[veix]** · 10-20-2003, 04:03 AM

Not making the best best perfomance or getting under 20sec, but a interesting numbers imo.
Pentium-M "Centrino" 1,4Ghz WinXP Home SP1 256MB DDR266
Maketrj 7.571 0.651
Foldtraj 44.304 10.715
Wonder if it is possible to run that cpu on desktop motherboard :P

**erk** · 10-20-2003, 06:08 PM

Originally posted by [veix]
Not making the best best perfomance or getting under 20sec, but a interesting numbers imo.
Pentium-M "Centrino" 1,4Ghz WinXP Home SP1 256MB DDR266
Maketrj 7.571 0.651
Foldtraj 44.304 10.715
Wonder if it is possible to run that cpu on desktop motherboard :P

There are Mini-ITX motherboards coming out for it.

http://www.lippert-at.com/miniitx.html

**matitaccia** · 11-17-2003, 05:42 PM

p4m 1,8GHz, 512mb, WinXp Sp1

Summary
-------
Usr time Sys time
-------- --------
Maketrj 17.085 0.781
Foldtraj 64.803 18.166

Sum = 64.803 + 18.166 = 82.969

Didn't know what was better to do... have tried to write everything I had.
EhEH...

Ciao!

**Xelas** · 11-21-2003, 12:19 PM

P4C 2.4 @ 3.0, (800 MHz FSB oc'd -> 1000)
512 MB PC3200 CL3 noname RAM (Samsung chips)

One moment, opening rotamer library...
Predicting secondary structure and generating trajectory distribution...
Folding protein...
Benchmark complete.

Summary
-------
Usr time Sys time
-------- --------
Maketrj 7.156 0.484
Foldtraj 31.156 8.563

This result is with memory running synchroniously (250 MHz DDR = 500 MHz, 1 GHz FSB) but with very loose timings 3-4-3-6. Even so, my mem voltage is at 2.95 volts. PAS is set to "Ultra Turbo" (fastest).

With Memory running asynch at native PC3200 speeds (200 MHz DDR = 400 MHz, 1 GHz FSB) I can set timings to 2-2-2-5, but the machine runs a tad slower, giving something around USR = 33 SYS = 8.7

System is Win XP Pro.

**yujen** · 12-18-2003, 11:33 PM

Running on an AMD Opteron 240 with 4gb ECC/Registered DDR333
Linux 2.6.0 SMP 32-bit NUMA Optimised

---
One moment, opening rotamer library...
Predicting secondary structure and generating trajectory distribution...
Folding protein...
Benchmark complete.

Summary
-------
Usr time Sys time
-------- --------
Maketrj 4.250 0.850
Foldtraj 36.880 11.280

---

If would appreciate if anyone knows how to configure the benchmark to run on 2 processors simultaenously.

I ran 2 benchmarks in 2 separate windows "almost" simultaneously (press enter, switch to another window, press enter again) I achieve roughly the same output as above.

**Grumpy** · 12-19-2003, 06:20 AM

The way you describe is how I do it. Just swap windows and run the second

What Client was that tested on. The regular or one of the Test Clients. It is very fast for a 140. Almost as fast as a 3200 Barton @ 200 FSB

Damn, I am saving up now, forget the Athlon64 3000, I want a Opteron Duallie after all

O yeah, is it the Iwill MB, and what video card is it running etc etc

**Brian the Fist** · 12-19-2003, 10:45 AM

Please note that for the recent beta clients, and the new one being released now (as indicated in the whatsnew.txt), the benchmark is no longer comparable to past benchmarks, due to the changes made to the algorithm. Interestingly, the new benchmark can show how much the algorithm has been sped up compared to the old algorithm. Please don't base hardware decisions on old vs. new benchmarks therefore

**Grumpy** · 12-19-2003, 04:23 PM

Yeah, that is why I asked for the Client Version. But it would have to have been done with the 108 I imagine, so 37 is very fast for the 140 all the same. It appears Linux 64 is running the Client a lot faster than Linux 32 Bit, even without a recompile

Umm, what precision are the numbers running at with the Client Howard...64, 72

**yujen** · 12-19-2003, 08:36 PM

Ooops, my apologies, forgot to mention thats using the new client so yeah, the numbers aren't that great... I'm more interested in how well it scales in SMP for NUMA vs non-NUMA which is why I asked if theres a better way to run 2 clients simultaneously.

**Grumpy** · 12-19-2003, 10:58 PM

I get 36 seconds for Foldtraj with the new updated Client on a NF2 MB and Barton @ 2275 Mhz, so if the 240 1.4 Ghz Opteron gets close to this, I am very very impressed with your configuration

**yujen** · 12-19-2003, 11:08 PM

Originally posted by Grumpy
The way you describe is how I do it. Just swap windows and run the second

What Client was that tested on. The regular or one of the Test Clients. It is very fast for a 140. Almost as fast as a 3200 Barton @ 200 FSB

Damn, I am saving up now, forget the Athlon64 3000, I want a Opteron Duallie after all

O yeah, is it the Iwill MB, and what video card is it running etc etc

ya, thats using the Iwill DK8SL with on-board ATi RageXL video... its not the DK8X workstation board unfortunately.

I've noticed quite a bit of performance improvement 2 days ago when I switched to 2.6.0 NUMA optimised... I think the difference is NUMA vs non-NUMA on the Opteron since the client is compiled in 32-bit so using a 64-bit kernel won't net any real benefit unless the calls to system libraries benefit from 64-bit in some way

Since each client uses up to 150mb of RAM, then theres real benefits to be had with NUMA

**Grumpy** · 12-19-2003, 11:13 PM

If the numbers being crunched are 64 bit precision, then it will make a heck of a difference as it can run it native and noyt have to emulate 64 bit precision

So it is a dual 240 system and is the MB have the ram shared so cpu 2 goes through cpu1 for memory ?

**yujen** · 12-19-2003, 11:36 PM

No, it has 8 ram slots, 4 per processors in a 4+4 configuration

(Iwill doesn't make castrated motherboards in a 4+0 configuration)

I use 4 sticks of 1GB ECC/Registered DDR333, so 2 sticks per processor with both processors running in 128-bit memory path.

If CPU1 has to go through CPU0 for ram then NUMA optimisations means sqat

True if the client uses double precision floating point computations, then compiling for x86-64 "may" see quite a significant improvement... it comes down to 32-bit vs 64-bit although if the Ultrasparc numbers aren't anything to write home about, going to 64-bit may not be an improvement if any.... you'll probably have to hack the code somewhere since a "double" on a 64-bit architecture means 128-bit precision if your only aiming for 64-bit then you're doing more work than you need to.

**Grumpy** · 12-19-2003, 11:40 PM

Hmmm, the last Opteron 140 benchmark someone post my Barton was 32 seconds and the 140 46 seconds in the Fold Benchmark

**Thor** · 12-20-2003, 05:18 AM

These are the benchmarks for the [b]new[/] client on a P4 2.66Ghz with 512MB PC1066 Rambus and Win2K:

One moment, opening rotamer library...
Predicting secondary structure and generating trajectory distribution...
Folding protein...
Benchmark complete.

Summary
-------
Usr time Sys time
-------- --------
Maketrj 8.582 0.340
Foldtraj 34.329 9.113

I think thats a pretty good score...

can anybody else post some new benchmarks?

Greets thor

**Grumpy** · 12-20-2003, 06:28 AM

Here is my Dual 2400 MP

Summary
-------
Usr time Sys time
-------- --------
Maketrj 9.266 0.484
Foldtraj 59.859 11.875

And my AMD Barton @ 2275 Mhz

Summary
-------
Usr time Sys time
-------- --------
Maketrj 6.156 0.234
Foldtraj 36.578 6.703

Thread: Call for Benchmarks

Thread Tools

Rate This Thread

Display

Re: Call for Benchmarks

G5 benchmarks?

Posting Permissions