If the Fire Brigade aint at ya front door, yor not OCing it enough
Originally posted by Grumpy
So close to breaking 20
Yeah, that's what I was aiming for. I'll try a higher o/c tomorrow
Train hard, fight easy
If the Fire Brigade aint at ya front door, yor not OCing it enough
I am not a Stats Ho, it is just more satisfying to see that my numbers are better than yours.
Hmmm, deja vu.
Anyway, I managed to break the 20 secs mark with:
Intel P4, 3724MHz, WinXP Pro SP1, 1GB
5.734, 0.500, 19.688, 8.406 (Screenshot)
The system ran the Prime95 torture test for just over an hour before running the DF bench so it seems stable enough at that speed. The problem is the high vcore (1.8V)...it's just too high for my comfort to run 24/7. I have now dropped back to 3.5GHz and 1.65V.
Train hard, fight easy
Well, I have discovered that Client Priority of 0 is best under Win 2 K...my Foldtraj went from 65 to 53.5 seconds
I am not a Stats Ho, it is just more satisfying to see that my numbers are better than yours.
Client bench doesn't work on the MacOS X client anymore
At least on two machines, the bench always dies with a generic Error 3...
As for the PCs:
Athlon XP @ 2.3GHz, 200FSB, 512MB Dual Channel 11-3-2-2.5 PC3200, Windows XP
Athlon XP Palomino @ 1.66GHz, 145FSB, 128MB 5-2-2-2 PC2100, SuSE Linux 8.2Code:One moment, opening rotamer library... Predicting secondary structure and generating trajectory distribution... Folding protein... Benchmark complete. Summary ------- Usr time Sys time -------- -------- Maketrj 6.609 0.344 Foldtraj 32.297 6.109 Press any key to continue . . .
P4 Northwood 2.0GHz @ 2.6GHz, 130FSB, 512MB 4-2-2-2 PC2700, Windows XPCode:One moment, opening rotamer library... Predicting secondary structure and generating trajectory distribution... Folding protein... Benchmark complete. Summary ------- Usr time Sys time -------- -------- Maketrj 5.010 0.420 Foldtraj 58.848 7.600
Code:One moment, opening rotamer library... Predicting secondary structure and generating trajectory distribution... Folding protein... Benchmark complete. Summary ------- Usr time Sys time -------- -------- Maketrj 9.391 0.547 Foldtraj 37.609 11.984
Proud member of the OCWorkbench Distributed Folding team
Apple,PowerPC G4,466,MacOSX10.1.5,15.600,0.000,172.490,0.000
Derek
okay, in that case, either it doesn't work on 10.2.x, or it will work with the new client.... I'll try again later
Proud member of the OCWorkbench Distributed Folding team
I have a feeling it will work just fine. The last client did NOT work on my machine with 10.1.5 - it would fail on the trajectory thing (after gen0). It works absolutely perfectly now (with the exception of the native.val mixup).Originally posted by [da'rayven]
okay, in that case, either it doesn't work on 10.2.x, or it will work with the new client.... I'll try again later
Let me know if you are unable to get the current client to run on 10.2.x. I have a 10.3 beta installed and could see if it runs on that - if it runs fine on the 10.3 beta, then it should run under Jaguar.
Derek
The client runs. It's the benchmark that doesn't I have been folding with my Macs as long as I have been folding What I'm saying is its a few weeks since I tried, and maybe the new client's bench will work...
Proud member of the OCWorkbench Distributed Folding team
I am not getting the SMP results either, K7D mothboard with a pair of MP2800+, I tried FreeBSD 2 versions and RedHat 9.0. FreeBSD 4.8-RELEASE was the quickest but not by much:Originally posted by TheOtherPhil
Actually Grumpy, I am not convinced that it does. I am estimating that a dual AMD is something like 70% efficient for DF....if that. I'm personally running 4x dual AMD's and a P4 (~19.6GHz). 24/7 power is 2x duals and the P4 (11.8GHz). The part time dual's (~7.8GHz) run ~8hrs a day. All run as a service with useram=1.
My daily output is ~240K/ day. I really should be getting much higher than that I feel with the power I have invested in this project.
I am going to conduct a small test within the next few weeks where I remove the procs from my 2x full time dual's and run them in uni-processor boards for a while. I am expecting to see significantly higher numbers (~+30%).
Usr time Sys time
-------- --------
Maketrj 4.836 0.875
Foldtraj 58.367 3.242
My soltek SL-75FRN2 with XP2600+ and RedHat 9.0:
Usr time Sys time
-------- --------
Maketrj 3.590 0.620
Foldtraj 37.120 11.750
Could someone please explain exactly what the four numbers returned by the benchmark mean?
athlon 64 3200
gigabyte k8vt800pro
256 mb apacer pc3200 cl3
winXP 32bit
Maketrj 5.939, 0.300
Foldtraj 36.663, 5.438
Mandrake linux 64 bit bata
Maketrj 2.210, 0.470
Foldtraj 33.340, 3.880
CL3 Ram
Nice time for ram at that speed...can you get the Ram down to 2.5 and try it ?
And Mandrake 64 Bit seems to be getting some extra juice too, it is a good sign for the 64 Bit Code running 32 Bit Apps at faster speeds
I am not a Stats Ho, it is just more satisfying to see that my numbers are better than yours.
I only have an adjustment for the ram voltage in the bios.Nice time for ram at that speed...can you get the Ram down to 2.5 and try it ?
The memory controler is intgrated into the cpu so I guess the CAS timing is not adjustable.
Here is the benchmark at 2.2 gig
Maketrj 1.99, 0.480
Foldtraj 30.710, 3.300
I'd love to know a bit more about this benchmark program. TheOtherPhil got sub-20 seconds with his P4 at 3.7GHz. My XP @2400 gets 35 seconds which suggests that the benchmark program reflects pure MHz. Yet my office 2400MHz P4s suck producing much slower than my home Athlons. (I'll benchmark a sample office P4 tomorrow.)
Most crunchers here seem to agree that Athlons are faster that P4s at DF so how come the benchmarks don't reflect that? Is the benchmark representative?
What does it actually mean?
Last edited by HaloJones; 10-12-2003 at 02:08 PM.
P4 + 800 FSB + 865/875 + HT = Below 20 Seconds
Whether this transfers to real world speed over the Athlons is another question.. it is possible that only the benchmark gets a boost from the above points, I doubt we will ever prove or disprove it
I am not a Stats Ho, it is just more satisfying to see that my numbers are better than yours.
P4 @ 3.7GHz in a phase change cooled computer. Unless DFII uses SSE2 or Netburst, it cannot be able to compute DF as fast per MHz as an Athlon simply due to the number of instructions per clock cycle. I'm not trying to re-start an old argument here but programs have to be specifically written for P4s to take advantage of them. A simple x86 routine is quicker on AMD than Intel.Originally posted by Grumpy
P4 + 800 FSB + 865/875 + HT = Below 20 Seconds
Whether this transfers to real world speed over the Athlons is another question.. it is possible that only the benchmark gets a boost from the above points, I doubt we will ever prove or disprove it
Perhaps Howard could enlighten us on what the benchmark actually does.
As promised:
Predicting secondary structure and generating trajectory distribution...
Folding protein...
Benchmark complete.
Summary
-------
Usr time Sys time
-------- --------
Maketrj 9.438 0.750
Foldtraj 47.328 15.922
P4-2400 (W2K)
Athlon XP at the same clockspeed does 35 seconds.
And even if it does use SSE2 (or Netburst? dunno, I'm not familiar with what Netburst is), it still won't be able to compete with the Athlon.Originally posted by HaloJones
Unless DFII uses SSE2 or Netburst, it cannot be able to compute DF as fast per MHz as an Athlon simply due to the number of instructions per clock cycle.
The vast majority of the DF client's time is spent chasing pointers (AKA, doing integer arithmetic on memory addresses), not doing floating-point stuff. That's why the current client doesn't even use SSE (and may not use MMX, either) -- there's simply nothing to be gained from it, because that's not where the code hot-spots are.
"If you fail to adjust your notion of fairness to the reality of the Universe, you will probably not be happy."
-- Originally posted by Paratima
I think I've mentioned before but the benchmark builds a .trj file (trajectory distribution) for one particular sequence, and then builds 100 structures of it (like gen. 0). The protein is always the same, regardless of what protein we are working on. The random seed is fixed as well, so the procedure is completely deterministic (will always make the same 100 structures). Unfortunately this doesn't hold true across different operating systems as the floating point rounding error seems to vary on different platforms which in turn influences the sequence of events.
Thus it should reflect well the performance of the actual client in most cases.
Howard Feldman
Mike, the P4 I was using was running a 266fsb (1064MHz Quad Pumped) with the RAM at very aggressive timings (~6GB/s mem bandwidth Sandra Bench). Clock for clock the Athlon may be faster but the P4 in question has a 1.3GHz Clock speed advantage over your 533fsb 2.4's and almost double the effective FSB.
FWIW, the P4 chewed through DF extremely fast and pretty much equalled my dual barton's at 2.3GHz in output.
Train hard, fight easy
Yer, the OCed PIV running above 800 FSB is ugly
And TheOtherPhil, your Signature is scaring the children, damn snoop coder
I am not a Stats Ho, it is just more satisfying to see that my numbers are better than yours.
My office P4s are almost certainly 400MHz FSB since they are "cheap" and nasty office-use Compaq Evos.
Off-topic: why do businesses allow themselves to get so badly ripped off by the big manufacturers? I'm all for buying suoer-stable machines but do they need to be s o s l o w?
Does anybody here have a Power Mac G5? I would love to see how one of those performed, cause my G4 just plain sucks in dfold.
Derek
Not making the best best perfomance or getting under 20sec, but a interesting numbers imo.
Pentium-M "Centrino" 1,4Ghz WinXP Home SP1 256MB DDR266
Maketrj 7.571 0.651
Foldtraj 44.304 10.715
Wonder if it is possible to run that cpu on desktop motherboard :P
There are Mini-ITX motherboards coming out for it.Originally posted by [veix]
Not making the best best perfomance or getting under 20sec, but a interesting numbers imo.
Pentium-M "Centrino" 1,4Ghz WinXP Home SP1 256MB DDR266
Maketrj 7.571 0.651
Foldtraj 44.304 10.715
Wonder if it is possible to run that cpu on desktop motherboard :P
http://www.lippert-at.com/miniitx.html
p4m 1,8GHz, 512mb, WinXp Sp1
Summary
-------
Usr time Sys time
-------- --------
Maketrj 17.085 0.781
Foldtraj 64.803 18.166
Sum = 64.803 + 18.166 = 82.969
Didn't know what was better to do... have tried to write everything I had.
EhEH...
Ciao!
P4C 2.4 @ 3.0, (800 MHz FSB oc'd -> 1000)
512 MB PC3200 CL3 noname RAM (Samsung chips)
One moment, opening rotamer library...
Predicting secondary structure and generating trajectory distribution...
Folding protein...
Benchmark complete.
Summary
-------
Usr time Sys time
-------- --------
Maketrj 7.156 0.484
Foldtraj 31.156 8.563
This result is with memory running synchroniously (250 MHz DDR = 500 MHz, 1 GHz FSB) but with very loose timings 3-4-3-6. Even so, my mem voltage is at 2.95 volts. PAS is set to "Ultra Turbo" (fastest).
With Memory running asynch at native PC3200 speeds (200 MHz DDR = 400 MHz, 1 GHz FSB) I can set timings to 2-2-2-5, but the machine runs a tad slower, giving something around USR = 33 SYS = 8.7
System is Win XP Pro.
Running on an AMD Opteron 240 with 4gb ECC/Registered DDR333
Linux 2.6.0 SMP 32-bit NUMA Optimised
---
One moment, opening rotamer library...
Predicting secondary structure and generating trajectory distribution...
Folding protein...
Benchmark complete.
Summary
-------
Usr time Sys time
-------- --------
Maketrj 4.250 0.850
Foldtraj 36.880 11.280
---
If would appreciate if anyone knows how to configure the benchmark to run on 2 processors simultaenously.
I ran 2 benchmarks in 2 separate windows "almost" simultaneously (press enter, switch to another window, press enter again) I achieve roughly the same output as above.
The way you describe is how I do it. Just swap windows and run the second
What Client was that tested on. The regular or one of the Test Clients. It is very fast for a 140. Almost as fast as a 3200 Barton @ 200 FSB
Damn, I am saving up now, forget the Athlon64 3000, I want a Opteron Duallie after all
O yeah, is it the Iwill MB, and what video card is it running etc etc
Last edited by Grumpy; 12-19-2003 at 06:30 AM.
I am not a Stats Ho, it is just more satisfying to see that my numbers are better than yours.
Please note that for the recent beta clients, and the new one being released now (as indicated in the whatsnew.txt), the benchmark is no longer comparable to past benchmarks, due to the changes made to the algorithm. Interestingly, the new benchmark can show how much the algorithm has been sped up compared to the old algorithm. Please don't base hardware decisions on old vs. new benchmarks therefore
Howard Feldman
Yeah, that is why I asked for the Client Version. But it would have to have been done with the 108 I imagine, so 37 is very fast for the 140 all the same. It appears Linux 64 is running the Client a lot faster than Linux 32 Bit, even without a recompile
Umm, what precision are the numbers running at with the Client Howard...64, 72
I am not a Stats Ho, it is just more satisfying to see that my numbers are better than yours.
Ooops, my apologies, forgot to mention thats using the new client so yeah, the numbers aren't that great... I'm more interested in how well it scales in SMP for NUMA vs non-NUMA which is why I asked if theres a better way to run 2 clients simultaneously.
I get 36 seconds for Foldtraj with the new updated Client on a NF2 MB and Barton @ 2275 Mhz, so if the 240 1.4 Ghz Opteron gets close to this, I am very very impressed with your configuration
I am not a Stats Ho, it is just more satisfying to see that my numbers are better than yours.
ya, thats using the Iwill DK8SL with on-board ATi RageXL video... its not the DK8X workstation board unfortunately.Originally posted by Grumpy
The way you describe is how I do it. Just swap windows and run the second
What Client was that tested on. The regular or one of the Test Clients. It is very fast for a 140. Almost as fast as a 3200 Barton @ 200 FSB
Damn, I am saving up now, forget the Athlon64 3000, I want a Opteron Duallie after all
O yeah, is it the Iwill MB, and what video card is it running etc etc
I've noticed quite a bit of performance improvement 2 days ago when I switched to 2.6.0 NUMA optimised... I think the difference is NUMA vs non-NUMA on the Opteron since the client is compiled in 32-bit so using a 64-bit kernel won't net any real benefit unless the calls to system libraries benefit from 64-bit in some way
Since each client uses up to 150mb of RAM, then theres real benefits to be had with NUMA
If the numbers being crunched are 64 bit precision, then it will make a heck of a difference as it can run it native and noyt have to emulate 64 bit precision
So it is a dual 240 system and is the MB have the ram shared so cpu 2 goes through cpu1 for memory ?
I am not a Stats Ho, it is just more satisfying to see that my numbers are better than yours.
No, it has 8 ram slots, 4 per processors in a 4+4 configuration (Iwill doesn't make castrated motherboards in a 4+0 configuration)
I use 4 sticks of 1GB ECC/Registered DDR333, so 2 sticks per processor with both processors running in 128-bit memory path.
If CPU1 has to go through CPU0 for ram then NUMA optimisations means sqat
True if the client uses double precision floating point computations, then compiling for x86-64 "may" see quite a significant improvement... it comes down to 32-bit vs 64-bit although if the Ultrasparc numbers aren't anything to write home about, going to 64-bit may not be an improvement if any.... you'll probably have to hack the code somewhere since a "double" on a 64-bit architecture means 128-bit precision if your only aiming for 64-bit then you're doing more work than you need to.
Hmmm, the last Opteron 140 benchmark someone post my Barton was 32 seconds and the 140 46 seconds in the Fold Benchmark
I am not a Stats Ho, it is just more satisfying to see that my numbers are better than yours.
These are the benchmarks for the [b]new[/] client on a P4 2.66Ghz with 512MB PC1066 Rambus and Win2K:
One moment, opening rotamer library...
Predicting secondary structure and generating trajectory distribution...
Folding protein...
Benchmark complete.
Summary
-------
Usr time Sys time
-------- --------
Maketrj 8.582 0.340
Foldtraj 34.329 9.113
I think thats a pretty good score...
can anybody else post some new benchmarks?
Greets thor
Here is my Dual 2400 MP
- Summary
-------
Usr time Sys time
-------- --------
Maketrj 9.266 0.484
Foldtraj 59.859 11.875
And my AMD Barton @ 2275 Mhz
- Summary
-------
Usr time Sys time
-------- --------
Maketrj 6.156 0.234
Foldtraj 36.578 6.703
I am not a Stats Ho, it is just more satisfying to see that my numbers are better than yours.