how cool is this?
not only does the computer guy post in team forums, but so does the head scientist guy!
this is ripped from the arstechnica DC forum
Frequency curves...
Hi folks,
This is Christopher Hogue, I'm the scientist in charge of the Distributed Folding project. I sensed a cry for help here...
quote:
--------------------------------------------------------------------------------
Originally posted by FoBoT:
Quote: "Originally posted by hanser:
I just took a look at that brief benchmark page, and perhaps I'm an idiot, but what exactly is the Y axis measuring? What is "frequency"?"
i don't get it either, perhaps a smart person could splain it to us
--------------------------------------------------------------------------------
No you aren't an idiot! Good question - benchmarks for systems that take indeterminate amounts of time are hard to show accurately. Everybody wants a simple "number" but the shape of the profiles can be revealing...
So then, the FOLDTRAJ benchmark results are the ones in the figures posted at:
http://bioinfo.mshri.on.ca/yac/speed.html
We did these benchmarks before we bought our cluster.
These are time vs frequency plots for 5000 samples of a protein called myoglobin. The frequency on the Y axis is how often a protien (as a fraction of 1.0) a protein is finished in a particular time (x axis) on each system. A better system has the frequency peak nearer to the left, and a narrower profile. Tall and thin if you like ...
If you look at the bold curve for the 500 MHz Alpha 21264, you can see that the peak is at about 1.3 seconds on the time x-axis and about 0.53 on the frequency y-axis.
This means that it takes 1.3 seconds to finish a structure, for about 53% of the 5000 attempted structures. If you look at where this curve hits the baseline, near 8-10 seconds, this means that a very small fraction of proteins completes in a longer time, but not too unreasonable.
Each protein made takes a slightly different amount of time owing to backtracking, and atomic collisions that ocurr at random. Backtracking usually causes disk I/O to occur, and unusual response times for this will cause the peak to have a broad tail to the right, like the SGI - Irix MIPS R5000 200MHz system. Various combinations of disk speed, cache size and OS buffering settings will cause this tail to broaden or narrow. The integral (area under the curves) is the exactly the same for each curve and represents the sample size.
Proteins that take a long time to complete often are quite compact - and that's a good thing for our science.
If you are interested in generating these curves for your system let us know. It is possible that we can add an option to dump this frequency data out for you from the text client in a form that you can load into a spreadsheet. E-mail Howard at
[email protected] and let him know...
Finally, thanks for all your help! Looks like we will hit 100 million structures this weekend.
Christopher Hogue
Senior Scientist, Samuel Lunenfeld Research Institute.
P.S. I'm in Tokyo attending a scientific meeting. Do you think it coincidence that the maid left a folded paper crane on my pillow?
these guys are better at PR than those stanford dudes :rolleyes: