Protein structure with RMSD of 3.73 found

**Aegion** · 11-06-2002, 02:36 AM

According to the offical statpage, huraxprax of Rechenkraft.net just processed a protein structure with an RMSD of 3.731695!:shocked: Is this the sort of unusually good protein structure that you would expect from this particular protein Howard? It seems a bit surprising it is so much better than any of the others currently processed.

**Michael H.W. Weber** · 11-06-2002, 05:17 AM

Hey - this is a great result!

It is indeed a little surprising that this structure is so much better than those following up in the top10 ranking (re-evaluation of the stochastic element of algorithm recommended?) but as you can see: it's possible!

Michael.

**huraxprax** · 11-06-2002, 06:03 AM

Hi,
I'm as surprised as you, especially since this one was so much better than all the others.
I hope it is valid and was not a bug, since I did nothing unusual. All machines were Athlons running the Linux icc-compiled client with -rt, just one moderately overclocked but absolutely stable and correct.
But it looks nice

Benno

**Brian the Fist** · 11-06-2002, 11:35 AM

The 3.73 structure is INCORRECT. The true RMSD is 13.40 (I downloaded the structure and checked). While it appears huraxprax or someone using huraxprax's handle has cheated, other possiblities exist. Please respond to the e-mail I have sent you.

If a machine is overclocked it is possible a math error occurred when computing the RMSD or something like that, but I will investigate and see if I can learn everything. huraxprax's best RMSD has been reset in the mean time.

**Michael H.W. Weber** · 11-06-2002, 02:30 PM

As the "leader" of the Rechenkraft.net DF team I have contacted huraxprax concerning this extraordinary result. There is IN NO WAY reason for me to believe any cheating has taken place at any time. The email Howard has sent to huraxprax has been received and - as expected - will be dealt with in detail.

From what I know it appears to me that an overclocking problem MIGHT have occured - maybe some of you guys might want to think again whether overclocking is really necessary. I for once do not support it in scientific dc projects making use of excessive caculations (Folding@home, Genome@home, Distributed Folding, and all the "docking" clients).

Fold on!

Michael.

**huraxprax** · 11-06-2002, 02:37 PM

I definitely did not cheat, nor did anyone else use my account.
Further details are in the mail sent to Howard.
Benno

**wirthi** · 11-06-2002, 03:11 PM

Originally posted by Michael H.W. Weber
From what I know it appears to me that an overclocking problem MIGHT have occured - maybe some of you guys might want to think again whether overclocking is really necessary. I for once do not support it in scientific dc projects making use of excessive caculations (Folding@home, Genome@home, Distributed Folding, and all the "docking" clients).

I do absolutely agree with you. Nevertheless you have to differ between the projects.

As you have seen on this occasion, Howard can verify within seconds (or minutes) if a solution is correct or not. He only has to check the best results (since the other results are more or less useless); so when he starts to use the results of the computation (for example to predict a CASP protein), he just needs to check the results he uses for that. I guess that are not more than 5 results, and thus, it takes him less than a minute. This may work in a similar way for other projects. (like Muon, where it takes longer to prove a single result, but still only the best result(s) have to be verified).

On the other hand, if you make an error on a project like Mersenne, it takes weeks to verify a result, and you have to verify EVERY result to be sure. That's the reason why they DO doublecheck ...

To come to a conclusion for Distributed Folding: Errors are bad and should be avoided if possible; but I guess more "Rechenkraft" (computation power) is better than avoiding an error that occurs every 1.000.000.000 folds - of course that's my personal opinion, I don't know what Howard thinks about that and how much work it really is for him to verify suspicious results.

**prokaryote** · 11-06-2002, 04:12 PM

Originally posted by Michael H.W. Weber
...

From what I know it appears to me that an overclocking problem MIGHT have occured - maybe some of you guys might want to think again whether overclocking is really necessary. I for once do not support it in scientific dc projects making use of excessive caculations (Folding@home, Genome@home, Distributed Folding, and all the "docking" clients).

Fold on!
Michael.

I think it depends upon the approach as well, isn't F@H more of a statistical approach? Therefore, occasional errors won't affect it.

Also, how does one protect a "stable" system from glitches caused by outside sources (Cosmic rays, natural radioactive decay from products found within the solder, etc.) I think, if possible, clients should be made as robust as is reasonable, perhaps storing or using some sort of error checking value for each data entry or groups of data. This may slow down the client, but if it is susceptible to issues due to memory or data "glitches" it should probably check for glitches.

I think that it would be impossible to have some sort of policing effort to keep people who OC from contributing. For those that do OC, checking the system for stability prior to use in a DC would be a prudent and considerate thing to do.

**Aegion** · 11-06-2002, 04:21 PM

Originally posted by prokaryote

I
Also, how does one protect a "stable" system from glitches caused by outside sources (Cosmic rays, natural radioactive decay from products found within the solder, etc.) I think, if possible, clients should be made as robust as is reasonable, perhaps storing or using some sort of error checking value for each data entry or groups of data. This may slow down the client, but if it is susceptible to issues due to memory or data "glitches" it should check for glitches.

The issue is that the consequences for this project, which is Distributed Folding an entirely different project than F@H, is merely that that particular structure with the error isn't of much use. Errors can be easily caught, so the slowdown from substancial error checking would be more harmful than the extremely rare error. A large number of "junk" units can be produced by computer errors for the project, and they are automaticly filtered out in most cases.

**Michael H.W. Weber** · 11-06-2002, 04:31 PM

1. Well, I did not mean to say that overclocking should be banned in general. I just asked myself - after this incident - whether it is really necessary. How much more structures do you generate using oc'ed systems compared to those working under the manufacturer-specified conditions? To me, it's not worth the risk. It is also clear that some projects suffer more from errors possibly occuring when overclocked than others.

2. As far as checking of the results is concerned: This takes just a few seconds on a standard PC. You can do it at home using - for example - Swiss-Pdb-Viewer. This free program has a superimposition tool. All you have to do is get both structure files (current template protein and structure model to be tested) and then superimpose the both of them.

Relevant links:
SPDBV: http://us.expasy.org/spdbv/mainpage.htm
Course on molecular modelling using SPDBV: http://www.usm.maine.edu/~rhodes/SPVTut/index.html
Protein structure data resources: http://www.rcsb.org/pdb/

Have fun,

Michael.

[edit]: Please note that - as far as I understand up to now - huraxprax's structure is VALID. It was "just" submitted with an INCORRECT RMSD.

**Brian the Fist** · 11-07-2002, 10:08 AM

That is exactly how I compared it actually (SwissPDB), good guess

So yes, it takes me 2 minutes to check, although usually visual inspection is enough even since a 3.7A structure should look VERY similar to the native structure, both of which are viewable with Cn3D from the stats page.

Anyhow, it is indeed difficult, I have found, to code software and account for all the possible things that can go wrong when you account for faulty RAM, cosmic rays, overclocking, CPU bugs and so on. Thanks to users I have put in warning/error messages that catch many of these problems, such as the bzip2 decompression errors, or things like that that occur with bad RAM. Nevertheless, its not perfect. For example, besides the 3.7A structure, there were 2 0A structures uploaded in the 3 billion or so we've sampled so far of this protein. Again, this is because the routine which computes RMSD failed on two users' machines, once. Why this occurs is almost impossible to say. I have used the routine myself thousands or even millions of times, and never had a problem. But when you scale up to billions of times, one or two errors once in a while become probable, rather than impossible, it seems. The RMSD calculation is a complex matrix computation using what's called SVD. It has to take into account many rare special cases (like when certain matrix elements work out to be exactly zero, etc.) so it is quite possible that 1 in a billion times there is a case which fails.
IF it does not happen repeatedly to the same user, I would not worry too much about it as it is very easy to identify and fix.

**bwkaz** · 11-07-2002, 01:58 PM

Singular Value Decomposition? We've had to write that into a program once, too. It was part of a large software project, a sort of bridge-building CAD system almost. It was full of bugs -- the weight crossing the bridge would make it just fine if there were no supporting beams whatsoever, but once you added one beam hanging off at an angle, it would fail. Strange stuff. I think more of the problem, though, was that some of the group members (myself included) were not quite clear on how the math was even supposed to work, let alone how it did work. Whatever. As long as you're happy with it, I guess I don't care.

Thread: Protein structure with RMSD of 3.73 found

Thread Tools

Rate This Thread

Display

Protein structure with RMSD of 3.73 found

Posting Permissions