-
Moderator
You've got most of it right (more or less) .
The major bias we use right now is something very old and fairly simple, called "secondary structure prediction". If you look at a bunch of protein structures in the PDB (using the software Cn3D or Rasmol - I can post links if you don't already know where to get these) you will see they are generally made up of 2 larger "motifs" or "secondary structures" - helices (those twisty shapes that look like a spiral staircase or a fun waterslide) and sheets (usually drawn by the software as parallel or anti-parallel arrows).
From teh sequence of the protein, there are AI methods (using neural nets for example) to predict where these 2 types of structures will occur in the protein, with about 75% accuracy (on a good day). The catch is, because it is not a perfect prediction, we cannot use it.. at least not use it as the gospel truth. So using some fancy probabilistic techniques, we are able to bias the structure building (give it a nudge in the right direction) but not force it. Thus only a certain population of the sampled structures will match the original prediction. This allows us to eliminate sampling many improbable folds. Using fragments that I mentioned earlier will further eliminate some improbable structures.
I am very pleased that some of the non-hard-core-biologists (I think) are understanding the fundamentals of what are project is doing, and the differences between ours and F@H - that example of Seattle and Washington was amazingly accurate
In our 2002 Proteins paper (in Table IV), we actually were able to estimate the exact number of samples required for each of 17 different proteins, in order to get 1 structures within 6A RMSD (or any other arbitrary cutoff for that matter). That was part of the basis from starting this project off with 1 billion samples. for example, we estimated we'd need 10 billion samples of the current protein to get a 6A structures (which we've already exceeded! Hooray! - these are very approximate order of magnitude estimates..) Similarly for 5PTI we predicted 10 billion samples needed to get a 6A structures (we got a 5.2A with 1 billion). Also, these larger samples allow us to make more accurate estimates. We are very excited about the results so far, and hope it will only get better as we continue to make the algorithm "smarter"
Oh, and as for the hydrogen bond question, the client does actually look for H-bonds in the structures (you may have seen an error in the error.log about unsatisfied H-bond donors, at some point). Because of the way in which our structures are formed though, H-bonds are relatively uncommon. This is because our structures are "unrefined". Think of our structures as the result of taking a tree and bending a few branches here and there. Because it is only slightly distorted, it is relatively easy to get it back to its normal shape. Refining the structure uses energy calculations to "relax" the structure to a more natural shape. Usually visiually the change is hardly noticeable. It is during this refining that most of the relevant H-bonds would actually form, so they are unfortunately hard to detect in our raw structures. We must rely on other signs of structural quality.
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules