Originally posted by Scott Jensen
OK, now if someone would care to explain in real simple layman terms what the following "good points" mean, I'd truly appreciate it. First up...
OK, I don't get this at all. Narrow the field?
And candidates for what?
Candidate structures - i.e. a structure that is likely to be close to correct.
---
How and in what way? How would it help IBM's Blue Gene? Would IBM even ask for DF's help? And how important would such assistance be to them? Will they come begging for our help or would we have a problem getting them to even return our phone calls?
The Blue Gene team is aware of our software and may indeed have use for it later on when they are closer to completion
---
What!? Why aren't we trying to find the exact structure? I would think that would be exactly the goal of Howard and Dr. Hogue.
And cannot x-ray crystallography be highly automated so ... [Scott tries to do his best Carl Sagan impression] ... billions upon billions upon billions can be done? How long does x-ray crystallography take to do one protein? How expensive is it? Wouldn't it be better to put smart people like Howard and Dr. Hogue to work on making lightning-fast x-ray crystallography technology than this guesstimating software?
Well this is a loaded question. Exact structure is an oxymoron in this case. Proteins are DYNAMIC molecules, constantly moving and rolling and vibrating around in solution. The best we can hope to do is find an approximate structure. Even then we can't expect a perfect match to the crystal structure. It is impossible to model reality sufficiently, at present, to ever expect an EXACT prediction. As for X-ray crystallography the limiting step is making crystals which can take for 3 mo. to 1 year or more for just one protein (protein dependent). There are indeed companies working on high-throughput crystallography, but the catch is, they can only do this on a certain fraction of proteins (maybe 5%) which crystallize easily. Other people are working on ways to speed up the crystallization process.
One advantage of software is in the future, when it works perfectly
, we can design new proteins which don't yet exist and predict their structures, and thus functions. Also, crystallizing a protein and getting its structure can be very expensive, costing tens of thousands of dollars in equipment, chemicals and labour. Also many proteins can never be crystallized (or are extremely difficult) and must be solved another way.
---
"We are trying to find (and prove) a method for taking an unknown protein and 'predicting' its structure with a certain degree of accuracy."
OK, this doesn't make sense. What do you mean by "unknown protein"? How can you take it if you don't know it? How can you know you predicted it right if you don't know it in the first place?
That should probably mean protein of unknown structure, but known sequence. Our goal is to get to the structure from the sequence.
---
"Based upon my limited understanding of the science involved, all of the results (not every single one, but the best of each protein run) in Phase IA were good enough to be of use."
For what?
For designing drugs, predicting function, finding homologs (similar evolutionarily related proteins), and lots of other cool stuff
---
"For example, trying to sort/categorize/organize/identify the proteins discovered as a result of the Human Genome Project. You obviously can't do lab work on every single one of them simultaneously, so you either find ways to "sort through" them or you sit around waiting while the limited lab resources are used to find the structures of all of these unknown proteins."
But how do we know which ones are the best ones to devote our efforts to? Or are we simply taking ones willy-nilly and seeing if anything worthwhile comes out of it? If the second one, how would we even know if that simulated protein is worth anything or even what it does?
There are methods for picking out 'interesting' proteins - usually ones that don't have any sequence similarity to known proteins, that have no known function, and that appear in many organisms. We can also predict domain boundaries and stuff like that but I won't go into that.
---
"DF also appears to have a good apparatus for implementing very fast tests of both sampling and scoring methods as new methods are discovered/created/shared/etc.
Basically, they have a platform that allows them to test sampling and scoring methods very quickly (a very large number of iterations, a very large "sample set" in a very short amount of time). As the samping and scoring methods become more and more accurate (and robust) the amount of more direct science that they can do with the project should increase. "
Ahhh! Perhaps there's something here. That being "very fast tests". Testing for what? For whom? Why? What value do these tests represent and mean to scientists and the biotech industry?
We are not a commercial entity, we are a research institute. Our goal is to work on the protein folding problem. Whether or not this benefits the biotech industry is of no importance, although they would definitely benefit by being able to design drugs for proteins whose structures have not/ cannot be solved by other methods. We wish to test different scoring and sampling methods to optimize our approach, and improve our predictions of protein structures.
---
One, can we see these more detailed pictures?
Two, could you set up some little slide show type thing on the website with a clicker button that would enable us to click back and forth between the real protein and its simulated prediction for us to see the differences ourselves?
Not sure what you want with number two but we'll add to the results section the actual structures so you can view them in Cn3D and see for yourself.
---
And this scoring system works how? Also, please remember you're explaining this to a moron.
It is a black box as far as you can see. In goes the structure, out comes the score. There present ones we are trying work by counting contact pairs in space. Basically all residues pairs within a certain distance in space are listed, and each gets a score depending on how likely those two residues are to be found close together. We just add up the score for all pairs.
---
"So we expect similar results for novel proteins of comparable size."
I assume "novel" means "unknown". But this just sounds odd as well. I mean how can you fold something you don't know? It doesn't make sense. Do you understand what I'm not getting? Is it that you have a string of protein components you've been told are proteins but they haven't been predicted/folded yet? Is that it? If so, how do you know they know these components make up a protein the first place? Where and how did you get this list of protein parts without knowing its shape? See what I'm getting at? It's like someone comes to me with a dump truck full of vehicle parts and tells me that they'll make a complete working car.
Proteins are like beads on a string. It is easy to get their sequence - the order of the beads on the string. But it is hard to get their 3-D shape - how that string folds up into a globular structure. Again, 'unknown' implies unknown structure but known sequence.
---
"It hasn't been called the most difficult biological problem in history for nothing after all."
Why? Nothing else in biology is this complex of a problem? Nothing? And who has said it was "the most difficult biological problem in history"? And, NO, I will not take Oprah's word for it.
Ask a biologist "what's the most difficult problem in biology" and see what they say
---
"Keep in mind that their goal is NOT structure prediction though, it is to investigate the folding pathway of proteins - i.e. how it gets from unfolded to folded and what all the intermediate steps are."
What?! Isn't their end goal the same as yours? To present the true structure of the protein. If that wasn't their goal as well, their willy-nilly folding of a protein components would be rather stupid and silly.
Nope, Im pretty sure their goal is to investigate folding pathways, not structure prediction. But again, I can't speak for them. Not sure what you mean by willy-nilly folding though.
---
"As a final point, our structures are sometimes referred to as 'unrefined'. That is, they are raw predictions straight out of the program. The[y] can be subjected to energy minimization techniques such as molecular dynamics and simulated annealing which could (and should) reduced the energy, and RMSD, further."
Will someone please translate the above? Sorry, I only know English.
"raw predictions"???
Like it says, straight out of a single program, not refined. See below.
"energy minimization techniques"???
minimize the energy of a protein using some technique
"molecular dynamics"???
see F@H
"simulated annealing"???
see F@H
"which could (and should) reduced the energy, and RMSD, further"???
See energy minimization techniques above
---