Hello,

I have a few questions about how energy and best structure are determined.

In Phase II FAQ it states, "Only the best structure for each generation is uploaded to the server." Does that apply to generation 0 as well?

How is the best structure from gen. 0 currently selected? Is it based on the lowest RMSD or best energy?

How is the energy calculated and how important is the accuracy?

And, is it possible to begin calculating or estimating the energy of the protein as it is being assembled? How different is the energy of protein between its native and a 'typical' unfolded state?

If the energies are very different, and if energy can be calculated during assembly (and if not all structures from gen. 0 are uploaded) might it be possible to speed up generation 0 by doing the following?

First, an assumption: I assume that for each of the 10,000 iterations, after the protein fold is completed, the energy is calculated and compared to the current best energy which is updated if the new fold is better. After all 10,000 are done, the best fold is used as the seed to start generation 1.

The suggestion: After the first 200 (or 500) folds are complete, start calculating the energy while each protein is being built up. If it exceeds the current best energy, the rest of that fold is skipped. If calculating the energy after each new residue is too expensive, perhaps it could be done starting after the 20th residue has been added and then again after every 5th, or some similar scheme. I suggest starting this after 200 (or however many) because I'm guessing it is expensive to calculate the energy and therefore it would be better to quickly build up enough folds to have a reasonably good energy to serve as the basis for comparison.

I imagine this would speed things up because less time would be wasted resolving atomic clashes which seem (based on staring at the screensaver) to occur more frequently after the first few dozen residues have been added, and of course, many iterations would be interrupted so generation 0 would complete more quickly than it does now.

If RMSD is used to choose the best structure, it seems to me that this approach of interrupting structures in gen. 0 should be even easier to implement.


So many questions...

Michael Matisko


= = = = = = = = = = = =


http://www.distributedfolding.org/details.html

I made the most structures. Do I win?

It is not quantity, but quality that matters with our project. The 'lowest RMSD' is what you should get excited about as this is where the real science is. Generating large numbers of poor structures doesn't achieve anything. Of course, it is all random, you have no control over the generated structures, that's the whole point of creating this massively distributed task. To explain this briefly, we are sampling from the many possible conformations of a protein, and testing to see how much sampling is required to get something that looks like the true structure. ... There are literally trillions of possible shapes such a protein could take, hence the need for massive sampling.


http://www.distributedfolding.org/phaseiifaq.html

... Generation zero will remain entirely random, producing 10000 structures. The best structure from this generation is chosen to serve as the basis for the next generation. Note that the best structure is selected based on either RMSD to native structure or crease energy (will vary from version to version as we test the algorithm more). Taking the best structure and generating 50 near-neighbour structures produces generation one. ... Only the best structure for each generation is uploaded to the server.