DF Algo Questions [Archive]

AMDPHREAK

07-15-2003, 01:26 AM

Pardon my simplistic view of the project, but as I understand we are not yet actually looking for "usable" proteins yet, but rather we are looking for the rules by which real proteins fold.

Wouldn't it thus make sense to use a milti-algo approach? Most of us Stat Hoes run multi-boxen, and I for one would love the ability to run different algos on different PC's to determine the "best" one.

I know RMSD isn't the end-all criteria of success, and that a truly "helpful" algo would be one which doesn't rely on a known protein for generational "best structure" selection. But wouldn't a multi-algo approach help the effort more, or would the additional overhead be prohibitive?

And separately, I think the current Client should allow for a MUCH higher Gen 0 sampling. Why bother starthing 250 generations of relative modeling based on a 20+ RMSD structure? (which a large majority of my boxen produce in Gen 0)

PinHead

07-15-2003, 02:54 AM

I think gen 0 is a 10000 sample.

rsbriggs

07-15-2003, 05:13 AM

He's talking about the RMSD value (of 20+) you carry forward into generation 1, after sampling that first 10,000....

Stardragon

07-15-2003, 11:03 AM

Rather than trying a number of fundamentally different algorithms with a variety of variable parameters, we are interested in evolving a singlle well-working algorithm that guarantess a certain level of results.

We've already seen that the results are improving, so it is preferrable to tweak the paramters to their best values, and then progress to a genetic algorithm.

Brian the Fist

07-15-2003, 12:23 PM

Further, it would be too complicated and chaotic to distribute different tweakable parameter sets to different users.

Please note that the 'RMSD' in generation 0 is not actually RMSD but rather a fitness score (we should probably add this to the Phase II FAQ if its not already mentioned there)

AMDPHREAK

07-16-2003, 01:34 AM

"Neither a application coder nor a protein specialist do I be", so please bear with my perhaps irksome questions...

I fully understand where Howard is coming from in terms of the logistical nightmare of creating and supporting multiple client/algo versions. I thought maybe the actual "crunching" portion of the client was kind of a "plugin" that could have a new set of equations dropped in as an easy replacement. Doesn't sound like it though...

But is there any potential benefit to increasing the Gen 0 sampling? (or is that what phase 1 told us in terms of how many random structures typically need to be generated before a certain level of "fitness" is achieved?) My non-expert logic says that since each structure in Gen 0 is independent from the last, an arbitrary fitness threshold should be used to determine when random generation may cease, rather than the current system of a fixed number of structures. Am I way off in my thinking?

I also dunno what effect changing something like this mid-phase would have, but I doubt Howard and Co would be too keen on the idea. Is there going to be a Phase IIa or III or something similar where this could be looked at?

AND KUDOS ON THE BUG FIXES! :cheers:

Brian the Fist

07-16-2003, 10:23 AM

Originally posted by AMDPHREAK
My non-expert logic says that since each structure in Gen 0 is independent from the last, an arbitrary fitness threshold should be used to determine when random generation may cease, rather than the current system of a fixed number of structures. Am I way off in my thinking?

That is quite right, we could just wait for a threshhold fitness score to be exceeded. Again though, on average, whichever of the 2 ways we choose would make little difference in the end since the fitness score only gives us a very crude idea of how good the structure is. It is highly unlikely that after making 10000 structure there would NOT be a single 'decent' starting point, if you get what I mean.