I was just curious to hear more about the science of distributed folding. I'm a grad student at Berkeley, but hang out at Stanford too (my wife is there) and have heard many of the big names in computational biology and protein structure prediciton (Michael Levitt, David Baker, etc) talk about work similar to what's going on here.

From what I can tell, Distributed Folding is creating a large decoy set, much like the original Park/Levitt decoy set. I guess I'm curious

(1) Why do you need distributed computing to do this? The Park/Levitt set (and other sets, eg Baker's) were created with much fewer resources. Are yours bigger or better? Based on the RMSds you quote, it doesn't seem that way.

I just want to make sure that we're not wasting our time here on inefficient code.

(2) WHy one needs a big decoy set in the first place? Levitt and Baker have each mentioned that decoy discrimination is the problem and that a bigger decoy set is not going to help.

Again, just curious whether all this makes any sense to be doing.

I think your idea to use distributed computing for biology (and proteins in particular) is great and I'm glad you're working on this. I just want to understand why. THanks!


Raj