The Science of Distirbuted Folding?

**plaidfishes** · 04-29-2002, 05:46 PM

First off my apologies to Howard and Dr. Hogue, my post should have said they discovered the Protein Folding problem is solvable because it isn't in fact NP-complete. On rereading, I can see how my post could be interpreted as saying they have solved the generic NP-complete problem which they clearly haven't done. Sorry for the confusion (I was trying so hard to avoid excessive jargon that I ended up less than clear)

Anyway, I took a long look at the curems project for Scott. Short answer, the DC project is reasonable but the overall plan as currently expressed is equivalent to saying "If we solve the strong AI problem, all of this will be easy"

The curems project is in the really early stages which means that any criticism is at most a raising of potential problems and not necessarily permanently valid. Again, I look at these projects from a computer science bias, not as a biologist so take it with a grain of salt. Also, in my experience, every project seems to start with a certain fraction of nonsense which gets discarded as the work gets underway.

The first curems DC project is a search for a ligand to interferon-gamma, it seems a reasonable bet that they will find such a thing. It is not clear that having a ligand for ifg will actually result in a cure for MS. It is not clear that changing the action of this specific protein is the answer. Compare this with the recent Anthrax DC problem where it was already definitivly known what had to be altered. Finding 10,000 ways that don't work is still useful science and the operative point here is that a ligand for ifg *might* actually work. Besides which, doing ligand searchs for all proteins is clearly a task under the heading of really big computer projects that need to be done. Starting with ones related to MS is as valid as any other choice for starting points.

The overall plan as currently described has some major holes.

"Once we have organized all the current data on Multiple Sclerosis we will begin running millions of evolving genetic algorithms that will continually self-organize neural networks that will perform a massive search for cures and treatments for MS. "

This is exactly the same as a computer program that can take current knowledge as input and solve arbitrary problems that we don't know the answer to. That is the strong AI problem, ie computer intelligence capable of generating new ideas. I don't want to start a flame war here but this is a 50 year old debate with no solution in sight. If (as I suspect) they don't really mean to imply solving strong AI, then applying these techniques to MS has some significant obstacles.

1) The data is not organised yet.
And might not be available for quite some time. As the site states, they need to get the following information before starting: "All proteins invloved in the MS disease model. All versions of these protein's life cycles. Protein network maps. How the proteins interrelate in the myelination/demyelination process. Pathways and cellular signaling maps of all biological processes that may relate to MS. Glycobiology links to pathways and signaling maps of all related glycocarbohydrates and glycolipids."

For all of our discussion of will DF work or not, the information needed for the MS project seems to presuppose the existance of a completed protein database such as Dr. Hogue's BIND http://bind.ca/. It appears that the database simply isn't populated enough to do the research curems contemplates at this time. This will change in time so it is not a fatal objection but it implies the time is not yet ripe. DF or FAH could be catagorised as efforts aimed at populating the database. The fatal objection for this plan as currently presented is that once you know all of the protiens, interactions, pathways etc, finding a cure is not a DC problem it becomes simply a matter of querying the database. In other words, if you presuppose perfect information, all problems tend to be trivial. My guess is that the project will evolve into efforts to fill in the database as it applies to MS.

2) Genetic algorithm is probably not an appropriate methodology.
A GA is appropriate for problems which do not have a specific solution, only strategies for improving results to the point it is "good enough". In simple terms, a GA takes random reactions to inputs and keeps the ones that are "better" to make the next generation of possible reactions. It requires that all of the inputs be fully known and a data structure that can map all possible inputs to all possible responses. It also requires that once the inputs have reacted with the random responses, a fitness function evaluates all of the results to separate the "better" ones.

I don't see how that fitness function can be written. There is not a large grey area of "better" results, it is either a potential cure or it is not. If you apply a GA to the stock market, you can say one result was $10 profit and another was $100 and the "Better" answer can be calculated. Or one protein fold has a "better" score than another. For this project, they are looking for a specific "right" answer. GA doesn't do "right" answers.

3) Self Organising Neural Networks

It has been a while since I dealt with NN but my understanding is that these are mostly used to capture "expert" information by training the network. It may be possible to train a NN by simply putting in all of the revelant information, papers etc but the result is not new discoveries. NN are used to make available information that is already known. An NN can be used to bring non-experts up to expert levels but it can't be used to exceed the knowledge of the expert. Since the purpose here is to find answers that not even the experts have, NN does not seem to be an obvious methodology to achieve a cure. That is not to say an NN for MS would be without value. A NN tends to proliferate expertise quicky and improving the number of experts tends to have a powerful effect on progress in any field. Besides which, the creation of a NN strictly from source materials without active intervention is a very worthy project in and of itself regardless of the knowledge domain involved.

Overall impressions:
The curems project is fully buzzword compliant. Its current theory seems to be that throwing a lot of computers at all of the fashionable algorithms (DC, GA, NN) will have useful results. It is still in the very early stages where such nonsense can be expected to proliferate. So the fact that there is nonsense at present does not count for much. Progress occurs as the wrong answers get cleared away. If they simply do the preliminary work they have discussed, gather all of the known data and organize it, they will have made a major contribution. Nothing looks set in stone and the first steps that they are taking will almost certainly lead to a much more reasonable plan of attack. The specific DC of looking for the ligand has a high probability of success, is clearly useful but not garanteed to cure MS.

How does this fit in the overall problem?
I like to think of all of these projects as steps on the way to creating a plaid fish. First you figure out DNA (the genome projects) Then you figure out how the DNA turns into proteins which is the DF and FAH projects. Then you take the known protein structures and figure out what drugs interact with them which is the Anthrax, TSC and MS ligand projects. Also at this level is figuring out protein to protein interactions, pathways etc. which the BIND project is a beginning point. After that is figuring out how any arbitrary sequence of DNA will express as a protein and how that protein would interact with all of the others. After all of that, enough knowledge should be in place to to design a plaid fish for my aquarium to go with the striped and polk-a-dot fishes.

Dang, that was a loooong post.

Thread: The Science of Distirbuted Folding?

Thread Tools

Rate This Thread

Display

Threaded View

Curems thoughts

Posting Permissions