The Science of Distirbuted Folding?

**plaidfishes** · 04-29-2002, 05:46 PM

First off my apologies to Howard and Dr. Hogue, my post should have said they discovered the Protein Folding problem is solvable because it isn't in fact NP-complete. On rereading, I can see how my post could be interpreted as saying they have solved the generic NP-complete problem which they clearly haven't done. Sorry for the confusion (I was trying so hard to avoid excessive jargon that I ended up less than clear)

Anyway, I took a long look at the curems project for Scott. Short answer, the DC project is reasonable but the overall plan as currently expressed is equivalent to saying "If we solve the strong AI problem, all of this will be easy"

The curems project is in the really early stages which means that any criticism is at most a raising of potential problems and not necessarily permanently valid. Again, I look at these projects from a computer science bias, not as a biologist so take it with a grain of salt. Also, in my experience, every project seems to start with a certain fraction of nonsense which gets discarded as the work gets underway.

The first curems DC project is a search for a ligand to interferon-gamma, it seems a reasonable bet that they will find such a thing. It is not clear that having a ligand for ifg will actually result in a cure for MS. It is not clear that changing the action of this specific protein is the answer. Compare this with the recent Anthrax DC problem where it was already definitivly known what had to be altered. Finding 10,000 ways that don't work is still useful science and the operative point here is that a ligand for ifg *might* actually work. Besides which, doing ligand searchs for all proteins is clearly a task under the heading of really big computer projects that need to be done. Starting with ones related to MS is as valid as any other choice for starting points.

The overall plan as currently described has some major holes.

"Once we have organized all the current data on Multiple Sclerosis we will begin running millions of evolving genetic algorithms that will continually self-organize neural networks that will perform a massive search for cures and treatments for MS. "

This is exactly the same as a computer program that can take current knowledge as input and solve arbitrary problems that we don't know the answer to. That is the strong AI problem, ie computer intelligence capable of generating new ideas. I don't want to start a flame war here but this is a 50 year old debate with no solution in sight. If (as I suspect) they don't really mean to imply solving strong AI, then applying these techniques to MS has some significant obstacles.

1) The data is not organised yet.
And might not be available for quite some time. As the site states, they need to get the following information before starting: "All proteins invloved in the MS disease model. All versions of these protein's life cycles. Protein network maps. How the proteins interrelate in the myelination/demyelination process. Pathways and cellular signaling maps of all biological processes that may relate to MS. Glycobiology links to pathways and signaling maps of all related glycocarbohydrates and glycolipids."

For all of our discussion of will DF work or not, the information needed for the MS project seems to presuppose the existance of a completed protein database such as Dr. Hogue's BIND http://bind.ca/. It appears that the database simply isn't populated enough to do the research curems contemplates at this time. This will change in time so it is not a fatal objection but it implies the time is not yet ripe. DF or FAH could be catagorised as efforts aimed at populating the database. The fatal objection for this plan as currently presented is that once you know all of the protiens, interactions, pathways etc, finding a cure is not a DC problem it becomes simply a matter of querying the database. In other words, if you presuppose perfect information, all problems tend to be trivial. My guess is that the project will evolve into efforts to fill in the database as it applies to MS.

2) Genetic algorithm is probably not an appropriate methodology.
A GA is appropriate for problems which do not have a specific solution, only strategies for improving results to the point it is "good enough". In simple terms, a GA takes random reactions to inputs and keeps the ones that are "better" to make the next generation of possible reactions. It requires that all of the inputs be fully known and a data structure that can map all possible inputs to all possible responses. It also requires that once the inputs have reacted with the random responses, a fitness function evaluates all of the results to separate the "better" ones.

I don't see how that fitness function can be written. There is not a large grey area of "better" results, it is either a potential cure or it is not. If you apply a GA to the stock market, you can say one result was $10 profit and another was $100 and the "Better" answer can be calculated. Or one protein fold has a "better" score than another. For this project, they are looking for a specific "right" answer. GA doesn't do "right" answers.

3) Self Organising Neural Networks

It has been a while since I dealt with NN but my understanding is that these are mostly used to capture "expert" information by training the network. It may be possible to train a NN by simply putting in all of the revelant information, papers etc but the result is not new discoveries. NN are used to make available information that is already known. An NN can be used to bring non-experts up to expert levels but it can't be used to exceed the knowledge of the expert. Since the purpose here is to find answers that not even the experts have, NN does not seem to be an obvious methodology to achieve a cure. That is not to say an NN for MS would be without value. A NN tends to proliferate expertise quicky and improving the number of experts tends to have a powerful effect on progress in any field. Besides which, the creation of a NN strictly from source materials without active intervention is a very worthy project in and of itself regardless of the knowledge domain involved.

Overall impressions:
The curems project is fully buzzword compliant. Its current theory seems to be that throwing a lot of computers at all of the fashionable algorithms (DC, GA, NN) will have useful results. It is still in the very early stages where such nonsense can be expected to proliferate. So the fact that there is nonsense at present does not count for much. Progress occurs as the wrong answers get cleared away. If they simply do the preliminary work they have discussed, gather all of the known data and organize it, they will have made a major contribution. Nothing looks set in stone and the first steps that they are taking will almost certainly lead to a much more reasonable plan of attack. The specific DC of looking for the ligand has a high probability of success, is clearly useful but not garanteed to cure MS.

How does this fit in the overall problem?
I like to think of all of these projects as steps on the way to creating a plaid fish. First you figure out DNA (the genome projects) Then you figure out how the DNA turns into proteins which is the DF and FAH projects. Then you take the known protein structures and figure out what drugs interact with them which is the Anthrax, TSC and MS ligand projects. Also at this level is figuring out protein to protein interactions, pathways etc. which the BIND project is a beginning point. After that is figuring out how any arbitrary sequence of DNA will express as a protein and how that protein would interact with all of the others. After all of that, enough knowledge should be in place to to design a plaid fish for my aquarium to go with the striped and polk-a-dot fishes.

Dang, that was a loooong post.

**Raj** · 04-30-2002, 02:20 AM

Hi,

Seems like this thread is nearing its end in terms of productive discussion. It will be interesting to see how the protein strucutre prediction field progresses. Perhaps it makes sense to revisit these questions at the end of the year. It's interesting that Dr. Hogue called these posts "FUD". I guess we'll see for sure post CASP.

Raj

PS As to Dr. Hogue's points.

(1) molecular dynamics is usually O(N), eg with cuttoffs of Particle-Mesh Ewald

(2) I found your point very interesting about the quote on the FAH paper, so I took a look. At first sight, I thought you raised a very serious point. After some reading and some thinking, I think you might be missing the point with the potentials used in MD. These potentials don't make good discriminators for structure prediction (as stated in the FAH paper), but that doesn't mean they don't make good force fields for MD. Remember that potentials used in MD are energies, not complete free energies and thus there are conformational entropies (side chains, minor backbone variations, etc) which are also relevant. These entropies come automatically in MD (which samples with Boltzmann weighting). Thus, the force fields won't necessarily be able to discrimiante or act as scoring functions for structure prediction, but may be perfectly fine for MD and kinetics. I think the FAH result shows that they're pretty good for folding kinetics (although probably far from perfect). Anyway, you should probably ask the FAH people. They might have a better (or at least different) answer.

**Scott Jensen** · 04-30-2002, 01:46 PM

Plaidfish,

Thanks for your review of CureMS. However, I think discussing it further in this thread would now be a bit too off-topic and even too off-topic for this forum. Thus I've posted a lightly editted version of your evaluation of CureMS to the newsgroup comp.distributed and encourage you and others interested in the topic to continue our discussion there. The thread's title is "Weighing dc projects". In it I discuss three dc projects I'm considering on helping ... with CureMS being one of them.

Again, thanks for the review. I hope to see you and others participate in the discussion in comp.distributed.

**Shaktai** · 04-30-2002, 06:13 PM

Well I must say this thread has been extremely fascinating reading, and very informative. Thanks to everyone who has contributed. I think Raj is correct in saying that perhas we need to wait until Casp 5 to revisit the issue.

Still it has been a learning experience, even if a lot of it was way over my head.

**MAD-ness** · 05-11-2002, 12:06 AM

Wow. Interesting thread.

The more I learn about the science that Howard and Dr. Hogue (and others) are involved in, the more interesting and exciting it becomes to watch the project progress.

My understanding of the science is...lacking and I am not a math, biology or computer science genius, so I can't follow all the diverse aspects of what is going on. However, I think I grasp the general concepts.

Some questions I have that were prompted by this thread:

The things being tested by the project are varied and potentially changing as the project continues. We have already tested and refined a distributed computing client. We have created a large amount of data using the sampling methods which have had the scoring methods applied to them. As a result of the current data analyzation that you are undertaking you will, hopefully, be able to refine and improve the algorithms used for both the sampling of proteins and for the scoring of these proteins. As your labs and other science projects progress, you may discover new methods of either sampling or scoring and test these out, or test various combinations of methods. As scientists elsewhere publish new sampling and scoring information, you could potentially (depending upon permissions, legal wrangling, IP law, etc.) incorporate and test methodologies and algorithms that probably didn't exist when the DF project was initially conceived.

Precisely because of the huge number of iterations produced by the Distributed Folding project, it seems to be a wonderful platform for testing and verifying new algorithms for and approaches to protein folding, both in terms of creating samples and in scoring the results of those samples. In a sense, you could put a new 'concept' into the program and in only weeks get what might take years to model in a lab (if you can get the computer time and if your 'concept' isn't outdated by time you finish).

Am I way off base here? (it is late and I am REALLY tired, so I might make no sense at all)

**RipItUp** · 05-27-2002, 04:03 PM

Hello Raj,

I'm afraid you've completely missed the point. You've missed the point because you have taken science first and forgotten the human side of things. I'm afraid the scientists do this as well. Not a good start when you are dealing with a group of non scientists who get home from work at 7pm and have to juggle their lives as well as doing scientific work on their PC. We always get the raw deal !

Is this project good ? Yes, to me it is. Not because I know whether it will save mankind but I know it has more chance than all my home computers will have getting a good score on 3Dmark2001. So I do a medical DC project. Maybe it won't work but it has more chance than in benchmarking Quake III.

But what about doing another project, like F@H ? Well. I am a member of [H]ardOCP which has done more units in F@H1 and F@H2 then any other team. I have been there from the start. And we are still there, but quite frankly the client has sucked, so much so that 1/3 of the team went over to United Devices. They could have had even better results if they had made it so we didn't have to sit down at 9pm of an evening to try and connect for 40 minutes.

What I want out of a DC project is :-

1) It works well with little upkeep
2) It might produce a result for mankind
3) It's more useful than not doing it

This project fits all those catagories.

Even if the science to this project is so flaky that it has very little chance then it still should be done to cut off this area of research and allow more productive paths.

Surely this is what brute force is all about ?

So you have missed the point, you wonder why we are doing this, worrying that it might be a blind alley. It might be, we don't know yet, but it makes us happy doing it

LIke I said, the human aspect. It seems to me you know 40% of the science but 0% of why people do DC. There is a carbon side to the silicon chip you know

Out of intest, which DC project are you doing and why ?
Regards

Andy

**RaginSteveK** · 04-06-2003, 10:23 PM

havent read Thing1 about residuals, or sampling, or scoring-- but I intend to.......

appreciate Howard's apparently unlimited patience with those constantly critiquing, and Dr Hogue's cool interjections...

I've diverted 3 PCs [ including one indestructible 850T-bird -powered collection of spare parts, including a 5 yo HDD.. with first -rate DDR RAM].. to this project..

there'd be more on-line, but I'd have to find a way to establish which pile of spare parts [ boards and CPUs ] are actually functional, without tearing perfectly good PCs apart to insert and examine the questionables..
---------

had been goin 24 x7 with 3 or more PCs with Seti, Genome@H, and Folding@H for over 3 yrs before coming here...

sorry I was so late and wasted so much time.......

RagingSteveK

**Codysluder** · 07-02-2003, 03:11 PM

Originally posted by Brian the Fist
- I remember visiting their site many times, clicking on the Publications link, and getting a 'coming soon' message).

They've got a fair complement of papers now . . .
http://www.stanford.edu/group/pandeg...ng/papers.html

Thread: The Science of Distirbuted Folding?

Thread Tools

Rate This Thread

Display

Curems thoughts

Posting Permissions