What if DF isn't the best predicter?

**Scott Jensen** · 09-11-2002, 02:50 PM

Well, it seems like CASP should be over soon. How long until then and then how long until they announce the results?

Now if DF wins CASP5, then champagne for everyone. However...

What if DF turns out not to be the best predicter?

**jlandgr** · 09-11-2002, 03:18 PM

Hi,
the CASP prediction season has ended and we have resumed working on phase Ib as described on the DF web site.
As for the results, these will be announced in December AFAIK and we will indeed have to see how we have fared compared to other techniques used

But, even if we aren't the best, I'm sure lots of interesting things can be learned even from results which are not perfect.
We'll have to see, but one way or another I am sure lots of useful information will be gained

Just my (layman) opinion

Jerome

**vsemaska** · 09-11-2002, 03:26 PM

According the the NEWS section, the CASP results will be available in early December.

If DF turns out not to be the best predictor, then they'll have to work on improving it. I think it was Einstein that said: 'If we knew what we're doing, it wouldn't be called research'.

Vic

**bwkaz** · 09-11-2002, 06:16 PM

I think it's someone's .sig on here that says "I haven't failed. I've merely found a thousand (number may vary) ways that don't work." Edison? I think that was who said it...

**Scoofy12** · 09-11-2002, 06:22 PM

It was Edison, I think it was 10,000 and the owner of the .sig is Free-DC's own beloved Moogie

**bwkaz** · 09-11-2002, 10:37 PM

That sounds about right, actually...

**tpdooley** · 09-12-2002, 01:17 AM

the folks that are in charge of grading the results stated that they have them graded within 2 weeks of submission; what's the extra 2 month delay before announcing the winners?

And if you look at the list of Casp proteins, we only worked on a few of them - they had 8-10? for every 2 week period, and we worked on 1 a week. If so, how do we compare what we've worked on - to what other approaches worked on, if they're not using the same protein?

(Is the only way to actually compare results prior to Dec 1-5th's presentation, to visit all the other projects, and see what they announced their score to be for each protein?)

**wirthi** · 09-12-2002, 09:22 AM

As far as i remember Brian mentioned that his group does predict some proteins "manually" (the old fashioned way) and some are predicted using distributed computing. Even the results that came from distributed computing got some manual optimisation.

**Brian the Fist** · 09-12-2002, 10:57 AM

Just to answer a few of your questions..
The CASP targets are evaluated 'automatically' by computer, as well as manually by 'assessors'. There are about 100 groups that can submit up to 5 predictions (the assessor usually only looks at predicition #1) for the 65 odd targets that were given. There are 3 assessors. So that's around 2000 predictions each must evaluate.. thats why they need until December (these are busy people with labs to run, etc too).

We submitted a total of 13 ab initio (from scratch) predictions using the project. This encompassed all the targets which were small enough (under 200 AA) to fold using the DFP and which had no homology to known structure (that means similar sequences).

38 of the targets DID have high to medium sequence identity with proteins of known structure, so for those we used Homology Modelling to do the prediction (basically start from the known structure and a sequence alignment and go from there). Homology modelling can be done on a desktop machine in a few hours and does not require parallel/distributed computing.

The remaining 14 or so targets were very large (over 200 AA) and did not have significant sequence similarity to known structures. Some of these may be predictible using a technique called "threading" which I won't get into here, and which we did not attempt (since we do not have our own unique method for doing this), and the remainder would just be very hard.

Whether we clean up or not though, we are already working on ways to improve the algorithm and make it smarter, which will hopefully come to fruition in several months. In the mean time, we will continue the original experiment to see just how good of a structure we can get in 10 billion samples of several different proteins - the results so far are encouraging as we've already got one below 8.5A for the current protein (and our goal is something in the 6-7 range hopefully). Keep those machines crunching and surely someone will pop out a 6A puppy.

**runestar** · 09-12-2002, 11:48 AM

Originally posted by Brian the Fist
Keep those machines crunching and surely someone will pop out a 6A puppy.

So we get a free pizza on you if we pop it out? =)

RS½

**bubbadog** · 09-12-2002, 12:42 PM

Originally posted by Brian the Fist
Keep those machines crunching and surely someone will pop out a 6A puppy.

Why am I picturing childbirth here?

**Michael H.W. Weber** · 09-15-2002, 06:05 AM

Originally posted by Brian the Fist
38 of the targets DID have high to medium sequence identity with proteins of known structure, so for those we used Homology Modelling to do the prediction (basically start from the known structure and a sequence alignment and go from there).

For those CASP targets with significant sequence identity to known proteins, have the results generated by the homology modelling approach been compared to ab initio predictions using the DF algorithm (performed internally in your lab - else we would of course know the answer to this question

)? If so, how much deviation has been observed? If not, wouldn't this be interesting (although it is somwhat similar to what we do right now)?

Michael.

**Brian the Fist** · 09-15-2002, 11:54 AM

As you just said, that would be similar to what we are doing right now, as well as with the first 5 proteins. The best RMSD is largely dependent on the size of the protein as well as the structure, as we have observed with the first 5 proteins. Proteins with lots of helices in them (e.g. 1VII, 1ENH) are 'predicted' better than those with sheets (1PMC, 1SHG). We definitely do better by using homology modelling when it is possible, typically getting 2-4A RMSD structures for proteins as big as 400 residues or more. Ab initio approaches like the Distributed Folding Project are needed for the many proteins which have no homology to known structrues.
For the CASP targets we submitted from DFP, we expect on the order of 8A RMSD structures for the protein that were around 100 AA and maybe around 10A for the larger ones. While these are not great in terms of practical usefulness, it will probably still be a lot better than what many others can come up with, and we certainly hope to improve further upon it in the future. We'll post structural alignments of the predictions to their true structures in December, so you can judge for yourselves their qualities..

Thread: What if DF isn't the best predicter?

Thread Tools

Rate This Thread

Display

What if DF isn't the best predicter?

Posting Permissions