CASP5 Conference This Week

**Digital Parasite** · 12-02-2002, 08:25 AM

Well the moment we have all been waiting for is happening this week as the CASP5 conference is held. Can't wait to see the results and how we stand compared to everyone else. Hopefully Howard will have web access to at least give us some tidbits of the results before they get back and have a chance to post a full report.

Is anyone from this board besides Howard and gang actually going to be there?

Jeff.

**FoBoT** · 12-02-2002, 09:26 AM

do we know where the conference is being held?
does the conf have some type of web site telling about what they are doing?

**Digital Parasite** · 12-02-2002, 10:00 AM

The CASP5 conference web site is here:
http://predictioncenter.llnl.gov/casp5/Casp5.html

It is being held at the Asilomar Conference Center in Pacific Grove near Monterey, California.

Jeff.

**slez** · 12-05-2002, 10:52 AM

Since today is the last day for the CASP5 conference, does that mean we will know how we did today?

-Scott

**wirthi** · 12-05-2002, 11:47 AM

You are right, Howard should get the information how we did today. That doesn't mean he will tell us today - perhaps he wants to prepare the data befor releasing it - but it should happen in the very near future.

**FoBoT** · 12-05-2002, 01:55 PM

the site linked above mentions

Student Addendum Conference is Dec. 5-6;
The Symposium on Bioinformatics is Dec. 7-8 (in Santa Cruz, CA).

and howard mentioned dec 8
so maybe he is staying for that other stuff

but if we are lucky he will get online soon and throw us a bone!

**MAD-ness** · 12-05-2002, 02:17 PM

I would imagine that the bioninformatics symposium is of interest to the folks at Hogue Bio-informatics and the SLRI.

/me returns to anxiously awaiting CASP5 results.

**Eaglechild** · 12-08-2002, 08:54 PM

Still no word??

**tonyh** · 12-08-2002, 09:22 PM

Just did some web searching and came across the following relevant link: http://speedy.embl-heidelberg.de/casp5/

**tpdooley** · 12-09-2002, 12:15 AM

What number group did Distributed Folding fit under? I couldn't find a name that seemed to fit. (nothing with Hogue or Distributed folding)

**Digital Parasite** · 12-09-2002, 08:03 AM

That server is giving me a "Forbidden" error so I can't access it. Maybe they didn't want people to find it and cut off access?

Jeff.

**Paratima** · 12-09-2002, 09:31 AM

You didn't miss much. Lots of numbers, no explanation.

**Brian the Fist** · 12-09-2002, 10:40 AM

There are a lot of numbers for us to pore through in order to figure out exactly how we did (yes, it is not so straightforward as it sounds). We will post a complete 'report' as soon as we have gone through all the relevant numbers. While I can tell you we did not 'win', we appear to have done reasonably well considering our algorithm is still in its infancy compared to many of the other participants. Clearly brute force alone is not sufficient to predict protein structures but in combination with a 'smart' algorithm we believe it can far exceed what anyone else will be able to do without DC. As we continue to test new enhancements to our current basic algorithm and scoring functions, we expect to see great improvements in our ability to sample structures and pick out low energy ones. Anyhow, will provide more details once we've analyzed the results.

**Nuri** · 01-08-2003, 05:05 PM

Any news on the report?

**Starfish** · 01-09-2003, 03:54 PM

Originally posted by Brian the Fist
Clearly brute force alone is not sufficient to predict protein structures but in combination with a 'smart' algorithm we believe it can far exceed what anyone else will be able to do without DC.

Howard, with 'smart' algorithm: do you mean a 'self-learning' or 'adapting' algorithm with that?

(try and keep the results if it's better then the previous results; i.e. 'evolution-like' ?!?!)

**Kileran** · 01-09-2003, 09:52 PM

there's a lot of stuff that can be done in this project to make the algorythm "smart". right now, it's pretty much guessing. not to mention the fact that there's no organization. with 1500 of us, and a random processor, i bet we're only doing about 5% unique work.

but due to the type of project, i cant say i know how this is going to be done. my brain works logically, so i dont even pretend to understand the algorythms. now, if it were brute force cracking or something where you can literally say "k, that's done, never do it again" then i could give a lot of logical type help

**Aegion** · 01-10-2003, 01:17 AM

Originally posted by Kileran
not to mention the fact that there's no organization. with 1500 of us, and a random processor, i bet we're only doing about 5% unique work.

This is really badly wrong! The fact that we can't easily get a perfect protein is due to the enourmous number of possible configurations for a single protein. The random number generator used with the project ensures that we almost never upload duplicate units. This is not a significant problem. For a general explantion of the science behind the project, check out this link.

**Kileran** · 01-10-2003, 09:41 AM

i'm sorry, perhaps you didn't understand what i meant.

when this client starts, we dont download anything. all we do is process some work, upload it, and process some more.

if a server is not coordinating our tasks, how can we be doing unique work? i understand that the project is INCREDIBLY big, but still. what stops my client from attempting the same structures as someone else.

i know 5% was an exageration, it wasn't meant as an insult. but there is still going to be a lot of work going to waste while we use a random generator.

**m0ti** · 01-10-2003, 10:44 AM

That isn't necessarily correct at all.

First of all, if they're using a 64 bit randomly generated numbers then we're talking about 4 Billion to the power of 4 billion different. That's well, well above the 10+ billion folds we doing per project.

I think Howard mentioned somewhere that only about 5% of the work units returned are duplicated, though probably some of that is people resubmitting work (I've done it a couple of times myself when copying the client (along with completed folds) to another computer).

**Brian the Fist** · 01-10-2003, 11:01 AM

As you say, only about 5% of work is true duplicates, probably because people accidentally or on purpose upload more than once.
Now Kileran, consider this, sometimes referred to as the Levinthal paradox.

Each amino acid has 2 rotational degrees of freedom on the backbone, 360 degrees each. So suppose we sample in 10 degree increments, for a 100 amino acid proteins ('small'). That gives 36^200 as the size of conformational space. And this is not even acoounting for the variation in protein sidechains! Thus it is extremely unlikely ANY two people are generating even remotely similar structures, unless they end up starting with the same random seed (which is 32-bits, but only chosen once at the start of the program). If we assume everyone does batches of 5000 at a time (which is approximately true), then 10 billion / 5000 = 2 million, which means 2million out of the possible 4 billion random seeds are actually made use of - less than 0.1%!

Now this is similar to the birthday problem - if you have 30 people in a room, what is the chance that 2 will have the same birthday?? Higher than you think. Similarly, amongst the 2million seeds, a few are bound to be duplicates (maybe up to 5%) but the majority are unique. And every random seed will lead to a completely different set of random structures due to the vastness of conformational space.

I hope this convinces you that you are wrong.

As for the new method, I am busy coding it now as we speak, the new infrastructure I'm putting in will allow us to try several different ideas, all sharing the common idea of iteration. i.e. make some structures locally, then take the best one(s) and do something with it to then make more structures, etc. Thus there will be a lots less uploading to the server but occasional downloading as well. Most changes will be invisible to the user though. We'll keep you posted as it develops.

**m0ti** · 01-10-2003, 03:35 PM

Howard,

Why don't you do a hash on the handle of the user (or his ID) + a randomly picked number (so as not to get duplicate folds for users with multiple computers) and set that up as the original seed for the random number generator? Should lower the number of potential conflicts.

**Kileran** · 01-11-2003, 02:10 AM

Brian the fist.

You are correct, i am prooven wrong. i knew this was a big project, but i am humbled to know just how big. i've only been on the project for... 4 days now or so, so i havn't gotten caught up on the reading that has been recomended. as i stated earlier, i meant no offence.

so, let me start a new line of questioning. are we doing enough? if it is truly that big, then why do we stop at 10 billion. if there were more people on the project, would it not be usefull to increase that number? or would it be more usefull to just go to 10 billion, but do it more times (updating to a new project more often)

heh, i have a habit of running off at the mouth and asking 16 questions at once

i'l break it down to simplify my intentions.

1) why do we stop at 10 billion. what math produced this number?

2) lets say we boomed overnight. an additional 40,000 people joined the project, would we
a) increase the size from 10 billion, or
b) leave it at 10 billion, and process more variants?

i'm very interested in this project. i've been with distributed.net for years, but thier lack of contact with thier users over the past year has made me lose interest. i'm glad to see the devotion you guy's have to keeping contact with us.

Sean

**Brian the Fist** · 01-11-2003, 10:56 AM

Sean,

To answer those questions, it would be best if you read (and understand) our recent papers (or at least the About->Science section of the main website. There's plenty of info on the website, as well as http://bioinfo.mshri.on.ca/trades if you are interested in the science behind the project and our algorithm.

Our current plan is to improve the sampling from 'random' structures to doing it more intelligently. We shouldn't need to sample 10 billion structures to get what we are getting, if we do it a bit more intelligently. Massive computational resources will still be required though

**MarcyDarcy** · 01-11-2003, 02:55 PM

Originally posted by Brian the Fist
Sean,

...
Our current plan is to improve the sampling from 'random' structures to doing it more intelligently. We shouldn't need to sample 10 billion structures to get what we are getting, if we do it a bit more intelligently. Massive computational resources will still be required though

So if i correctly understand what you mean, is that there will be a new client which has some 'intelligence' in it to improve the change on finding a smaller RMS?

**pmfp** · 01-11-2003, 06:45 PM

Originally posted by MarcyDarcy
So if i correctly understand what you mean, is that there will be a new client which has some 'intelligence' in it to improve the change on finding a smaller RMS?

As have been said: in some future client, it will download some crap which will then be combined with random junk to form an organized pile of shit which will then be transported on the info highway to Brian the naughty boy. That's what I caught, perhaps there's more.

pmfp sends

**MAD-ness** · 01-11-2003, 06:53 PM

Don't quote me on this (I don't feel like digging up an exact answer) but I think it has been said that the odds of finding a "better" structure increase in a fashion that is approximately logarithmic.

Some math mumbo-jumbo like that. Anyways, to get significantly lower RMSD values than what we see at 10 billion samples we would have to get a LOT more samples.

A more detailed answer by Howard (Brian the Fist) is here on the forum somewhere and the science papers he linked to have a ton of info.

**Scoofy12** · 01-12-2003, 07:37 PM

in answer to your question regarding number of proteins to sample, you may want to check out this thread.

in short, dr. hogue and howard (et al?) have shown that this particular method results in what is called an extreme value distribution, which has well-known properties. from this, they can predict *approximately* how many samples they need for a certain accuracy, and vice versa (this is for the case when we know the structure already, leaving out the scoring problem). thus, once we have a certain number of samples, we should be able to predict how good we will do given x number of samples more. generally, the rule of diminishing returns applies liberally, as you might expect from the graphs.

also check out the "educational" threads (at the top of the DF forum) and if the forum shows no threads, look below where it says "Showing threads 1 to n of x, sorted by [last post time] in [descending] order, from [last x days]", change the last option box to "the beginning" (BTW, any way to make that the default?)

Thread: CASP5 Conference This Week

Thread Tools

Rate This Thread

Display

CASP5 Conference This Week

Posting Permissions