PDA

View Full Version : 4ICB - How Many Possible Structures?



Orion
04-15-2002, 09:56 PM
"Gene Machine" http://www.wired.com/wired/archive/9.07/blue_pr.html

This absorbing article, referenced by Dr Hogue in another thread, contains the following:

"If you have a short protein (say, 100 amino acids), and each link can be bent or twisted in just three distinct ways, then you have to multiply 3 by itself 100 times to get the number of possible shapes the chain might fold into. This is a big number - roughly speaking, the current age of the universe, squared. One hundred amino acids is a short protein.''

The article goes on to say that local structures and hydrophobics will cut this number down, but what remains is still stupefying.

4ICB, (http://www.imb-jena.de/cgi-bin/ImgLib.pl?CODE=4icb), has 76 residues. Would the total number of possibilities be 3 (or whatever) to the 76th power?

Brian the Fist
04-15-2002, 11:15 PM
That is correct, although the 3 is just an approximations of course. Luckily things we know about other protein structures can help reduce the search space, but it still gives you a feel for the enormity of the problem.

sklepp
04-16-2002, 01:54 AM
Just found this on the web, thought it was funny and coincedental done a search for WD-40 the oil to see if it reacts with rubber and plastics on a car
and found out that there is a Protein called WD40 in Maize.
As you see I am not a Biology major and thought this was a good thread for this.
You always learn something.... :rotfl:

SpongeBob SquarePants
04-19-2002, 12:08 PM
Ni!


:help:

I have posed this question on the Genome @Home forum before.

(Surprise, Surprise, Stefan never answered my question.) :spank:

We have worked on a 36, a 52?, and now a 76. I have seen a few 150's from the genome project. What is the "range" of AA's in nature.

Is cancer a "500"? That is why it is so hard to cure? I don't need super specifics here, I just would like to know if we did have 100,000 folks join could we just throw a 150 AA for 10,000,000,000 just to slow it down to keep pace with influx of users.

What is the highest AA you forsee throwing at us at a later date?

How many AA's are in CASP 5?

A curious SpongeBob

Ni!

ProteinCowboy
04-19-2002, 01:45 PM
Sponge Bob,

I'm more molecular biologist than computer scientist, so I'll try and clarify a few things... aa stands for amino acid, which are the units that make up a protein. There are 20 possible aa's to make proteins out of. You can think of it as beads on a string: you've got 20 different beads with which you can make seemingly unlimited combinations of different lengths. To put things into the context of the genome, DNA contains the instructions regarding how to make the proteins: that is which "beads" to string together.

And leave it to nature, that complicated gal, to make strings of all different lengths! Proteins can be small (only a handfull of aa's) or massive (thousands of aa's). And then of course they have to fold in a particular way to function correctly. Keep in mind too, that this is an oversimplified view. There are many other things going on that add layers of complexity to the whole process. The level of complexity in biological systems still amazes me!

Apologies if I've been too simple or too complicated.

ProteinCowboy

jkeating
04-19-2002, 02:24 PM
And leave it to nature, that complicated gal, to make strings of all different lengths! Proteins can be small (only a handfull of aa's) or massive (thousands of aa's).

ok, but SpongeBob had a good question as I understood it. What is the "average" length of those strings? Do we know? Will we be getting a 5324AA to work on someday? :shocked:

Inquiring minds need to know!

ProteinCowboy
04-19-2002, 04:20 PM
Well I sure don't know what the average aa length of all existing proteins is. You could find out the average length of all the known proteins, by examining the lengths of all the entries in the public protein sequence databases. We also have to keep in mind that we don't know of all the proteins that exist. I suspect that what's in the databanks is probably a representative sample though. Hmmm, maybe there is a massive population of tiny proteins whose existence we biologists have never found yet: exception is the rule in biological science.

I supose its not improbable that we could eventually fold huge proteins... but I really have no idea what the upper aa chain limit would be for this algorithm. Is there one in theory? Or will hardware be the major limiting factor?

Oh, and in reference to Sponge Bob's cancer comment: Cancer is an immensely complex disease. Extremely variable. The probability of two people with the "same" cancer (ex. colon) reacting differently to the same treatments is high, since everyperson's cancer can have different characteristics at the genetic level. So what helps some may do nothing for others. There will never be a single cure for cancer.

ProteinCowboy

SpongeBob SquarePants
04-19-2002, 04:41 PM
Ni!


Proteins can be small (only a handfull of aa's) or massive (thousands of aa's).

PC,

That is exactly what I wanted to know. Thousands! :cheers:

JK had it right what is a "normal" amount of beads on the necklace?

Where would I find the largest necklace? PDB? What do I look for when I am there? There does not seem to be a place where the bead number AA is stated. Do they use another term?

Mr Fist,

How many AA's are in CASP 5?

How long a necklace can we do as a project? Thousands?

Are we semi-randomly placing 20 different beads in the necklace each time or are we trial and erroring a fewer amount like 19 fixed and twenty beads in order?


Thanks,

My little proifera brain is getting tired.

SpongeBob SquarePants

Dutch Kow Suk!

Brian the Fist
04-19-2002, 05:15 PM
For a given protein, the sequence is fixed. For example, the protein we are doing right now is length 76 and has this sequence:

MKSPEELKGIFEKYAAKEGDPNQLSKEELKLLLQTEFPSLLKGPSTLDELFEELDKNGDGEVSFEEFQVLVKKISQ

Each letter is one 'bead'
There's 76 of them.
CASP 5 targets are not yet announced but could be typically anywhere from about 75 to 1000 AA long with the majority around 100 or so. Some proteins can be as large as 3000-4000 AA and that is about the upper limit that has been observed.
Our algorithm scales as N logN with the #AA, however, the conformational space grows exponentially so we have to generate more and more structures to get something good as the protein gets longer.

But all hope is not lost. Luckily, we know large proteins are always composed of what are called 'domains'. These are globular, independently folding units that take the same shape, more or less, with or without the rest of the protein present. Domains are usually in the range 50-200 AA, so a big 2000 residue proteins might be made up of 10 or more domains. We can often predict where these domain boundaries are, and solve the structure 1 domain at a time, and then put them back together again, like a bunch of Lego pieces.
All of the proteins we have worked on so far are single domain proteins - some are actually domains extracted from larger multi-domain proteins in fact, such as 1SHG, the 62-residue SH3 domain we recently did.

Hope this doesn't squeeze your spongy head too hard!

Nofinger
04-19-2002, 07:07 PM
It's great to see that someone of the project takes the time to answer almost every question wich is being tossed at them :cool: :thumbs:

keep the good work going

Paratima
04-19-2002, 07:51 PM
Not only that, but I understood almost every word! :thumbs:

I'm still working on the phrases and sentences, and the paragraphs are downright incomprehensible. But I got the words just fine. :p Thanks, Howard.

Orion
04-19-2002, 08:57 PM
Here's a thread from the PDB that deals with this:

http://www.rcsb.org/pdb/lists/pdb-l/199904/msg00007.html

DF is giving my bookmarked Online Medical Dictionary (http://cancerweb.ncl.ac.uk/omd/ ) a real workout!

OT: Those who have protein viewers should check out 1gav, a real work of nature's art. It's the outer coat of a virus that infects bacteria, but its beautiful!

Brian the Fist
04-19-2002, 09:46 PM
Just keep in mind that the PDB (referenced above) contains only proteins whose structures have been solved experimentally. There are many larger proteins, as I said up to 3000 and beyond, but their structures have not been solved yet, only their sequence is known.

Im happy to educate/answer any of your scientific-type questions. It provides a good break from all the 'I forgot my handle's :p

ColinT
04-19-2002, 10:51 PM
Howard, I lost my password!

Oh, never mind :haddock:



:D