Results 1 to 28 of 28

Thread: This is the best we can do?!

  1. #1

    Unhappy This is the best we can do?!

    One second please.

    *Scott puts on his flame-proof fireman's outfit ... including independent oxygen supply.*

    Thanks for waiting.

    Well, I just looked at the graphics for all the proteins we've done so far and thought to myself:

    OH MY GOD, WE DID THAT BADLY?!

    From what the scientists have been saying here and what I've read elsewhere, the shape of the protein determines its function and our aim is to make our simulation as close as possible to the real thing so it will do what the real one does. Too far off and the simulation is worthless ... or worse if medical drugs are developed thinking the simulation is the true structure but the simulation is so far off that the drug they're developing will have little or even bad effects on the actual real protein. And at the protein level, even an angstrom (a tenth of a billionth of a meter) difference supposedly really does matter!

    When looking at the graphics of the protein structures we've already attempted to predict, I get the impression of a drunk trying to walk a drawn curvy line. Sorry, but those graphics rather depressed me as for what we're trying to accomplish here. And this is with a KNOWN protein. One where we know the RMSD score. Now we're about to do unknown ones for CASP and I cannot even imagine how far off the comparison graphic will be when we finish theirs.

    As for what I was expecting, I was expecting just a slight drawing outside of the lines. Like there being a third color of green when segments exactly matched the protein with blue and yellow coming into play when they didn't ... and that would be only a few tiny times. Now if that really did happen, I'd HIGHLY recommend adding this new color and the reason for the color. Perhaps even more of a rainbow separation thing where it's green if they perfectly match and shift more to the different ends of the color spectrum the further they get apart (deep red for native and deep purple for software generated).

    OK, now I'm fully expecting "love it or leave it", "it's better than nothing", and "Gosh, Scott, would you like a billion dollars?" replies to this post. Note the flame suit. However, while I expect and won't get really upset by those replies (well, especially not the last one ), I would appreciate some good scientific assurance (in layman terms) that we're not wasting our computers on this project. And did FAH do any of these proteins? If so, could you show how their graphics looked ... even a comparison graphic with DF's simulation in one color, FAH's in another color, and the real protein in a third? Perhaps real protein in green, DF in yellow, and FAH in blue.
    Last edited by Scott Jensen; 05-29-2002 at 10:30 AM.

  2. #2
    I'm going out on a limb here - bioinformatics isn't my specialty.

    As I read it - 4A is where you start developing drugs with some confidence.

    The project is young. We're not designing drugs. Fact is, we have bupkiss for computing power in the project. If the methodology proves effective - a drug company would drop a billion dollars on a hypercluster like that [sound of snapping fingers]. 10 year ROIs are not uncommon in that industry.

    As I see it, you have an experiment. Howard's team has an idea. For how to go about this. They have an algo and a theory. So do the other related projects. Both the algo and the theory are promising. So are everyone elses.

    So they want to go from theory to hypothosis to proof. The only way to do that is through experiment. We're participating in that.

    None of the work we're doing is commercially applicable. It's theoretical. It's a necessary step - but the proverbial rubber is a long way from meeting the road.

    I go the opposite direction - I'm terribly impressed with how little drift I see from actual and derived. I'm seeing RMSd approaching commercial acceptability (albiet on very small protein) with *very* little processing power.

    Ok - now it's time for Elena or Howard to step in and tell me where *I'm* full of it.

  3. #3
    Mac since '86
    Join Date
    Apr 2002
    Location
    Silverdale, WA
    Posts
    51
    Jodie, I think you are pretty much on track with your comments. Protein Folding Predicition is very much leading edge stuff. Casp 5 will go a long way toward demonstrating how close this approach can come.

    From what I read before, it seems that the idea of what we are doing is more to really narrow the field, so that other methods can be used to finalize the result. If effective, this would be used more to create "candidates" that can then be more effeciently pursued from other approaches. Kind of a filtering process.

    One such mega-cluster like you mentioned is IBM's Blue Gene??? which is under development. Can't remember all the stats except for a few things like "Water cooled with a 1000 gallon reservoir tank in the basement. (Fan noise would be intolerable with all the cpu's involved) and it would use something like 1.5 megawatts of power.

    If successful, it is possible that Distributed Folding could eventually collaborate with other such projects.

    Allegory: We are working on this massive, hay stack. It is huge, the size of several cities. Somewhere in it there is a straw of hay in it with the perfect needle. Our job is to go through the haystack one straw at a time, and to pull out the most likely candidates for straws that might hide the perfect needle. These candidates will then go into a much smaller haystack, which can then be analyzed by other processes, to find the perfect needle, or a suitably close match to it.

  4. #4
    Some of the proteins were not very close (visually) to the 'real' structures, but they were almost all recognizably similar visually. We aren't trying to find the exact structure in microscopic detail (they do that in labs with x-ray crystallography or some other stuff I am not allowed to attempt to pronounce, spell, or understand).

    We are trying to find (and prove) a method for taking an unknown protein and 'predicting' its structure with a certain degree of accuracy. From what I have gathered, NONE of the protein structure prediction methods (Folding@home) and DF included, should be expected to produce an EXACT result. Lab work would still need to be done for the exact result to found (even this might not technically be 'exact').

    Based upon my limited understanding of the science involved, all of the results (not every single one, but the best of each protein run) in Phase IA were good enough to be of use. For example, trying to sort/categorize/organize/identify the proteins discovered as a result of the Human Genome Project. You obviously can't do lab work on every single one of them simultaneously, so you either find ways to "sort through" them or you sit around waiting while the limited lab resources are used to find the structures of all of these unknown proteins.

    DF also appears to have a good apparatus for implementing very fast tests of both sampling and scoring methods as new methods are discovered/created/shared/etc.

    Basically, they have a platform that allows them to test sampling and scoring methods very quickly (a very large number of iterations, a very large "sample set" in a very short amount of time). As the samping and scoring methods become more and more accurate (and robust) the amount of more direct science that they can do with the project should increase.

    These are all my (long-winded) guesses, btw, not anything I took from a paper or am quoting. (would be a very bad paper or a very dumb scientist if this was the case).

    =)

  5. #5
    Once again I am amazed and impressed by the technical and scientific understanding of you folks. Soon I won't have to write anything at all You've pretty much got it summarized. If I can just add my 2 cents to allay Scott's concern..

    Keep in mind that these pictures are just representations of the actual proteins, in fact just the backbone (they also have side chains sticking out all along the backbone, which are hidden in the picture to make it easier to see). When you view them in spacefilling images with all atoms present, it actually doesn't look too bad (but then its very hard to distinguish which is which without 3-D glasses - hence the simplified representation). Proteins are extremely complex. The number of possible conformations is astronomical. We have sampled just 1 billion of them in each case, essentially at random, and the results are quite impressive considering. We have basically illustrated the limits of random sampling for structure prediction. The theoretical number of conformations of even a small protein, say 60 resdiues long, is on par with the number of particles in the universe! Imagine searching for one particular particle in the whole universe.. it would take a while.
    We are working on the hardest part of the protein folding problem, called 'ab initio' - that is, we have no knowledge of the true structure and are essentially 'guessing' at it based only on our scoring functions. In the case of the first 5 proteins we have done here, we know what the best structures are for each protein because their structure is known. However, we did NOT use our knowledge of the structure to help us in anyway during the sampling. So we expect similar results for novel proteins of comparable size.

    Another method of structure prediction is sometimes called 'homology modelling' - this is used when your protein sequence matches closely with that of another protein, whose structure is already known. This occurrs when two proteins are evolutionarily related. For example, insulin in a pig and a human differ only in one amino acid position (dont quote me on that) and so one can infer that their structures are very similar - if the structure of one is known, the structure of the other can be 'predicted' by starting with the known structure and modifying it from there. This method is by no means trivial (I wont go into details) but generally produces very good models/predictions. The catch is, you need a similar protein of known structure. If none exists, you're out of luck and must resort to ab initio.

    As Raj has mentioned in another thread, the current 'king' of ab initio prediction, so to speak, is David Baker in Seattle, WA. I suggest looking at papers from his lab/his web site for details on his methods, if you are interested. Basically the state of the art in ab initio prediction is still quite crude though, I will freely admit.
    It hasn't been called the most difficult biological problem in history for nothing after all

    As for F@H, as far as I know the only protein we have in common is 1VII. I believe the best structure they obtained was 3.3 A but you will have to ask them if you want to see a structural alignment or the structure itself. Keep in mind that their goal is NOT structure prediction though, it is to investigate the folding pathway of proteins - i.e. how it gets from unfolded to folded and what all the intermediate steps are.

    As a final point, our structures are sometimes referred to as 'unrefined'. That is, they are raw predictions straight out of the program. The can be subjected to energy minimization techniques such as molecular dynamics and simulated annealing which could (and should) reduced the energy, and RMSD, further. Until now, this was not necessary for our purposes, but we may indeed refine some of the CASP predictions. And we indeed intend to test new scoring functions and sampling methods, both during CASP and beyond.

    You have all brought up some great points here!
    Howard Feldman

  6. #6
    Originally posted by Brian the Fist
    You have all brought up some great points here!
    OK, now if someone would care to explain in real simple layman terms what the following "good points" mean, I'd truly appreciate it. First up...

    "...the idea of what we are doing is more to really narrow the field, so that other methods can be used to finalize the result. If effective, this would be used more to create "candidates" that can then be more effeciently pursued from other approaches. Kind of a filtering process."

    OK, I don't get this at all. Narrow the field? Narrow it to what? Are we striking out into the unknown hoping to come across something useful OR are we taking a cue from someone to look in a certain lagoon for buried treasure? Are we at the start of the whole process or somewhere in the middle? If in the middle, where in the middle? The whole middle? Part of it? If part of it, which part? Early middle? Mid-middle? Late middle? By your statement, we're apparently not at the end of the process.

    And candidates for what?

    ---

    "If successful, it is possible that Distributed Folding could eventually collaborate with other such projects."

    How and in what way? How would it help IBM's Blue Gene? Would IBM even ask for DF's help? And how important would such assistance be to them? Will they come begging for our help or would we have a problem getting them to even return our phone calls?

    ---

    "We aren't trying to find the exact structure in microscopic detail (they do that in labs with x-ray crystallography or some other stuff I am not allowed to attempt to pronounce, spell, or understand)."

    What!? Why aren't we trying to find the exact structure? I would think that would be exactly the goal of Howard and Dr. Hogue.

    And cannot x-ray crystallography be highly automated so ... [Scott tries to do his best Carl Sagan impression] ... billions upon billions upon billions can be done? How long does x-ray crystallography take to do one protein? How expensive is it? Wouldn't it be better to put smart people like Howard and Dr. Hogue to work on making lightning-fast x-ray crystallography technology than this guesstimating software?

    ---

    "We are trying to find (and prove) a method for taking an unknown protein and 'predicting' its structure with a certain degree of accuracy."

    OK, this doesn't make sense. What do you mean by "unknown protein"? How can you take it if you don't know it? How can you know you predicted it right if you don't know it in the first place?

    ---

    "Based upon my limited understanding of the science involved, all of the results (not every single one, but the best of each protein run) in Phase IA were good enough to be of use."

    For what?

    ---

    "For example, trying to sort/categorize/organize/identify the proteins discovered as a result of the Human Genome Project. You obviously can't do lab work on every single one of them simultaneously, so you either find ways to "sort through" them or you sit around waiting while the limited lab resources are used to find the structures of all of these unknown proteins."

    But how do we know which ones are the best ones to devote our efforts to? Or are we simply taking ones willy-nilly and seeing if anything worthwhile comes out of it? If the second one, how would we even know if that simulated protein is worth anything or even what it does?

    ---

    "DF also appears to have a good apparatus for implementing very fast tests of both sampling and scoring methods as new methods are discovered/created/shared/etc.

    Basically, they have a platform that allows them to test sampling and scoring methods very quickly (a very large number of iterations, a very large "sample set" in a very short amount of time). As the samping and scoring methods become more and more accurate (and robust) the amount of more direct science that they can do with the project should increase. "


    Ahhh! Perhaps there's something here. That being "very fast tests". Testing for what? For whom? Why? What value do these tests represent and mean to scientists and the biotech industry?

    ---

    "Keep in mind that these pictures are just representations of the actual proteins, in fact just the backbone (they also have side chains sticking out all along the backbone, which are hidden in the picture to make it easier to see). When you view them in spacefilling images with all atoms present, it actually doesn't look too bad (but then its very hard to distinguish which is which without 3-D glasses - hence the simplified representation)."

    One, can we see these more detailed pictures?

    Two, could you set up some little slide show type thing on the website with a clicker button that would enable us to click back and forth between the real protein and its simulated prediction for us to see the differences ourselves?

    ---

    "We are working on the hardest part of the protein folding problem, called 'ab initio' - that is, we have no knowledge of the true structure and are essentially 'guessing' at it based only on our scoring functions."

    And this scoring system works how? Also, please remember you're explaining this to a moron.

    ---

    "So we expect similar results for novel proteins of comparable size."

    I assume "novel" means "unknown". But this just sounds odd as well. I mean how can you fold something you don't know? It doesn't make sense. Do you understand what I'm not getting? Is it that you have a string of protein components you've been told are proteins but they haven't been predicted/folded yet? Is that it? If so, how do you know they know these components make up a protein the first place? Where and how did you get this list of protein parts without knowing its shape? See what I'm getting at? It's like someone comes to me with a dump truck full of vehicle parts and tells me that they'll make a complete working car.

    I ask "How do you know they'll make a car?"

    The delivery person says "Ummm. Well. Errrr."

    "And you expect me to create a car out of these parts? Parts which you've just told me you don't know how they fit together in the first place BUT you're sure they'll all fit together."

    "Precisely."

    "Right. You stand right there and don't move while I call the funny farm and ask if anyone's missing."

    ---

    "As Raj has mentioned in another thread, the current 'king' of ab initio prediction, so to speak, is David Baker in Seattle, WA. I suggest looking at papers from his lab/his web site for details on his methods, if you are interested."

    Marketer here. Not scientist. I really doubt I'd be able to understand his papers anymore than I do yours. Thank god for this forum so I can at least ask questions to TRY to understand what's going on.

    ---

    "It hasn't been called the most difficult biological problem in history for nothing after all."

    Why? Nothing else in biology is this complex of a problem? Nothing? And who has said it was "the most difficult biological problem in history"? And, NO, I will not take Oprah's word for it.

    ---

    "As for F@H, as far as I know the only protein we have in common is 1VII. I believe the best structure they obtained was 3.3 A..."

    With ours being 2.03.

    "...but you will have to ask them if you want to see a structural alignment or the structure itself."

    Could someone do this and post it or a link to it here? Even put DF's picture next to it? Due to personal reasons, I really don't want to interact with FAH if at all possible.

    "Keep in mind that their goal is NOT structure prediction though, it is to investigate the folding pathway of proteins - i.e. how it gets from unfolded to folded and what all the intermediate steps are."

    What?! Isn't their end goal the same as yours? To present the true structure of the protein. If that wasn't their goal as well, their willy-nilly folding of a protein components would be rather stupid and silly.

    ---

    "As a final point, our structures are sometimes referred to as 'unrefined'. That is, they are raw predictions straight out of the program. The[y] can be subjected to energy minimization techniques such as molecular dynamics and simulated annealing which could (and should) reduced the energy, and RMSD, further."

    Will someone please translate the above? Sorry, I only know English.

    "raw predictions"???

    "energy minimization techniques"???

    "molecular dynamics"???

    "simulated annealing"???

    "which could (and should) reduced the energy, and RMSD, further"???

    ---

    Whether it was a blessing or curse, having been raised by two teachers (father a psychology professor and mother a second-grade teacher), I've long taken the attitude that you're only an idiot if you don't know something and don't try to find out the answers. Or as my father always told me: The only dumb question is the one not asked.

    Then again, my father also told me that the more you know, the more you know how little you know. Hmmm. Since I already feel like I know almost nothing, maybe I should stop asking questions now. *insane laugh* Yeah, right! *more insane laughing*

  7. #7
    aka rebldomine
    Join Date
    Apr 2002
    Location
    S. California
    Posts
    44
    scott, i think the word you want for "firemans outfit, with indepentent oxegen supply" is that of bunker gear and a scba...

    sorry im currently working on a degree in fire science in college, currently enrolled in a fire academy and so on...

  8. #8
    Originally posted by Scott Jensen

    OK, now if someone would care to explain in real simple layman terms what the following "good points" mean, I'd truly appreciate it. First up...

    OK, I don't get this at all. Narrow the field?

    And candidates for what?


    Candidate structures - i.e. a structure that is likely to be close to correct.

    ---

    How and in what way? How would it help IBM's Blue Gene? Would IBM even ask for DF's help? And how important would such assistance be to them? Will they come begging for our help or would we have a problem getting them to even return our phone calls?

    The Blue Gene team is aware of our software and may indeed have use for it later on when they are closer to completion

    ---

    What!? Why aren't we trying to find the exact structure? I would think that would be exactly the goal of Howard and Dr. Hogue.

    And cannot x-ray crystallography be highly automated so ... [Scott tries to do his best Carl Sagan impression] ... billions upon billions upon billions can be done? How long does x-ray crystallography take to do one protein? How expensive is it? Wouldn't it be better to put smart people like Howard and Dr. Hogue to work on making lightning-fast x-ray crystallography technology than this guesstimating software?


    Well this is a loaded question. Exact structure is an oxymoron in this case. Proteins are DYNAMIC molecules, constantly moving and rolling and vibrating around in solution. The best we can hope to do is find an approximate structure. Even then we can't expect a perfect match to the crystal structure. It is impossible to model reality sufficiently, at present, to ever expect an EXACT prediction. As for X-ray crystallography the limiting step is making crystals which can take for 3 mo. to 1 year or more for just one protein (protein dependent). There are indeed companies working on high-throughput crystallography, but the catch is, they can only do this on a certain fraction of proteins (maybe 5%) which crystallize easily. Other people are working on ways to speed up the crystallization process.
    One advantage of software is in the future, when it works perfectly , we can design new proteins which don't yet exist and predict their structures, and thus functions. Also, crystallizing a protein and getting its structure can be very expensive, costing tens of thousands of dollars in equipment, chemicals and labour. Also many proteins can never be crystallized (or are extremely difficult) and must be solved another way.

    ---

    "We are trying to find (and prove) a method for taking an unknown protein and 'predicting' its structure with a certain degree of accuracy."

    OK, this doesn't make sense. What do you mean by "unknown protein"? How can you take it if you don't know it? How can you know you predicted it right if you don't know it in the first place?


    That should probably mean protein of unknown structure, but known sequence. Our goal is to get to the structure from the sequence.

    ---

    "Based upon my limited understanding of the science involved, all of the results (not every single one, but the best of each protein run) in Phase IA were good enough to be of use."

    For what?


    For designing drugs, predicting function, finding homologs (similar evolutionarily related proteins), and lots of other cool stuff

    ---

    "For example, trying to sort/categorize/organize/identify the proteins discovered as a result of the Human Genome Project. You obviously can't do lab work on every single one of them simultaneously, so you either find ways to "sort through" them or you sit around waiting while the limited lab resources are used to find the structures of all of these unknown proteins."

    But how do we know which ones are the best ones to devote our efforts to? Or are we simply taking ones willy-nilly and seeing if anything worthwhile comes out of it? If the second one, how would we even know if that simulated protein is worth anything or even what it does?


    There are methods for picking out 'interesting' proteins - usually ones that don't have any sequence similarity to known proteins, that have no known function, and that appear in many organisms. We can also predict domain boundaries and stuff like that but I won't go into that.

    ---

    "DF also appears to have a good apparatus for implementing very fast tests of both sampling and scoring methods as new methods are discovered/created/shared/etc.

    Basically, they have a platform that allows them to test sampling and scoring methods very quickly (a very large number of iterations, a very large "sample set" in a very short amount of time). As the samping and scoring methods become more and more accurate (and robust) the amount of more direct science that they can do with the project should increase. "

    Ahhh! Perhaps there's something here. That being "very fast tests". Testing for what? For whom? Why? What value do these tests represent and mean to scientists and the biotech industry?


    We are not a commercial entity, we are a research institute. Our goal is to work on the protein folding problem. Whether or not this benefits the biotech industry is of no importance, although they would definitely benefit by being able to design drugs for proteins whose structures have not/ cannot be solved by other methods. We wish to test different scoring and sampling methods to optimize our approach, and improve our predictions of protein structures.

    ---


    One, can we see these more detailed pictures?

    Two, could you set up some little slide show type thing on the website with a clicker button that would enable us to click back and forth between the real protein and its simulated prediction for us to see the differences ourselves?


    Not sure what you want with number two but we'll add to the results section the actual structures so you can view them in Cn3D and see for yourself.

    ---


    And this scoring system works how? Also, please remember you're explaining this to a moron.


    It is a black box as far as you can see. In goes the structure, out comes the score. There present ones we are trying work by counting contact pairs in space. Basically all residues pairs within a certain distance in space are listed, and each gets a score depending on how likely those two residues are to be found close together. We just add up the score for all pairs.

    ---

    "So we expect similar results for novel proteins of comparable size."

    I assume "novel" means "unknown". But this just sounds odd as well. I mean how can you fold something you don't know? It doesn't make sense. Do you understand what I'm not getting? Is it that you have a string of protein components you've been told are proteins but they haven't been predicted/folded yet? Is that it? If so, how do you know they know these components make up a protein the first place? Where and how did you get this list of protein parts without knowing its shape? See what I'm getting at? It's like someone comes to me with a dump truck full of vehicle parts and tells me that they'll make a complete working car.


    Proteins are like beads on a string. It is easy to get their sequence - the order of the beads on the string. But it is hard to get their 3-D shape - how that string folds up into a globular structure. Again, 'unknown' implies unknown structure but known sequence.

    ---

    "It hasn't been called the most difficult biological problem in history for nothing after all."

    Why? Nothing else in biology is this complex of a problem? Nothing? And who has said it was "the most difficult biological problem in history"? And, NO, I will not take Oprah's word for it.


    Ask a biologist "what's the most difficult problem in biology" and see what they say

    ---

    "Keep in mind that their goal is NOT structure prediction though, it is to investigate the folding pathway of proteins - i.e. how it gets from unfolded to folded and what all the intermediate steps are."

    What?! Isn't their end goal the same as yours? To present the true structure of the protein. If that wasn't their goal as well, their willy-nilly folding of a protein components would be rather stupid and silly.


    Nope, Im pretty sure their goal is to investigate folding pathways, not structure prediction. But again, I can't speak for them. Not sure what you mean by willy-nilly folding though.

    ---

    "As a final point, our structures are sometimes referred to as 'unrefined'. That is, they are raw predictions straight out of the program. The[y] can be subjected to energy minimization techniques such as molecular dynamics and simulated annealing which could (and should) reduced the energy, and RMSD, further."

    Will someone please translate the above? Sorry, I only know English.


    "raw predictions"???

    Like it says, straight out of a single program, not refined. See below.

    "energy minimization techniques"???

    minimize the energy of a protein using some technique

    "molecular dynamics"???

    see F@H

    "simulated annealing"???

    see F@H

    "which could (and should) reduced the energy, and RMSD, further"???

    See energy minimization techniques above

    ---
    Howard Feldman

  9. #9
    Mac since '86
    Join Date
    Apr 2002
    Location
    Silverdale, WA
    Posts
    51

    Lightbulb

    Let's see if I can get this straight and simple without mangling it too badly:

    Distributed Computing: The ultimate goal is to be able to take a known string of amino acids (the beads) and "predict" how they will fold into a protein whose final form is not yet known. The method is tested against "known" proteins for testing purposes, but in CASP 5 it will be tested against unkown proteins. The advantage is that it will "speed up the process of discovery."

    Folding@Home: Takes Known amino acid strings and known final proteins and trys to figure out the process of how they "fold" from one to the other. A step by step anylsis of the process. Also key to their project is how some proteins "mis-fold" or essentially mutate into proteins that do damage rather than helpful proteins and cause disease, which is the suspected cause of diseases such as Alzheimers, mad cow disease, etc. It is like a book where you know the the first paragraph of the prelude, and the last paragraph of the conclusion, and are trying to fill in everything that happened in between.

  10. #10
    dismembered Scoofy12's Avatar
    Join Date
    Apr 2002
    Location
    Between keyboard and chair
    Posts
    608
    Here's (hopefully) a little bit more clarity on the few things i understand

    "Narrowing the field?"
    Given the amino acid sequence makes up any given protein, there are a LOT of different possible ways the amino acids (residues) could be oriented with respect to each other, or possible different structures. "the field" is the set of all the possible (or probable) structures that could be assumed by a given sequence. we, of course, are looking for the right one, or as close to right as we can get it. Our technique of generating them starts with no knowledge of this strucuture and while it's true that random processes are used to generate the structure, it's not totally random. remember the "random walk" we were talking about in the other thread? we start with 2 residues and add them one at a time. for each individual residue we add, we "point" it in some direction sorta-randomly with respect to the rest of the structure... but not totally randomly. remember those "trajactory distributions"? we have a sort of probability map with the relative likelihood for each direction, and i expect the probabilities in our generation are wieghted to correspond to this map. back to my point... by narrowing the field, we mean we use our algorithm to pick out likely configurations or structures, which we can then "tweak" using other, possibly more accurate, methods. I see this as sort of the start of the process.

    "and this scoring system works how?"
    I dunno but this is neato and im gonna look at it more

    "energy minimization"
    I i have think the general idea of this. its sort of a potential energy... i dunno how much chemistry you remember, but you may recall that certain atoms, compounts, molecular structures, etc are said to have higher energy than others. think of it as a sort of potential energy if you want. a stack of books on a shelf. while it may be stable, has more energy as a system than a stack of books on the floor. given the opportunity, the books up high will travel toward the books on the floor (ie down) because its a position of lower energy. likewise, certain chemical configurations (positions of atoms and groups of atoms with respect to each other) have higher and lower energy. this is due to a lot of things, including things as simple electrostatic forces (ie like and opposite charges attracting and repelling each other), to (i think) things as strange as forces due to the pauli exclusion principle. there are interactions between atoms, groups of atoms, the structure with itself and its environment and most of these interactions dictate a certain way that things just sorta "like" to arrange themselves. these "preferred" configurations are said to have lower energy (consequently you can get energy out of the system by allowing it to go to a state of lower energy. this is why some chemical reactions give off heat). anyway the long and short of it is, you can calculate some of these interactions and get a value of energy... then you know that structures with lower energy are more likely to occur in nature (just like there are a LOT more O2 molecules in the air than individual Oxygen atoms because it's a much lower energy state.... however you do still find O atoms).... sooo, we guess that lower energy structures are more likely to be closer to "right."

    "molecular dynamics?"
    in a simple nutshell (am i even capable of that?), dynamics is the study of forces and their effect on motion (as opposed to kinematics, which is just the motion itself). so molecular dynamics looks at forces on the molecular level. it's all related because all those forces contribute to the relative energies of the structures. if i'm not mistaken F@H makes extensive use of molecular dynamics because they are most concerned with the motion of the atoms. they want to know HOW it folds, as opposed to WHAT it looks like after it's done.

    "simulated annealing?"
    hm, i dunno... relevant definition of annealing from m-w.com: to heat and then cool (nucleic acid) in order to separate strands and induce combination at lower temperature especially with complementary strands of a different species.... so it seems like this process might unfold the protein and then let it fold up again? dunno, a quick search of F@H didnt find that word anywhere.

    anyway, hope that helps.

  11. #11
    All very interesting.

    This type of science is the FUN type of science. You can't directly view/capture/determine the item being studied (or its behavior) so you have to find unique/cunning/inventive ways of narrowing the possible options. Half scientific method, half Sherlock Holmes. =)

    The science involved in this project is truly multi-discipline in its nature. What have we covered so far? Statistics (probability, curves, RMSD, etc.). Chemistry (energy states, molecules, etc.). Biology (where to begin?). Mathematics. Computer science. Well, I suppose something like neural network programming incorporates about half of those elements anyways.

    Good Stuff. =)

    (even if I don't understand wtf it all means)

  12. #12
    Mac since '86
    Join Date
    Apr 2002
    Location
    Silverdale, WA
    Posts
    51
    Originally posted by MAD-ness
    All very interesting.

    This type of science is the FUN type of science. You can't directly view/capture/determine the item being studied (or its behavior) so you have to find unique/cunning/inventive ways of narrowing the possible options. Half scientific method, half Sherlock Holmes. =)
    I think that might be 1 part scientific method, 1 part Sherlock Holmes and 1 part Las Vegas craps.


  13. #13
    ME: And candidates for what?

    HOWARD: Candidate structures - i.e. a structure that is likely to be close to correct.


    But, again, candidates for what? Or is this some different definition thing within your field where the common definition for a word isn't the same as one used in your specialized industry? To me, "candidate" means it is up/running for something. What I'm wondering is what that "something" is.

    ---

    ME: How and in what way? How would it help IBM's Blue Gene? Would IBM even ask for DF's help? And how important would such assistance be to them? Will they come begging for our help or would we have a problem getting them to even return our phone calls?

    HOWARD: The Blue Gene team is aware of our software and may indeed have use for it later on when they are closer to completion.


    But in what way would they likely use it? Also, isn't Folding@Home closer to what they're doing than what DF is doing?

    ---

    ME: Why aren't we trying to find the exact structure? I would think that would be exactly the goal of Howard and Dr. Hogue.

    HOWARD: Well this is a loaded question. Exact structure is an oxymoron in this case. Proteins are DYNAMIC molecules, constantly moving and rolling and vibrating around in solution. The best we can hope to do is find an approximate structure. Even then we can't expect a perfect match to the crystal structure. It is impossible to model reality sufficiently, at present, to ever expect an EXACT prediction.


    So what is the RMSD score then? Isn't 0A a perfect match? Or are you saying that we will never see a 0A score?

    HOWARD: As for X-ray crystallography the limiting step is making crystals which can take for 3 mo. to 1 year or more for just one protein (protein dependent). There are indeed companies working on high-throughput crystallography, but the catch is, they can only do this on a certain fraction of proteins (maybe 5%) which crystallize easily.

    Crystallize? How do you crystallize a protein? You mean make it solid as opposed to a more fluid structure? But you just said: "Proteins are DYNAMIC molecules, constantly moving and rolling and vibrating around in solution." So wouldn't crystallizing them give you a false ... or at least a less accurate idea of the protein?

    HOWARD: Also many proteins can never be crystallized (or are extremely difficult) and must be solved another way.

    Why can they never be?

    ---

    ME: And this scoring system works how? Also, please remember you're explaining this to a moron.

    HOWARD: It is a black box as far as you can see. In goes the structure, out comes the score.


    Is that to keep the scores honest?

    ---

    HOWARD: Proteins are like beads on a string. It is easy to get their sequence - the order of the beads on the string. But it is hard to get their 3-D shape - how that string folds up into a globular structure. Again, 'unknown' implies unknown structure but known sequence.

    So spotting a protein is easy, correct? But how do you get their sequence? Why is that also easy? Isn't there a concern that you straighten it out (or whatever you do to get the sequence) wrong thus end up doing a simulation on a false protein? And since there's "billions" of proteins, how would you know you got it wrong if you did? It would seem it would be rather difficult finding another exact one since there's so many to choose from.

    ---

    HOWARD: Keep in mind that their [Stanford's Folding@Home's] goal is NOT structure prediction though, it is to investigate the folding pathway of proteins - i.e. how it gets from unfolded to folded and what all the intermediate steps are."

    ME: What?! Isn't their end goal the same as yours? To present the true structure of the protein. If that wasn't their goal as well, their willy-nilly folding of a protein components would be rather stupid and silly.

    HOWARD: Nope, Im pretty sure their goal is to investigate folding pathways, not structure prediction.


    But, again, this doesn't make sense. How do they know they folded it right if their goal isn't to get the structure right? Wouldn't they need the fold to end up with the correct structure and if it didn't, it would mean they didn't take the correct pathways?

    HOWARD: Not sure what you mean by willy-nilly folding though.

    Willy-nilly is a derogatory term for randomly. More like "foolishly randomly" or "randomly without care".

    ---

    ME: "energy minimization techniques"???

    HOWARD: minimize the energy of a protein using some technique


    Errr? What? Hmmm. Let me take a stake in the dark and see if I get someone's back. Are you saying that the less energy a protein need to expend to completely fold a certain protein sequence is how nature likely (or exactly?) would fold said protein? Also, what do you mean by energy? Where is this energy coming from? Is the protein a living functional organism before or after it folds? Or does something else fold it for it? If something else, what is that?

    ---

    SCOOFY12: "the field" is the set of all the possible (or probable) structures that could be assumed by a given sequence.

    Which -- as someone (Howard?) has said -- there are more than there are atoms in the universe, correct?

    SCOOFY12: remember those "trajactory distributions"? we have a sort of probability map with the relative likelihood for each direction, and i expect the probabilities in our generation are wieghted to correspond to this map.

    One, I don't remember "trajactory distributions". And it is "trajectory" and not "trajactory", right? I couldn't find "trajactory" in my common-usage dictionary, but that's not saying doesn't exist within this specialized field. And I think it is better to ask than assume in this situation.

    And the "probability map with the relative likelihood for each direction" is based on what? The before-mentioned energy used as talked about earlier? Less energy used being the most likely route taken, right?

    And what do you means by "our generation"? Do you mean our generation of protein structures? And thus "weighed to correspond to this map" meaning how much energy ours took and then comparing that against energy used up by other such predictions by other crunchers?

    ---

    ME: "and this scoring system works how?"

    SCOOFY12: I dunno but this is neato and im gonna look at it more


    Please post your findings here in this thread.

    ---

    SCOOFY12: "energy minimization"

    I think i have the general idea of this. its sort of a potential energy... i dunno how much chemistry you remember, but you may recall that certain atoms, compounts, molecular structures, etc are said to have higher energy than others.


    Last time I took a course in chemistry was in high school ... some 22 years ago. During college all my science courses were in astronomy ... which I aced every single one of. And before you ask, same case with biology. Given this poor foundation of fading knowledge...

    Where did this potential energy come from? I don't believe it came out of thin air so it had to come from somewhere. And how did these things get energized with it?

    SCOOFY12: the pauli exclusion principle.

    ???

    ---

    ME: "simulated annealing???"

    SCOOFY12: hm, i dunno... relevant definition of annealing from m-w.com: to heat and then cool (nucleic acid) in order to separate strands and induce combination at lower temperature especially with complementary strands of a different species.... so it seems like this process might unfold the protein and then let it fold up again? dunno, a quick search of F@H didnt find that word anywhere.


    Anyone like to take a shot at this? Or is Scoofy12 close?

  14. #14
    Scott,

    I believe at this point, you will be better off reading a basic biochemistry/high school physics textbook. This will answer the majority of your questions. I will address the more relevant ones below:

    Originally posted by Scott Jensen
    ME: And candidates for what?

    HOWARD: Candidate structures - i.e. a structure that is likely to be close to correct.


    But, again, candidates for what? Or is this some different definition thing within your field where the common definition for a word isn't the same as one used in your specialized industry? To me, "candidate" means it is up/running for something. What I'm wondering is what that "something" is.


    They're running for the US senate of course.

    ---

    ME: How and in what way? How would it help IBM's Blue Gene? Would IBM even ask for DF's help? And how important would such assistance be to them? Will they come begging for our help or would we have a problem getting them to even return our phone calls?

    HOWARD: The Blue Gene team is aware of our software and may indeed have use for it later on when they are closer to completion.


    But in what way would they likely use it? Also, isn't Folding@Home closer to what they're doing than what DF is doing?

    I'd rather not discuss this here and now. IBM may have use for both projects on Blue Gene

    ---

    ME: Why aren't we trying to find the exact structure? I would think that would be exactly the goal of Howard and Dr. Hogue.

    HOWARD: Well this is a loaded question. Exact structure is an oxymoron in this case. Proteins are DYNAMIC molecules, constantly moving and rolling and vibrating around in solution. The best we can hope to do is find an approximate structure. Even then we can't expect a perfect match to the crystal structure. It is impossible to model reality sufficiently, at present, to ever expect an EXACT prediction.


    So what is the RMSD score then? Isn't 0A a perfect match? Or are you saying that we will never see a 0A score?

    0A is a perfect match to an imperfect structure as you find out yourself in the next questions

    HOWARD: As for X-ray crystallography the limiting step is making crystals which can take for 3 mo. to 1 year or more for just one protein (protein dependent). There are indeed companies working on high-throughput crystallography, but the catch is, they can only do this on a certain fraction of proteins (maybe 5%) which crystallize easily.

    Crystallize? How do you crystallize a protein? You mean make it solid as opposed to a more fluid structure? But you just said: "Proteins are DYNAMIC molecules, constantly moving and rolling and vibrating around in solution." So wouldn't crystallizing them give you a false ... or at least a less accurate idea of the protein?

    You crystallize a protein the same way you crystallize salt or sugar - see supersaturated solutions in a chemistry text book. Indeed crystallization may give false structures/conclusions but experience has shown this to be EXTREMELY rare.

    HOWARD: Also many proteins can never be crystallized (or are extremely difficult) and must be solved another way.

    Why can they never be?

    They just won't. A crytsal requires a highly regular, repetitive pattern of the same molecule by definition. Some proteins just won't do it. Imagine you can a bunch of circular tiles. Could you cover completely a floor with no overlap? Of course not. But if the circles were triangles or squares, you could...
    ---

    ME: And this scoring system works how? Also, please remember you're explaining this to a moron.

    HOWARD: It is a black box as far as you can see. In goes the structure, out comes the score.


    Is that to keep the scores honest?

    This is now explained briefly on the News section of the web site
    ---

    HOWARD: Proteins are like beads on a string. It is easy to get their sequence - the order of the beads on the string. But it is hard to get their 3-D shape - how that string folds up into a globular structure. Again, 'unknown' implies unknown structure but known sequence.

    So spotting a protein is easy, correct? But how do you get their sequence? Why is that also easy? Isn't there a concern that you straighten it out (or whatever you do to get the sequence) wrong thus end up doing a simulation on a false protein? And since there's "billions" of proteins, how would you know you got it wrong if you did? It would seem it would be rather difficult finding another exact one since there's so many to choose from.
    See Biochemistry textbook
    ---

    HOWARD: Keep in mind that their [Stanford's Folding@Home's] goal is NOT structure prediction though, it is to investigate the folding pathway of proteins - i.e. how it gets from unfolded to folded and what all the intermediate steps are."

    ME: What?! Isn't their end goal the same as yours? To present the true structure of the protein. If that wasn't their goal as well, their willy-nilly folding of a protein components would be rather stupid and silly.

    HOWARD: Nope, Im pretty sure their goal is to investigate folding pathways, not structure prediction.


    But, again, this doesn't make sense. How do they know they folded it right if their goal isn't to get the structure right? Wouldn't they need the fold to end up with the correct structure and if it didn't, it would mean they didn't take the correct pathways?

    They are not folding proteins. They are folding fragments, small pieces of structure. All <= 36 resdiues as far as I know. They wish to study th pathways of these protein fragments and then make the assumption that the same rules hold true for bigger, complete proteins.

    HOWARD: Not sure what you mean by willy-nilly folding though.

    Willy-nilly is a derogatory term for randomly. More like "foolishly randomly" or "randomly without care".

    ---

    ME: "energy minimization techniques"???

    HOWARD: minimize the energy of a protein using some technique


    Errr? What? Hmmm. Let me take a stake in the dark and see if I get someone's back. Are you saying that the less energy a protein need to expend to completely fold a certain protein sequence is how nature likely (or exactly?) would fold said protein? Also, what do you mean by energy? Where is this energy coming from? Is the protein a living functional organism before or after it folds? Or does something else fold it for it? If something else, what is that?
    see physics or biochem. book Energy = Kinetic energy + Potenital energy
    ---

    SCOOFY12: "the field" is the set of all the possible (or probable) structures that could be assumed by a given sequence.

    Which -- as someone (Howard?) has said -- there are more than there are atoms in the universe, correct?

    SCOOFY12: remember those "trajactory distributions"? we have a sort of probability map with the relative likelihood for each direction, and i expect the probabilities in our generation are wieghted to correspond to this map.

    One, I don't remember "trajactory distributions". And it is "trajectory" and not "trajactory", right? I couldn't find "trajactory" in my common-usage dictionary, but that's not saying doesn't exist within this specialized field. And I think it is better to ask than assume in this situation.

    And the "probability map with the relative likelihood for each direction" is based on what? The before-mentioned energy used as talked about earlier? Less energy used being the most likely route taken, right?

    And what do you means by "our generation"? Do you mean our generation of protein structures? And thus "weighed to correspond to this map" meaning how much energy ours took and then comparing that against energy used up by other such predictions by other crunchers?

    Too many questions here. anyways, trajectory, yes. Probabilities are based on information from known protein structures only, no energy is involved.
    ---

    ME: "and this scoring system works how?"

    SCOOFY12: I dunno but this is neato and im gonna look at it more


    Please post your findings here in this thread.

    ---

    SCOOFY12: "energy minimization"

    I think i have the general idea of this. its sort of a potential energy... i dunno how much chemistry you remember, but you may recall that certain atoms, compounts, molecular structures, etc are said to have higher energy than others.


    Last time I took a course in chemistry was in high school ... some 22 years ago. During college all my science courses were in astronomy ... which I aced every single one of. And before you ask, same case with biology. Given this poor foundation of fading knowledge...

    Where did this potential energy come from? I don't believe it came out of thin air so it had to come from somewhere. And how did these things get energized with it?

    SCOOFY12: the pauli exclusion principle.

    ???
    Physics textbook
    ---

    ME: "simulated annealing???"

    SCOOFY12: hm, i dunno... relevant definition of annealing from m-w.com: to heat and then cool (nucleic acid) in order to separate strands and induce combination at lower temperature especially with complementary strands of a different species.... so it seems like this process might unfold the protein and then let it fold up again? dunno, a quick search of F@H didnt find that word anywhere.


    Anyone like to take a shot at this? Or is Scoofy12 close? [/B]
    It is a complex method of minimzation which generally works well at finding the global minimum of a surface. (From computer science theory).
    Howard Feldman

  15. #15
    Ahhh. Back on comfey ground for me...

    Simulated Annealing and Adaptive Simulated Annealing are mathematical models commonly used in tough statistical problems such as finance, market prediction, physics, neural networking models, etc.

    We use an ASA derivitive in analyzing temporal video quality over a particular compression algo with different scene constraints. Basically a Monte Carlo integration methodology...

    The upshot is you are statistically guaranteed a nearly 'perfect' solution. The downfalls are: It's slow. It's computationally expensive. It's oft overused when there are better methods 'cause it's "cool". It's slow. It's rather tough to tune to a specific problem and other curve-fit techniques are often easier. And it's slower than the second coming.

    Did I mention it's generally really slow?


    ----

    I appreciate your curiosity, and I don't want to be mean. But it seems like you're asking to pursue a grad course in math, biology, chemistry, physics, bioinformatics, and probably a few I've missed. That's cool and all, but you're lacking a few of the prerequisites to 'get it'

    If you're THAT interested, you're going to need to do some homework. Hundreds of thousands of pages (millions?) have been written beating much of this to death. Cliffnotes are fine, but atleast in my area of specialty (math, numerical recipies and algorithmic design), they simply aren't going to get you anywhere that is of any use in understanding the very next question that was based on the one you asked just before.

    I hope that doesn't come out tooo harsh.

    My interest personally has been peaked in no small part by your questions. I find myself brushing up on my organic chem, biochem, molecular dynamics and pursing the relatively new (to me) field of bioinformatics. I thank you for that.

    I'm more than willing to share my reading list with you - but you're likely to get frustrated if you aren't comfortable in atleast second year chem and second semester biology...

  16. #16
    Sorry - didn't see that Howard had replied. I still had your post on my screen from this morning - meaning to answer it when I returned home from work... Ooops!

  17. #17
    Originally posted by Jodie
    I appreciate your curiosity, and I don't want to be mean. But it seems like you're asking to pursue a grad course in math, biology, chemistry, physics, bioinformatics, and probably a few I've missed. That's cool and all, but you're lacking a few of the prerequisites to 'get it'

    ...

    My interest personally has been peaked in no small part by your questions. I find myself brushing up on my organic chem, biochem, molecular dynamics and pursing the relatively new (to me) field of bioinformatics. I thank you for that.

    I'm more than willing to share my reading list with you - but you're likely to get frustrated if you aren't comfortable in atleast second year chem and second semester biology...
    When I was writing up the first post, I had a better good idea that the thread it would create would eventually land itself in the Educational section ... though I was surprised how fast it initially did. Because of this, I've tried to become a layman's advocate by thinking what questions they might ask so when they read this thread, hopefully they'll come away better understanding what we're doing here and why. The first post was to peak people's interest ... even its title was designed that way. And I knew the first replies to that post would be filled with techno jargon and high-level/shorthand (to a layman) answers. My subsequent replies were simply to get simpler more-complete answers. To get layman definitions of the techno jargon. To get explanations of some of the basic concepts and science underneath it all. I think the resulting thread has done a fairly nice job doing that and I think it is great that people have been willing to take the time to help in this regards.

    Now, yes, I could have looked up most of this on the web, but that wasn't the point. I knew that the vast majority of the people that would read this thread wouldn't do such web searching so I've tried to get those-that-knew to give explanations here in this thread. I was even counting on fellow crunchers to pick up the ball when DF staff weren't interested ... as people did in this thread when Howard [a.k.a. Brian the Fist] wasn't up to giving some definitions and/or explanations. And that's no gripe about Howard either. It is great that someone that's part of the DF staff took the time to post answers. In all, this has been a very nice educational thread. There's only a few more of my questions that I hope other crunchers are willing to stand up to the plate and try to answer ... as I think Howard's given what he's going to give. Depending on those answers (i.e., how well they explain themselves in layman terms), I'll likely stop asking questions so this thread can return once more to the Educational section.

    Again, to all that have replied, I thank you for doing so. I've learned more and I hope others that read this thread feel likewise.

    Oh, and to those that read this in the Educational section, if you have a question about something in this thread, ask it! Not only will you then hopefully get an answer thus know, but others have probably thought of the same question but never took the initiative to ask it. For the only dumb question is the one not asked.

  18. #18
    dismembered Scoofy12's Avatar
    Join Date
    Apr 2002
    Location
    Between keyboard and chair
    Posts
    608
    Kinda along these lines, I would be more than happy to attempt to explain various basic ideas of chemistry, physics, computer science, and anything else i know enough about, if people have questions. i was going to do more of this kinda like with the statistics, but i found that the sheer amount of stuff we've covered was a bit much for me to go and do it all. so its up to you guys to be a bit more specific in your questions, and i'll see what i can do. hopefully i havent jsut bitten off more than i can chew

  19. #19
    Junior Member BugTracer's Avatar
    Join Date
    May 2002
    Location
    London, England
    Posts
    1
    My first posting, but I've been folding and following with interest for several months. This has been a fascinating thread, but I can't help but think we haven't quite concluded on the right balance between "explain all previous science from first principles" and "RTFM".

    It would be great to use the web as it was intended and not regurgitate any existing material, but just give references to the best online resource we know. I'll just offer the title of one book written for the layman with a scientific interest: Matt Ridley's 'Genome'. An excellent description of how we're moving from just reading the genetic code and how that defines protein sequences towards a decade (or more) of really understanding what it means. I'm sure you can find it on Amazon or your favourite alternative.

    Keep up the good work, all.

    Andy

  20. #20
    SCOOFY12: Kinda along these lines, I would be more than happy to attempt to explain various basic ideas of chemistry, physics, computer science, and anything else i know enough about, if people have questions.

    Thanks for offering.

    SCOOFY12: i was going to do more of this kinda like with the statistics, but i found that the sheer amount of stuff we've covered was a bit much for me to go and do it all.

    Yes, it is a lot. It is for that reason that I only ask an opening question (or series of questions) in the first post and from then on simply try to ask questions that explain in simpler and simpler terms the usual techno-babble and complex answers that original post generated. Not trying to get the entire universe explained but simply things related to the original post. Doesn't mean though that others cannot stray from the line ... which I highly encourage.

    SCOOFY12: so its up to you guys to be a bit more specific in your questions, and i'll see what i can do. hopefully i havent jsut bitten off more than i can chew.

    Well, as far as I'm concerned, if you can look through what I've asked so far and find anything that's yet to be given a simpler answer, that would be great. Your answers may trigger more questions or I may feel that's enough for now ... as far as my desire for explanation is concerned.

    ---

    BUGTRACER: "RTFM"

    ???

    BUGTRACER: It would be great to use the web as it was intended and not regurgitate any existing material, but just give references to the best online resource we know.

    Giving links to resources that would provide more in-depth information (both simpler and/or higher level) has never been discouraged and I'm all in favor of it. However, a combination of approaches would probably be best. Links for basic information and new text to explain how that material relates to the topic at hand ... or the other way around of text discussing the case in hand and then providing link(s) for those that would like to learn more about related topics. Giving titles to books is fine but links would be better since their access is immediate and just a click away thus more likely explored.

  21. #21
    Senior Member
    Join Date
    Mar 2002
    Location
    MI, U.S.
    Posts
    697
    Scott: I don't know if this has been posted or not (I kinda skipped a bit of the thread ), but regarding how they figure out what the beads are, that's readily apparent from the DNA strands that created the proteins in the first place. You said you did stuff with biology before, but for other people reading this, there are 4 different "components" to DNA, and a sequence of three of these makes up one amino acid. So there are something like 64 different amino acids (actually there are less, but I don't remember why right now). If you take, oh, 110 amino acids or so and string them all together, you get a protein (actually a fairly short one AFAIK in nature, but long for a project like this -- the current CASP target is 182, but the previous proteins we were doing were 50-some, 80-some, generally less than 100). Obviously not all proteins are the same length though

    The Human Genome Project found the DNA sequence of every strand in the human body. So knowing that, we can predict the sequence of amino acids in every protein that our bodies could ever produce. But at that point, all we know is the sequence, not the actual structure, and we'd need the structure for things like medicines, etc. So that's where F@H, DF, Blue Gene, etc. come in -- they take the amino acid sequence (that was found from the DNA strand), and (a) figure out how, in nature, it folds (F@H), (b) skip that entirely and look at "what's the structure, as close as we can come?" (DF), or (c) do whatever it is that Blue Gene does (probably close to (b), but I'm not sure).

    Does that help any?

  22. #22
    JODIE: No one is trying to insult you here. No one is requiring you to type a single word in reply. If you feel one must have a certain level of education before having a discourse with them, that's your requirement for such discourses but it is rather inappropriate for you to state one should self-censor oneself based on their own preceived level of knowledge and some sort of idea of appropriate level of discussion. As for Howard, he has been selective in what he'll answer and what he'll suggest one to read elsewhere. If others like Scoofy12 are willing to step in and give the answers to questions that Howard doesn't have the time for, that's great. For remember, Howard needs not just highly-educated people like you but laymen like me to make this dc project as productive as possible -- "productive" meaning as many crunchers as possible. Needless to say, there are a LOT more laymen like me than there are intellectuals like you. And the days when people never questioned authority, wisdom, or such are long gone ... so to simply say this is a worthwhile project to donate your computer's idle CPUs to isn't enough anymore. There are too many dc projects out there screaming for crunchers and a TON of dc projects coming online. In fact, I'm right now in negiotiations with one currently-active dc project and one about-to-go-primetime dc project to be their marketer and in talks with three others that are in various stages of development ... though I would love to be the marketer for this project, my offer was politely turned down. Threads like "The Science of Distributed Folding?", "Smallest RMSD structure: The King of the Hill", and this one help explain what this project is about. Such explanations serve two purposes: one, they explain what this project is trying to do thus why it is a project worthy of your computer's idle CPUs and, two, makes the currently-enrolled crunchers more informed about what's going on thus more connected to the project on an intellectual level thus it is more likely they will stick with it and not drop it at the first sight of the next "cool" dc project. These threads are like FAQs but with the difference being that the one who asks the question isn't also the one that answers the question ... which means that the one asking is always less educated in that particular area of knowledge than the one answering. Due to the cutting-edge science of this project in particular, the education gap is most assuredly massive.

    Now when someone asks for an explanation of a "basic concept" or such, one can give several different types of replies. Those being...

    One can do as I expect Scoofy12 to be doing and try to give the gest of the idea and perhaps that's all the asker wanted. Those asking may continue asking more questions but their questions will stop when they're comfortable with the answers given. That is when the answers connect with what they feel they already understand.

    One can do as I expect BugTracer to be doing and give links to webarticles that would give such "basic concept" explanations. Done properly, people like BugTracer would type a reply that will bridge the knowledge between that at the link and this particular project.

    One can do as we've seen Howard do and simply tell one what general area (or which science textbook) one should explore if one wants that knowledge. Not the best reply, but it is better than nothing.

    One can simply not answer the question and if no one does, that question will simply die a silent death.

    One can tell the asker that they shouldn't ask that question in the first place, that to expect an answer is insulting, that it trivializes all the years one has put into one's education, that it is a waste of time for people to answer, that it wrongfully takes away manpower from the project, etc. This type of a reply is ... well ... to put it mildly rather inappropriate and not a reply I wouldn't suggest anyone to give. I would instead suggest that if you feel this way that you just let the question die a silent death. And if someone else does take the time to answer the question, it would be beyond inappropriate to then step into such discussions and say they shouldn't take place.

    Now I realize that you Jodie might have simply been drunk when you gave that last post as you made mention that you drank "too many margharitas" before typing it. If you want to retrack it, I'll retrack this one. I understand that people make mistakes and don't want to go down in history for them. While I do see what I've typed above as being worth keeping on record for those that might also think along the same lines as you have, I do not want you to feel embarrassed by something you'd rather retrack. If you're going to retrack, send me an email that you have and I'll then retrack this one.

  23. #23
    I actually wanted to subtlely edit it. There are a few things in there I would have said differently. When I discovered I couldn't edit it (30min time limit) I sent admin email requesting it be deleted and I'd start the train of thought again - it derailed a few times.

    I guess I can really sum-up this easily instead:

    Lazy people annoy me. Ask a special question of specialists. Read a book if you have a general question.

    The logical falicy of your statement above is that if people are responding to questions in an unmoderated format and misanswering them, then the DF team will be compelled to clean-up misnomers and confusion. That's a bad place to be.

  24. #24
    BWKAZ: ... regarding how they figure out what the beads are, that's readily apparent from the DNA strands that created the proteins in the first place. You said you did stuff with biology before...

    Yup, worked (for free) for a several years in a biochemistry lab helping a girlfriend get her Ph.D. in genetics. I can still feel the cold from having to work for hours with the French press down in the walk-in deep freezer in the basement.

    ...but for other people reading this, there are 4 different "components" to DNA...

    Yup, and I also remember the hours where I read off those sequences as she double-checked them on films and such. Lovely working in a lab where everyone wears a radiation counter on their lab coat's collars ... NOT!

    ...and a sequence of three of these makes up one amino acid. So there are something like 64 different amino acids (actually there are less, but I don't remember why right now).

    *Scott fires up the Bat signal to get the attention of Scoofy12.*

    If you take, oh, 110 amino acids or so and string them all together, you get a protein (actually a fairly short one AFAIK in nature, but long for a project like this -- the current CASP target is 182, but the previous proteins we were doing were 50-some, 80-some, generally less than 100). Obviously not all proteins are the same length though

    What's the longest known protein? Howard, we will ever tackle it? If not, just how long of a protein will we end up tackling? And, of course, if we're not going to eventually tackle the longest one, why not? Personally, I think it would be cool tackling the longest known protein for no other reason that just doing it.

    The Human Genome Project found the DNA sequence of every strand in the human body. So knowing that, we can predict the sequence of amino acids in every protein that our bodies could ever produce.

    *whispers to Bwkaz so Jodie doesn't pull out anymore hair* But how do we know this? So we know our entire DNA, but how do we know "every" protein that it will create?

    But at that point, all we know is the sequence...

    How did we know such-and-such as a protein sequence? Is there some markers on DNA that say something along the lines of: "From here to here is a protein." ?

    Does that help any?

    Contrary to what you might think by the questions above, it did.

  25. #25
    Originally posted by Jodie
    Lazy people annoy me. Ask a special question of specialists. Read a book if you have a general question.

    The logical falicy of your statement above is that if people are responding to questions in an unmoderated format and misanswering them, then the DF team will be compelled to clean-up misnomers and confusion. That's a bad place to be.
    Gosh, the last thing we'd ever want to have happen is misinformed people know they're misinformed and learn what the real information is.

    As for being lazy ... *sigh* ... I'm getting tired of the put-downs.

  26. #26
    Very interesting and informative thread. I would hate to see any of that be overshadowed by the fashion in which the thread has progressed recently. I guess that thread with raj got a bit tense at a few points too.

    I personally don't care if people have conflict, but I am not in charge.

    Some of the science and math being discussed is so far over my head I can't even hear it zooming by. However, as I have read this thread (and the others here) the level of general scientific knowledge I possess has increased and in certain areas (biology, genetics, bio-informatics, computer science, statistics) I have gotten a lot of exposure to new concepts.

    I definitely think that, while not precisely on topic, a suggested reading list compiled by some of our more educated peers would be a great benefit to those of us willing to dig into the science behind this project a little bit further than we have so far.

  27. #27
    dismembered Scoofy12's Avatar
    Join Date
    Apr 2002
    Location
    Between keyboard and chair
    Posts
    608
    Originally posted by Scott Jensen

    ...and a sequence of three of these makes up one amino acid. So there are something like 64 different amino acids (actually there are less, but I don't remember why right now).

    *Scott fires up the Bat signal to get the attention of Scoofy12.*
    *Scoofy12 fires up Google and gets cracking*

    Hm... a lot of neat things on google, but not that specifically. Also consulted my gf who is currently doing undergrad biomedical engineering research at the U of MN and who knows more about this than me... There are 20 AAs (Amino acids) and as someone mentioned, there are 64 possible combinations. As it turns out, all of the 64 possible combinations do correspond to AAs, which means that each AA has more than one sequence that maps to it. for example, AUU, AUC, and AUA are all the same amino acid (isoleucine). perhaps this is an early form of ECC (Error Correction Coding) eh?

  28. #28
    Scott, and others:

    While we appreciate that you are trying to assist in 'educating' the users of our forum, as a moderator I am going to have to step in here.
    This forum is for discussion of the Distributed Folding Project. As such, please limit your questions to relevant topics. Yes, you could argue that anything biological or physics related is relevant, but there are plenty of other places (USENET, other groups like this, etc.) which are designed for just such a purpose. Thus I'd request you redirect any such 'general knowledge' discussions to those places. You are welcome of course to post links here to discussions in other forums. Thank you for your co-operation.
    Howard Feldman

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •