Results 1 to 26 of 26

Thread: CASP 6 Strategy

  1. #1

    CASP 6 Strategy

    Hogue team CASP 6 strategy
    June 24, 2004

    The following is a summary of the approaches the Hogue team will be applying for CASP. For those interested, the CASP 6 web site is located here: http://predictioncenter.llnl.gov/casp6/Casp6.html We are putting all our available resources into it this time around, for a full force effort. Note that we have two teams, an automated server team, HOGUE-HOMTRAJ, and a ‘manual’ team, HOGUE-STEIPE.

    The HOGUE-HOMTRAJ team, consisting of Howard Feldman, Michel Dumontier, Kevin Snyder, and Christopher Hogue, is using our automated homology modelling server, located at http://homtraj.blueprint.org/, to make all predictions, where suitable templates can be found in the PDB. It makes use of a program called SAM developed by Kevin Karplus at UCSC (http://www.cse.ucsc.edu/research/compbio/sam.html) to find possible template structures in the PDB, using a Hidden Markov Model approach. It also provides an alignment between the CASP target sequence and the template. Following this, our own method is applied to construct 3D models of the query sequence using information from the template(s), and the results are mailed back to CASP.

    Additionally, targets are forwarded to Armadillo, our domain prediction server developed by Michel Dumontier (http://armadillo.blueprint.org/) to participate in the domain prediction portion of CASP.

    The ‘manual’ prediction team consists of a group effort by a number of teams here at Blueprint, in addition to collaboration with Boris Steipe’s lab (http://biochemistry.med.utoronto.ca/steipe/) at the University of Toronto, the Distributed Folding community (http://www.distributedfolding.org/), and also involving a test of a scoring function developed by Dr. Brendan McConkey at University of Waterloo (http://www.science.uwaterloo.ca/biol.../mcconkey.html). Predictions will be made as follows:

    For those targets where HomTraj was able to find templates (structures with similar sequences – the “easy” part of protein structure prediction) , we will manually tweak the alignments to see if we can improve upon the automated method. In addition, we may try some different approaches for choosing the best structure that we generate, including using a scoring function developed by Dr. McConkey.

    For the remaining targets which must be predicted with an ab initio approach, we will be incorporating Dr. Steipe’s protein motif library, in the form of protein fragments added to the Trajectory Distributions. Michael Brougham, a summer student, has been working on integrating these fragments with the Foldtraj algorithm. Briefly, these motifs consist of sequence patterns, 3-15 amino acids in length, which have been clustered based on 3D conformation. Building structures from these fragments, matched up by sequence pattern, will produce more protein-like structures. This will improve the results in generation 0 of our method, and they may also be used in future generations with different weights. Alternatively they may work sufficiently well that further generations are not required (like in Phase I). This remains to be determined after some testing is performed. We are able to build about 50 million structures per day on our own cluster while testing this new approach.

    Additionally, Michael Matan of the Seqhound group here, Florence Wu (a summer student), and several of the BIND (http://www.bind.ca/) curators will be assisting us with the new function prediction category of CASP. This entails prediction of binding sites, binding partners, protein function, post-translational modifications, and so on. This may also in turn assist 3D structure prediction efforts as well. For example, a predicted DNA-binding protein should have a DNA binding motif on its surface somewhere.

    With all this going on, we had considered a number of possibilities for how the Distributed Folding Project (DFP) would play a role. We could:

    a) Take the small ab initio targets suitable for prediction with DFP, and run the current DFP algorithm on them (but without the native structure or RMSDs of course, which are unknown now)
    b) Revert to the Phase I algorithm, which has done better than Phase II in some cases and was much simpler
    c) Implement Steipe’s fragments into the Phase II algorithm and use that
    d) Shut down the DF project entirely to focus on manual CASP-6 predictions

    After much debate, we have decided to continue with the project, and go along with a combination of a) and b), and possibly c). That is, we will keep the present algorithm (which does NOT use RMSD anywhere in it for scoring or driving the generations – it is a true blind test), and run it on the dozen or so targets suitable for DFP (i.e. too hard to predict any other way, and relatively small). However we will also increase the generation zero size to about 50,000, to make it a bit more like phase I, increasing the initial sampling we do. The fragment integration will probably take too long for us to get working reliably and stably in time for the CASP targets, but if we are able to, they may be added later. Additionally, McConkey’s scoring function may be swapped in for crease energy (which we have been using until now) after we have done more testing with it. Elena Garderman will continue to make all changes to the software as needed.

    We will likely end the present protein early, and expect to start work on the first CASP target towards the end of the week of July 5. We will then proceed at a rate of one target per week, with targets ranging in size from about 50-150 residues. We realize that some users may not be used to, and may be unable to keep up with, this fast pace of changeovers, but we intend to make it as painless as possible, and it is unfortunately necessary due to the time constraints imposed by CASP. If you cannot keep up, we suggest you try a different DC project until CASP has ended in early September rather than wasting CPU cycles on structures we have already submitted to CASP.

    We are very enthusiastic and excited about this opportunity to test our newest ideas and methods, and hope to at least top our CASP 5 results, if not come out on top overall. We feel that the inclusion of the motif library, and the improvements we have made to our HomTraj server, will significantly improve our performance since CASP 5. We are also excited to see how we do with function prediction, which is included for the first time this year in CASP. With our BIND interaction database and other bioinformatics resources, we feel we have a distinct advantage in this category.
    Elena Garderman

  2. #2
    Boinc'ing away
    Join Date
    Aug 2002
    Location
    London, UK
    Posts
    982
    Thanks for the update - it'll be interesting to see how DF performs on this

  3. #3
    Fixer of Broken Things FoBoT's Avatar
    Join Date
    Dec 2001
    Location
    Holden MO
    Posts
    2,137
    i have SASS (short attention span syndrome), cliff notes?
    Use the right tool for the right job!

  4. #4
    Ancient Programmer Paratima's Avatar
    Join Date
    Dec 2001
    Location
    West Central Florida
    Posts
    3,296
    I understood most of the words.
    HOME: A physical construct for keeping rain off your computers.

  5. #5
    Boinc'ing away
    Join Date
    Aug 2002
    Location
    London, UK
    Posts
    982
    just read the 2nd to last paragraph - that's the interesting one to us DF'ers

  6. #6
    Clear information Elena.
    I will try to keep my teammembers involved/informed with the CASP6 prediction. Let's hope DF will return good results and most of all...we will kick *ass of the BOINC project that participates in the predection.
    Member of the Los Alcoholicos

  7. #7
    Target Butt IronBits's Avatar
    Join Date
    Dec 2001
    Location
    Morrisville, NC
    Posts
    8,619
    If there is anything we can do to help, just let us know how and where.

  8. #8
    OCworkbench Stats Ho
    Join Date
    Jan 2003
    Posts
    519
    I shall round up the troops at OCworkbench and await further orders. Time to smoke the opposition
    I am not a Stats Ho, it is just more satisfying to see that my numbers are better than yours.

  9. #9
    Some questions/requests for the DF team

    When this CASP6 will take place.

    1) It's important that the autoupdate function, when set, works like intented. Autoupdate the client. If i remember correctly, this wasn't the case during last proteinchange.
    Furthermore, i think the most handy and cleaver situation for not losing DF members and valuable CPU time is that the client will always autoupdate herself, even when not set - or disable temporarily this function and let the client always update so that the last CASP target will be used.

    2) Will the creditsystem with the 100% and 50% reward after the protein/targetupdate be used? If a target only lasts for one week, this rewardingsystem lets the option to waste potential power for the predictions.
    I guess this second question is related to the 1st one. If the client will autoupdate herself, than this rewardingsystem could be eased/limited.
    Again another wild thought. When the client will work for 1 week on a target and updating the client will take about 20 hours because of non-set autoupdate, busy servers etc. mayby it's posible to set the DF server on accepting 2 targets at the same time.

    Let me explain. Example: the targets will run from Tuesday to Tuesday. So release the next target on monday and let the server accept 2 targets at one moment. The official closingtime for the target is Tuesday, and when the autoupdate function is set...the client will update like intented and no CPU time will be lost (and the rewardingsystem for uploading after an update could be limited/disabled)

    My intention for above thoughts is that DF will get the most effort/results possible and nothing is wasted.
    Member of the Los Alcoholicos

  10. #10
    7G - OCW iggy's Avatar
    Join Date
    Aug 2003
    Location
    London, UK
    Posts
    156


    Just bring it on - it will be more interesting working with different protein sizes, and quick changeovers will keep us alert!

    Would like to know if the Daemon is going to be operational, though...

    Last edited by iggy; 06-25-2004 at 05:47 AM.

  11. #11
    OCworkbench Stats Ho
    Join Date
    Jan 2003
    Posts
    519
    This will be great, because all the Teams will have a common enemy..the other Folding Projects...and Diseases. And all the while trying to gain an advantage over our regular rival Teams..Wheels Within Wheels
    I am not a Stats Ho, it is just more satisfying to see that my numbers are better than yours.

  12. #12
    Thanks for the detailed updated. Its information like this that keeps us all going.

    Jeff.

  13. #13
    Originally posted by MarcyDarcy
    Some questions/requests for the DF team

    When this CASP6 will take place.

    1) It's important that the autoupdate function, when set, works like intented. Autoupdate the client. If i remember correctly, this wasn't the case during last proteinchange.
    Furthermore, i think the most handy and cleaver situation for not losing DF members and valuable CPU time is that the client will always autoupdate herself, even when not set - or disable temporarily this function and let the client always update so that the last CASP target will be used.

    2) Will the creditsystem with the 100% and 50% reward after the protein/targetupdate be used? If a target only lasts for one week, this rewardingsystem lets the option to waste potential power for the predictions.
    I guess this second question is related to the 1st one. If the client will autoupdate herself, than this rewardingsystem could be eased/limited.
    Again another wild thought. When the client will work for 1 week on a target and updating the client will take about 20 hours because of non-set autoupdate, busy servers etc. mayby it's posible to set the DF server on accepting 2 targets at the same time.

    My intention for above thoughts is that DF will get the most effort/results possible and nothing is wasted.
    The rules will remain the same as they currently are in all respects. We are limited by our hardware (storage) in that we cannot collect more than one protein at a time, so unfortunately this cannot change. Auto-update should be working properly other than when we slip up (on rare occasion) here at our end. If it has never worked for you, it is most likely because you do not wait long enough after the update, or you are behind a caching (invisible) proxy which does not flush its cache frequently enough.

    For 3rd-party stats and fold monitor people, there should not be any significant changes in the way things work except RMSD will be replaced by Fitness score wherever it appears, and native.val will no longer be used during CASP (it can be there but it will be ignored).
    Howard Feldman

  14. #14
    Ancient Programmer Paratima's Avatar
    Join Date
    Dec 2001
    Location
    West Central Florida
    Posts
    3,296
    I'm polishing up the mirror.

    We will likely end the present protein early, and expect to start work on the first CASP target towards the end of the week of July 5.
    Ummm, does this mean that we'll be switching proteins at the actual ===> end <=== of each week, a practice that we have found over and over to be WAY less than satisfactory? Please say it isn't so.
    Last edited by Paratima; 06-25-2004 at 11:06 PM.
    HOME: A physical construct for keeping rain off your computers.

  15. #15
    Junior Member Digger's Avatar
    Join Date
    May 2004
    Location
    manchester, uk
    Posts
    7
    I appreciate the technical challenge/opportunity that CASP represents for your project, but it's unfortunate that it takes place over what is, for most, holiday season. Non-networked users will be unable to keep their clients updated during the rapid changeover period when they're busy trying to get from to !

    Would it be possible, rather than ending the current protein, to SUSPEND it during the CASP6 period, and resume sometime in September? That way, users unable to keep up with the client changes could elect to keep running the 145 protein and at least be assured that their cycles would eventually be useful to THIS DF project, rather than abandoning it for a few months (and possibly not come back).

  16. #16
    Boinc'ing away
    Join Date
    Aug 2002
    Location
    London, UK
    Posts
    982
    RMSD score calculations reverted back to a special fitness function, as there will be no native structure to compare against during CASP
    I assume that smaller = better for this?

    How many proteins are you planning to do via DF - looking at the list, there are 11 viable (< 150) but 1 is cancelled and 2 expire this week - and any idea of a timetable? (It has been asked before I think)

  17. #17
    Just after the changeover i flushed some buffered results, but did not get credit for them (no errors BTW) Isn't there a grace period what so ever during the CASP?
    Member of Dutch Power Cows
    (Number #1 overal winner)

  18. #18
    Boinc'ing away
    Join Date
    Aug 2002
    Location
    London, UK
    Posts
    982
    Originally posted by Basman
    Just after the changeover i flushed some buffered results, but did not get credit for them (no errors BTW) Isn't there a grace period what so ever during the CASP?
    I've gotten points for the gens that were buffering during the changeover - how many results did you have?

  19. #19
    OCworkbench Stats Ho
    Join Date
    Jan 2003
    Posts
    519
    Yeah, one of my Boxen auto updated to the new Protein but 400K points has vanished without a trace and no errors The Curse of Grumpy is alive and well
    I am not a Stats Ho, it is just more satisfying to see that my numbers are better than yours.

  20. #20
    Originally posted by pfb
    I've gotten points for the gens that were buffering during the changeover - how many results did you have?
    Allready found the results in the stats

    Thanx 4 the help though
    Last edited by Basman; 07-06-2004 at 08:57 PM.
    Member of Dutch Power Cows
    (Number #1 overal winner)

  21. #21

    Question

    Sorry, but I haven't understood how we can note when a changeover will be done during this period until CASP is used.
    My teammates of PGRI asked me how can determine that, because we haven't seen any announce of it.

    Many thx in advance

  22. #22
    Ancient Programmer Paratima's Avatar
    Join Date
    Dec 2001
    Location
    West Central Florida
    Posts
    3,296
    Jan,
    ...expect to start work on the first CASP target towards the end of the week of July 5. We will then proceed at a rate of one target per week, with targets ranging in size from about 50-150 residues.
    In other words, expect a change every Tuesday for quite a while. Hope this helps.
    HOME: A physical construct for keeping rain off your computers.

  23. #23
    Originally posted by Paratima
    Jan,
    In other words, expect a change every Tuesday for quite a while. Hope this helps.
    Thx a lot, Paratima
    I had understood this, but what I would know is if there's a precise advertisement of the changeover, of if we can assume tuesday as the day of the change as You told.

    Bye

  24. #24
    Yes, you can assume there is a new protein every Tuesday for the duration of CASP, unless we post news saying otherwise.
    Elena Garderman

  25. #25
    Folding for Team JSI
    Join Date
    Apr 2004
    Location
    Midlands, UK
    Posts
    15
    Originally posted by Stardragon
    Yes, you can assume there is a new protein every Tuesday for the duration of CASP, unless we post news saying otherwise.
    Any chance of posting a link to the target details (like you did for the first one) in advance ?

    Just so we know what we're facing you understand


  26. #26
    We do not have a list of proteins we will be doing for CASP, as they continue to release new targets each week, which in turns changes out priorities of which ones to do. Also some are 'hard' and some are 'easy' - we are only interested in the shorter, hard ones. We will try to let you know which target number we are doing each week as we release it, and give the link to the CASP page.
    Howard Feldman

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •