Page 1 of 2 12 LastLast
Results 1 to 40 of 53

Thread: Future of DF project?

  1. #1

    Future of DF project?

    Hello out there,
    I have been with DF since its initial phase and even though I have not posted much if anything for quite a while here, I was following the projects' progress whenever possible. Maybe the one or the other thing has escaped my attention, but I have the strong feeling that this project is sort of resting - sciencewise - and I write this post in hope that it might lead to an awakening - albeit a rough one. I see a lot of cosmetic polishing of the DF website - yes, it does look nice - while the CASP5 results were "only" of average quality. I know that protein structure prediction is a very tough "science business" - if not the toughest. However, I think it is time to cease the continuous "10 billion structures runs" and focus back on improving the algorithm. The implementation of the genetic algorithm surely is a good step, but if I understand it correctly, the scoring function is the key to success. Wouldn't it then make sense to successively test more and more refined versions of the scoring function in an otherwise unchanged algorithm with the same protein (and later other protein types to optimize for certain advantages/disadvantages of a given scoring function) and reduce the number of generated structures by at least (!) 50% for more rapid progress? It sure is nice to have a more stable and reliable server backend - but please, focus on the science wherever possible. All these computations cost a hell of electricity - just use it wisely. I hope for the best!

    Best regards,
    Michael.
    http://www.rechenkraft.net - Germany's largest distributed computing community

    - - - - - - - - - -
    RNAs are nanomachines or nanomachine building blocks. Examples: The ribosome, RNase P, the cellular protein secretion machinery and the spliceosome.

  2. #2
    Probably why we are beta testing a new client next week.

  3. #3
    Originally posted by DocWardo
    Probably why we are beta testing a new client next week.
    Taking a look at the news section rather creates the impression that this beta testing is more or less dedicated to test the new server backend but not to test a scientifically improved client algorithm.

    Michael.
    http://www.rechenkraft.net - Germany's largest distributed computing community

    - - - - - - - - - -
    RNAs are nanomachines or nanomachine building blocks. Examples: The ribosome, RNase P, the cellular protein secretion machinery and the spliceosome.

  4. #4
    Senior Member wirthi's Avatar
    Join Date
    Apr 2002
    Location
    Pasching.AT.EU
    Posts
    820
    I'd see this like building a new car. Sure, the engine is the most important part (everybody likes a few additional horsepowers) but If you want to drive the car, the tyres are just as important .... what use do we have of a high-tech engine lying on the floor without a car around it?

    Once the server/backend is as perfect as it should be Howard can concentrate on the algorithm again. I'm sure that's what he's thinking about 24/7 anyway ...

    Wirthi

  5. #5
    Member
    Join Date
    Oct 2002
    Location
    southeastern North Carolina
    Posts
    66

    Question along the same lines...

    there seems to be a total lack of discussion about what 's really happening with these efforts... and progress: is the process really getting smarter?

    posts are almost completely centered on stats, changeovers, and gripes..

    I can appreciate that the science dudes and dudettes have their hands full, but I gotta believe that every once in a while there's something worthwhile to say- science-wise...

    I like the way my PCs are really blowing thru generations, but I'd really like better results [ as reflected in lower RMSDs-- if that's still the most relevant gauge]..or a better trend, that says my efforts are worthwhile...

    seems like better RMSDs are painfully slow-- on my team, some serious crunchers are not even in contention for "best of show"...

    thinking they should be........... ... ...
    " All that's necessary for the forces of evil to win in the world is for enough good men to do nothing."-
    Edmund Burke

    " Crunch Away! But, play nice .."

    --RagingSteveK's mom


  6. #6
    25/25Mbit is nearly enough :p pointwood's Avatar
    Join Date
    Dec 2001
    Location
    Denmark
    Posts
    831
    From what I understand (which isn't much ), they are working hard to make it generate better structures. That's why we are crunching on already known proteins again and again so they are able to see what differences the latest changes made.
    Pointwood
    Jabber ID: pointwood@jabber.shd.dk
    irc.arstechnica.com, #distributed

  7. #7
    Yes, I'm sure Howard and crew are working on small tweaks to change the algorithm to make things better. As you said this is a tough and somewhat competive field. There are probably some aspects that the research group is not comfortable releasing to the public yet, without documented results first! There are reasons it takes years to get the research for a PhD done, it's not a product that can be assembled faster by working harder. your limited to outcomes of your experiment, the laws of matter, energy, and nature that govern what your looking at. so you could spend a ton of time trying to fight something that is impossible to do. or as the case of this project, spend a lot of time trying to perfectly mimic what nature does automatically. Some ideas will work great, others may fail miserably, but you have to follow your protocols to ensure that you have scientifically useful results.

    Also for the lack of in depth discussion; if you look around here or most any other DF board, the many users could give a rats behind about the actual science involved. They are in it for the points and how well it runs on their farms and borgfarms and what project their team is currently trying to pass another team in. Many others care about what project they are running and want to give their time to this project because they believe in the overall goal. And then there are the few who know enough about the science, and want to know the details of the how the project is being done.

    I have a PhD in chemistry I understand what we are looking at. I did molecular modeling of small molecules for several years, so I can understand the conversation. But I'm happy with the updates Howard has given and if you go back and read carefully he has updated us on some of the tweaks they have done and given reasons for why we are back doing the proteins we did in phase I (in an effort to evaluate the increased effectiveness of the new algorithm.

  8. #8
    Senior Member
    Join Date
    Jul 2002
    Location
    Kodiak, Alaska
    Posts
    432
    During Phase I, we'd create 1,000 to 10,000 proteins (default 5,000) and then grade them and upload them all to the server. That got changed to grading them all and just sending in the best one. The client was given the option of using extra ram, and a few other speed improvements were delivered. But if the client got to an RMSD of 8A with 1 billion protein structures, we'd need roughly 10 billion to get a 7A RMSD and 100 billion to get a 6A RMSD.. The approach was always in the center of the CASP5 results.

    The first client in Phase II also graded by RMSD - and with a similar amount of work, we chopped off 2-3A from the results. A great job - since that means that we were doing the work of 100 to 1000 times our number for Phase I.

    But the whole point of the client was to eventually create the right structure when we didn't know what the right structure was; so RMSD couldn't be used as the initial scoring function. We're running these trials to test out the new scoring functions and testing out slightly different approaches to improving the structure picked from the intial 10,000 random structures created in gen 0. We haven't come very close to the first client's ability to get low scores by testing against the known protein structure (RMSD). But they're looking at the results and trying a few things to get us back to the great results we got with the first Phase II client.

    My personal goal isn't JUST to get the top score - but to have my top score be 2A with a twisted protein structure that's 200+ proteins long.. But I'm going to have to wait until we test out a few different approaches and a few different scoring techniques before we get there.. from the looks of things. (We'll get there..
    www.thegenomecollective.com
    Borging.. it's not just an addiction. It's...

  9. #9
    Senior Member
    Join Date
    Mar 2002
    Location
    MI, U.S.
    Posts
    697
    You need a good fold-finding algorithm, yeah, but you also need a good scoring function to know which folds are good!

    The latter part of this is what we've been working on lately, IIRC.
    "If you fail to adjust your notion of fairness to the reality of the Universe, you will probably not be happy."

    -- Originally posted by Paratima

  10. #10
    Again: I can't see a new scoring function far and wide. All I see is testing of a new server backend where obviously a lot of effort has been put into. I just wished priorities would be set a little more appropriate to serve the SCIENTIFIC progress of this project rather than what I have described above. The implementation of the genetic algorithm is an "old hat" already. And that is all I have seen changing since CASP5 - or have I overlooked something here?
    As soon as a new scoring function requires beta testing I hereby volunteer to help out instantly. Currently, I consider this project being stagnant.

    Michael.
    Last edited by Michael H.W. Weber; 02-07-2004 at 06:27 AM.
    http://www.rechenkraft.net - Germany's largest distributed computing community

    - - - - - - - - - -
    RNAs are nanomachines or nanomachine building blocks. Examples: The ribosome, RNase P, the cellular protein secretion machinery and the spliceosome.

  11. #11
    WE have had tweaks to the scorring function since its inception with phase 2. but you have to look at the timeline of an experiment. The experiment is NOT a single fold or even a single set of 250 generations, it's the total target they are looking at to gather data from many computers to see how the functions work. so it takes a month to do one experiement on one protien. Now you don't just run one experiment on one substrate and then change all of your stuff around for that one substrate, or you will end up with a stradegy that will ONLY work for that one substrate. So we have to test the identical funcion on mulitple protiens.

    We could get a folding system and a scoring system that gets us a 0.2 RMSD but if it only works for a single protien then that is completely useless.

    as for why we are testing the back end, well one of the major problems we have with this project is the back end. it floods out, it prevents our clients from updating automatically, it actually makes the users of the software FEAR the progress of an update. so they lose users and thus speed at an update. or they risk smashing their database engine to bits when all of the nonetters and cached work units flow in from an update.

    So if the back end runs smoother, updates go smoother, then I can see possibly updates coming more often and with less angst. thus progress can pick up some speed.

  12. #12
    Senior Member
    Join Date
    Apr 2002
    Location
    Near Frankfurt, Germany
    Posts
    106
    Quite interesting, that Howard Feldman does not post an answer - or is he just drinking coffee after getting his title?
    Where can I find a short overview of your effords of phase 2 up to now?
    Last edited by Pascal; 02-07-2004 at 12:35 PM.

  13. #13
    Originally posted by DocWardo
    ...as for why we are testing the back end, well one of the major problems we have with this project is the back end.
    I beg to differ, but I consider this NOT a MAJOR problem but rather a minor one. The MAJOR problems I have detailed above.

    Michael.
    http://www.rechenkraft.net - Germany's largest distributed computing community

    - - - - - - - - - -
    RNAs are nanomachines or nanomachine building blocks. Examples: The ribosome, RNase P, the cellular protein secretion machinery and the spliceosome.

  14. #14
    Senior Member
    Join Date
    Mar 2002
    Location
    MI, U.S.
    Posts
    697
    Would it still be minor if they were running on a P-90 backend with a single 14.4 modem connection, so that only one client could ever upload at a time (well, closer to half a client)?

    Most everybody that's in this for the stats (which, whether it's a good thing or not, is most everybody period) considers it a major problem. Most everybody that has a large farm of machines crunching considers it a major problem when their large farms can't upload anything. And the people that aren't only in it for stats should consider it a major problem because of those other people -- or didn't you read that part of DocWardo's response?:

    So if the back end runs smoother, updates go smoother, then I can see possibly updates coming more often and with less angst. thus progress can pick up some speed.
    Granted, it's not the only thing we should be working on, but for the moment, it's the biggest obstacle to changing the other stuff (proteins, algorithms, scoring functions) more frequently. And there's only so much time in the day to work on things.
    "If you fail to adjust your notion of fairness to the reality of the Universe, you will probably not be happy."

    -- Originally posted by Paratima

  15. #15
    Senior Member
    Join Date
    Jul 2002
    Location
    Kodiak, Alaska
    Posts
    432
    Uploading isn't a big problem? Members of other teams have gone from 20 full time folders down to 2 because of problems such as uploading. If he's not getting stats for the work he's doing.. what useful work is he doing? He stopped folding on all the machines at home after noticing a few of them posting "910 missing protein" error messages when he uploaded the proteins that day. Of course, he was also less than pleased with all the time he spent uploading the information every day from the 18 machines (uploading twice a day).

    Our Phase II clients are uploading (2-4x?) as often as they did during Phase I. And not improving the performance/reliability of the back end will start creating more and more unhappy ex folders like the person I've mentioned. The more folders we can handle reliably - the faster we can run through the tests and get to an approach that ends up with much smaller results..

    As for Howard drinking coffee - I don't think I've seen Howard post on these boards after about 5pm pst on Fridays until Monday morning except once when some of the servers stopped working on a Saturday.
    www.thegenomecollective.com
    Borging.. it's not just an addiction. It's...

  16. #16
    I can't see a new scoring function far and wide. All I see is testing of a new server backend where obviously a lot of effort has been put into. I just wished priorities would be set a little more appropriate to serve the SCIENTIFIC progress of this project rather than what I have described above. The implementation of the genetic algorithm is an "old hat" already. And that is all I have seen changing since CASP5 - or have I overlooked something here?
    You are missing a lot. Read the news page to see all of the changes.

    12/04/03 - A test version of the client has been posted, using a slightly different algorithm. This went live a couple weeks later.

    1/15/03 - Energy function in development
    A new scoring function is currently undergoing rigorous testing. If we find it to be more effective than the current function, it will be incorporated into the next update

  17. #17
    Originally posted by bwkaz
    Would it still be minor if they were running on a P-90 backend with a single 14.4 modem connection, so that only one client could ever upload at a time (well, closer to half a client)?
    Would / if / could - all not relevant here. Facts count and facts tell that the situation you describe above is hopelessly exaggerated.

    Originally posted by bwkaz Most everybody that's in this for the stats (which, whether it's a good thing or not, is most everybody period) considers it a major problem.
    I am not the one to judge which reason to contribute to DF is a good one or a bad one. However, THIS THREAD is NOT about stats at all and I would like to see it kept on topic.

    Originally posted by tpdooley
    Uploading isn't a big problem? Members of other teams have gone from 20 full time folders down to 2 because of problems such as uploading. If he's not getting stats for the work he's doing.. what useful work is he doing? He stopped folding on all the machines at home after noticing a few of them posting "910 missing protein" error messages when he uploaded the proteins that day. Of course, he was also less than pleased with all the time he spent uploading the information every day from the 18 machines (uploading twice a day).
    Well, our team has of course also encountered a few MINOR problems with uploading in the past while I myself have never encountered one. This "910 missing protein error" - is this one at all about uploading? I don't know the details about this one (never encountered it).

    Originally posted by Galuvian
    You are missing a lot. Read the news page to see all of the changes.
    All read long back.

    Originally posted by Galuvian
    1/15/03 - Energy function in development
    A new scoring function is currently undergoing rigorous testing. If we find it to be more effective than the current function, it will be incorporated into the next update
    For this one I am waiting - and where is it?

    Michael.
    http://www.rechenkraft.net - Germany's largest distributed computing community

    - - - - - - - - - -
    RNAs are nanomachines or nanomachine building blocks. Examples: The ribosome, RNase P, the cellular protein secretion machinery and the spliceosome.

  18. #18
    Maybe one more remark. The majority of reply postings found in this thread excactly reflect what I was complaining about: Instead of focusing on supporting the scientific progress of this project you spent your time on largely irrelevant problems. Just think about it for a minute.

    Michael.
    http://www.rechenkraft.net - Germany's largest distributed computing community

    - - - - - - - - - -
    RNAs are nanomachines or nanomachine building blocks. Examples: The ribosome, RNase P, the cellular protein secretion machinery and the spliceosome.

  19. #19
    Senior Member
    Join Date
    Jul 2002
    Location
    Kodiak, Alaska
    Posts
    432
    Originally posted by Michael H.W. Weber

    Well, our team has of course also encountered a few MINOR problems with uploading in the past while I myself have never encountered one. This "910 missing protein error" - is this one at all about uploading? I don't know the details about this one (never encountered it).


    All read long back.

    Michael. [/B]
    For those of us that have lost hundreds of generations or thousands of generations - totally wasted effort since everything uploaded in between the first 910 error and the start of the next set of 250 generations is thrown away - this is a slightly important issue. Better handshaking helped with the issue - but has not completely cured it. A brief explanation of the problem:
    1. your system has at least one buffered generation to send to the DF servers.
    2. your system calls home to the DF servers and asks if anyone is home.
    3. the DF servers tell your system that someone's home.
    4. your system sends a package to the DF servers with a return receipt requested.
    5. the DF servers receive the package, check it out, and make sure that it was received and it was in good condition. If not, an error is reported to your system.
    6. The DF servers also check to see that it's received a valid copy of the generation prior to the one you're submitting now. (gen 0 doesn't get tested for this). If, for some reason, you don't get a valid generation (200, for example) stored in the database, gen 201 won't get stored.. the DF server tells your system this error.
    7. If everything is fine, the DF servers tell your system that the generation was fine and saved.
    8. for 6&7 - the current generation is deleted, and the client moves on to the next generation or creates another, if none are queued to be sent.

    ---------
    For some reason, (connection problems were supposed to be ruled out by the improved handshaking) your client sometimes ends up believing the DF servers have received the generation and accepted it. So it deletes the current generation. Is it transmission problems (which have been worked on with better handshaking), the handshaking not being immune to noise involved in an isp dropping your connection, or problems with the backend not being reliable/speedy enough? It's another of the important and neccessary tasks to keep the project working properly.

    And it doesn't do the project much good to have a faster client that puts out better results if the back end can't handle the load properly..
    www.thegenomecollective.com
    Borging.. it's not just an addiction. It's...

  20. #20
    Senior Member
    Join Date
    Mar 2002
    Location
    MI, U.S.
    Posts
    697
    Originally posted by Michael H.W. Weber
    Would / if / could - all not relevant here. Facts count and facts tell that the situation you describe above is hopelessly exaggerated.
    This was an attempt to get you to see that the backend does matter. If it was a P90 on a 14.4 connection, it would be crap. Obviously you missed that.

    I am not saying it's a P90 on a 14.4 connection. But the fact that it isn't doesn't have anything to do with the fact that if it was, you would (I bet) be clamoring for an upgrade just like the rest of us. Or you'd be off on another project.

    I am not the one to judge which reason to contribute to DF is a good one or a bad one. However, THIS THREAD is NOT about stats at all and I would like to see it kept on topic.
    Obviously you missed my point here too. My point was simple -- you don't think it's a major problem, but a lot of people do. Their reasons for thinking it's a major problem (they don't get stats for their results when the upload process breaks) are irrelevant.
    "If you fail to adjust your notion of fairness to the reality of the Universe, you will probably not be happy."

    -- Originally posted by Paratima

  21. #21
    Originally posted by Michael H.W. Weber
    Maybe one more remark. The majority of reply postings found in this thread excactly reflect what I was complaining about: Instead of focusing on supporting the scientific progress of this project you spent your time on largely irrelevant problems. Just think about it for a minute.
    You are tyring to compare raw research insitution on dedicated machines with distributed computing. You can't have the science without the stats. if the stats die and systems do not upload as the client users wish, then they will take their computing time to a project that does. and then you are left with no computers to run/ test your new functions on.

  22. #22
    I don't deny that you need a functional backend. I don't deny that stats help keeping people tuned to the project. What I do deny, however, is that the project priorities are chosen appropriately.

    I say: science > backend > stats
    You say: backend = stats > science

    And the latter is exactly what I criticize here. Maybe I have explained it a bit better now.

    Michael.
    Last edited by Michael H.W. Weber; 02-08-2004 at 05:27 PM.
    http://www.rechenkraft.net - Germany's largest distributed computing community

    - - - - - - - - - -
    RNAs are nanomachines or nanomachine building blocks. Examples: The ribosome, RNase P, the cellular protein secretion machinery and the spliceosome.

  23. #23
    I'm exactly saying that backend = stats > science for this and all distributed computing projects.

    I liken it to my research in grad school. I have lots of reactions to run to contribute to science with the creation of new compounds and new chemical reactions. I could have all the great ideas in the world, but if I dont' have the flasks to run my reaction in, stir plate to stir my reactions, and a reliable lab book and pen to record my data with then it doesn't matter how good my scientific ideas are because they are useless.

    Without reliable basics you can't do science. If I have no flasks, I have nothing to run reactions in (ie no computers to distribute the computations), i have nothing to stir the reaction with (no stats to excite users to install the client on comptuers), and even if I did my lab book and pen may not be reliable enough to record my data! (ie unstable back end server may loose data packets that may be valuble to the science being conducted)

  24. #24
    Electric fence operator
    Join Date
    Dec 2003
    Location
    Indianapolis, IN
    Posts
    379
    I'll stay out of the general discussion on this, but it may be even more appropriate to say that, in a DC environment, backend = stats -------> science.

    Just my $.02
    "If angels have voices, then surely they must sound like Loreena McKennitt" - me 1/2/04, somewhere over Illinois

    Member of Free-DC

  25. #25
    If it is worth anything, I summed up and categorized the items on the news page.

    There were a total of 65 news items, among which:

    Science: 14
    Protein Switch: 16
    Client: 35

    Science items = protein switch items that mention more than a couple of words about the protein, any algorithm talk, or anything referring to a scientific article.

    Protein Switch = items that only talk about protein switch, or delays to the protein switch, but no real science talk.

    Client: items that deal with found bugs, new client etc. Basically any news items that don't mention algorithm or science stuff.

    If the news page is in any way correlated with what is going on with the DF project these days, then things are like this:

    Client/Backend > Protein Switch >= Science

  26. #26
    I don't think you can separate the Client and the Protein Switch tho since they are so closely related.

    Michael, we all get your message that science should be more important but if it were, how many of the people running DF would be doing so? Every time the client falls over, a protein switch breaks or the back end is uncontactable, people quit DF. Given that the amount of "science" performed is directly related to the happiness of the users, making sure they're happy is vitally important.

    I'm sure you understand this so I don't really understand the point of your continued objections.

  27. #27
    That's actually a very good ratio you showed. The same algorithm has to be tried with several proteins to be able to get (somewhat) statistically valid data. Updates to the client are also necessary to keep production up and people happy, resulting in more science being able to get done.

    I sense that you feel that they're not making enough progress on the science front. The research they're doing is very practical; even if something looks like it should work out theoretically (or perhaps more accurately, heuristically), we still need to do large scale runs to verify it.

    I for one, am quite happy with the progress made by the DF team, and I think that they deserve our thanks for this project.
    Team Anandtech DF!

  28. #28
    Originally posted by m0ti
    ...I think that they deserve our thanks for this project.
    Well, I would say it the other way around: We deserve their thanks for helping them out because without our contribution, this project would never be possible. And because it is like this, we can also ask for something, I think. I asked for a little more emphasis on the science side - that is actually all.

    Michael.
    http://www.rechenkraft.net - Germany's largest distributed computing community

    - - - - - - - - - -
    RNAs are nanomachines or nanomachine building blocks. Examples: The ribosome, RNase P, the cellular protein secretion machinery and the spliceosome.

  29. #29
    There have been many excellent points addressed in this discussion, and I agree with pretty much everything that has been said. To clarify Michael, you seem to think we have been doing the exact same program thoughout all of phase II. We are actually on the SIXTH release of the client since Phase II began. Each of these (or at least most of them) contained not only bugfixes, but changes to the algorithm and/or scoring function. Usually this is invisble to you, the user - as it should be. You should not need to know or care precisely how the algorithm is choosing structures at each step. In some cases you will notice, like when we changed from 50 to 100 strucs per generation. These changes may seem subtle, but can potentially have a large impact on the results. As pointed out, each 'experiment' takes about 1 month to perform - it takes time to test out new ideas. If we had more than one person working on this project, we might have time to test a lot more things a lot faster but as we don't, we have had no choice but to focus on tidying up the back end as a top priority.

    So the answer is yes, stable DF system > science in this case. Without a stable backend, and client, there will be no science (so to speak). Hopefully we are reaching a point now when 910 errors and missing filelist's will never haunt us again, and we can focus our efforts and research on some brand new scoring methods as well as flesh out the genetic algorithm I've had in my head for about a year now, as well as wade though the gig's of data you folks have been generating for us to glean the useful bits of knowledge from it and learn how to improve the results.

    We look forward to the day when 100% of our time can be spent on doing the science - testing out new ideas and analyzing data - but for now this cannot be a reality.
    Howard Feldman

  30. #30
    Originally posted by Brian the Fist
    To clarify Michael, you seem to think we have been doing the exact same program thoughout all of phase II.
    No, I don't.

    Originally posted by Brian the Fist
    ...contained not only bugfixes, but changes to the algorithm and/or scoring function. Usually this is invisble to you, the user - as it should be. You should not need to know or care precisely how the algorithm is choosing structures at each step.
    This exactly is an incorrect asumption of yours: I DO CARE and I SHOULD CARE since it is us - the "front hog folders" who actually fund this project to a major extend. I would be happy if a detailed "roadmap" of algorithm changes covering the entire current project phase would be available soon somewhere on the DF website. There are enough people out here who are capable of understanding the details if summarized appropriately. What these people really don't care about is all this information on the server, the website, the stats, etc. I can just repeat what I have stated above: MORE of the scientific aspects, please.

    Originally posted by Brian the Fist
    ...and we can focus our efforts and research on some brand new scoring methods as well as flesh out the genetic algorithm I've had in my head for about a year now, as well as wade though the gig's of data you folks have been generating for us to glean the useful bits of knowledge from it and learn how to improve the results.
    This is exactly what I am waiting for.

    Originally posted by Brian the Fist
    We look forward to the day when 100% of our time can be spent on doing the science - testing out new ideas and analyzing data - but for now this cannot be a reality.
    Well, I hope that now where the new backend is in place, this reality will be de facto established.

    Michael.
    http://www.rechenkraft.net - Germany's largest distributed computing community

    - - - - - - - - - -
    RNAs are nanomachines or nanomachine building blocks. Examples: The ribosome, RNase P, the cellular protein secretion machinery and the spliceosome.

  31. #31
    Member
    Join Date
    Oct 2002
    Location
    southeastern North Carolina
    Posts
    66
    Originally posted by Brian the Fist
    ... We are actually on the SIXTH release of the client since Phase II began. Each of these (or at least most of them) contained not only bugfixes, but changes to the algorithm and/or scoring function. Usually this is invisble to you, the user - as it should be. You should not need to know or care precisely how the algorithm is choosing structures at each step. In some cases you will notice, like when we changed from 50 to 100 strucs per generation. These changes may seem subtle, but can potentially have a large impact on the results. ..
    a few words now and again about this sort of thing would be appreciated by some of us.. esp when no personal progress [ per RMSD] has been noted for several million folds.......
    and I do understand that you're not sitting around drinking coffee and twiddling aimlessly when there's no crisis clammering for attention..

    Originally posted by Brian the Fist
    So the answer is yes, stable DF system > science in this case. Without a stable backend, and client, there will be no science (so to speak). Hopefully we are reaching a point now when 910 errors and missing filelist's will never haunt us again, and we can focus our efforts and research on some brand new scoring methods as well as flesh out the genetic algorithm I've had in my head for about a year now, as well as wade though the gig's of data you folks have been generating for us to glean the useful bits of knowledge from it and learn how to improve the results.

    We look forward to the day when 100% of our time can be spent on doing the science - testing out new ideas and analyzing data - but for now this cannot be a reality.
    hope you'll be able to take the time to discuss ...
    " All that's necessary for the forces of evil to win in the world is for enough good men to do nothing."-
    Edmund Burke

    " Crunch Away! But, play nice .."

    --RagingSteveK's mom


  32. #32
    Being new to this project I was wondering:

    Do we produce any true hard scientific data or is this a long running beta test?

    No offence I like the project,at least we have input from DEV team.

    I have looked at a lot of other projects and this is the most user friendly I found.

    Was with seti for 5yrs...... time to come down to earth.



    Also to free dc great team.




    P.S. I did read the above posts .... just a little slow....DOH
    P.S.S sry I spoke to soon when i seen thread about new beta client I now see that it is a seperate client..sry as usuall RTFM.
    Last edited by rel279; 03-10-2004 at 02:10 PM.
    "Slowly Crunching Along"

  33. #33
    Junior Member
    Join Date
    Mar 2004
    Location
    Norway
    Posts
    12
    I have been participating in DF for well over a year now and have been more or less happy with the project. The software is great, and the so is the aim of the project. I haven't even found any reason to register myself in the projects forum although I have been reading it regularly. Over the last month or so I have had a growing concern about the future of DF, and when I saw this thread I thought to my self: now is the time to register.

    I really do miss information about the refinement of algorithms and project achievements, and I always have. I find this one of the weak points of DF.
    DF and F@H do seem like familiar projects to the general public but there is one big difference: F@H publicizes the results of the project in Nature and journalists write about them, while DF still refers to CASP5 as the only result. To me, this is a big difference!

    Brian the First, I think you are wrong when you say that: "... Usually this is invisble to you, the user - as it should be. You should not need to know or care precisely how the algorithm is choosing structures at each step".
    Participants in a project should not need to know, but should have the ability to know. If you personally don't have the time to inform people, then someone else should, the project is lacing in information and as a result DF give the impression of going nowhere rather than going somewhere.
    Last edited by brage; 03-14-2004 at 02:08 PM.

  34. #34
    I agree with Michael's posts above. I was really excited to see DF at CASP, but really disappointed with their results. It was at best "average".
    Last edited by Raj; 03-14-2004 at 09:22 PM.

  35. #35
    Again, one major difference is that F@H has basically an entire research lab devoted to the project - maybe half a dozen full-time researchers devoted to it (not sure exactly how many Vijay's got working on it..). On the other hand, DFP, being a part of the Blueprint Initiative, is just a 'small' side-project, so-to-speak. If you have looked around at www.blueprint.org you will see our major product is actually the BIND protein interaction database.

    I would love to have half a dozen people working on this project to develop the science, but unfortunately it is not in the cards right now. With just one person (Stardragon) working actively on the project, as good as she may be, we have been mostly bogged down lately in tightening up the backend and such, as we've mentioned, and had to forsake scientific advancement in doing so.

    We are, however, actively seeking a PostDoc to join us and help out on the science side. We have plenty of ideas on how to improve the algorithm and hopefully lead to publishable results, but simply have not had the time to implement them as of yet. We intend to publish some of the present 'folding movie' work as well but it will be awhile until we have analyzed the mounds of data and distilled it to a paper. Rest assured that the details of the algorithm and results WILL be published, in one form or another, its just a matter of time...
    Howard Feldman

  36. #36
    Junior Member
    Join Date
    Mar 2004
    Location
    Norway
    Posts
    12
    I am willing to give DF both my time and my CPU time as long as there is a documented progress and not just statistics and an endless number of protein changes. If DF could use more CPU time I think this is the way to go. Users who care about the refinement of algorithms (like myself) are in my opinion of great value to projects like this. Feed us with some news and document some progress and we stay dedicated and loyal for years to come. We even recruit new members and buy new machines just for you. Statistics are boring (over time), science is always fun! For the time being DF doesn't seem like much more than climbing a list, witch is sad. Don't get me wrong, I do understand that DF have limited recourses, but I am not asking for that much, just a note like: "made adjustments to the folding algorithm" in the news section. I do not want these changes to be made invisible to me, as they are important to both me and probably other users. Keep up the good work, and I look forward to new results from the DF project. (witch is one of my favorite dc projects)

  37. #37
    Member
    Join Date
    Apr 2003
    Location
    Germany
    Posts
    59
    Originally posted by brage
    I am willing to give DF both my time and my CPU time as long as there is a documented progress and not just statistics and an endless number of protein changes. If DF could use more CPU time I think this is the way to go. Users who care about the refinement of algorithms (like myself) are in my opinion of great value to projects like this. Feed us with some news and document some progress and we stay dedicated and loyal for years to come. We even recruit new members and buy new machines just for you. Statistics are boring (over time), science is always fun! For the time being DF doesn't seem like much more than climbing a list, witch is sad. Don't get me wrong, I do understand that DF have limited recourses, but I am not asking for that much, just a note like: "made adjustments to the folding algorithm" in the news section. I do not want these changes to be made invisible to me, as they are important to both me and probably other users. Keep up the good work, and I look forward to new results from the DF project. (witch is one of my favorite dc projects)
    Thumbs up!

  38. #38
    Member
    Join Date
    Oct 2002
    Location
    southeastern North Carolina
    Posts
    66
    Originally posted by brage
    I am willing to give DF both my time and my CPU time as long as there is a documented progress and not just statistics and an endless number of protein changes. If DF could use more CPU time I think this is the way to go. Users who care about the refinement of algorithms (like myself) are in my opinion of great value to projects like this. Feed us with some news and document some progress and we stay dedicated and loyal for years to come. We even recruit new members and buy new machines just for you. Statistics are boring (over time), science is always fun! For the time being DF doesn't seem like much more than climbing a list, witch is sad. Don't get me wrong, I do understand that DF have limited recourses, but I am not asking for that much, just a note like: "made adjustments to the folding algorithm" in the news section. I do not want these changes to be made invisible to me, as they are important to both me and probably other users. Keep up the good work, and I look forward to new results from the DF project. (witch is one of my favorite dc projects)
    also in total agreement..
    wont be building more boxes for a while, tho...sorry
    " All that's necessary for the forces of evil to win in the world is for enough good men to do nothing."-
    Edmund Burke

    " Crunch Away! But, play nice .."

    --RagingSteveK's mom


  39. #39
    Senior Member
    Join Date
    Apr 2002
    Location
    Oosterhout, Netherlands
    Posts
    223
    Originally posted by brage
    I am willing to give DF both my time and my CPU time as long as there is a documented progress and not just statistics and an endless number of protein changes. If DF could use more CPU time I think this is the way to go. Users who care about the refinement of algorithms (like myself) are in my opinion of great value to projects like this.
    I hate to be the one to break it to you, but you're a minority. All help is appreciated but there is only a really small amount of users interested in that. And I think that the organisation wants to please everybody but the largest amount of users first.
    Proud member of the Dutch Power Cows

  40. #40
    Junior Member
    Join Date
    Mar 2004
    Location
    Norway
    Posts
    12
    I disagree with you, [DPC]Mobster. On the contrary I think most people do care. After all I am not asking for much more than a note about the work being done. It's a small effort with tremendous result if done the right way. I'll try to explain myself:

    I think that people who hear about the opportunity to contribute to science, folding protein, searching for radio signals from outer space, or what ever, do care about the progress the projects are making. After all we hear about the project, decide to join, download the program, install it and run it for hundreds and hopefully thousands of hours. And it's a fact that most people run the screensaver version (at lest that goes for SETI and probably also DF). I think the reason for that is so that they can monitor the progress and brag about the screensaver to friends and so on. But after a while you get used to the screensaver, and you have told all of your friends, and most people start wondering: "what happens with the work I contribute". Is there a reason for me to do this, other than a cool screensaver? This is when a good project never leaves the volunteer in doubt. Progress must be documented so that the volunteer feel like doing something useful as well. I do believe that lots of people "install and forget", but I also believe that they are the first to "uninstall and forget".

    Statistics is one way of making it clear that there is a progress, but statistics isn't enough to make people stay with a project for many years. Personal contact with the people who run the project would be very bonding, but requires too much work. A project that want to grow large (and by large I mean 100.000+ cpus) has to give response to it's volunteers.

    There are some key elements to attract users to a dc-project: statistics, good client with a really cool screensaver, forum and response from the developers on progress and other useful "inside information".

    DistributedFolding now has approximately 30.000 registered users, and 1454 active users. This is nothing! Something has to be done to make people stay. DF has statistics, a good client with a really cool screensaver and a good forum but is lacking big time in response from developers. Participating in DF feels like contributing to statistics and no one is telling you anything else. I am about to quit the project; it feels like throwing cpu cycles into a black hole. It probably is more useful than that, but the result page is not convincing me otherwise, on the contrary, it strengthens my feeling with some old mediocre CASP results. I want a reason to believe in DF. I am not giving away cpu-time to anyone. There are lots of other useful projects that document their progress, but then again they don't have that cool screensaver... but in the long run, screensavers doesn't matter, only the feeling of doing something useful.

Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •