Results 1 to 29 of 29

Thread: DF runtime limitation

  1. #1

    DF runtime limitation

    It appears that DF produces a maximum of 250 file sets. With the increasing size of the proteins being under examination, would it be possible to extend the number of allowed file sets or ideally remove this limitation completely? With the current settings, on several of our machines DF cannot run offline for three weeks or so when people are for example out on vacation.

    Michael.
    http://www.rechenkraft.net - Germany's largest distributed computing community

    - - - - - - - - - -
    RNAs are nanomachines or nanomachine building blocks. Examples: The ribosome, RNase P, the cellular protein secretion machinery and the spliceosome.

  2. #2
    Senior Member wirthi's Avatar
    Join Date
    Apr 2002
    Location
    Pasching.AT.EU
    Posts
    820
    Hi Michael,

    It's true, that limitation could be a bit annoying. Anyway, you must consider, running the client offline for such a long time is not the best idea since you never know for sure when the next update is, so all your data could be trash.

    [GERMAN]Und anderen Teams vorzugaukeln, dass man keine Rechenzeit mehr investiert, ist weder die feine Art, noch hilft es dem Projekt. [/GERMAN]

    If you still want to do it (what I assume), you could write a batchfile/shellscript that does the following things every week or so:

    * stop the client (delete foldtrajlite.lock)
    * copy filelist.txt and the data-files to a "backup"-directory
    * delete those files in the client directory
    * restart the client

    And be sure to use the "-s 10000" option, to increase your buffer-size to the maximum.

    This will let you have infinite buffer. When you return from "vacation" ( ), you could upload all the backup-buffers manually or by another script.

  3. #3
    Writing scripts, even simple ones like this, isn't an option for most users.

    That said, you can probably find someone to write it for you, if it is needed for limited use.

  4. #4
    Release All Zigs!
    Join Date
    Aug 2002
    Location
    So. Cal., U.S.A.
    Posts
    359
    Was thinking about this and came back to this post...

    At 250 sets x 10000 structures/set = 2,500,000 structures ...

    2.5 million structures is a lot of structures to be sitting around in the first place. That's 1/400th of the total goal in case anybody is wondering. If you miss the changeover that's quite a number of structures gone forever and you'll never get credit for and for your team if you are on one.

    RS½
    The SETI TechDesk
    http://egroups.com/group/SETI_techdesk
    ~Your source for astronomy news and resources~

  5. #5
    Hey guys, we just don't like limitations in general!

    Now honestly, we have an increasing number of offline folders - that's why I ask. All the other concerns are actually not applicable because with the stats system and the fixed amounts of structures to be folded for the given proteins (that have increasing size for the future), it is very well calculatable whether or not within a given time the protein will be changed.

    The idea with the script is nice - but rather uncomfortable. Therefore again, can't this limitation be removed?

    Michael.
    http://www.rechenkraft.net - Germany's largest distributed computing community

    - - - - - - - - - -
    RNAs are nanomachines or nanomachine building blocks. Examples: The ribosome, RNase P, the cellular protein secretion machinery and the spliceosome.

  6. #6
    Release All Zigs!
    Join Date
    Aug 2002
    Location
    So. Cal., U.S.A.
    Posts
    359
    This is off-topic from the post, but noticing you were in Germany I was wondering if you ever played any of the Moorhuhn games, Micheal?

    RS½
    The SETI TechDesk
    http://egroups.com/group/SETI_techdesk
    ~Your source for astronomy news and resources~

  7. #7
    Originally posted by runestar½
    This is off-topic from the post, but noticing you were in Germany I was wondering if you ever played any of the Moorhuhn games, Micheal?

    RS½
    Errrr, although it really is off-topic: Sure!

    Michael.
    http://www.rechenkraft.net - Germany's largest distributed computing community

    - - - - - - - - - -
    RNAs are nanomachines or nanomachine building blocks. Examples: The ribosome, RNase P, the cellular protein secretion machinery and the spliceosome.

  8. #8
    Release All Zigs!
    Join Date
    Aug 2002
    Location
    So. Cal., U.S.A.
    Posts
    359
    Was looking for someone fluent in German and English to translate some of some stuff into English so those of us over here can understand heh...

    Its affectionaly called Turkey Shoot over here since the birds resemble turkeys...

    Anyhoo, why are you (or your teammates) running so many structures off-line? 2.5 million is quite a many to be queing up. Is this on one machine or multiple machines?

    I'm sure someone here can write you up some kind of script or batch file which you can then customize yourself.

    Inquiring minds want to know. =)

    TTFN.

    RS½
    The SETI TechDesk
    http://egroups.com/group/SETI_techdesk
    ~Your source for astronomy news and resources~

  9. #9
    Senior Member wirthi's Avatar
    Join Date
    Apr 2002
    Location
    Pasching.AT.EU
    Posts
    820
    Anyhoo, why are you (or your teammates) running so many structures off-line? 2.5 million is quite a many to be queing up. Is this on one machine or multiple machines?
    They'r doing top-secret work

    What do you want to have translated? Just send a PM with the URL or the text and I'll translate it, if it's not too long.

  10. #10
    Originally posted by runestar½
    Its affectionaly called Turkey Shoot over here since the birds resemble turkeys...

    Anyhoo, why are you (or your teammates) running so many structures off-line? 2.5 million is quite a many to be queing up. Is this on one machine or multiple machines?
    1. Well, in fact - it is not a turkey (much too big and looks different, too). It is a "Moorhuhn" - that's a small chicken-like bird (actually it is smaller than a "standard house" chicken) which is exceptionally difficult to hunt because of its increadible flying speed. Also, it rarely flies in a straight line (unlike in the game) and usually stays down near the ground when flying and is difficult to make out because of its brown color.

    2. Some of our people have quite a number of computers that are not connected to the internet. And others want to go on vacation for 3 weeks or so but within this time the 2.5 million structures barrier is EASILY crossed (just take three of my TBird@1.1 GHz [and these I consider "old" systems] - they would be enough to do so: 80.000 structures/day * 3 comps * 14 days = 3.360,000 structures).

    3. No top-secrets over here at all. Just folding.


    Michael.
    http://www.rechenkraft.net - Germany's largest distributed computing community

    - - - - - - - - - -
    RNAs are nanomachines or nanomachine building blocks. Examples: The ribosome, RNase P, the cellular protein secretion machinery and the spliceosome.

  11. #11
    I would like a finite limit on this, since the files DO take up disk space and who really wants 50,000 files sitting on their hard drive waiting to be uploaded? However, since the file sizes are smaller now, I can increase the limit. Would you care to suggest a reasonable new limit that would make you happy then?

    Your argument is invaliud though - you multiplied by 3 comps - you can't do that, the ultimate limit would be 2.5 million PER CPU. Do you REALLY have a CPU that will crank out 2.5 million is the time you are away? Anyways suggest a new limit and Ill consider it for the next update.
    Howard Feldman

  12. #12
    Fixer of Broken Things FoBoT's Avatar
    Join Date
    Dec 2001
    Location
    Holden MO
    Posts
    2,137
    i run a lot of non-net boxen, but normally set -s 10000 to avoid this problem, and i am paranoid about losing structures, so i gather a couple times a week

    but, to answer the question, if its going to be raised, i suggest:

    2^10

    as a nice round number

    that would raise the capacity by about 4X

    if we take some posted benchmarks from an earlier FAST protein, the capacity of a FAST pc would be about 20 days, more on the average boxen
    --------
    show my work


    from the benchmark below, lets assume that within the next six months, a faster cpu on a faster protein might be able to crank out 500,000 strucutres a day (probably much less)
    with 1024 x 10,000 = 10,240,000 / 500,000 = 20 days which is pretty much 3 weeks


    Structures Per Day : 268410

    OS : Windows XP MHz: 1799
    CPU: AMD Athlon(TM) XP 2200+
    Use the right tool for the right job!

  13. #13
    Release All Zigs!
    Join Date
    Aug 2002
    Location
    So. Cal., U.S.A.
    Posts
    359

    RE:Moorhuhn

    Originally posted by Michael H.W. Weber

    1. Well, in fact - it is not a turkey (much too big and looks different, too). It is a "Moorhuhn" - that's a small chicken-like bird (actually it is smaller than a "standard house" chicken) which is exceptionally difficult to hunt because of its increadible flying speed. Also, it rarely flies in a straight line (unlike in the game) and usually stays down near the ground when flying and is difficult to make out because of its brown color.
    Yeah, but consider in the U.S. how many people know about Moorhuhn chickens? =) I just found out about it the other week when I finally decided to look it up. To us, the closest thing is a turkey...albeit rather skinny turkeys.... esp. with the red hanging thingie. =)

    Your mention of hunting makes me think of Distinguished Gentlemen in which Eddie Murphy becomes a congressman. He visits the NRA and they tell him, "We use them [guns] for hunting." THey promptly flush a whole bunch of ducks out and about 6 member proceed to light up the sky with fully automatic machine guns for about 30 seconds. One duck falls down... Eddie Murphy looks over at the duck... "Hmm...not a scratch, must have died of fright..."


    RuneStar½
    The SETI TechDesk
    http://egroups.com/group/SETI_techdesk
    ~Your source for astronomy news and resources~

  14. #14
    Release All Zigs!
    Join Date
    Aug 2002
    Location
    So. Cal., U.S.A.
    Posts
    359
    Originally posted by Brian the Fist
    Your argument is invaliud though - you multiplied by 3 comps - you can't do that, the ultimate limit would be 2.5 million PER CPU. Do you REALLY have a CPU that will crank out 2.5 million is the time you are away? Anyways suggest a new limit and Ill consider it for the next update.
    Howard and Micheal,

    Well... you both got valid points, but... take a look at this.


    My Athlon XP 2000+ (at 2100+ right now) can crank out a full set of 10,000 in about an hour and a half with the extra memory option. Just to make the math nice and tidy we'll say its exactly 90 minutes just running DF, an hour and a half which works out to 16 sets a day.

    At 10000 structures/set * 16 sets of struct/day * 7 day/week * 1 week = 1120000

    So if I were to disappear for two weeks 14 days) I would be okay at 2240000 as Howard says, although I'm getting awfully close to that limit.

    15 days = 2400000
    16 days = 2560000

    So clearly anything over 2 weeks on this example would be trouble. If, as Micheal says, people disappear for 3 weeks on vacation, then they are going to hit the cap even if the client occassionally gets "stuck" on a structures.


    Micheal, I just thought of an idea... if they are going to be away for a long period of time that they can't check the machine(s). Why not run two instances of DF? This should act as a artifical slowdown. Theoritically you still are getting the same amount of work done, but at least DF doesn't stop when it reaches its cap.

    Alternatively, they could set the primary on a higher priority, and the second on a lower priority. Once the first one hits its cap, the second one will kick in.

    Its easy enough to set up two directories, and then use two copies of dfGUI to set this up. You'd just need to figure out the proper priority numbers. No messy scripts or batch files to setup or run or test or worry they are working right. =)

    TTFN,

    RuneStar½
    The SETI TechDesk
    http://egroups.com/group/SETI_techdesk
    ~Your source for astronomy news and resources~

  15. #15
    Originally posted by Brian the Fist
    I would like a finite limit on this, since the files DO take up disk space and who really wants 50,000 files sitting on their hard drive waiting to be uploaded?
    Well, we are simply crazy. Some of us would run the client offline for months if there wasn't the occasional protein change (in fact something that kept quite a number of people from participating during the CASP5 effort).

    Originally posted by Brian the Fist However, since the file sizes are smaller now, I can increase the limit. Would you care to suggest a reasonable new limit that would make you happy then?
    Well that would be great! How about setting a new limit to the old one multiplied by 10?

    Originally posted by Brian the Fist Your argument is invaliud though - you multiplied by 3 comps - you can't do that, the ultimate limit would be 2.5 million PER CPU. Do you REALLY have a CPU that will crank out 2.5 million is the time you are away? Anyways suggest a new limit and Ill consider it for the next update.
    Actually, it was only meant to demonstrate that you can easily produce a relevant number of structures. But just take the examples mentioned by others in this thread - it's not too difficult to reach the 2.5 million barrier if you intend to go on a vacation. By the way, at present, I am waiting for the release of the new Epox nForce2 mobo that will be equipped with 2x 256 MB CORSAIR 333 DDR RAM modules (CL2,0) and a XP2700+ (FSB-333). I will let you know how long it takes to complete 2.5 million structures as soon as I have assembled that little sucker (yes, after that I guess I'll have to eat dry bread only - for a few months or so...).

    Michael.
    http://www.rechenkraft.net - Germany's largest distributed computing community

    - - - - - - - - - -
    RNAs are nanomachines or nanomachine building blocks. Examples: The ribosome, RNase P, the cellular protein secretion machinery and the spliceosome.

  16. #16
    Bottom of the Top Ten TheOtherZaphod's Avatar
    Join Date
    May 2002
    Location
    zone 5 west
    Posts
    100
    I have a farm that has a good network connection so I do not have this problem, but I do have a suggestion.

    Now that the size of the individual result sets has been decreased so dramatically, why not change the maximum uploadfreq from 10000 to a larger value. Use whatever the factor that the size of the results decreased as the factor to increase the result set size by. It is still a limit, but it seems like it should be fair enough; if our european friends want to take longer vacations than that they should hire network-babysitters .
    Don't Panic

  17. #17
    Release All Zigs!
    Join Date
    Aug 2002
    Location
    So. Cal., U.S.A.
    Posts
    359
    Well the project would die if people were running off-line for months. What good is the project if people take forever to return results. Although I'm sure Howard appreciates those that run it without 24/7 connections, that's not the idea behind distributed computing.

    First of all, results need to be returned in a reasonable timeframe... months isn't going to do it. As far as I know, the majority of D.C. projects have a preset turn-around or at least a timeframe for work to be returned. This project requires results to be returned within a reasonable time or else the project is useless. How is Howard and his team going to know how good their algorithims are if people return results months later?

    And second, the longer it takes to reach the goal, the longer it is going to take to move onto to the next protein and any changes in the client. As much fun as it is, it is still a scientific project. The sooner we meet the goals, the sooner Howard and his team can study the results and see how well they are doing and make any changes and tweaks as needed.


    Question though, why can't these machines submit results while they are gone? Are we talking about home or work?

    Best,

    RS½
    The SETI TechDesk
    http://egroups.com/group/SETI_techdesk
    ~Your source for astronomy news and resources~

  18. #18
    Originally posted by runestar½
    Well the project would die if people were running off-line for months. What good is the project if people take forever to return results. Although I'm sure Howard appreciates those that run it without 24/7 connections, that's not the idea behind distributed computing.
    RS½
    While this may be true, there really isn't any difference for the project if the majority of the protein structures are uploaded immediately after being crunched or shortly before a protein changeover occurs. Certainly with the current rate of protein changeovers, it is possible to have a farm which can only be accessed again in about a month participate in the project. A higher limit would make the project more viable for people in this situation.
    A member of TSF http://teamstirfry.net/

  19. #19
    Ok, when the next update comes, the number of files to buffer (with -df) will be oscenely large, like 10000 maybe. I will not change the maximum upload 'packet' size of 10,000 structures though, I see no reason to increase this and it IS important because we basically only get one actualy protein structure file for every 10,000 (or whatever number you pick here). So if I made it bigger, we'd be collecting less and less data, and collecting data is what this project is all about (kind of.. you know what I mean).
    And it IS true that if you go on a long holiday, you may indeed come back to find that we've switched proteins, or switched EXEs and experiments even, and your data is no longer good. Generally a turnaround time of 1-2 weeks (i.e. you must connect and upload a minimum of once every 1-2 weeks) is not unreasonable for a scientific project. The nature of most useful algorithms require communication both from client to server and server to client in this way in order to produce useful results, as you may imagine.
    Howard Feldman

  20. #20
    Our team does definetely NOT intend to fold entirely offline - so, don't worry. We have always taken care that the data is returned ASAP. What I was talking about is that there are circumstances under which the current file set limitation causes people to simply switch to another dc project - at least for the time they are away (vacation & stuff). It is clear that everybody investing the electricity will do his/her best to return the data in time. So again, it's simply all about "removing a hurdle" when I ask for increasing the maximum of allowed file sets.

    Therefore, thanks a lot Howard - I will pass that info to our team.

    Michael.
    http://www.rechenkraft.net - Germany's largest distributed computing community

    - - - - - - - - - -
    RNAs are nanomachines or nanomachine building blocks. Examples: The ribosome, RNase P, the cellular protein secretion machinery and the spliceosome.

  21. #21

  22. #22
    ^bump^

    Did this get implemented? 2.5 million structures isn't a lot concidering the speed of the new client

    Ni!
    Oh, what sad times are these when passing ruffians can say Ni at will to old ladies..

  23. #23
    Michael,

    I take it that those computers are networked, though, right?

    I'm currently working on a generic solution for these kinds of problems in my free time.

    I'm planning on an early release in about 2 weeks time though it will be mainly for testing at that time.

    Planned functionality:

    Client-Side:
    - Gather files and upload to them to the server (deletes the files after successful upload)

    Server-Side:
    - output results to a single file (for easy transport to another location if the server doesn't have internet access)
    - upload results contained in file to DF

    The early release will have fairly minimal functionality, things along the lines of how often to upload files to the server and from the server to DF.

    Please note! This is intended for people folding on multiple computers with the same handle!

    Placing a client and server on the same machine is perfectly doable and will allow you an unlimited amount of storage. (well, till you run out of hd-space). Though if howard's already implemented the increased buffer size to 10K, then that isn't really a problem.

    (Edit: Can't beleive I'm helping the "competition"... J/K whatever helps DF grow and easier to use!)
    Last edited by m0ti; 12-11-2002 at 05:32 PM.
    Team Anandtech DF!

  24. #24
    Senior Member
    Join Date
    Apr 2002
    Location
    Santa Barbara CA
    Posts
    355
    Originally posted by KWSN Grim Reaper
    ^bump^

    Did this get implemented? 2.5 million structures isn't a lot concidering the speed of the new client

    Ni!
    from the whatsnew.txt:

    12/10/2002

    - Solaris 32-bit version now support SunOS 5.6 and up
    - Increased maximum buffer size when using -df option
    - Added support for sampling with less atom bump-checking

  25. #25
    Release All Zigs!
    Join Date
    Aug 2002
    Location
    So. Cal., U.S.A.
    Posts
    359
    SO what's the maximum buffer size now then, Howard?

    RS½

  26. #26
    Originally posted by Welnic
    from the whatsnew.txt:

    12/10/2002

    - Increased maximum buffer size when using -df option
    Ok - nice! And to what size exactly?

    @m0ti: No, they are not on a network. But still your efforts are surely useful!

    Michael.
    http://www.rechenkraft.net - Germany's largest distributed computing community

    - - - - - - - - - -
    RNAs are nanomachines or nanomachine building blocks. Examples: The ribosome, RNase P, the cellular protein secretion machinery and the spliceosome.

  27. #27
    real big
    Howard Feldman

  28. #28
    Release All Zigs!
    Join Date
    Aug 2002
    Location
    So. Cal., U.S.A.
    Posts
    359
    Originally posted by Brian the Fist
    real big
    Bigger than the Predator in Star Trek: Nemesis? =)

  29. #29
    Originally posted by Brian the Fist
    real big
    HOW big?

    Michael.
    http://www.rechenkraft.net - Germany's largest distributed computing community

    - - - - - - - - - -
    RNAs are nanomachines or nanomachine building blocks. Examples: The ribosome, RNase P, the cellular protein secretion machinery and the spliceosome.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •