Page 2 of 3 FirstFirst 123 LastLast
Results 41 to 80 of 98

Thread: Beta 7 (or algorithm # 3.5)

  1. #41
    OCworkbench Stats Ho
    Join Date
    Jan 2003
    Posts
    519
    Yes, this is the drawback that I have been looking at. A lot of computers are 1 Ghz and below..a lot It may discourage these people if they will never see the end of a complete set. The fact that some of these "slow" computers may actually be working on valuable low energy proteins when changeover comes, you have to ask can we afford for them to stop when they may produce a valuable result, albeit a week late. Maybe extend the period of production value being halved after changeover so they can finish the Fold yet still get points if they want to
    I am not a Stats Ho, it is just more satisfying to see that my numbers are better than yours.

  2. #42
    Remember speed will depend on protein size and other factors, as well as on the final algorithm we choose. So lets not concern ourselves with slow computers just yet.

    As Ive mentioned before, we have a good pile of P3-450 MHz's locally, so I can assure you the minimum requirements will not exceed this at least, in the foreseeable future.

    It sounds like most people are OK but a few are having strange problems. For those with problems, please try to describe in as much detail as possible step-by-step what you did to encounter the error, and post the exact error message (from screen or error.log) if you haven't already.
    Howard Feldman

  3. #43
    Social Parasite
    Join Date
    Jul 2002
    Location
    Hill Country
    Posts
    94
    Runnin Linux. The "switchover" from b6 to b7 did NOT go smoothly.

    While running b6, I entered 'Q'. I then untarred b7 and entered 'foldit'. It told me "uploading 1/29" but then said (I think) "could not find structure from previous generation", and did NOT upload the twenty-odd generations queued up from b6. So I erased filelist.txt and all the .bz2 files, and started 'foldit' with a clean slate.

    Here are the lines from error.log:

    ========================[ Apr 20, 2003 10:16 AM ]========================
    ERROR: [000.000] {foldtrajlite2.c, line 3861} Cannot find structure from previous generation ; find it manually or delete filelist.txt to continue
    ERROR: [000.000] {foldtrajlite2.c, line 4019} Error during upload: Previous generation missing


    And here are the last bunch of lines that were in the b6 filelist.txt:

    ./fold_1_MyUserID_0_MyUserID_protein_183.log.bz2
    ./MyUserID_1_MyUserID_protein_183_0000005.val
    fold_0_MyUserId_55_protein.log.bz2
    MyUserID_0_protein_0000057.val
    CurrentStruc 0 63 125 0 0 57 52.457 -726.964 1058.155 0.274 5073051.500 0.850 1.500 250.000 -----------------------HHHHHH----HHHHHHH---------------- HHHHHHH----------------HHHH-------------
    d11cdbe4897ffefe9b4ec2a9096743b3

    mikus

  4. #44
    Social Parasite
    Join Date
    Jul 2002
    Location
    Hill Country
    Posts
    94
    A nit: while it was processing gen 0, the caption said: "1 gen buffered".

    When it fininshed gen 0, it did not upload anything. (The caption again said "1 gen buffered".)

    When it finished gen 1, it uploaded *one* fileset. (The caption now said "0 gen buffered".)

    mikus (using Linux)


    p.s.

    Somewhere along the line it managed to record a "could not connect" in error.log, even though I activated my relay-proxy while it was minimizing gen 0, and deactivated the relay-proxy after it finished uploading gen 1

  5. #45
    Ancient Programmer Paratima's Avatar
    Join Date
    Dec 2001
    Location
    West Central Florida
    Posts
    3,296
    Just surfing along. Unexpected, unprovoked Reboot! Have no idea if related to folding...

    Win2K. AMD XP2100+ on MSI KT-7 with 256MB RAM. UPS. Error.log empty. Filelist.txt as follows:

    .\fold_0_MyUserID_0_MyUserID_protein_40.log.bz2
    .\MyUserID_0_MyUserID_protein_40_0000005.val
    .\fold_0_MyUserID_30_MyUserID_protein_41.log.bz2
    .\MyUserID_0_MyUserID_protein_41_0000039.val
    CurrentStruc 0 51 125 41 1 39 8.856 -2005.669 -547.685 -1326.325 92588192.000 1.500 2.800 1538.198 -HHHHHH------------------HHHH------HHHHHH---------------HHHH--------------------HHHH-HHHHH------
    bb1d4e278a080d1b0c8115329f30e9c7

    Deleted the .lock file & it seems to have recovered OK.
    Last edited by Paratima; 04-20-2003 at 05:49 PM.

  6. #46
    Ancient Programmer Paratima's Avatar
    Join Date
    Dec 2001
    Location
    West Central Florida
    Posts
    3,296
    Arrrrrrgh!

    The RedHat Linux box just crashed with the first sig 11 it's had in months!

    Deleted filelist.txt. (sigh) Starting over. Had my best RMSD, too.

  7. #47
    Junior Member
    Join Date
    Apr 2003
    Location
    Budapest, Hungary
    Posts
    12


    -1? :shocked:
    I never used to be able to finish anything, but now I

  8. #48
    Senior Member
    Join Date
    Jan 2003
    Location
    North Carolina
    Posts
    184
    Originally posted by Paratima
    Arrrrrrgh!

    The RedHat Linux box just crashed with the first sig 11 it's had in months!

    Deleted filelist.txt. (sigh) Starting over. Had my best RMSD, too.
    The client is supposed to checkpoint regularly so it can recover after crashes without losing much work. Did you try simply restarting the client? If so, what happened?

  9. #49
    Ancient Programmer Paratima's Avatar
    Join Date
    Dec 2001
    Location
    West Central Florida
    Posts
    3,296
    Tried, but not successful:

    FATAL ERROR: [000.000] {foldtrajlite2.c, line 2952} Unable to find file UserID_0_UserID_protein_73_0000040.val; cannot continue - replace file and start again, or manually delete filelist.txt

    Got that, so restarted.

    As it happens, there was a ....0000040_min.val. Sorry, should have provided more detail earlier.

  10. #50
    Senior Member
    Join Date
    Jul 2002
    Location
    Kodiak, Alaska
    Posts
    432
    for my axp 1800+ system, it tries accessing the internet and complaining about not being able to connect between 10pmish to 1am. (due to a hub that eliminates access to the internet and insures that certain folks go to bed on time..
    Most of the problems connecting are matched in both the slow machine and the fast machine's error logs; they're just more frequent in the faster machine's.. And before I got to see some errors during the day on the fast machine, it started having problems on Sunday morning.
    "This program has performed an illegal operation and will be shut down"..
    I restarted it, and it uploaded a few of the files that it was holding, and then popped up with the illegal op error again.

    Will see if I can identify what in the directory is corrupted tomorrow when it's quieter here..
    (Just put a clean copy of the client in a new directory, and started it from scratch..)

  11. #51
    Ol' retired IT geezer
    Join Date
    Feb 2003
    Location
    Scarborough
    Posts
    92

    Unhappy Processing Extremes

    I've been experiencing extreme differences of processing times for different generations. Initial times were 1.0 to 3.2 hours per generation, averaging about 1.5 hours and structural lax values of 20%, 25%, and 0%. But generation 22 required 8.4 hours and resulted in structural lax values of 75%, 100%, and 100% (dfGui.. thanks to Jeff). Now I'm screaming thru generations as low as 12 minutes with lax values still at 41%, 57%, and 100%. Computer is Celeron 600.

    Is this normal??? Anyone else have similar experiences?

    Ned

  12. #52
    This is normal behaviour for the beta... sometimes it gets into very "tight" spots, where there are many many collisions. This can cause some generations to take up to several hours. In general though, not to far past these generations the client really zips through 'em.
    Team Anandtech DF!

  13. #53
    Senior Member
    Join Date
    Mar 2002
    Location
    MI, U.S.
    Posts
    697
    And Ned -- in addition to m0ti's post, when you do get a generation that's that tight, and the laxness values do get that high, it's normal for it to start screaming through the next few generations. It does this simply because the laxnesses are so high. Anything (in a relative sense) is permissible, so it's easy to find structures that work.

  14. #54
    OCworkbench Stats Ho
    Join Date
    Jan 2003
    Posts
    519
    What is the procedure for restarting the Beta after a power failure..I thought it had checkpoints so you could restart without wiping the current work. A few of OCworkbench members running the Beta have had power outages and failed to restart successfully, needing to wipe the directories. I am checking to see if they kept the error logs like I told them to
    I am not a Stats Ho, it is just more satisfying to see that my numbers are better than yours.

  15. #55

    To save screens or not to save screens?

    So it seems to me that out of those present that everyone runs the text client when testing these beta programs.... Does anyone test the screensaver (besides myself)? Is it worth it to test the screensaver? Honestly, I feel a bit guilty that maybe every single spare cycle isn't being dedicated to Science... but that screensaver sure is cool. I could almost see it being worth it to soup up the screensaver maybe just a bit more, trading in a few cycles for an improvement to an already awesome screensaver that would make it all the rage amongst the little gamer kiddies with the high-powered performance machines. Less cycles per machine, but more machines....

    Additional question: the FAQ vaguely states that the screensaver and client can work together... can they work together on the SAME protein (for instance, the client working in the background and the screensaver taking over when it kicks in)?

  16. #56
    Senior Member
    Join Date
    Apr 2002
    Location
    Santa Barbara CA
    Posts
    355
    You should definitely test the screensaver. When Howard first came out with an improved screensaver, back before the beta client, he got rather grumpy because not that many people that hang out in this forum tested the screensaver. So if you feel any desire to run it, you should. Don't feel guilty, it is just the beta test and the screensaver has the potential to draw in a lot wider group than us hard core text client guys.

    Both clients use the same files. If you stopped the screensaver, copied the data files into a text client folder and started up the text client it would be happy. It also works the other direction. You could do this manually at the end of the day or write a batch file that would do it. The screensaver and the text client can't both run using the same data set at the same time though.

  17. #57
    Ancient Programmer Paratima's Avatar
    Join Date
    Dec 2001
    Location
    West Central Florida
    Posts
    3,296
    Grumpy: See my post on page 2 of this thread concerning my non-success in restarting after a calamity. I was in a bit of a hurry, but if it happens again, I'll take more notes.

    Eriol: I think the FAQ is trying to say that the screen saver and the text client both produce the same end result, not that they would play well together on the same machine. Unfortunate wording, but there you go...two peoples divided by a common language, Churchill said.

  18. #58
    OCworkbench Stats Ho
    Join Date
    Jan 2003
    Posts
    519
    Hmmm, looks like Iron Bits is holding out with his 5.35. What did the last Beta get down to, I cannot remember.
    I am not a Stats Ho, it is just more satisfying to see that my numbers are better than yours.

  19. #59
    Junior Member
    Join Date
    Apr 2003
    Location
    bucharest
    Posts
    3
    Hi! Its my first post here!

    I had 2 power failure but nothing bad was happened.For me it seems to be very stable .

  20. #60
    Junior Member
    Join Date
    Apr 2003
    Location
    Budapest, Hungary
    Posts
    12
    Originally posted by gymyforte
    Hi! Its my first post here!

    I had 2 power failure but nothing bad was happened.For me it seems to be very stable .
    Welcome!

    This is my 1st error with this version (I think it is a common error):

    ========================[ Apr 23, 2003 1:27 PM ]========================
    ERROR: [000.000] {ncbi_http_connector.c, line 217} [HTTP] Error writing body at offset 12288
    ERROR: [000.000] {ncbi_http_connector.c, line 117} [HTTP] Retry attempt(s) exhausted, giving up
    ERROR: [010.003] {taskapi.c, line 1217} [ReadServerResponse] Timeout waiting for response, got 0 chars.
    ERROR: [000.000] {foldtrajlite2.c, line 4019} Error during upload: NO RESPONSE FROM SERVER - WILL TRY AGAIN LATER

    ========================[ Apr 23, 2003 1:34 PM ]========================
    I never used to be able to finish anything, but now I

  21. #61
    Ancient Programmer Paratima's Avatar
    Join Date
    Dec 2001
    Location
    West Central Florida
    Posts
    3,296
    Text client stopped running on a Win2K box at Gen 217. The client is still in memory, just stopped.

    error.log:

    ========================[ Apr 22, 2003 10:38 PM ]========================
    ERROR: [001.001] {bbox.c, line 266} ..24..

    ERROR: [001.001] {bbox.c, line 268} .. HA ..

    ERROR: [001.001] {bbox.c, line 269} ..1..

    ERROR: [001.001] {bbox.c, line 270} .. CA ..

    FATAL ERROR: [004.001] {bbox.c, line 282} b-d node crash; node not inserted

    filelist.txt:

    .\fold_0_userid_15_userid_protein_216.log.bz2
    .\userid_0_userid_protein_216_0000024.val
    CurrentStruc 0 1 125 217 1 0 10000000.000 10000000.000 -10000000.000 0.000 0.000 1.300 2.400 879.467 -HHHHHH------------------HHHH------HHHHHH---------------HHHH--------------------HHHH-HHHHH------
    83bcfbc743ab28532c3924dacbbac3d7

    Seems to have restarted OK. I'll watch it.
    Last edited by Paratima; 04-23-2003 at 10:12 PM.

  22. #62
    25/25Mbit is nearly enough :p pointwood's Avatar
    Join Date
    Dec 2001
    Location
    Denmark
    Posts
    831
    In one of the first "beta threads", I askec about the possibility of adding a -benchmark switch to the client, but I never got an answer

    What I am hoping for is an easy way to benchmark various CPU's and systems. The switch should just run a relatively short benchmark on a special benchmark work unit and then quit again. It should of course not submit any data to the server.

    Would that be possible?
    Pointwood
    Jabber ID: pointwood@jabber.shd.dk
    irc.arstechnica.com, #distributed

  23. #63
    OCworkbench Stats Ho
    Join Date
    Jan 2003
    Posts
    519
    Ok Howard, I have a good one for you. My RMS scores are getting bigger now that I have passed Gen 150. And bigger and bigger. At this rate they will be back at the Gen 0 figures . Is this meant to happen on 2 separate computers..I hate coincidence

    :bs:
    I am not a Stats Ho, it is just more satisfying to see that my numbers are better than yours.

  24. #64
    Ancient Programmer Paratima's Avatar
    Join Date
    Dec 2001
    Location
    West Central Florida
    Posts
    3,296
    Similar problem as before, same W2K machine. This time the program disappeared entirely! dfGUI thought everything was OK. The only way I found it was by checking the task manager. Nothing new in error.log but a timestamp.

    filelist.txt:

    .\fold_0_userid_15_userid_protein_231.log.bz2
    .\userid_0_userid_protein_231_0000025.val
    .\fold_0_userid_0_userid_protein_232.log.bz2
    .\userid_0_userid_protein_232_0000007.val
    CurrentStruc 0 51 125 232 1 7 7.268 -805.002 927.147 19.114 7971384.000 1.800 3.400 3557.933 -HHHHHH------------------HHHH------HHHHHH---------------HHHH--------------------HHHH-HHHHH------
    49dc79cc61a65309cc7dd9a50afd1003

    progress.txt:

    Building structure 46 generation 232
    4 until next generation
    1 generations buffered
    Best Energy so far: 7.268

    Am I the only one having this problem? This ain't gonna fly, if I have to nursemaid every machine, every hour, just to find out if the program's still running!

  25. #65
    Junior Member
    Join Date
    Feb 2003
    Location
    Alphen aan den Rijn, NL
    Posts
    8

    Traj distr. error

    It seems the foldtrajlite program stopped running at the end of gen. 117 on my machine. Even though DFgui still thought it was running. I tried to restart it but now I get:

    [NULL_Caption] FATAL ERROR: [002.003] Unable to read trajectory distribution,
    please create a new one
    Hit Return

    Filelist.txt:

    .\fold_0_9ks53vel_0_9ks53vel_protein_116.log.bz2
    .\9ks53vel_0_9ks53vel_protein_116_0000004.val
    .\fold_0_9ks53vel_25_9ks53vel_protein_117.log.bz2
    .\9ks53vel_0_9ks53vel_protein_117_0000027.val
    CurrentStruc 0 51 125 117 1 27 7.178 -126.801 700.260 301.332 6334455.000 1.200 2.200 665.002 --------------------HHHHHHHHHHHH---HHHHHH---------------HHHHHH---------------HHHH---------------
    fdda4954fe6853fbe7f5f7c4208d34ff

    error.log:

    ========================[ Apr 24, 2003 2:44 PM ]========================
    ERROR: [001.001] {trajtools.c, line 3480} Unable to open trajectory distribution file 9ks53vel_protein_117.trj
    FATAL ERROR: [002.003] {foldtrajlite2.c, line 4375} Unable to read trajectory distribution, please create a new one

    Any ideas how to continue without deleting the filelist.txt and starting from scratch?

    Thanks in advance.

  26. #66
    Ancient Programmer Paratima's Avatar
    Join Date
    Dec 2001
    Location
    West Central Florida
    Posts
    3,296
    Decided on a whim to run the beta on an old 400MHz plodder at work. Came in this morn to find that it had spent 20 hours (!) trying to get over the hump on a particularly sticky spot. On another whim, I stopped and restarted the client. The laxness levels immediately shot way up and it took off processing. My question is, why did I have to stop & restart the client manually for this to happen? Seems like there should be some automatic bail-out after a "reasonable" amount of time/number of attempts.

    So far, this "beta" is looking WAY too fragile to put into a production environment.

  27. #67
    Originally posted by Paratima
    Decided on a whim to run the beta on an old 400MHz plodder at work. Came in this morn to find that it had spent 20 hours (!) trying to get over the hump on a particularly sticky spot. On another whim, I stopped and restarted the client. The laxness levels immediately shot way up and it took off processing.
    A coupe of my machines seem to have also gotten "perma-stuck" so I decided to stop the client and start it again. After that it carried on its merry way. Not sure what happened in both cases.

    Jeff.

  28. #68
    Just so you know, very little changed between this beta and the previous couple, except in the algorithm itself. Any problems you are experiencing are not new ones. It sounds like the main problem is some people having trouble quitting and restarting again so Im not sure why since this seemed to be fine up until now. I'll take a look and try to reproduce some of these things but in the cases where only one of you / one machine is experiencing the problems, please verify that it is really the software and not the machine (after all I know some of you do some pretty crazy things to your boxes to 'optimize' them)

    Pointwood: I saw your request for a benchmark. Right now we have more important things to focus on but we will consider adding something like this in the future.
    Howard Feldman

  29. #69
    Junior Member
    Join Date
    Feb 2003
    Location
    Alphen aan den Rijn, NL
    Posts
    8
    Hi Howard,

    I didn't change anything on my computer. I just installed Beta 7 into the folder. Haven't had any problems with Beta 6.
    It was running fine and when I came back about one hour later I had the problems I wrote about earlier.

    Sincerily,

    Leon Helmink

  30. #70
    25/25Mbit is nearly enough :p pointwood's Avatar
    Join Date
    Dec 2001
    Location
    Denmark
    Posts
    831
    Dr. Fist - thanks for the reply. Yes, this is of course a low priority feature
    Pointwood
    Jabber ID: pointwood@jabber.shd.dk
    irc.arstechnica.com, #distributed

  31. #71
    Senior Member
    Join Date
    Jan 2003
    Location
    North Carolina
    Posts
    184
    Originally posted by Grumpy
    Ok Howard, I have a good one for you. My RMS scores are getting bigger now that I have passed Gen 150. And bigger and bigger. At this rate they will be back at the Gen 0 figures . Is this meant to happen on 2 separate computers..I hate coincidence

    :bs:
    It looks like samples for the next-generation structure are not being taken randomly about the current structure, but only from "one side". As long as samples from that side can lead to lower RMSD values, the structures can improve. But eventually the only path to lower RMSD is on the "other side", which isn't being sampled. Structures from the side that is being sampled lead away from the minima, and towards higher RMSD. Thus, each generation has a higher RMSD than the last, even though the program is selecting the lowest RMSD sample from the set of samples that it looks at.

    In other words, it looks like the sampling algorithm has a bug.

  32. #72
    Target Butt IronBits's Avatar
    Join Date
    Dec 2001
    Location
    Morrisville, NC
    Posts
    8,619

    client hangs

    IF, when you startup the client, using dfGUI, the lock file gets created, which fools it into thinking it's running.
    What has actually happened is foldtrajlite.exe starts up alright, but bombs out on not being able to reach the beta.server, and is waiting for someone to press enter in the DOS box that is hidden.
    The leaves foldtrajlite.exe in memory, never deletes the .lock file, which tells dfGUI that it's not running...
    ========================[ Apr 25, 2003 1:50 PM ]========================
    ERROR: [000.000] {ncbi_socket.c, line 910} [SOCK::s_Connect] Failed pending connect to beta.distributedfolding.org:80 (Unknown)
    ERROR: [000.000] {ncbi_connutil.c, line 526} [URL_Connect] Socket connect to beta.distributedfolding.org:80 failed: Unknown
    ERROR: [000.000] {ncbi_http_connector.c, line 117} [HTTP] Retry attempt(s) exhausted, giving up
    ========================
    The above error does not terminate the DOS session nor delete the .lock file and gives the appearance all is well.
    If you delete the .lock file, dfGUI then says, client is not running, but, if you bring up task manager, you will see foldtrajlite.exe is still there, which means it is running, just not doing any work.
    Maybe dfGUI could monitor the .lock file, and the date/time of progress.txt file to futher test if it is actually working.
    It would also be nice if foldtrajlite.exe would exit all the way out and remove the .lock file.
    Or, at the very least, remove the .lock file first, when it encounters an error like this ...

  33. #73
    Ancient Programmer Paratima's Avatar
    Join Date
    Dec 2001
    Location
    West Central Florida
    Posts
    3,296

    Re: client hangs

    Originally posted by IronBits
    It would also be nice if foldtrajlite.exe would exit all the way out and remove the .lock file.
    Or, at the very least, remove the .lock file first, when it encounters an error like this ...
    I would be happy with either of these solutions!

  34. #74

    Re: client hangs

    Originally posted by IronBits
    IF, when you startup the client, using dfGUI, the lock file gets created, which fools it into thinking it's running.
    What has actually happened is foldtrajlite.exe starts up alright, but bombs out on not being able to reach the beta.server, and is waiting for someone to press enter in the DOS box that is hidden.
    The leaves foldtrajlite.exe in memory, never deletes the .lock file, which tells dfGUI that it's not running...
    ========================[ Apr 25, 2003 1:50 PM ]========================
    ERROR: [000.000] {ncbi_socket.c, line 910} [SOCK::s_Connect] Failed pending connect to beta.distributedfolding.org:80 (Unknown)
    ERROR: [000.000] {ncbi_connutil.c, line 526} [URL_Connect] Socket connect to beta.distributedfolding.org:80 failed: Unknown
    ERROR: [000.000] {ncbi_http_connector.c, line 117} [HTTP] Retry attempt(s) exhausted, giving up
    ========================
    The above error does not terminate the DOS session nor delete the .lock file and gives the appearance all is well.
    If you delete the .lock file, dfGUI then says, client is not running, but, if you bring up task manager, you will see foldtrajlite.exe is still there, which means it is running, just not doing any work.
    Maybe dfGUI could monitor the .lock file, and the date/time of progress.txt file to futher test if it is actually working.
    It would also be nice if foldtrajlite.exe would exit all the way out and remove the .lock file.
    Or, at the very least, remove the .lock file first, when it encounters an error like this ...
    Hmm, now please clarify, is this while running in service mode or normal mode. Is it ONLY with DFGUI or is it without as well? It is something I can and should probably fix but I wanna be clear on when exactly it is a problem. Well I knwo when its a problem - when you have no console output. So I want to know in what situations you don't have a console to output the error to.
    Howard Feldman

  35. #75
    Target Butt IronBits's Avatar
    Join Date
    Dec 2001
    Location
    Morrisville, NC
    Posts
    8,619
    Hmm, now please clarify, is this while running in service mode or normal mode. Is it ONLY with DFGUI or is it without as well? It is something I can and should probably fix but I wanna be clear on when exactly it is a problem. Well I knwo when its a problem - when you have no console output. So I want to know in what situations you don't have a console to output the error to.
    I don't run anything as a service. dfGUI allows you to hide that console output while running with the -qt switch.
    If I was running it with out, I would be able to see the DOS box and the error and hit return to restart the client
    Problem comes from the .lock file not being deleted before the console error message shows up, or any error, thus tricking dfGUI into thinking it's still running.
    (I asked to have dfGUI do a date/time stamp test to help in monitoring it more closely)
    On the other hand, or since I said that, or now that I have said that et al.
    I want you to know that the DFbeta client it is running rock solid, when your servers are up of course.
    w2k on all but two mandrake 9.+ boxes (1 server 1 client).
    1 is an IIS Web/database Server, 3 are duals (careful not to reveal too many secrets ) a few are 'regular' user computers, where we get mail, play games etc. and the client is always running no matter what's going on, the rest are dedicated crunchers... all are running the command line client with dfGUI (except for the 2 *nix boxes of course) and we are using a myriad of mobo/video/hdd/memory combinations, all running the DFbeta client.
    Great job so far!

  36. #76
    OCworkbench Stats Ho
    Join Date
    Jan 2003
    Posts
    519
    Lol, a bug that makes the RMSD go higher My other client reached the same Generation approximately as the others..around the 140 region...up she went and still going up. 3 different Clients doing this at the same point, I smell a rat

    O, that would be me, senior Lab Rat ..

    My best RMSD Client produced 5.90 by gen 130...by Gen 140 it was above 6.00..now gen 235 and way beyond 9.00





    I am not a Stats Ho, it is just more satisfying to see that my numbers are better than yours.

  37. #77

    Re: client hangs

    Originally posted by IronBits
    Maybe dfGUI could monitor the .lock file, and the date/time of progress.txt file to futher test if it is actually working.
    I could probably add that feature but I don't know how useful it would be. I'm not sure what would happen if people choose -g 0 to disable the progress.txt file with the new beta (I guess is still doesn't create a progress.txt file)?

    Depending on what option the user has selected (-g x) and how fast their machine is, it could be 10, 20, 40 minutes before the progress.txt file gets updated. I'm not sure how to get dfGUI to figure out if the progress.txt file hasn't update because the client has stopped/crashed/never started or the client is just stuck somewhere on a protein and hasn't reached the magic # yet to update the progress.txt file.

    Ideally, it would be nice if the foldtrajlite client got to a state where it knew it wasn't going to run any further if it could remove the .lock file. I can see it being useful to pause and wait for someone to hit enter in case they launched it from a shortcut or batch file where the window would close and they would never see the message otherwise.

    Jeff.

  38. #78
    Target Butt IronBits's Avatar
    Join Date
    Dec 2001
    Location
    Morrisville, NC
    Posts
    8,619
    Good points Digital Parasite - that's a sticky wicket alright.

    Howard, could you update the .lock file each time you update the progress.txt?
    If -g 0 is being used, then it would be disabled.

    or another command line switch -a 5 (a=active and updates the .lock date/time stamp every 5 minutes

    We can sure use something like that to monitor the client more closely.
    I would still like to see the .lock file removed, before an abrupt client stoppages and error messages output to the DOS box.
    I would like to see the DOS box just go away, with the errormsg written to the log file.
    You could put the EXIT errorlevel in the error.log
    errorlevel=1 - BAD RAM
    errorlevel=2 - progress.txt file corrupt ~and whatever else you wanted to put in there
    errorlevel=3 - filelist.txt bad-renamed to filelist.tx1 (rotate thru 0-9)
    errorlevel=9 - IronBits is full of it today

    or/and, when an ERROR is encountered, remove .lock file and EXIT with the proper and pre-defined ERRORLEVEL.
    Then we can trap it in the foldit.bat file and actually attempt to do something about it to recover from the error.
    Puts the monkey back on us.

  39. #79
    Originally posted by IronBits
    Howard, could you update the .lock file each time you update the progress.txt?
    That would cause two problems. First, you would lose the ability to determine how long the DF client had been running (and when it started). And it wouldn't buy you anything because you are relying on when the progress.txt file is updated (and thus it could take 10, 40, 60 minutes) and you still wouldn't know how long to wait before you deem the client not running.

    Jeff.

  40. #80
    Target Butt IronBits's Avatar
    Join Date
    Dec 2001
    Location
    Morrisville, NC
    Posts
    8,619
    ok, scratch the -a .lock update idea...
    Back to plan B.
    Remove the .lock file first on every error that stops the client
    Exit the DOS box, no screen output (user needs to check error.log)
    Exit with a real ERRORLEVEL so we can trap it
    if errorlevel=0 do nothing - all is well
    if errorlevel=1 do 'something'
    if errorlevel...
    if errorlevel=254 call Howard ASAP
    if errorlevel=255 do 'make lots-O-noise' you found an RMS =0

Page 2 of 3 FirstFirst 123 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •