PDA

View Full Version : Client stuck for over a day now



BuddhaMan
07-26-2003, 02:15 PM
On my only dual proc machine one of the clients is totally stuck on a generation for over a day now. Here's the some of the output from dfGUI that shows the client restarting alot with no progress being made.


7/25/2003 11:58:50 AM Protein Size: 96
Client Status: Appears to be running.
Current Generation: 44 Generations Buffered: 0
Structs Complete: 1 Structs Remaining: 49
Best Energy: 10000000.000 Client Run Time: 0:00:05:00
Time To Complete: Bench Run Time: 0:00:05:10
Prev Generation Time: 0:00:13:30 Avg Generation Time: 0:00:13:11
Structs/Day: -
# Restarts: 94


7/26/2003 10:38:51 AM Protein Size: 96
Client Status: Appears to be running.
Current Generation: 44 Generations Buffered: 0
Structs Complete: 1 Structs Remaining: 49
Best Energy: 10000000.000 Client Run Time: 0:00:02:45
Time To Complete: Bench Run Time: 0:00:02:30
Prev Generation Time: 0:00:13:30 Avg Generation Time: 0:00:13:11
Structs/Day: -
# Restarts: 344

Progress.txt shows -1 generations buffered (???)

Building structure 1 generation 44
49 until next generation
-1 generations buffered
Best Energy so far: 10000000.000

filelist.txt contains the following:

CurrentStruc 0 1 127 44 1 0 10000000.000 10000000.000 -10000000.000 0.000 0.000 0.950 1.700 330.623 ------------------HHHHHHHHHHH------HHHHH----------------HHHHHH-----------------HHHH-------------
2d589eac609cf27732ca09af52a5a7e8


Looking at my other machines they all have some files listed at the beginning of filelist.txt

The following files are in my directory (no other result files present):

fwigarf2_0_fwigarf2_protein_40_0000011_min.val
fwigarf2_0_fwigarf2_protein_43_0000020_min.val.bz2
fwigarf2_protein_44.trj

<sigh>I just had the other client on this machine start back over again at gen 0 from gen 67. (Maybe because I was looking at the filelist.txt file for it and it couldn't be written to by the client so it started over.) Running the client on this machine isn't worth the constant hassle and electricity cost so I'm considering removing it from the rotation.

Any ideas on the above "stuck" client is welcome. Thanx in advance.

BuddhaMan
07-26-2003, 04:18 PM
Nevermind. Said machine is being moved to another DC project.

LagosAzul
07-27-2003, 10:46 AM
I am actually having the same problem.

------------------------------------------------------------
Distributed Folding Windows dfGUI v3.1 Benchmark

Current Generation: 109
Sample Size : 0 structures over 141 seconds.
Protein Size: 96AA

Structures Per Hour : -
Structures Per Day : -

OS : Windows XP MHz: 1695
CPU: Intel(R) Pentium(R) 4 CPU 1700MHz
Client Switches: -rt -g 1
------------------------------------------------------------

Building structure 1 generation 109
49 until next generation
0 generations buffered
Best Energy so far: 10000000.000


Basically while calculating residue, it gets to 20, and just stops. It will try alternate comformation over and over, never making any gains. Seems like the fact that no Gens are buffered, is keeping it from advancing.

Any ideas about how to fix this problem?

Brian the Fist
07-28-2003, 12:09 AM
This is not a problem, this is intended behavior. You should observe the 'laxness' parameters increase slowly. eventually it will get free. Remember, a watched pot never boils, or so they say.

tpdooley
07-28-2003, 12:48 AM
You're not running in quiet mode, LagosAzul. Under WinXP we've noticed about 30% of the cpu time spent on processes that disappear when in quiet mode. Thus, you end up with higher performance that way (10-15% more being produced?)
Even so, I had a machine that took over a day to get past a single generation during the beta; and another of the beta testers spent 2 days. And then it raced through to gen 250, and started over..
If you have several machines, then you can see the law of averages working a bit better. Sometimes the faster machines are much faster than the slow ones.. and sometimes the slower ones run faster than the fast machines. (Run more machines!!! ) :)

LagosAzul
07-28-2003, 07:50 AM
Thanks for the reply's, I must have switched out of quiet mode when I was messing with settings trying to find a fix. I do usually run in quiet. Well good to know that it was not a bug.

ps. I'm trying to get 2 other machines going...I'm not that persuasive though ;)