PDA

View Full Version : Error crunching generation 251



AMD_is_logical
07-17-2003, 03:00 PM
I got an "unable to open trajectory" error while crunching generation 251 -- that's right, generation 251. :eek:

Here is the entry from the error.log file:

ERROR: [001.001] {trajtools.c, line 3480} Unable to open trajectory distribution file handle_protein_251.trj
FATAL ERROR: [002.003] {foldtrajlite2.c, line 4700} Unable to read trajectory distribution, please create a new one

And then the client sat there, presumably asking for an Enter to continue (there's no monitor or keyboard on these nodes).

Here is filelist.txt:

./fold_5_handle_0_handle_protein_249.log.bz2
./handle_5_handle_protein_249_0000001.val
./fold_5_handle_5_handle_protein_250.log.bz2
./handle_5_handle_protein_250_0000006.val
CurrentStruc 5 1 126 251 1 0 10000000.000 10000000.000 -10000000.000 0.000 0.000 1.700 3.200 2690.317 ---------------------HHHHHHHHH-------HHHHH----------------HHHHH----------------HHHHH--HHHH------
db0ce216386579ade69f3485fee79457

This was on a linux node (single processor Athlon) with the following switches:

./foldtrajlite -f protein -n native -rt -if -qt -p0 -g5

I have a number of these nodes going 24/7, and other than this one glitch they've been trouble-free.

AMD_is_logical
02-24-2004, 04:49 PM
I still get hit with this bug every now and then. I've had it twice with the current protein.

If I copy in a .trj file from elsewhere I can get the client to crunch generation 251, but the server won't accept it. :p

Brian the Fist
02-25-2004, 12:21 PM
Could you possibly provide a detailed explanation of how to reproduce this since you seem to have it nailed down? Thanks in advance!

AMD_is_logical
02-26-2004, 12:54 AM
Originally posted by Brian the Fist
Could you possibly provide a detailed explanation of how to reproduce this since you seem to have it nailed down? Thanks in advance! OK, I think I have a way to reproduce it. (I am using Linux, BTW.)

Crunch until near the end of generation 250 and stop before structure 100 is done. The filelist.txt file should now have a line like:

CurrentStruc 6 100 134 250 1 41 66.049 -1877.919 1703.723 -36.104 61679932.000 2.950 5.700 88562.391

Note that the second number is 100 and the fourth is 250, meaning that the client will start crunching structure 100 of generation 250 when it's restarted.

Now start the client with the flags -if -g0 . It will generate structure 100, then do the minimize for generation 250, then start generating structures for generation 0 of the next set. With the client still running, open another terminal window and look at the CurrentStruc line of filelist.txt. What I got was:

CurrentStruc 6 1 134 251 1 0 10000000.000 10000000.000 -10000000.000 0.000 0.000 2.900 5.600 77010.773

Notice that it calls for structure 1 of generation 251. These are bad values. If you kill -9 the client and restart it, it will try to do generation 251, but it won't be able to find the .trj file. (There seems to be a window of vulnerability where even a normal shutdown can cause the client to leave this bad filelist.txt file.)

bwkaz
02-26-2004, 06:39 PM
Well, don't do that then! ("that" being kill -9 the client at the start of gen 0)

:p

(Yes, I did read your "window of vulnerability" comment, this is mostly joking. ;))

Brian the Fist
02-27-2004, 12:58 PM
Excellent! We will have a look and hopefully fix it now :thumbs: :thumbs:

AMD_is_logical
05-04-2004, 06:26 PM
I just had this happen again. Didn't it ever get fixed? Or didn't the fix get into the current client? :confused: