PDA

View Full Version : [NULL_Caption] FATAL ERROR: [000.000] Maketrj reported an error, cannot continue



jonnyw
01-27-2004, 06:18 PM
got the following messagae last week (didn't have time to post then)



[NULL_Caption] FATAL ERROR: [000.000] Maketrj reported an error, cannot conti
ue
Hit Return


anyway when I try to upload I get



uploading 1/584
data file checksum failed


running again now I just got


[NULL_Caption] FATAL ERROR: [001.008] Cannot open rotamer library (rotlib.bin.
bz2) - if the file is missing, please re-install the software
Hit Return


and running again got my original message.



Any suggestions, this is really anoying cos there's 612,000 points buffered. :(

Have I lost them all? Is there anything I can do?

There doesn't appear to be a filelist.txt.tmp anywhere.




:bang: :trash: :bang:


oh and my error.log is attached

bwkaz
01-27-2004, 06:47 PM
Does it help to extract the DF client to some other directory, then copy the newly-created rotlib.bin.bz2 file over top of the one in your DF directory?

Actually, you might want to rename the current file instead of overwriting it, just in case.

iggy
01-28-2004, 07:25 AM
In addition to what bwkaz said,

*.min.val error can be rectified by copying another existing *.min.val file over the one corrupted.

If you have another *.trj file on another system, copy it over the current (corrupt) and rename accordingly.

jonnyw
01-28-2004, 08:19 AM
So if I set up the client from scratch, and copy ove rthe *.trj and rotlib.bin.bz2 file over that may sort it?

And if this doesn't work if not i'm not sure how I could copy an existing *.min.val over into the directory, cos I thought each one was unique to a particular generation in a particular client. Or am I wrong?

jonnyw
01-28-2004, 09:51 AM
w00t, never mind, just did as was said above, and now it's now started uploading. Cheers guys :thumbs: :D

jonnyw
01-28-2004, 11:35 AM
spoke too soon :( :trash:

it stopped at 449 gens buffered left to upload.

Now when I try to run it again I get data file checksum failed

It then tries to fold and comes up with



[NULL_Caption] FATAL ERROR: [000.000] Maketrj reported an error, cannot contin
ue
Hit Return


again.

Error.log shows



========================[ Jan 28, 2004 4:28 PM ]========================
Starting foldtrajlite built Jan 12 2004
Wed Jan 28 16:28:48 2004 ERROR: [000.000] {foldtrajlite2.c, line 4785} File .\ks9ez3km_0_ks9ez3km_protein_202_0004030_min.val is corrupt, missing or has been tampered with; cannot continue - replace file and start again, or manually delete filelist.txt
Wed Jan 28 16:28:48 2004 ERROR: [000.000] {foldtrajlite2.c, line 4933} Error during upload: Data file checksum failed
Wed Jan 28 16:28:58 2004 FATAL ERROR: [000.000] {foldtrajlite2.c, line 3974} Maketrj reported an error, cannot continue


I've tried replacing the protein.trj and rotlib.bin.bz2 files again, but still get the same error.

Any ideas?

Could you please give more detail of your other solution iggy?

Would that purge upload list thingy work here?

iggy
01-28-2004, 12:43 PM
Purgeuploadlist won't help in this case.

You have file "*.protein_202_0004030_min.val" that is giving problems to server. Delete it - it is of no use. Now, get another *.min.val file, make a copy of it, and then rename it so that it have the same name as "*.protein_202_0004030_min.val" - double check name is correct (as stated in error log). You may try to upload now - use dfGUI Upload function, but be sure to unhide and unquiet the client beforehand, so that you know what is happening.

This error might occur few more times - it seems that somehow your disk or memory got slightly corrupted during writing of these corrupted files - not to worry too much if that system has been working OK till now.

Repeat procedure for any *.min.val corrupted file. It is not important what file will get renamed, for as long as it was originally another *.min.val file.

Now for the *.trj error. Working client names it as your_handle_protein_current_generation-1.trj. Check in your filelist.txt to see what is the current generation being worked on or check in your folding directory to see the real name of that file (lets suppose it is generation 201). Delete that file noting its name. Get another *.trj file from another working client put it in your problem directory, and rename it to your_handle_protein_200.trj (hope you get the idea).

That should do the job and the client should start crunching again, without any points being lost :)

Be sure to make backup beforehand. There is another error considering filelist.txt.tmp - this should be easily rectified by rebooting the system.

If you still have problems, zip up complete folding directory and let me know where to get it - I can look into it and let you know if I manage to do something about it.

Brian the Fist
01-28-2004, 05:46 PM
Based on the errors you are receiving, it sounds like you have either a damaged hard drive or damaged RAM (or both ?!) I suggest you run a thorough memory test (like memtest86) and also run Scandisk/Norton disk doctor/ etc to check for bad sectors. These are the only reasons the rotlib.bin.bz2 file (which is only ever read, not written to, by the program) could get messed up. If you find defective hardware, likely the data you generated is also corrupt (but the server will determine that as you try to upload it..)

jonnyw
01-28-2004, 06:03 PM
yep it's the ram alright.

Went around xmas day. Rang up to arrange for some new ram (under warranty), and am still waiting.

Is a real PITA cos the pc occasionaly resets itself, without warning, making a total cock up of distributed folding.

Cheers for the help everyone. Am just running through uploading my last set of 114 generations.

Thanks for the advice about fixing the instalation afterwards, but i think it wil just be easier to re-install the client.

Thanks again all :thumbs:

:D :) :D