PDA

View Full Version : 250 generations disappeared...



jonnyw
12-07-2003, 05:56 AM
I went to bed last night with the client at approx gen 240 (with generations 20 - 240 buffered), and woke up this morning to find my client with only 25 generations buffered :confused:

The client is set to have disabled net access , and use extra ram.

the latest error in the error log is:



========================[ Dec 7, 2003 2:56 AM ]========================
Starting foldtrajlite built Nov 3 2003
Sun Dec 07 08:38:23 2003 ERROR: [002.002] {buildit.c, line 256} Cannot set co-ordinates


Where have all my buffered generations gone?

They defo weren't uploaded (i checked my stats and theres no big dump there)

I'm really not happy that i've left my pc on for nearly a week, to lose all these points. :swear:

Are there any fixes?

The directory does hold a couple of hundred .log.bz2 and .val files but they seem to be all over the place (i.e not completely sequential going from from 23-245, with some gens missing).

tpdooley
12-07-2003, 08:26 AM
Make a backup of the directory.

Then do a search for messages describing what to do with a missing filelist.txt file. (I remember others being told to look for filelist.tmp? ) If you're lucky, you should be able to upload most of the generations in the directory from the backup file; by renaming it filelist.txt (although you want to keep the current filelist.txt file to upload the rest of the files.)

You might want to download the november 20th foldtrajlite.exe file to get rid of the "can't set coordinates" errors.

And on Monday, Elena will hopefully give you directions on uploading the whole directory to them, in case it's something they want to look at.

Anyone have links to the discussions on tracking down the missing filelist tmp files?

Good luck..

Ned
12-07-2003, 08:28 AM
jonnyw

Sounds like you had a hardware error of some kind. Look back in your error.log and you'll probably find some file access error of some kind. Once that happens, that run is usually toast.

Why wait so long to do an upload???? With all the heat generated with the flat out computing, something is bound to happen... And hardware is hardware, it fails once in a while and this system does a lot of file accessing.

Ned

jonnyw
12-07-2003, 08:34 AM
there isn't a flielist.tmp in the directory :(

As for why i leave it so long before uploading, I've heard lots of reports of servers going down, and peoples pc's being left on doing nothing unable to continue folding, waiting for the server to go back up to continue.

I didn't want to risk this happening :(


Any other ideas (directory has been backed up)?

iggy
12-07-2003, 09:46 AM
Happened to me only twice - for no apparent reason no-net (I buffer all trhe time) system did the same thing - from about 200 buffered gens went down to 10. When trying to upload at leat that, filelist.txt went missing, and was nowhere to be found. Had to start from scratch.

You may wish to try latest test client - in the past three days I didn't find anything unusual, and seems to be more stable then before.

Check this topic (http://www.free-dc.org/forum/showthread.php?s=&threadid=4941) for the Elena's post to see where to upload and how to contact them.

Someone should put these details as sticky...

tpdooley
12-07-2003, 06:24 PM
Here's something Elena posted on Dec 2nd 2003 in the Tech Support topic "filelist.txt ... erased":

----------
Check the directory for "filelist.txt.tmp" and if you find this file, simply rename it to filelist.txt.

Also, if you have the directory as it was at the point of the missing filelist, please zip/tar it up and upload it as anonymous to ftp.mshri.on.ca/incoming. If you are able to upload the zipped directory, please notify us by e-mailing trades@mshri.on.ca so we can take a closer look at what happened.


__________________
Elena Garderman

-----------

Miraculously, it touches on both topics.. :) Hope this helps..

(Howard's mentioned searching the whole computer for filelist.txt.tmp in the past - hopefully you'll be able to find the one that's missing from the night of the 240 gens.. that has an earlier date than the remaining 25 gens the next day.)

jonnyw
12-07-2003, 06:53 PM
no signs of filelist.txt.tmp on the whole computer :(

Brian the Fist
12-08-2003, 10:41 AM
This is the first I have heard of this problem. Can any one reproduce the problem step-by-step or does it seem to be random? Can you describe more clearly if you are using any 'front-ends', what your OSes are, if you have installed as a service, what flags you are using, etc etc

jonnyw
12-08-2003, 10:58 AM
running WinXP, P4 2.53, 512MB Ram

using dfGUI and dfMon (which only uses filelist.txt in a read only mode, and cannot send any commands to the client).

Client not running as a service. setup as below


http://www.khaos.plus.com/distribfold/config.gif


I definitely can't re-create it as I went to bed meaning to upload in the morning, and when I woke up dfGUI was showing only 25 gens buffered.

I can zip the file for you and upload it to your ftp if u want. Its ~16MB zipped


p.s

just found this error, logged a few days earlier if it makes a difference



========================[ Dec 5, 2003 9:00 AM ]========================
Starting foldtrajlite built Nov 3 2003
Sat Dec 06 00:01:12 2003 FATAL ERROR: [003.001] {foldtrajlite2.c, line 5484} Unable to fetch Biostruc


although it definitely carried on folding after this

Brian the Fist
12-09-2003, 10:23 AM
This error is on the Known Bugs page of the web site - please look there for an explanation. It is quite possible this is the cause of your problem. How could the program restart after a fatal error, does dfGUI automatically restart it if an error occurs? The program does not restart on its own..

jonnyw
12-09-2003, 10:50 AM
well i dont remember emptying the temp directory...although I could have.

Are the points well and truly lost then, or is there any way to save them?


EDIT:I may have noticed the client was stopped and restarted it myself before I went to bed.

It had been a long night so i don't really remember

Ned
12-10-2003, 08:03 AM
Not sure if this is possible in all instances, but could you post a flag in filelist.txt or in another file altogether when you have a fatel error to inhibit further processing of that set??? But still allow upload of usable data?

It seems that after some fatel errors, the system can be restarted, but subsequent data created is not usable (wasted effort!).

Ned:)

Brian the Fist
12-10-2003, 10:38 AM
We have tried our best to make the client recover from all possible fatal errors. We cannot possibly test all the things that could go wrong, so rely on you to tell us when things go wrong. Unfortunately, most of the problems are too vague for us to reproduce - often you cannot reproduce them either - and this makes it very difficult for us to fix. I'll say it again - if you can describe a reproducable way that filelist.txt gets messed up/data gets lost, we will me more than happy to fix it.

iggy
12-16-2003, 02:08 PM
Just had two systems with the lost work: One had empty filelist.txt and progress.txt, on another one the client said "...filelist.txt has been tampered with...". Two days of work disappeared... Both of the systems were running normaly, WinXP, I just noticed that they were not producing anything. The systems didn't crash, there is plenty of HD space and memory, and of course, there were no temporary files (or deleted ones) so that I could recover.

I'm sending both of the zipped directories - I'm not sure that will help, as there are no errors in error.log...

I prefer to buffer the work and send it when I have time, than to work online, but I still don't think those kind of problems should occur and to loose all the work unless tehre is HD crash. Maximum that should be lost is the current work (current generation), not the work that has already been done, as all of teh *.val and *.bz2 files are available. Maybe there is another way of dealing with filelist.txt - I can't help as I'm not programmer.