Results 1 to 14 of 14

Thread: 250 generations disappeared...

  1. #1
    Senior Member
    Join Date
    Feb 2003
    Location
    wigan, uk
    Posts
    200

    Angry 250 generations disappeared...

    I went to bed last night with the client at approx gen 240 (with generations 20 - 240 buffered), and woke up this morning to find my client with only 25 generations buffered

    The client is set to have disabled net access , and use extra ram.

    the latest error in the error log is:

    ========================[ Dec 7, 2003 2:56 AM ]========================
    Starting foldtrajlite built Nov 3 2003
    Sun Dec 07 08:38:23 2003 ERROR: [002.002] {buildit.c, line 256} Cannot set co-ordinates
    Where have all my buffered generations gone?

    They defo weren't uploaded (i checked my stats and theres no big dump there)

    I'm really not happy that i've left my pc on for nearly a week, to lose all these points.

    Are there any fixes?

    The directory does hold a couple of hundred .log.bz2 and .val files but they seem to be all over the place (i.e not completely sequential going from from 23-245, with some gens missing).

  2. #2
    Senior Member
    Join Date
    Jul 2002
    Location
    Kodiak, Alaska
    Posts
    432
    Make a backup of the directory.

    Then do a search for messages describing what to do with a missing filelist.txt file. (I remember others being told to look for filelist.tmp? ) If you're lucky, you should be able to upload most of the generations in the directory from the backup file; by renaming it filelist.txt (although you want to keep the current filelist.txt file to upload the rest of the files.)

    You might want to download the november 20th foldtrajlite.exe file to get rid of the "can't set coordinates" errors.

    And on Monday, Elena will hopefully give you directions on uploading the whole directory to them, in case it's something they want to look at.

    Anyone have links to the discussions on tracking down the missing filelist tmp files?

    Good luck..
    www.thegenomecollective.com
    Borging.. it's not just an addiction. It's...

  3. #3
    Ol' retired IT geezer
    Join Date
    Feb 2003
    Location
    Scarborough
    Posts
    92

    Arrow Don't Wait

    jonnyw

    Sounds like you had a hardware error of some kind. Look back in your error.log and you'll probably find some file access error of some kind. Once that happens, that run is usually toast.

    Why wait so long to do an upload???? With all the heat generated with the flat out computing, something is bound to happen... And hardware is hardware, it fails once in a while and this system does a lot of file accessing.

    Ned

  4. #4
    Senior Member
    Join Date
    Feb 2003
    Location
    wigan, uk
    Posts
    200
    there isn't a flielist.tmp in the directory

    As for why i leave it so long before uploading, I've heard lots of reports of servers going down, and peoples pc's being left on doing nothing unable to continue folding, waiting for the server to go back up to continue.

    I didn't want to risk this happening


    Any other ideas (directory has been backed up)?

  5. #5
    7G - OCW iggy's Avatar
    Join Date
    Aug 2003
    Location
    London, UK
    Posts
    156
    Happened to me only twice - for no apparent reason no-net (I buffer all trhe time) system did the same thing - from about 200 buffered gens went down to 10. When trying to upload at leat that, filelist.txt went missing, and was nowhere to be found. Had to start from scratch.

    You may wish to try latest test client - in the past three days I didn't find anything unusual, and seems to be more stable then before.

    Check this topic for the Elena's post to see where to upload and how to contact them.

    Someone should put these details as sticky...
    Last edited by iggy; 12-07-2003 at 08:13 PM.

  6. #6
    Senior Member
    Join Date
    Jul 2002
    Location
    Kodiak, Alaska
    Posts
    432
    Here's something Elena posted on Dec 2nd 2003 in the Tech Support topic "filelist.txt ... erased":

    ----------
    Check the directory for "filelist.txt.tmp" and if you find this file, simply rename it to filelist.txt.

    Also, if you have the directory as it was at the point of the missing filelist, please zip/tar it up and upload it as anonymous to ftp.mshri.on.ca/incoming. If you are able to upload the zipped directory, please notify us by e-mailing trades@mshri.on.ca so we can take a closer look at what happened.


    __________________
    Elena Garderman

    -----------

    Miraculously, it touches on both topics.. Hope this helps..

    (Howard's mentioned searching the whole computer for filelist.txt.tmp in the past - hopefully you'll be able to find the one that's missing from the night of the 240 gens.. that has an earlier date than the remaining 25 gens the next day.)
    www.thegenomecollective.com
    Borging.. it's not just an addiction. It's...

  7. #7
    Senior Member
    Join Date
    Feb 2003
    Location
    wigan, uk
    Posts
    200
    no signs of filelist.txt.tmp on the whole computer

  8. #8
    This is the first I have heard of this problem. Can any one reproduce the problem step-by-step or does it seem to be random? Can you describe more clearly if you are using any 'front-ends', what your OSes are, if you have installed as a service, what flags you are using, etc etc
    Howard Feldman

  9. #9
    Senior Member
    Join Date
    Feb 2003
    Location
    wigan, uk
    Posts
    200
    running WinXP, P4 2.53, 512MB Ram

    using dfGUI and dfMon (which only uses filelist.txt in a read only mode, and cannot send any commands to the client).

    Client not running as a service. setup as below





    I definitely can't re-create it as I went to bed meaning to upload in the morning, and when I woke up dfGUI was showing only 25 gens buffered.

    I can zip the file for you and upload it to your ftp if u want. Its ~16MB zipped


    p.s

    just found this error, logged a few days earlier if it makes a difference


    ========================[ Dec 5, 2003 9:00 AM ]========================
    Starting foldtrajlite built Nov 3 2003
    Sat Dec 06 00:01:12 2003 FATAL ERROR: [003.001] {foldtrajlite2.c, line 5484} Unable to fetch Biostruc

    although it definitely carried on folding after this

  10. #10
    This error is on the Known Bugs page of the web site - please look there for an explanation. It is quite possible this is the cause of your problem. How could the program restart after a fatal error, does dfGUI automatically restart it if an error occurs? The program does not restart on its own..
    Howard Feldman

  11. #11
    Senior Member
    Join Date
    Feb 2003
    Location
    wigan, uk
    Posts
    200
    well i dont remember emptying the temp directory...although I could have.

    Are the points well and truly lost then, or is there any way to save them?


    EDIT:I may have noticed the client was stopped and restarted it myself before I went to bed.

    It had been a long night so i don't really remember

  12. #12
    Ol' retired IT geezer
    Join Date
    Feb 2003
    Location
    Scarborough
    Posts
    92

    Lightbulb Fatel Error Processing

    Not sure if this is possible in all instances, but could you post a flag in filelist.txt or in another file altogether when you have a fatel error to inhibit further processing of that set??? But still allow upload of usable data?

    It seems that after some fatel errors, the system can be restarted, but subsequent data created is not usable (wasted effort!).

    Ned

  13. #13
    We have tried our best to make the client recover from all possible fatal errors. We cannot possibly test all the things that could go wrong, so rely on you to tell us when things go wrong. Unfortunately, most of the problems are too vague for us to reproduce - often you cannot reproduce them either - and this makes it very difficult for us to fix. I'll say it again - if you can describe a reproducable way that filelist.txt gets messed up/data gets lost, we will me more than happy to fix it.
    Howard Feldman

  14. #14
    7G - OCW iggy's Avatar
    Join Date
    Aug 2003
    Location
    London, UK
    Posts
    156
    Just had two systems with the lost work: One had empty filelist.txt and progress.txt, on another one the client said "...filelist.txt has been tampered with...". Two days of work disappeared... Both of the systems were running normaly, WinXP, I just noticed that they were not producing anything. The systems didn't crash, there is plenty of HD space and memory, and of course, there were no temporary files (or deleted ones) so that I could recover.

    I'm sending both of the zipped directories - I'm not sure that will help, as there are no errors in error.log...

    I prefer to buffer the work and send it when I have time, than to work online, but I still don't think those kind of problems should occur and to loose all the work unless tehre is HD crash. Maximum that should be lost is the current work (current generation), not the work that has already been done, as all of teh *.val and *.bz2 files are available. Maybe there is another way of dealing with filelist.txt - I can't help as I'm not programmer.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •