Results 1 to 18 of 18

Thread: RAM suddenly not up to snuff??

  1. #1
    Member
    Join Date
    Oct 2002
    Location
    southeastern North Carolina
    Posts
    66

    RAM suddenly not up to snuff??

    just started getting messages suggesting faulty RAM...
    RLEUnpack failed; size= 10926; should=400*251...

    box is my DVD player- nothing else being used-- ever.. how good does my RAM need to be, all of a sudden???
    , but initial and new RAM chip both give same results;
    reinstalled [ downloaded] client to make sure I hadnt scrambled something..]

    ... was watching a flick and crunching, with this result...[ would let me upload/ wont let me crunch...any ideas ???
    " All that's necessary for the forces of evil to win in the world is for enough good men to do nothing."-
    Edmund Burke

    " Crunch Away! But, play nice .."

    --RagingSteveK's mom


  2. #2
    Member
    Join Date
    Oct 2002
    Location
    southeastern North Carolina
    Posts
    66

    just saw this..

    http://www.free-dc.org/forum/showthr...&highlight=RAM
    did a clean install, but not from generation 0-- will try that...

    [minutes later];; problem seems to be resolved with brand new 100% clean install, beginning with gen 0..
    Last edited by RaginSteveK; 12-24-2003 at 08:41 AM.
    " All that's necessary for the forces of evil to win in the world is for enough good men to do nothing."-
    Edmund Burke

    " Crunch Away! But, play nice .."

    --RagingSteveK's mom


  3. #3
    I just got the same thing:
    Thu Jan 15 20:20:42 2004 FATAL ERROR: [023.024] {trajtools.c, line 2637} RLEUnPack failed, size=3, should=400*400 - likely this is caused by overclocked or faulty RAM chips, please test your RAM

    This is the text client on WinXP and I am not overlocking at all. The RAM passes the Prime95 torture tests AND memtest86.

    Howard: Is this an RLEUnPack problem on the .val.bz2 files?

    I tried re-installing a new fresh copy of the DF client over top of what I have now, didn't help. Once I deleted filelist.txt to force it to start at gen 0 again, the problem went away and now the DF client doesn't complain.

    I have kept all the relevant files if you are interested.

    Jeff.

  4. #4
    This error refers to the most recent .trj file being corrupt. It may even be size zero. Probably caused by a system crash/program kill at an inopportune time.
    Howard Feldman

  5. #5
    Junior Member Nanobot's Avatar
    Join Date
    Mar 2002
    Location
    Nottingham, UK
    Posts
    8
    Originally posted by Brian the Fist
    This error refers to the most recent .trj file being corrupt. It may even be size zero. Probably caused by a system crash/program kill at an inopportune time.
    I have had this twice on two different machines, both duals and in both cases the second service. I have not restarted the process after the second crash so I still have the files if you want them. There was no system failure or stopping of the program as the machines were only being used to run DC projects.

  6. #6
    Junior Member Nanobot's Avatar
    Join Date
    Mar 2002
    Location
    Nottingham, UK
    Posts
    8
    I think this may be due to a memory leak. I have been running the client on some machines which have multiple language support. On leaving the machines over the weekend I have logged off. On returning on the Moday both machines insistes that the logon should be in French, even though the main language the previous week was English. On logging on the default language was English but if I looged out of either machine it insisted that the default language was French, including the keyboard layout.

    HTH

  7. #7
    Originally posted by Nanobot
    I have had this twice on two different machines, both duals and in both cases the second service.
    for me the EXACT same thing, on a "dual" (p4 with HT) and always the 2nd client (or at least the same client dir).

  8. #8
    I have had this problem on the second instance of the application running on a Dual 2 GHz G5 Mac so the problem is cross-platform. I am not overclocked. I checked my RAM and it was good. And the crash occurred in the middle of a run.

  9. #9
    As has been mentioned before, this error means that the latest trajectory distribution file (.trj) is corrupted. It could be related to your RAM (faulty or being used up by other applications) as well as some sort of crash on your system.

    If you are running on a dual machine, make sure you have your temp directories separated as well. You can simply add it you the foldit script, as follows:
    set DFPTEMP=/distribfold1/TEMP
    set DFPTEMP=/distribfold2/TEMP
    where distribfold1 and distribfold2 are the installation directories. Make sure to create the different TEMP directoried first.
    Elena Garderman

  10. #10
    "As has been mentioned before, this error means that the latest trajectory distribution file (.trj) is corrupted. It could be related to your RAM (faulty or being used up by other applications) as well as some sort of crash on your system.

    If you are running on a dual machine, make sure you have your temp directories separated as well."

    I'm running a dual G5 Mac with your PowerPC/Darwin distribution. As far as I know, it doesn't need temp directories, merely that the path to each instance be different and I've run two instances for several weeks without difficulty using that method.

    The failure of one of my instances occurred in mid-run. I checked the RAM and it was fine. I downloaded a complete new application from you, obtained a new work unit and the problem continued. I doubt it was corruption at my end. I'm from Toronto.

  11. #11
    Senior Member
    Join Date
    Oct 2003
    Location
    an Island off the coast of somewhere
    Posts
    540
    I had three clients crash over the weekend with the same error, on three different boxes.

    I have multiple clients running on W2K server - one for each CPU.
    Each client runs from it's own folder, with it's own TEMP space.

    These are very high-end servers, with excellent memory that has exhibited no other problems. The machines have had no crashes or other failures - the remaining clients on the boxes each kept crunching. They have at least 2GB of RAM in each box.

    I *seriously* doubt that this many people would start to have memory problems all at once. It would be a remarkable coincidence.

    I think the project needs to look elsewhere rather than blaming bad memory.


    willy1





    0-6 12-9 11-3 11-3 0-8 1

  12. #12
    Originally posted by Stardragon
    As has been mentioned before, this error means that the latest trajectory distribution file (.trj) is corrupted. It could be related to your RAM (faulty or being used up by other applications) as well as some sort of crash on your system.
    if on all kinds of different platforms, OS'es and (especially dual)machines the same client crashes over and over, could't it just that maybe, maybe there is something wrong with the client?

    There are numerous ppl with a dual setup, be it actually two cpu's or HT, that have to deal with random crashes of one of the clients, and yes, we did setup the tempdirs.

    I've posted a quite elaborous discription of my (home)system and attached somefiles, saved the complete dir's for weeks but nobody wants them.

    so with all due respect, since this distributed computing is just a means to your end, could you maybe just give these kinds of posts - from users who do this only for competition and science - a bit more serious attention instead of waving it away on our hardware/system?

    Taking into consideration the serverproblems of the last few days, the least of your worries should be OUR hardware

    So again, a quick summary of the error:
    • on dual setups
    • on Windows, Linux and MacOSX
    • always the same (2nd) client
    Last edited by Escrimador; 03-29-2004 at 04:53 PM.

  13. #13
    We will try to reproduce the error on our end. If you indeed have saved folders from this crash, please upload them to ftp.blueprint.org/incoming, and send a descriptive e-mail to trades@mshri.on.ca notifying us of your upload.

    Are there any errors appearing in the error log immediately before the RLE Unpack error?
    Elena Garderman

  14. #14
    Senior Member
    Join Date
    Oct 2003
    Location
    an Island off the coast of somewhere
    Posts
    540
    Thu Mar 25 17:05:16 2004 ERROR: [010.003] {taskapi.c, line 1218} [ReadServerResponse] Timeout waiting for response, got 0 chars.
    Thu Mar 25 17:05:16 2004 ERROR: [000.000] {foldtrajlite2.c, line 4933} Error during upload: NO RESPONSE FROM SERVER - WILL TRY AGAIN LATER
    Thu Mar 25 17:35:23 2004 ERROR: [010.003] {taskapi.c, line 1218} [ReadServerResponse] Timeout waiting for response, got 0 chars.
    Thu Mar 25 18:02:53 2004 ERROR: [010.003] {taskapi.c, line 1218} [ReadServerResponse] Timeout waiting for response, got 0 chars.
    Thu Mar 25 18:02:53 2004 ERROR: [000.000] {foldtrajlite2.c, line 4933} Error during upload: NO RESPONSE FROM SERVER - WILL TRY AGAIN LATER
    Thu Mar 25 18:28:22 2004 ERROR: [000.000] {foldtrajlite2.c, line 4933} Error during upload: NO RESPONSE FROM SERVER - WILL TRY AGAIN LATER
    Thu Mar 25 19:20:29 2004 ERROR: [010.003] {taskapi.c, line 1218} [ReadServerResponse] Timeout waiting for response, got 0 chars.
    Thu Mar 25 19:25:29 2004 FATAL ERROR: [023.024] {trajtools.c, line 2637} RLEUnPack failed, size=5604, should=400*20 - likely this is caused by overclocked or faulty RAM chips, please test your RAM


    willy1





    0-6 12-9 11-3 11-3 0-8 1

  15. #15
    Code:
    ========================[ Mar 23, 2004 12:22 AM ]========================
    Starting foldtrajlite built Jan 12 2004
    
    ========================[ Mar 23, 2004  9:44 AM ]========================
    Starting foldtrajlite built Jan 12 2004
    
    ========================[ Mar 25, 2004  9:50 AM ]========================
    Starting foldtrajlite built Jan 12 2004
    
    ========================[ Mar 26, 2004  9:31 AM ]========================
    Starting foldtrajlite built Jan 12 2004
    
    ========================[ Mar 26, 2004 10:49 AM ]========================
    Starting foldtrajlite built Jan 12 2004
    
    ========================[ Mar 26, 2004 11:26 AM ]========================
    Starting foldtrajlite built Jan 12 2004
    
    ========================[ Mar 29, 2004  5:24 PM ]========================
    Starting foldtrajlite built Jan 12 2004
    
    ========================[ Mar 29, 2004  5:33 PM ]========================
    Starting foldtrajlite built Jan 12 2004
    
    ========================[ Mar 30, 2004 10:00 AM ]========================
    Starting foldtrajlite built Jan 12 2004
    Tue Mar 30 12:41:43 2004 FATAL ERROR: [023.024] {trajtools.c, line 2637} 
    RLEUnPack failed, size=3, should=400*400 - likely this is caused by overclocked or 
    faulty RAM chips, please test your RAM
    
    ========================[ Mar 31, 2004 10:15 PM ]========================
    Starting foldtrajlite built Jan 12 2004
    Wed Mar 31 22:15:29 2004 FATAL ERROR: [023.024] {trajtools.c, line 2637} 
    RLEUnPack failed, size=3, should=400*400 - likely this is caused by overclocked or 
    faulty RAM chips, please test your RAM
    
    ========================[ Mar 31, 2004 10:34 PM ]========================
    Starting foldtrajlite built Jan 12 2004
    
    ========================[ Apr 1, 2004  9:53 AM ]========================
    Starting foldtrajlite built Jan 12 2004
    Thu Apr 01 09:53:09 2004 FATAL ERROR: [023.024] {trajtools.c, line 2637} 
    RLEUnPack failed, size=3, should=400*400 - likely this is caused by overclocked or 
    faulty RAM chips, please test your RAM
    
    ========================[ Apr 2, 2004 10:45 AM ]========================
    Starting foldtrajlite built Jan 12 2004
    Fri Apr 02 10:45:24 2004 FATAL ERROR: [023.024] {trajtools.c, line 2637} 
    RLEUnPack failed, size=3, should=400*400 - likely this is caused by overclocked or 
    faulty RAM chips, please test your RAM

  16. #16
    and again on a P4-HT 2.4 GHz, not o/c, Win XP Pro

    Code:
    ========================[ Apr 8, 2004  4:41 PM ]========================
    Starting foldtrajlite built Jan 12 2004
    
    ========================[ Apr 8, 2004  4:42 PM ]========================
    Starting foldtrajlite built Jan 12 2004
    
    ========================[ Apr 8, 2004  4:46 PM ]========================
    Starting foldtrajlite built Jan 12 2004
    
    ========================[ Apr 8, 2004  4:47 PM ]========================
    Starting foldtrajlite built Jan 12 2004
    
    ========================[ Apr 8, 2004  4:48 PM ]========================
    Starting foldtrajlite built Jan 12 2004
    
    ========================[ Apr 8, 2004  5:03 PM ]========================
    Starting foldtrajlite built Jan 12 2004
    Thu Apr 08 19:53:17 2004 FATAL ERROR: [023.024] {trajtools.c, line 2637}
    RLEUnPack failed, size=8470, should=400*162 - likely this is caused by overclocked 
    or faulty RAM chips, please test your RAM

  17. #17
    Any hope that the bad RAM problem on multiple cpus will be solved in the April 20 protein release?

  18. #18
    We have not solved this yet, sorry. the number in the error message is important. If it says size=3, should be 400*400, this usually means the .trj file is missing or empty for some reason. If it is some other number, it almost certainly IS a result of faulty RAM. The .trj is there and readable, but is essentially failing a consistency check

    I suspect the 2-CPU problems may be a result of Mutex's and Semaphores in the NCBI toolkit which we are using. We have been unable to pinpoint it just yet though - it is an extremely difficult problem to debug, as it occurs apparently at random and relatively infrequently. Its on our list of things to fix of course!
    Howard Feldman

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •