Page 1 of 2 12 LastLast
Results 1 to 40 of 49

Thread: Beta 8 - release candidiate

  1. #1

    Beta 8 - release candidiate

    The next (and possibly final) beta is available at the usual location (see start of beta 7/6/5 threads). I have adjusted the algorithm one more time and have fixed some bugs with file uploading/tracking. To install, as usual, either overwrite an existing beta version, or download the 'full' client from the normal web site and unpack the beta overtop of that.

    I would request that while beta testing you do NOT use dfGUI please. Beta testing two pieces of related software simultaneously is just asking for trouble. I'd request that you refrain from testing dfGUI until we are sure all the major bugs from the client are worked out. As it is now, I am unable to distringuish true bugs from bugs caused (inadvenrtantly of course) by dfGUI. It will be to everyone's benefit if we ensure the client is fully stable and robust to abuse before dfGUI can be made the same. Please do not report any bugs in the beta if you are using dfGUI. Please DO report bugs otherwise.

    There may possibly be some situations still which result in an 'unable to find previous generation' type message so let us know if this occurs. Also, please remember when reporting a suspected bug, please post your full filelist.txt (x out your handle though if you wish), your full error.log (at least the relevant part up until the end of the file) and as clear a description of the problem as possible. Also important, please note all flags in the foldit script you have used, if any as these often play an important role in the bugs (since I cant possibly test all combinations of flags).

    Thanks for your continued co-operation and I suspect this may well be the last change to the algorithm before we try it on a much larger scale (i.e. release it).
    Howard Feldman

  2. #2
    Senior Member
    Join Date
    Jul 2002
    Location
    Kodiak, Alaska
    Posts
    432
    my beta machines aren't running dfGUI. The machine that has the corrupted copy of the beta client is running win98se with the critical update program from MS - and after being informed that there was a new update from MS, I may have installed a "critical update" in the middle of the beta 7 test. It passed through beta 1-6 with no problems.
    my 5 week MSCE class on Exchange 2000 is over tonight.. so I'll finally get to take a look at the corrupted directory and see if I can help isolate what is corrupted.
    I remention this since rsbriggs described something remotely similar. Perhaps others will also try testing out whether beta client 8 has problems with MS products or critical updates being installed in the background?

  3. #3
    Senior Member
    Join Date
    Jan 2003
    Location
    North Carolina
    Posts
    184
    I tried copying the beta8 files into the DF directories without removing the beta7 work. It didn't work, so I had to remove all work files.

    What I did and the errors I got were exactly the same as last time. See my first post in the beta7 thread for details.

  4. #4
    Ancient Programmer Paratima's Avatar
    Join Date
    Dec 2001
    Location
    West Central Florida
    Posts
    3,296
    Linux version came right up & ran fast. (See cat,scalded.)

    Windows text client on W98, when run w/o -qt is so slow, it's painful to watch. So I stopped watching. It spends so much time updating the top line, that drawing the ASCII art takes forever. Running now under -qt (sans dfGUI) and it's MUCH faster.

  5. #5
    Senior Member
    Join Date
    Mar 2002
    Location
    MI, U.S.
    Posts
    697
    Hmm... that "calculating energy" phase still takes ~50% CPU in kernel time...

    I guess I just have to bug people some more until I hear "yeah, that happens to me too" enough.

    Anyway, yeah, other than that, beta 8 seems to be working fine. I did notice something with beta 7 today, though -- I'd been running it since it was released, nonstop, and it somehow managed to consume over 120MB of swap. Which caused issues when I was running Rune (an Unreal Tournament engine video game; yes, I have the Linux version ) on the same machine -- Rune kept getting killed by the kernel out-of-memory handler to free up RAM and/or swap space. Restarting the beta 7 client fixed the problem -- after I did that, the swap that it was using dropped to like 4MB. The RSS size, both before and after restarting, was about the same (~80MB, and yes, I'm running with -rt).

    It almost acted like it was a very small memory leak (though I could be completely wrong here...), where a few bytes every iteration got left allocated, and then got swapped out when enough of them accumulated. But maybe not -- and maybe beta 8 fixes it, we'll see. Anyway, if I don't say anything else about this problem, consider it fixed.

  6. #6
    Ancient Programmer Paratima's Avatar
    Join Date
    Dec 2001
    Location
    West Central Florida
    Posts
    3,296
    Hey Howard, don't forget to clear the stats so's we can see how it's running!

  7. #7
    I'm seeing a strange bug now with beta8. I think I have seen this before but never really investigated it and someone else on my team also reported the same problem but he was running dfGUI at the time so we weren't sure if it was my bug or yours.

    Since I am not running dfGUI I guess you win.

    I have started 3 beta8 clients on XP, all running as services, all with the following service.cfg:
    service=2
    useram=1

    (Two are service=1 and one is service=2)

    It seems that the DF client is only updating progress.txt and filelist.txt at the beginning of each generation and never in between for two of those clients. The third client is working fine. The other strange thing is that filelist.txt is sometimes showing 2 generations and sometimes 1, but it is always reporting in progress.txt that it has 0 generations buffered. These are all files from the same client.

    progress.txt:
    Building structure 1 generation 50
    49 until next generation
    0 generations buffered
    Best Energy so far: 10000000.000

    filelist.txt:
    .\fold_0_XXXX_0_XXXX_protein_49.log.bz2
    .\XXXX_0_XXXX_protein_49_0000042.val
    CurrentStruc 0 1 126 50 1 0 10000000.000 10000000.000 -10000000.000 0.000 0.000 1.000 1.800 380.218 ---HHHH---------------HHHHHHHH---HHHHHHHH-----E--------EHHHH-----------------------HHHHH--------
    7bdb29f982c7f6a349f6d1194b54da81

    =========================================

    progress.txt:
    Building structure 1 generation 51
    49 until next generation
    0 generations buffered
    Best Energy so far: 10000000.000

    filelist.txt:
    .\fold_0_XXXX_0_XXXX_protein_50.log.bz2
    .\XXXX_0_XXXX_protein_50_0000046.val
    .\fold_0_XXXX_0_XXXX_protein_51.log.bz2
    .\XXXX_0_XXXX_protein_51_0000050.val
    CurrentStruc 0 51 126 51 1 50 8.069 -2621.499 -979.139 -1739.432 156604336.000 1.150 2.100 578.264 ---HHHH---------------HHHHHHHH---HHHHHHHH-----E--------EHHHH-----------------------HHHHH--------
    7d947bc5e58ccfb529bce7ce2bf137e0

    =========================================

    progress.txt:
    Building structure 1 generation 52
    49 until next generation
    0 generations buffered
    Best Energy so far: 10000000.000

    filelist.txt:
    .\fold_0_XXXX_0_XXXX_protein_51.log.bz2
    .\XXXX_0_XXXX_protein_51_0000050.val
    CurrentStruc 0 1 126 52 1 0 10000000.000 10000000.000 -10000000.000 0.000 0.000 1.100 2.000 502.838 ---HHHH---------------HHHHHHHH---HHHHHHHH-----E--------EHHHH-----------------------HHHHH--------
    20aa279398fe6e337c425053f72b518c

    =========================================

    The entire time throughout the generation those files never change, only when a new generation starts.

    I have stopped one client, re-started it again with -g 1 and it now seems to be working fine. I haven't touched the other one yet in case you want me to do something with it.

    Jeff.
    Last edited by Digital Parasite; 05-01-2003 at 10:50 AM.

  8. #8
    Social Parasite
    Join Date
    Jul 2002
    Location
    Hill Country
    Posts
    94
    I interrupted beta 7 by typing 'Q'. I then typed 'foldit -u t'. The client refused to upload, saying "Missing previous something-or-other".

    So I deleted all the files from the previous generations. Then I could run beta 8.

  9. #9
    Ol' retired IT geezer
    Join Date
    Feb 2003
    Location
    Scarborough
    Posts
    92

    Unhappy Linux Text Client

    When I saw the Beta 8 notification, I retrieved and started two clients... One under Win'98 SE and one under Mandrake Linux. The windows client appears to be working and reporting fine. The Linux client appears to be working fine, BUT does not appear to be reporting since its Best Energy value is not being reported. Could I have a back level version? Its timestamp is 04/19/03 05:11 pm. (In both cases, I started with clean folders)

    Ned

  10. #10
    Ol' retired IT geezer
    Join Date
    Feb 2003
    Location
    Scarborough
    Posts
    92

    Unhappy Linux Client...

    Just redid it... April 30 timestamp now ... grrrr...

    Wasted time on oversite...

    Ned

  11. #11
    Social Parasite
    Join Date
    Jul 2002
    Location
    Hill Country
    Posts
    94
    Having a hard time grasping how much "output" has been queued up, when the path to the server is not available:

    On the DF screen the "X gen. buffered" value appears to step when the .val file for the current generation is first written. (Meaning that after an upload, it quickly says "1 gen buffered" even though the actual upload of that gen would not happen until the final structure of that gen has been built.)

    And the filelist.txt file appears to start with the entry for the generation that was the __last__ to have been previously uploaded. (In other words, it contains the name of one more .bz2 file than there is actually on my hard disk.)

  12. #12
    Originally posted by Mikus
    Having a hard time grasping how much "output" has been queued up, when the path to the server is not available:

    On the DF screen the "X gen. buffered" value appears to step when the .val file for the current generation is first written. (Meaning that after an upload, it quickly says "1 gen buffered" even though the actual upload of that gen would not happen until the final structure of that gen has been built.)

    And the filelist.txt file appears to start with the entry for the generation that was the __last__ to have been previously uploaded. (In other words, it contains the name of one more .bz2 file than there is actually on my hard disk.)
    This is not new, but the format of the filelist.txt is more complicated than in the non-beta. But you shouldn't have to worry about it unless you are trying to write a front-end or something in which case you should just e-mail me for details on how it works.

    This added complication is part of the reason for all the related bugs when switching proteins etc. but I've just about got it straight I think.
    Howard Feldman

  13. #13
    Brian

    I have been running the beta on my W2K Server since beta 4 without any issues until now.

    Last night I downloaded beta 8 and started it. Several hours later I noticed it had an error message and stopped running. I restarted it and it began at gen 0 doing the 10000 initial structures. This morning I saw that it stopped again with the same error message. I checked the error log but there was nothing. I deleted the error log, filelist.txt and the BZ2 and VAL files and restarted it. It once again began with the initial 10000 structures. It stopped again. And again there is nothing in the error log. The error message is:

    The instruction at "0x0044937f" referenced memory at "0x0000009a". The memory could not be "read".

    Click on OK to terminate the program.

    Click on Cancel to debug the program.


    As far as I know, I have not changed anything on the box, other than changing the beta client.

    foldtrajlite.exe, protein.trj and readme.txt are all dated 4/30/2003.

    G

  14. #14
    Registered User
    Join Date
    Mar 2003
    Location
    The Netherlands
    Posts
    12
    Originally posted by Georgina
    Brian

    I have been running the beta on my W2K Server since beta 4 without any issues until now.

    Last night I downloaded beta 8 and started it. Several hours later I noticed it had an error message and stopped running. I restarted it and it began at gen 0 doing the 10000 initial structures. This morning I saw that it stopped again with the same error message. I checked the error log but there was nothing. I deleted the error log, filelist.txt and the BZ2 and VAL files and restarted it. It once again began with the initial 10000 structures. It stopped again. And again there is nothing in the error log. The error message is:

    The instruction at "0x0044937f" referenced memory at "0x0000009a". The memory could not be "read".

    Click on OK to terminate the program.

    Click on Cancel to debug the program.


    As far as I know, I have not changed anything on the box, other than changing the beta client.

    foldtrajlite.exe, protein.trj and readme.txt are all dated 4/30/2003.

    G
    Maybe you can try to run a program like memtest86, it may be your memory that is bad.

  15. #15
    I would agree hear, it is likely a RAM issue resulting in random crashing.
    Howard Feldman

  16. #16
    OK

    I have another stick of DDR ram available. I'll change it and see what happens.

    G

  17. #17
    I'm Back

    Ok I have been having system crashes so this could be related to that...

    Heres what happen...

    After a crash I opened foldit "Not using dfGUI as requested" I encountered this error in the Dos screen====

    [NULL_Caption] FATAL ERROR: [000.000] Upload list has been tampered with, plea
    se delete filelist.txt and try again
    Hit Return


    I did not touch the filelist ! The filelist is now blank "no text what so ever"...

    Here is a jpeg of the DF folder===

    http://www.transload.net/~slotype/TE...ror-folder.jpg

    Here is the error log... Note I have my internet connection off a lot till I get a router for this machine... I do not have the internet switch -i f in the foldit script===

    http://www.transload.net/~slotype/TESTS/error.log


    I started another client but would like to Know if there is a way to rejuvenate this one?? Would be good to Know now and if this happens in the future...

    As always Thanks,
    Slo...

  18. #18
    Try deleting the physical file "filelist.txt" and see if that will let that copy of the client restart. During a crash or hang you may get what is known as a control character in the file. It's hex value has no ascii value so it does not show up when you try to view the text file.

    It also could be that since "filelist.txt" exists but is blank, the beta client thinks there should be some info in there.

    So delete the file and try again.

  19. #19

    Strange reboot problems

    I have been having some strange random reboot problems on one of my machines running DF lately. It seems to reboot on average once every day or two and it is never at the same time. I don't recall installing anything or changing any of my settings except running the latest DF beta. Since I haven't had this problem with any of the other DF beta's, I wasn't blaming it on DF except that the last two times my machine has rebooted I have a strange progress.txt left over showing it is building structure 51 with -1 remaining:

    progress.txt:
    Building structure 51 generation 171
    -1 until next generation
    1 generations buffered
    Best Energy so far: 6.819


    filelist.txt:
    .\fold_0_XXXX_17_XXXX_protein_170.log.bz2
    .\XXXX_0_XXXX_protein_170_0000018.val
    .\fold_0_XXXX_39_XXXX_protein_171.log.bz2
    .\XXXX_0_XXXX_protein_171_0000040.val
    CurrentStruc 0 51 126 171 1 40 6.819 -2395.139 -271.886 -1373.846 110311368.000 1.200 2.200 665.002 -HHHHHHH----------HHHHH------------HHHHH---------HHHH---HHHH-------------------------HHHH-------
    213ae0f895b56680f4934196d7cb8cf3


    error.log:

    ========================[ Apr 30, 2003 5:58 PM ]========================

    ========================[ Apr 30, 2003 6:00 PM ]========================

    ========================[ Apr 30, 2003 8:43 PM ]========================

    ========================[ May 1, 2003 4:40 PM ]========================

    ========================[ May 1, 2003 4:51 PM ]========================

    ========================[ May 1, 2003 5:41 PM ]========================
    ERROR: [000.000] {ncbi_socket.c, line 1258} [SOCK::s_Connect] Failed pending connect to beta.distributedfolding.org:80 (Unknown) {errno=No such file or directory}
    ERROR: [000.000] {ncbi_connutil.c, line 801} [URL_Connect] Socket connect to beta.distributedfolding.org:80 failed: Unknown

    ========================[ May 2, 2003 2:30 PM ]========================

    ========================[ May 2, 2003 9:33 PM ]========================

    ========================[ May 2, 2003 11:20 PM ]========================

    ========================[ May 3, 2003 6:40 PM ]========================

    ========================[ May 4, 2003 9:25 AM ]========================

    ========================[ May 4, 2003 9:33 AM ]========================

    ========================[ May 4, 2003 9:37 AM ]========================

    ========================[ May 5, 2003 1:32 AM ]========================

    ========================[ May 5, 2003 7:12 AM ]========================


    After a little while the progress.txt file seems to work itself out as it goes on to the next generation:

    Building structure 2 generation 172
    48 until next generation
    1 generations buffered
    Best Energy so far: 6.852
    Last edited by Digital Parasite; 05-05-2003 at 07:30 AM.

  20. #20
    Originally posted by Slo
    I'm Back

    http://www.transload.net/~slotype/TESTS/error.log


    I started another client but would like to Know if there is a way to rejuvenate this one?? Would be good to Know now and if this happens in the future...

    As always Thanks,
    Slo...
    The BDRemove errors in your error.log indicate with 99.9999% certainty that your machine has faulty RAM. Please run www.memtest86.com or a similar program to verify this.
    Howard Feldman

  21. #21

    Re: Strange reboot problems

    Originally posted by Digital Parasite
    I have been having some strange random reboot problems on one of my machines running DF lately. It seems to reboot on average once every day or two and it is never at the same time. I don't recall installing anything or changing any of my settings except running the latest DF beta. Since I haven't had this problem with any of the other DF beta's, I wasn't blaming it on DF except that the last two times my machine has rebooted I have a strange progress.txt left over showing it is building structure 51 with -1 remaining:

    progress.txt:
    Building structure 51 generation 171
    -1 until next generation
    1 generations buffered
    Best Energy so far: 6.819
    While I doubt it has anything to do with your machine rebooting, the above MAY be a bug, I will look how it could possibly get to 51 there. It is possible this machine has defective RAM too of course.
    Howard Feldman

  22. #22
    Senior Member
    Join Date
    Mar 2002
    Location
    MI, U.S.
    Posts
    697
    Originally posted by bwkaz
    I did notice something with beta 7 today, though -- I'd been running it since it was released, nonstop, and it somehow managed to consume over 120MB of swap. Which caused issues when I was running Rune (an Unreal Tournament engine video game; yes, I have the Linux version ) on the same machine -- Rune kept getting killed by the kernel out-of-memory handler to free up RAM and/or swap space. Restarting the beta 7 client fixed the problem -- after I did that, the swap that it was using dropped to like 4MB. The RSS size, both before and after restarting, was about the same (~80MB, and yes, I'm running with -rt).

    It almost acted like it was a very small memory leak (though I could be completely wrong here...), where a few bytes every iteration got left allocated, and then got swapped out when enough of them accumulated. But maybe not -- and maybe beta 8 fixes it, we'll see. Anyway, if I don't say anything else about this problem, consider it fixed.
    Still seeing this with beta 8, it happened today. Running Rune again (which, BTW, I just noticed, uses about 250MB of swap! Probably caches the entire level in swap or something), and it just suddenly got killed. DF was taking 80MB of RAM and 100MB of swap again; removing the lock file and restarting it reduced this to 80MB/4MB.

    Any ideas, Howard? Anything else you want me to look at?

    Oh, BTW, yes, this is Linux.

  23. #23

    Re: Re: Strange reboot problems

    Originally posted by Brian the Fist
    While I doubt it has anything to do with your machine rebooting, the above MAY be a bug, I will look how it could possibly get to 51 there. It is possible this machine has defective RAM too of course.
    I had checked my RAM when I first got the machine about a month ago but I will re-run the test again just to make sure.

    I know MS recently released a patch for the kernel to protect against some buffer overflows, it is possible that if a certain error happens the kernel might panic and cause a reboot now from stuff they changed (long shot, but just trying to think of what might be the cause).

    Jeff.

  24. #24
    Senior Member
    Join Date
    Jan 2003
    Location
    North Carolina
    Posts
    184

    Re: Re: Strange reboot problems

    Originally posted by Brian the Fist
    While I doubt it has anything to do with your machine rebooting, the above MAY be a bug, I will look how it could possibly get to 51 there. It is possible this machine has defective RAM too of course.
    I've seen the progress.txt say 51. It happened when I stopped the client during the minimize, then restarted it. I didn't see that this caused any problem, though.

    My guess is that the client was doing the minimize when the crash happened. I suspect that the system might be stressed slightly more during this step, so if a system was right on the very edge of being stable, it might get pushed over the edge by the minimize.

    I agree that running memtest86 is a good idea.

  25. #25
    Originally posted by bwkaz
    Still seeing this with beta 8, it happened today. Running Rune again (which, BTW, I just noticed, uses about 250MB of swap! Probably caches the entire level in swap or something), and it just suddenly got killed. DF was taking 80MB of RAM and 100MB of swap again; removing the lock file and restarting it reduced this to 80MB/4MB.

    Any ideas, Howard? Anything else you want me to look at?

    Oh, BTW, yes, this is Linux.
    How, exactly are you coming up with these numbers? top? Can I see a screenshot/screenscrape of your top when this happens?
    Howard Feldman

  26. #26

    Re: Re: Re: Strange reboot problems

    Originally posted by AMD_is_logical
    I've seen the progress.txt say 51. It happened when I stopped the client during the minimize, then restarted it. I didn't see that this caused any problem, though.

    My guess is that the client was doing the minimize when the crash happened. I suspect that the system might be stressed slightly more during this step, so if a system was right on the very edge of being stable, it might get pushed over the edge by the minimize.

    I agree that running memtest86 is a good idea.
    Yes the 51 is 'normal' (ill fix it though) if you start/stop during minimization. And yes, the minimizations use very different operations than the rest of the program so it could, for example, be an FPU problem or something that wouldn't show up during the rest of the program (maybe). Anyhow if no one else has this problem, I'm assuming the problem is not in the code...
    Howard Feldman

  27. #27
    Senior Member
    Join Date
    Mar 2002
    Location
    MI, U.S.
    Posts
    697
    gkrellm2 and free both report fairly massive swap usage. Even more when Rune is loading (obviously that's not something you can look into, though ). pmap also showed a huge pool of memory being used by foldtrajlite (pmap runs through the /proc/<pid>/maps file and parses it to be more easily readable; source is available here if you think that might help).

    I've since restarted the client, so I can't post screenshots or whatever, but I think I've still got the pmap output stored in the Eterm buffer... hang on.

    08048000 (4260 KB) r-xp (03:47 142706) /home/bilbo/distribfold-icc-beta/foldtrajlite
    08471000 (1240 KB) rw-p (03:47 142706) /home/bilbo/distribfold-icc-beta/foldtrajlite
    085a7000 (51484 KB) rwxp (00:00 0)
    40000000 (84 KB) r-xp (03:05 214953) /lib/ld-2.2.5.so
    40015000 (4 KB) rw-p (03:05 214953) /lib/ld-2.2.5.so
    40016000 (4 KB) rw-p (00:00 0)
    40017000 (12 KB) r-xp (03:05 214948) /lib/libnss_dns-2.2.5.so
    4001a000 (4 KB) rw-p (03:05 214948) /lib/libnss_dns-2.2.5.so
    40022000 (4 KB) rw-p (00:00 0)
    40023000 (136 KB) r-xp (03:05 214943) /lib/libm-2.2.5.so
    40045000 (4 KB) rw-p (03:05 214943) /lib/libm-2.2.5.so
    40046000 (60 KB) r-xp (03:05 212322) /lib/libpthread-0.9.so
    40055000 (28 KB) rw-p (03:05 212322) /lib/libpthread-0.9.so
    4005c000 (236 KB) r-xp (03:05 212708) /lib/libncurses.so.5.2
    40097000 (36 KB) rw-p (03:05 212708) /lib/libncurses.so.5.2
    400a0000 (12 KB) rw-p (00:00 0)
    400a3000 (1148 KB) r-xp (03:05 214954) /lib/libc-2.2.5.so
    401c2000 (24 KB) rw-p (03:05 214954) /lib/libc-2.2.5.so
    401c2000 (24 KB) rw-p (03:05 214954) /lib/libc-2.2.5.so
    401c8000 (16 KB) rw-p (00:00 0)
    401cc000 (20 KB) r-xp (03:05 389592) /usr/lib/libgpm.so.1.18.0
    401d1000 (4 KB) rw-p (03:05 389592) /usr/lib/libgpm.so.1.18.0
    401d2000 (36 KB) r-xp (03:05 213844) /lib/libnss_files-2.2.5.so
    401db000 (4 KB) rw-p (03:05 213844) /lib/libnss_files-2.2.5.so
    401e8000 (60 KB) r-xp (03:05 214947) /lib/libresolv-2.2.5.so
    401f7000 (4 KB) rw-p (03:05 214947) /lib/libresolv-2.2.5.so
    401f8000 (8 KB) rw-p (00:00 0)
    40200000 (2280 KB) rw-p (00:00 0)
    404a7000 (75408 KB) rw-p (00:00 0)
    44f00000 (200 KB) rw-p (00:00 0)
    44f32000 (824 KB) ---p (00:00 0)
    bff85000 (492 KB) rwxp (00:00 0)
    mapped: 138136 KB writable/private: 131260 KB shared: 0 KB
    And right after restarting:

    08048000 (4260 KB) r-xp (03:47 142706) /home/bilbo/distribfold-icc-beta/foldtrajlite
    08471000 (1240 KB) rw-p (03:47 142706) /home/bilbo/distribfold-icc-beta/foldtrajlite
    085a7000 (17932 KB) rwxp (00:00 0)
    40000000 (84 KB) r-xp (03:05 214953) /lib/ld-2.2.5.so
    40015000 (4 KB) rw-p (03:05 214953) /lib/ld-2.2.5.so
    40016000 (4 KB) rw-p (00:00 0)
    40017000 (12 KB) r-xp (03:05 214948) /lib/libnss_dns-2.2.5.so
    4001a000 (4 KB) rw-p (03:05 214948) /lib/libnss_dns-2.2.5.so
    40022000 (4 KB) rw-p (00:00 0)
    40023000 (136 KB) r-xp (03:05 214943) /lib/libm-2.2.5.so
    40045000 (4 KB) rw-p (03:05 214943) /lib/libm-2.2.5.so
    40046000 (60 KB) r-xp (03:05 212322) /lib/libpthread-0.9.so
    40055000 (28 KB) rw-p (03:05 212322) /lib/libpthread-0.9.so
    4005c000 (236 KB) r-xp (03:05 212708) /lib/libncurses.so.5.2
    40097000 (36 KB) rw-p (03:05 212708) /lib/libncurses.so.5.2
    400a0000 (12 KB) rw-p (00:00 0)
    400a3000 (1148 KB) r-xp (03:05 214954) /lib/libc-2.2.5.so
    401c2000 (24 KB) rw-p (03:05 214954) /lib/libc-2.2.5.so
    401c8000 (16 KB) rw-p (00:00 0)
    401cc000 (20 KB) r-xp (03:05 389592) /usr/lib/libgpm.so.1.18.0
    401d1000 (4 KB) rw-p (03:05 389592) /usr/lib/libgpm.so.1.18.0
    401d2000 (36 KB) r-xp (03:05 213844) /lib/libnss_files-2.2.5.so
    401db000 (4 KB) rw-p (03:05 213844) /lib/libnss_files-2.2.5.so
    401e8000 (60 KB) r-xp (03:05 214947) /lib/libresolv-2.2.5.so
    401f7000 (4 KB) rw-p (03:05 214947) /lib/libresolv-2.2.5.so
    401f8000 (2520 KB) rw-p (00:00 0)
    404a7000 (74152 KB) rw-p (00:00 0)
    bff85000 (492 KB) rwxp (00:00 0)
    mapped: 102536 KB writable/private: 96484 KB shared: 0 KB
    The important part is the first large allocation (with permissions rwxp) -- before I restarted the thing, it had a 50MB chunk, and right after a restart, that shrank to 17MB. It's at 19MB right now. Then again, right now, discounting cache, I've got 100MB of physical RAM free, so it's not likely to cause any problems for a while.

    I don't use top, though. Don't like it -- ps is much better IMHO.

    And note also that this isn't as bad as it had been last time I complained (right at the end of beta 7). There, pmap was showing a writable/private value of near 200MB, with (again) only 80 or so of it in physical RAM (according to the swap usage reported by free, with almost nothing else running -- and definitely not X).

    Edit: Hang on... why does that chunk have execute permission? That doesn't make any sense... I wonder if this is a bug in the system libraries, not DF... Hmm...
    Last edited by bwkaz; 05-05-2003 at 06:39 PM.

  28. #28

    Re: Re: Re: Re: Strange reboot problems

    Originally posted by Brian the Fist
    Yes the 51 is 'normal' (ill fix it though) if you start/stop during minimization. And yes, the minimizations use very different operations than the rest of the program so it could, for example, be an FPU problem or something that wouldn't show up during the rest of the program (maybe). Anyhow if no one else has this problem, I'm assuming the problem is not in the code...
    Another one of my DF machines running XP just spontaneously rebooted. That one is a totally different beast having a different brand of processor, different MB, different type of RAM. It was in the middle of a generation.

    I have a feeling it might be that security patch that MS made available. Hopefully they will have a fix for the fix soon.

    Jeff.

  29. #29
    I just finished doing a full memtest86 and a Prime95 torture test and both passed with flying colours so it is not my RAM that is bad (which I was pretty sure since I had just done that a month ago).

    My guess is the new patch that MS recently released.

    Jeff.

  30. #30
    Interesting... one of my foldtrajlite.com clients just crashed. It was installed as a service. Nothing in error.log. I was able to capture this message from my VS.NET debugger when it crashed.

    service.cfg :
    service=1
    useram=1
    progress=1

    progress.txt :
    Building structure 31 generation 193
    19 until next generation
    1 generations buffered
    Best Energy so far: 7.152

    filelist.txt :
    .\fold_0_XXXX_40_XXXX_protein_192.log.bz2
    .\XXXX_0_XXXX_protein_192_0000041.val
    fold_0_XXXX_9_XXXX_protein_193.log.bz2
    XXXX_0_XXXX_protein_193_0000010.val
    CurrentStruc 0 31 126 193 1 10 7.152 -891.114 1010.339 -91.804 7071383.500 1.650 3.100 2339.397 -HHHHHHH----------HHHHH------------HHHHH---------HHHH---HHHH-------------------------HHHH-------
    51bc41daef8ca892be58f1b839019672


    We have ruled out the RAM being a problem. I have no idea what this crash is, first time I have ever seen it.

    Jeff.

    Here is the assembler dump if you can read it (the bold line is the one where it crashed on):

    0044E870 push ebx
    0044E871 push esi
    0044E872 mov esi,dword ptr [esp+0Ch]
    0044E876 test esi,esi
    0044E878 push edi
    0044E879 je 0044E9B8
    0044E87F mov ebx,dword ptr [esp+14h]
    0044E883 movsx edi,bx
    0044E886 lea ecx,[esi+4]
    0044E889 mov dword ptr ds:[7214C8h],ecx
    0044E88F fld dword ptr [ecx+edi*4]
    0044E892 fcomp dword ptr [edi*4+7214B8h]
    0044E899 fnstsw ax
    0044E89B test ah,41h
    0044E89E jne 0044E8BD
    0044E8A0 mov eax,dword ptr [esi+edi*8+10h]
    0044E8A4 test eax,eax
    0044E8A6 mov dword ptr ds:[007214E4h],eax
    0044E8AB je 0044E8BD
    0044E8AD push ebx
    0044E8AE push eax
    0044E8AF call 0044E870
    0044E8B4 mov ecx,dword ptr ds:[7214C8h]
    0044E8BA add esp,8
    0044E8BD cmp bx,2
    0044E8C1 jge 0044E8D5
    0044E8C3 lea eax,[ebx+1]
    0044E8C6 push eax
    0044E8C7 push esi
    0044E8C8 call 0044E870
    0044E8CD add esp,8
    0044E8D0 jmp 0044E98F
    0044E8D5 fld dword ptr [ecx]
    0044E8D7 fcomp dword ptr ds:[7214D8h]
    0044E8DD fnstsw ax
    0044E8DF test ah,1
    0044E8E2 je 0044E995
    Attached Images Attached Images

  31. #31
    Originally posted by bwkaz

    And note also that this isn't as bad as it had been last time I complained (right at the end of beta 7). There, pmap was showing a writable/private value of near 200MB, with (again) only 80 or so of it in physical RAM (according to the swap usage reported by free, with almost nothing else running -- and definitely not X).


    Edit: Hang on... why does that chunk have execute permission? That doesn't make any sense... I wonder if this is a bug in the system libraries, not DF... Hmm... [/B]
    Sorry dude but its all greek to me, as they say. I don't know pmap or the file format above though I can vaguely guess what some of the columns are, and I have no idea what writable/private is. Since I do know top, if you want me to fix this please make the problem occur again and send me the output from top. Also please post the exact flags that are being used in your foldit script.
    Howard Feldman

  32. #32
    Originally posted by Digital Parasite
    Interesting... one of my foldtrajlite.com clients just crashed. It was installed as a service. Nothing in error.log. I was able to capture this message from my VS.NET debugger when it crashed.
    Alas, without any symbols it is fairly hopeless. You seem to be the only one out of the 50 or so testers that is having this trouble though so Im still not sure whats going on. I do not suspect it has to do with any Microsoft patches as Im sure other people keep their computers up to date too..

    I could try sending you a debug version but it still might not give you the symbols. I could alternatively give you an ErrorLogPrintf riddled version of the code to track where the code is at all times but this is generally painful and a last resort.
    Howard Feldman

  33. #33
    Originally posted by Brian the Fist
    Alas, without any symbols it is fairly hopeless. You seem to be the only one out of the 50 or so testers that is having this trouble though so Im still not sure whats going on. I do not suspect it has to do with any Microsoft patches as Im sure other people keep their computers up to date too..

    I could try sending you a debug version but it still might not give you the symbols. I could alternatively give you an ErrorLogPrintf riddled version of the code to track where the code is at all times but this is generally painful and a last resort.
    The actual crash was the first time it had happened to me but if you want to send me a debug version I will run that in case the debugger will print out the symbols so you can see where it is happened.

    Jeff.

  34. #34
    Senior Member
    Join Date
    Jul 2002
    Location
    Kodiak, Alaska
    Posts
    432
    Originally posted by Brian the Fist
    You seem to be the only one out of the 50 or so testers that is having this trouble though so Im still not sure whats going on. I do not suspect it has to do with any Microsoft patches as Im sure other people keep their computers up to date too..
    The problem with my axp1800+ system running the beta clients - that ended up corrupting itself - happened after I'd loaded one of the latest critical updates from MS for Win98. (first time I'd loaded a critical update since starting the beta clients with beta 1). So there's a possibility that the critical update process is/has causing problems.

  35. #35
    Senior Member
    Join Date
    Mar 2002
    Location
    MI, U.S.
    Posts
    697
    Originally posted by Brian the Fist
    Sorry dude but its all greek to me, as they say. I don't know pmap or the file format above though I can vaguely guess what some of the columns are, and I have no idea what writable/private is. Since I do know top, if you want me to fix this please make the problem occur again and send me the output from top. Also please post the exact flags that are being used in your foldit script.
    OK, will do if I see it again (and I just rebooted the whole system today, so it probably won't happen until sometime near the end of next week). The flags, for the moment, are just "-rt -g 5".

    Thanks!

  36. #36

    StatsMan DF beta stats available

    Initially only updated every 12 hours, but here they are:

    http://www.statsman.org/distfoldbetastats
    or
    http://www.statsman.org/distfoldbetastats/html

    Enjoy!
    StatsMan

  37. #37
    Ancient Programmer Paratima's Avatar
    Join Date
    Dec 2001
    Location
    West Central Florida
    Posts
    3,296
    Thanks, STATSMAN! Great work as always!

  38. #38
    OCworkbench Stats Ho
    Join Date
    Jan 2003
    Posts
    519
    I am with the Microsoft stuft something camp..there AMD updates tend to bugger things up big time, last time I had to unuinstall to get a couple of my PCs to run agin I have not loaded patches for 2 months and will not in the future unless it is deserately needed. If you have an AMD system, don't get the AMD CPU Updates..they are very nasty .

    I am not a Stats Ho, it is just more satisfying to see that my numbers are better than yours.

  39. #39
    For testing, I stopped my DF clients on the machine that has been random rebooting. Just as I expected, the machine still reboots so we know it isn't the DF client.

    I found this interesting article about the MS patch I have been talking about:
    http://support.microsoft.com/?kbid=819634

    It especially acts up if you have anti-virus software (who doesn't these days). I'm going to try uninstalling it and see if it makes a difference since it does seem to be messing people's systems up to a certain extent.

    Jeff.

  40. #40
    Ok, so is it basically safe for me to ignore all the 'rebooting'/crashing problems mentioned in this thread, and we shall attribute them to Micro$oft? No one has had any trouble on LINUX other than the alledged memory leak??
    Howard Feldman

Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •