View Full Version : Beta 8 - release candidiate
Brian the Fist
04-30-2003, 04:58 PM
The next (and possibly final) beta is available at the usual location (see start of beta 7/6/5 threads). I have adjusted the algorithm one more time and have fixed some bugs with file uploading/tracking. To install, as usual, either overwrite an existing beta version, or download the 'full' client from the normal web site and unpack the beta overtop of that.
I would request that while beta testing you do NOT use dfGUI please. Beta testing two pieces of related software simultaneously is just asking for trouble. I'd request that you refrain from testing dfGUI until we are sure all the major bugs from the client are worked out. As it is now, I am unable to distringuish true bugs from bugs caused (inadvenrtantly of course) by dfGUI. It will be to everyone's benefit if we ensure the client is fully stable and robust to abuse before dfGUI can be made the same. Please do not report any bugs in the beta if you are using dfGUI. Please DO report bugs otherwise.
There may possibly be some situations still which result in an 'unable to find previous generation' type message so let us know if this occurs. Also, please remember when reporting a suspected bug, please post your full filelist.txt (x out your handle though if you wish), your full error.log (at least the relevant part up until the end of the file) and as clear a description of the problem as possible. Also important, please note all flags in the foldit script you have used, if any as these often play an important role in the bugs (since I cant possibly test all combinations of flags).
Thanks for your continued co-operation and I suspect this may well be the last change to the algorithm before we try it on a much larger scale (i.e. release it).
tpdooley
04-30-2003, 05:34 PM
my beta machines aren't running dfGUI. The machine that has the corrupted copy of the beta client is running win98se with the critical update program from MS - and after being informed that there was a new update from MS, I may have installed a "critical update" in the middle of the beta 7 test. It passed through beta 1-6 with no problems.
my 5 week MSCE class on Exchange 2000 is over tonight.. so I'll finally get to take a look at the corrupted directory and see if I can help isolate what is corrupted.
I remention this since rsbriggs described something remotely similar. Perhaps others will also try testing out whether beta client 8 has problems with MS products or critical updates being installed in the background?
AMD_is_logical
04-30-2003, 06:00 PM
I tried copying the beta8 files into the DF directories without removing the beta7 work. It didn't work, so I had to remove all work files. :rolleyes:
What I did and the errors I got were exactly the same as last time. See my first post in the beta7 thread for details.
Paratima
04-30-2003, 06:09 PM
Linux version came right up & ran fast. (See cat,scalded.)
Windows text client on W98, when run w/o -qt is so slow, it's painful to watch. So I stopped watching. It spends so much time updating the top line, that drawing the ASCII art takes forever. Running now under -qt (sans dfGUI) and it's MUCH faster. :cool:
bwkaz
04-30-2003, 06:24 PM
Hmm... that "calculating energy" phase still takes ~50% CPU in kernel time...
I guess I just have to bug people some more until I hear "yeah, that happens to me too" enough. :D
Anyway, yeah, other than that, beta 8 seems to be working fine. I did notice something with beta 7 today, though -- I'd been running it since it was released, nonstop, and it somehow managed to consume over 120MB of swap. Which caused issues when I was running Rune (an Unreal Tournament engine video game; yes, I have the Linux version :p) on the same machine -- Rune kept getting killed by the kernel out-of-memory handler to free up RAM and/or swap space. Restarting the beta 7 client fixed the problem -- after I did that, the swap that it was using dropped to like 4MB. The RSS size, both before and after restarting, was about the same (~80MB, and yes, I'm running with -rt).
It almost acted like it was a very small memory leak (though I could be completely wrong here...), where a few bytes every iteration got left allocated, and then got swapped out when enough of them accumulated. But maybe not -- and maybe beta 8 fixes it, we'll see. Anyway, if I don't say anything else about this problem, consider it fixed. :)
Paratima
04-30-2003, 06:31 PM
Hey Howard, don't forget to clear the stats so's we can see how it's running!
Digital Parasite
05-01-2003, 08:01 AM
I'm seeing a strange bug now with beta8. I think I have seen this before but never really investigated it and someone else on my team also reported the same problem but he was running dfGUI at the time so we weren't sure if it was my bug or yours.
Since I am not running dfGUI I guess you win. ;)
I have started 3 beta8 clients on XP, all running as services, all with the following service.cfg:
service=2
useram=1
(Two are service=1 and one is service=2)
It seems that the DF client is only updating progress.txt and filelist.txt at the beginning of each generation and never in between for two of those clients. The third client is working fine. The other strange thing is that filelist.txt is sometimes showing 2 generations and sometimes 1, but it is always reporting in progress.txt that it has 0 generations buffered. These are all files from the same client.
progress.txt:
Building structure 1 generation 50
49 until next generation
0 generations buffered
Best Energy so far: 10000000.000
filelist.txt:
.\fold_0_XXXX_0_XXXX_protein_49.log.bz2
.\XXXX_0_XXXX_protein_49_0000042.val
CurrentStruc 0 1 126 50 1 0 10000000.000 10000000.000 -10000000.000 0.000 0.000 1.000 1.800 380.218 ---HHHH---------------HHHHHHHH---HHHHHHHH-----E--------EHHHH-----------------------HHHHH--------
7bdb29f982c7f6a349f6d1194b54da81
=========================================
progress.txt:
Building structure 1 generation 51
49 until next generation
0 generations buffered
Best Energy so far: 10000000.000
filelist.txt:
.\fold_0_XXXX_0_XXXX_protein_50.log.bz2
.\XXXX_0_XXXX_protein_50_0000046.val
.\fold_0_XXXX_0_XXXX_protein_51.log.bz2
.\XXXX_0_XXXX_protein_51_0000050.val
CurrentStruc 0 51 126 51 1 50 8.069 -2621.499 -979.139 -1739.432 156604336.000 1.150 2.100 578.264 ---HHHH---------------HHHHHHHH---HHHHHHHH-----E--------EHHHH-----------------------HHHHH--------
7d947bc5e58ccfb529bce7ce2bf137e0
=========================================
progress.txt:
Building structure 1 generation 52
49 until next generation
0 generations buffered
Best Energy so far: 10000000.000
filelist.txt:
.\fold_0_XXXX_0_XXXX_protein_51.log.bz2
.\XXXX_0_XXXX_protein_51_0000050.val
CurrentStruc 0 1 126 52 1 0 10000000.000 10000000.000 -10000000.000 0.000 0.000 1.100 2.000 502.838 ---HHHH---------------HHHHHHHH---HHHHHHHH-----E--------EHHHH-----------------------HHHHH--------
20aa279398fe6e337c425053f72b518c
=========================================
The entire time throughout the generation those files never change, only when a new generation starts.
I have stopped one client, re-started it again with -g 1 and it now seems to be working fine. I haven't touched the other one yet in case you want me to do something with it.
Jeff.
Mikus
05-01-2003, 09:49 AM
I interrupted beta 7 by typing 'Q'. I then typed 'foldit -u t'. The client refused to upload, saying "Missing previous something-or-other".
So I deleted all the files from the previous generations. Then I could run beta 8.
When I saw the Beta 8 notification, I retrieved and started two clients... One under Win'98 SE and one under Mandrake Linux. The windows client appears to be working and reporting fine. The Linux client appears to be working fine, BUT does not appear to be reporting since its Best Energy value is not being reported. Could I have a back level version? Its timestamp is 04/19/03 05:11 pm. (In both cases, I started with clean folders)
Ned
Just redid it... April 30 timestamp now ... grrrr...
Wasted time on oversite...
Ned
Mikus
05-01-2003, 01:22 PM
Having a hard time grasping how much "output" has been queued up, when the path to the server is not available:
On the DF screen the "X gen. buffered" value appears to step when the .val file for the current generation is first written. (Meaning that after an upload, it quickly says "1 gen buffered" even though the actual upload of that gen would not happen until the final structure of that gen has been built.)
And the filelist.txt file appears to start with the entry for the generation that was the __last__ to have been previously uploaded. (In other words, it contains the name of one more .bz2 file than there is actually on my hard disk.)
Brian the Fist
05-01-2003, 04:27 PM
Originally posted by Mikus
Having a hard time grasping how much "output" has been queued up, when the path to the server is not available:
On the DF screen the "X gen. buffered" value appears to step when the .val file for the current generation is first written. (Meaning that after an upload, it quickly says "1 gen buffered" even though the actual upload of that gen would not happen until the final structure of that gen has been built.)
And the filelist.txt file appears to start with the entry for the generation that was the __last__ to have been previously uploaded. (In other words, it contains the name of one more .bz2 file than there is actually on my hard disk.)
This is not new, but the format of the filelist.txt is more complicated than in the non-beta. But you shouldn't have to worry about it unless you are trying to write a front-end or something in which case you should just e-mail me for details on how it works.
This added complication is part of the reason for all the related bugs when switching proteins etc. but I've just about got it straight I think.
Georgina
05-01-2003, 10:55 PM
Brian
I have been running the beta on my W2K Server since beta 4 without any issues until now.
Last night I downloaded beta 8 and started it. Several hours later I noticed it had an error message and stopped running. I restarted it and it began at gen 0 doing the 10000 initial structures. This morning I saw that it stopped again with the same error message. I checked the error log but there was nothing. I deleted the error log, filelist.txt and the BZ2 and VAL files and restarted it. It once again began with the initial 10000 structures. It stopped again. And again there is nothing in the error log. The error message is:
The instruction at "0x0044937f" referenced memory at "0x0000009a". The memory could not be "read".
Click on OK to terminate the program.
Click on Cancel to debug the program.
As far as I know, I have not changed anything on the box, other than changing the beta client.
foldtrajlite.exe, protein.trj and readme.txt are all dated 4/30/2003.
G
arjanscholl
05-02-2003, 07:26 AM
Originally posted by Georgina
Brian
I have been running the beta on my W2K Server since beta 4 without any issues until now.
Last night I downloaded beta 8 and started it. Several hours later I noticed it had an error message and stopped running. I restarted it and it began at gen 0 doing the 10000 initial structures. This morning I saw that it stopped again with the same error message. I checked the error log but there was nothing. I deleted the error log, filelist.txt and the BZ2 and VAL files and restarted it. It once again began with the initial 10000 structures. It stopped again. And again there is nothing in the error log. The error message is:
The instruction at "0x0044937f" referenced memory at "0x0000009a". The memory could not be "read".
Click on OK to terminate the program.
Click on Cancel to debug the program.
As far as I know, I have not changed anything on the box, other than changing the beta client.
foldtrajlite.exe, protein.trj and readme.txt are all dated 4/30/2003.
G
Maybe you can try to run a program like memtest86, it may be your memory that is bad.
Brian the Fist
05-02-2003, 10:13 AM
I would agree hear, it is likely a RAM issue resulting in random crashing.
Georgina
05-02-2003, 11:41 AM
OK
I have another stick of DDR ram available. I'll change it and see what happens.
G
I'm Back ;)
Ok I have been having system crashes so this could be related to that...
Heres what happen...
After a crash I opened foldit "Not using dfGUI as requested" I encountered this error in the Dos screen====
[NULL_Caption] FATAL ERROR: [000.000] Upload list has been tampered with, plea
se delete filelist.txt and try again
Hit Return
I did not touch the filelist ! The filelist is now blank "no text what so ever"...
Here is a jpeg of the DF folder===
http://www.transload.net/~slotype/TESTS/error-folder.jpg
Here is the error log... Note I have my internet connection off a lot till I get a router for this machine... I do not have the internet switch -i f in the foldit script===
http://www.transload.net/~slotype/TESTS/error.log
I started another client but would like to Know if there is a way to rejuvenate this one?? Would be good to Know now and if this happens in the future...
As always Thanks,
Slo...
PinHead
05-04-2003, 11:09 PM
Try deleting the physical file "filelist.txt" and see if that will let that copy of the client restart. During a crash or hang you may get what is known as a control character in the file. It's hex value has no ascii value so it does not show up when you try to view the text file.
It also could be that since "filelist.txt" exists but is blank, the beta client thinks there should be some info in there.
So delete the file and try again.
Digital Parasite
05-05-2003, 07:20 AM
I have been having some strange random reboot problems on one of my machines running DF lately. It seems to reboot on average once every day or two and it is never at the same time. I don't recall installing anything or changing any of my settings except running the latest DF beta. Since I haven't had this problem with any of the other DF beta's, I wasn't blaming it on DF except that the last two times my machine has rebooted I have a strange progress.txt left over showing it is building structure 51 with -1 remaining:
progress.txt:
Building structure 51 generation 171
-1 until next generation
1 generations buffered
Best Energy so far: 6.819
filelist.txt:
.\fold_0_XXXX_17_XXXX_protein_170.log.bz2
.\XXXX_0_XXXX_protein_170_0000018.val
.\fold_0_XXXX_39_XXXX_protein_171.log.bz2
.\XXXX_0_XXXX_protein_171_0000040.val
CurrentStruc 0 51 126 171 1 40 6.819 -2395.139 -271.886 -1373.846 110311368.000 1.200 2.200 665.002 -HHHHHHH----------HHHHH------------HHHHH---------HHHH---HHHH-------------------------HHHH-------
213ae0f895b56680f4934196d7cb8cf3
error.log:
========================[ Apr 30, 2003 5:58 PM ]========================
========================[ Apr 30, 2003 6:00 PM ]========================
========================[ Apr 30, 2003 8:43 PM ]========================
========================[ May 1, 2003 4:40 PM ]========================
========================[ May 1, 2003 4:51 PM ]========================
========================[ May 1, 2003 5:41 PM ]========================
ERROR: [000.000] {ncbi_socket.c, line 1258} [SOCK::s_Connect] Failed pending connect to beta.distributedfolding.org:80 (Unknown) {errno=No such file or directory}
ERROR: [000.000] {ncbi_connutil.c, line 801} [URL_Connect] Socket connect to beta.distributedfolding.org:80 failed: Unknown
========================[ May 2, 2003 2:30 PM ]========================
========================[ May 2, 2003 9:33 PM ]========================
========================[ May 2, 2003 11:20 PM ]========================
========================[ May 3, 2003 6:40 PM ]========================
========================[ May 4, 2003 9:25 AM ]========================
========================[ May 4, 2003 9:33 AM ]========================
========================[ May 4, 2003 9:37 AM ]========================
========================[ May 5, 2003 1:32 AM ]========================
========================[ May 5, 2003 7:12 AM ]========================
After a little while the progress.txt file seems to work itself out as it goes on to the next generation:
Building structure 2 generation 172
48 until next generation
1 generations buffered
Best Energy so far: 6.852
Brian the Fist
05-05-2003, 11:04 AM
Originally posted by Slo
I'm Back ;)
http://www.transload.net/~slotype/TESTS/error.log
I started another client but would like to Know if there is a way to rejuvenate this one?? Would be good to Know now and if this happens in the future...
As always Thanks,
Slo...
The BDRemove errors in your error.log indicate with 99.9999% certainty that your machine has faulty RAM. Please run www.memtest86.com or a similar program to verify this.
Brian the Fist
05-05-2003, 11:06 AM
Originally posted by Digital Parasite
I have been having some strange random reboot problems on one of my machines running DF lately. It seems to reboot on average once every day or two and it is never at the same time. I don't recall installing anything or changing any of my settings except running the latest DF beta. Since I haven't had this problem with any of the other DF beta's, I wasn't blaming it on DF except that the last two times my machine has rebooted I have a strange progress.txt left over showing it is building structure 51 with -1 remaining:
progress.txt:
Building structure 51 generation 171
-1 until next generation
1 generations buffered
Best Energy so far: 6.819
While I doubt it has anything to do with your machine rebooting, the above MAY be a bug, I will look how it could possibly get to 51 there. It is possible this machine has defective RAM too of course.
bwkaz
05-05-2003, 11:19 AM
Originally posted by bwkaz
I did notice something with beta 7 today, though -- I'd been running it since it was released, nonstop, and it somehow managed to consume over 120MB of swap. Which caused issues when I was running Rune (an Unreal Tournament engine video game; yes, I have the Linux version :p) on the same machine -- Rune kept getting killed by the kernel out-of-memory handler to free up RAM and/or swap space. Restarting the beta 7 client fixed the problem -- after I did that, the swap that it was using dropped to like 4MB. The RSS size, both before and after restarting, was about the same (~80MB, and yes, I'm running with -rt).
It almost acted like it was a very small memory leak (though I could be completely wrong here...), where a few bytes every iteration got left allocated, and then got swapped out when enough of them accumulated. But maybe not -- and maybe beta 8 fixes it, we'll see. Anyway, if I don't say anything else about this problem, consider it fixed. :) Still seeing this with beta 8, it happened today. Running Rune again (which, BTW, I just noticed, uses about 250MB of swap! :eek: Probably caches the entire level in swap or something), and it just suddenly got killed. DF was taking 80MB of RAM and 100MB of swap again; removing the lock file and restarting it reduced this to 80MB/4MB.
Any ideas, Howard? Anything else you want me to look at?
Oh, BTW, yes, this is Linux. ;)
Digital Parasite
05-05-2003, 12:10 PM
Originally posted by Brian the Fist
While I doubt it has anything to do with your machine rebooting, the above MAY be a bug, I will look how it could possibly get to 51 there. It is possible this machine has defective RAM too of course.
I had checked my RAM when I first got the machine about a month ago but I will re-run the test again just to make sure.
I know MS recently released a patch for the kernel to protect against some buffer overflows, it is possible that if a certain error happens the kernel might panic and cause a reboot now from stuff they changed (long shot, but just trying to think of what might be the cause).
Jeff.
AMD_is_logical
05-05-2003, 01:13 PM
Originally posted by Brian the Fist
While I doubt it has anything to do with your machine rebooting, the above MAY be a bug, I will look how it could possibly get to 51 there. It is possible this machine has defective RAM too of course. I've seen the progress.txt say 51. It happened when I stopped the client during the minimize, then restarted it. I didn't see that this caused any problem, though.
My guess is that the client was doing the minimize when the crash happened. I suspect that the system might be stressed slightly more during this step, so if a system was right on the very edge of being stable, it might get pushed over the edge by the minimize.
I agree that running memtest86 is a good idea.
Brian the Fist
05-05-2003, 01:56 PM
Originally posted by bwkaz
Still seeing this with beta 8, it happened today. Running Rune again (which, BTW, I just noticed, uses about 250MB of swap! :eek: Probably caches the entire level in swap or something), and it just suddenly got killed. DF was taking 80MB of RAM and 100MB of swap again; removing the lock file and restarting it reduced this to 80MB/4MB.
Any ideas, Howard? Anything else you want me to look at?
Oh, BTW, yes, this is Linux. ;)
How, exactly are you coming up with these numbers? top? Can I see a screenshot/screenscrape of your top when this happens?
Brian the Fist
05-05-2003, 02:00 PM
Originally posted by AMD_is_logical
I've seen the progress.txt say 51. It happened when I stopped the client during the minimize, then restarted it. I didn't see that this caused any problem, though.
My guess is that the client was doing the minimize when the crash happened. I suspect that the system might be stressed slightly more during this step, so if a system was right on the very edge of being stable, it might get pushed over the edge by the minimize.
I agree that running memtest86 is a good idea.
Yes the 51 is 'normal' (ill fix it though) if you start/stop during minimization. And yes, the minimizations use very different operations than the rest of the program so it could, for example, be an FPU problem or something that wouldn't show up during the rest of the program (maybe). Anyhow if no one else has this problem, I'm assuming the problem is not in the code... :elephant:
bwkaz
05-05-2003, 02:18 PM
gkrellm2 and free both report fairly massive swap usage. Even more when Rune is loading (obviously that's not something you can look into, though ;)). pmap also showed a huge pool of memory being used by foldtrajlite (pmap runs through the /proc/<pid>/maps file and parses it to be more easily readable; source is available here (http://web.hexapodia.org/~adi/pmap.c) if you think that might help).
I've since restarted the client, so I can't post screenshots or whatever, but I think I've still got the pmap output stored in the Eterm buffer... hang on.
08048000 (4260 KB) r-xp (03:47 142706) /home/bilbo/distribfold-icc-beta/foldtrajlite
08471000 (1240 KB) rw-p (03:47 142706) /home/bilbo/distribfold-icc-beta/foldtrajlite
085a7000 (51484 KB) rwxp (00:00 0)
40000000 (84 KB) r-xp (03:05 214953) /lib/ld-2.2.5.so
40015000 (4 KB) rw-p (03:05 214953) /lib/ld-2.2.5.so
40016000 (4 KB) rw-p (00:00 0)
40017000 (12 KB) r-xp (03:05 214948) /lib/libnss_dns-2.2.5.so
4001a000 (4 KB) rw-p (03:05 214948) /lib/libnss_dns-2.2.5.so
40022000 (4 KB) rw-p (00:00 0)
40023000 (136 KB) r-xp (03:05 214943) /lib/libm-2.2.5.so
40045000 (4 KB) rw-p (03:05 214943) /lib/libm-2.2.5.so
40046000 (60 KB) r-xp (03:05 212322) /lib/libpthread-0.9.so
40055000 (28 KB) rw-p (03:05 212322) /lib/libpthread-0.9.so
4005c000 (236 KB) r-xp (03:05 212708) /lib/libncurses.so.5.2
40097000 (36 KB) rw-p (03:05 212708) /lib/libncurses.so.5.2
400a0000 (12 KB) rw-p (00:00 0)
400a3000 (1148 KB) r-xp (03:05 214954) /lib/libc-2.2.5.so
401c2000 (24 KB) rw-p (03:05 214954) /lib/libc-2.2.5.so
401c2000 (24 KB) rw-p (03:05 214954) /lib/libc-2.2.5.so
401c8000 (16 KB) rw-p (00:00 0)
401cc000 (20 KB) r-xp (03:05 389592) /usr/lib/libgpm.so.1.18.0
401d1000 (4 KB) rw-p (03:05 389592) /usr/lib/libgpm.so.1.18.0
401d2000 (36 KB) r-xp (03:05 213844) /lib/libnss_files-2.2.5.so
401db000 (4 KB) rw-p (03:05 213844) /lib/libnss_files-2.2.5.so
401e8000 (60 KB) r-xp (03:05 214947) /lib/libresolv-2.2.5.so
401f7000 (4 KB) rw-p (03:05 214947) /lib/libresolv-2.2.5.so
401f8000 (8 KB) rw-p (00:00 0)
40200000 (2280 KB) rw-p (00:00 0)
404a7000 (75408 KB) rw-p (00:00 0)
44f00000 (200 KB) rw-p (00:00 0)
44f32000 (824 KB) ---p (00:00 0)
bff85000 (492 KB) rwxp (00:00 0)
mapped: 138136 KB writable/private: 131260 KB shared: 0 KB And right after restarting:
08048000 (4260 KB) r-xp (03:47 142706) /home/bilbo/distribfold-icc-beta/foldtrajlite
08471000 (1240 KB) rw-p (03:47 142706) /home/bilbo/distribfold-icc-beta/foldtrajlite
085a7000 (17932 KB) rwxp (00:00 0)
40000000 (84 KB) r-xp (03:05 214953) /lib/ld-2.2.5.so
40015000 (4 KB) rw-p (03:05 214953) /lib/ld-2.2.5.so
40016000 (4 KB) rw-p (00:00 0)
40017000 (12 KB) r-xp (03:05 214948) /lib/libnss_dns-2.2.5.so
4001a000 (4 KB) rw-p (03:05 214948) /lib/libnss_dns-2.2.5.so
40022000 (4 KB) rw-p (00:00 0)
40023000 (136 KB) r-xp (03:05 214943) /lib/libm-2.2.5.so
40045000 (4 KB) rw-p (03:05 214943) /lib/libm-2.2.5.so
40046000 (60 KB) r-xp (03:05 212322) /lib/libpthread-0.9.so
40055000 (28 KB) rw-p (03:05 212322) /lib/libpthread-0.9.so
4005c000 (236 KB) r-xp (03:05 212708) /lib/libncurses.so.5.2
40097000 (36 KB) rw-p (03:05 212708) /lib/libncurses.so.5.2
400a0000 (12 KB) rw-p (00:00 0)
400a3000 (1148 KB) r-xp (03:05 214954) /lib/libc-2.2.5.so
401c2000 (24 KB) rw-p (03:05 214954) /lib/libc-2.2.5.so
401c8000 (16 KB) rw-p (00:00 0)
401cc000 (20 KB) r-xp (03:05 389592) /usr/lib/libgpm.so.1.18.0
401d1000 (4 KB) rw-p (03:05 389592) /usr/lib/libgpm.so.1.18.0
401d2000 (36 KB) r-xp (03:05 213844) /lib/libnss_files-2.2.5.so
401db000 (4 KB) rw-p (03:05 213844) /lib/libnss_files-2.2.5.so
401e8000 (60 KB) r-xp (03:05 214947) /lib/libresolv-2.2.5.so
401f7000 (4 KB) rw-p (03:05 214947) /lib/libresolv-2.2.5.so
401f8000 (2520 KB) rw-p (00:00 0)
404a7000 (74152 KB) rw-p (00:00 0)
bff85000 (492 KB) rwxp (00:00 0)
mapped: 102536 KB writable/private: 96484 KB shared: 0 KB The important part is the first large allocation (with permissions rwxp) -- before I restarted the thing, it had a 50MB chunk, and right after a restart, that shrank to 17MB. It's at 19MB right now. Then again, right now, discounting cache, I've got 100MB of physical RAM free, so it's not likely to cause any problems for a while.
I don't use top, though. Don't like it -- ps is much better IMHO. ;)
And note also that this isn't as bad as it had been last time I complained (right at the end of beta 7). There, pmap was showing a writable/private value of near 200MB, with (again) only 80 or so of it in physical RAM (according to the swap usage reported by free, with almost nothing else running -- and definitely not X).
Edit: Hang on... why does that chunk have execute permission? That doesn't make any sense... I wonder if this is a bug in the system libraries, not DF... Hmm...
Digital Parasite
05-05-2003, 06:15 PM
Originally posted by Brian the Fist
Yes the 51 is 'normal' (ill fix it though) if you start/stop during minimization. And yes, the minimizations use very different operations than the rest of the program so it could, for example, be an FPU problem or something that wouldn't show up during the rest of the program (maybe). Anyhow if no one else has this problem, I'm assuming the problem is not in the code... :elephant:
Another one of my DF machines running XP just spontaneously rebooted. That one is a totally different beast having a different brand of processor, different MB, different type of RAM. It was in the middle of a generation.
I have a feeling it might be that security patch that MS made available. Hopefully they will have a fix for the fix soon. :bonk:
Jeff.
Digital Parasite
05-06-2003, 07:18 AM
I just finished doing a full memtest86 and a Prime95 torture test and both passed with flying colours so it is not my RAM that is bad (which I was pretty sure since I had just done that a month ago).
My guess is the new patch that MS recently released.
Jeff.
Digital Parasite
05-06-2003, 09:16 AM
Interesting... one of my foldtrajlite.com clients just crashed. It was installed as a service. Nothing in error.log. I was able to capture this message from my VS.NET debugger when it crashed.
service.cfg :
service=1
useram=1
progress=1
progress.txt :
Building structure 31 generation 193
19 until next generation
1 generations buffered
Best Energy so far: 7.152
filelist.txt :
.\fold_0_XXXX_40_XXXX_protein_192.log.bz2
.\XXXX_0_XXXX_protein_192_0000041.val
fold_0_XXXX_9_XXXX_protein_193.log.bz2
XXXX_0_XXXX_protein_193_0000010.val
CurrentStruc 0 31 126 193 1 10 7.152 -891.114 1010.339 -91.804 7071383.500 1.650 3.100 2339.397 -HHHHHHH----------HHHHH------------HHHHH---------HHHH---HHHH-------------------------HHHH-------
51bc41daef8ca892be58f1b839019672
We have ruled out the RAM being a problem. I have no idea what this crash is, first time I have ever seen it.
Jeff.
Here is the assembler dump if you can read it (the bold line is the one where it crashed on):
0044E870 push ebx
0044E871 push esi
0044E872 mov esi,dword ptr [esp+0Ch]
0044E876 test esi,esi
0044E878 push edi
0044E879 je 0044E9B8
0044E87F mov ebx,dword ptr [esp+14h]
0044E883 movsx edi,bx
0044E886 lea ecx,[esi+4]
0044E889 mov dword ptr ds:[7214C8h],ecx
0044E88F fld dword ptr [ecx+edi*4]
0044E892 fcomp dword ptr [edi*4+7214B8h]
0044E899 fnstsw ax
0044E89B test ah,41h
0044E89E jne 0044E8BD
0044E8A0 mov eax,dword ptr [esi+edi*8+10h]
0044E8A4 test eax,eax
0044E8A6 mov dword ptr ds:[007214E4h],eax
0044E8AB je 0044E8BD
0044E8AD push ebx
0044E8AE push eax
0044E8AF call 0044E870
0044E8B4 mov ecx,dword ptr ds:[7214C8h]
0044E8BA add esp,8
0044E8BD cmp bx,2
0044E8C1 jge 0044E8D5
0044E8C3 lea eax,[ebx+1]
0044E8C6 push eax
0044E8C7 push esi
0044E8C8 call 0044E870
0044E8CD add esp,8
0044E8D0 jmp 0044E98F
0044E8D5 fld dword ptr [ecx]
0044E8D7 fcomp dword ptr ds:[7214D8h]
0044E8DD fnstsw ax
0044E8DF test ah,1
0044E8E2 je 0044E995
Brian the Fist
05-06-2003, 02:09 PM
Originally posted by bwkaz
And note also that this isn't as bad as it had been last time I complained (right at the end of beta 7). There, pmap was showing a writable/private value of near 200MB, with (again) only 80 or so of it in physical RAM (according to the swap usage reported by free, with almost nothing else running -- and definitely not X).
Edit: Hang on... why does that chunk have execute permission? That doesn't make any sense... I wonder if this is a bug in the system libraries, not DF... Hmm... [/B]
Sorry dude but its all greek to me, as they say. I don't know pmap or the file format above though I can vaguely guess what some of the columns are, and I have no idea what writable/private is. Since I do know top, if you want me to fix this please make the problem occur again and send me the output from top. Also please post the exact flags that are being used in your foldit script.
Brian the Fist
05-06-2003, 02:13 PM
Originally posted by Digital Parasite
Interesting... one of my foldtrajlite.com clients just crashed. It was installed as a service. Nothing in error.log. I was able to capture this message from my VS.NET debugger when it crashed.
Alas, without any symbols it is fairly hopeless. You seem to be the only one out of the 50 or so testers that is having this trouble though so Im still not sure whats going on. I do not suspect it has to do with any Microsoft patches as Im sure other people keep their computers up to date too..
I could try sending you a debug version but it still might not give you the symbols. I could alternatively give you an ErrorLogPrintf riddled version of the code to track where the code is at all times but this is generally painful and a last resort.
Digital Parasite
05-06-2003, 03:10 PM
Originally posted by Brian the Fist
Alas, without any symbols it is fairly hopeless. You seem to be the only one out of the 50 or so testers that is having this trouble though so Im still not sure whats going on. I do not suspect it has to do with any Microsoft patches as Im sure other people keep their computers up to date too..
I could try sending you a debug version but it still might not give you the symbols. I could alternatively give you an ErrorLogPrintf riddled version of the code to track where the code is at all times but this is generally painful and a last resort.
The actual crash was the first time it had happened to me but if you want to send me a debug version I will run that in case the debugger will print out the symbols so you can see where it is happened.
Jeff.
tpdooley
05-06-2003, 03:45 PM
Originally posted by Brian the Fist
You seem to be the only one out of the 50 or so testers that is having this trouble though so Im still not sure whats going on. I do not suspect it has to do with any Microsoft patches as Im sure other people keep their computers up to date too..
The problem with my axp1800+ system running the beta clients - that ended up corrupting itself - happened after I'd loaded one of the latest critical updates from MS for Win98. (first time I'd loaded a critical update since starting the beta clients with beta 1). So there's a possibility that the critical update process is/has causing problems.
bwkaz
05-06-2003, 05:56 PM
Originally posted by Brian the Fist
Sorry dude but its all greek to me, as they say. I don't know pmap or the file format above though I can vaguely guess what some of the columns are, and I have no idea what writable/private is. Since I do know top, if you want me to fix this please make the problem occur again and send me the output from top. Also please post the exact flags that are being used in your foldit script. OK, will do if I see it again (and I just rebooted the whole system today, so it probably won't happen until sometime near the end of next week). The flags, for the moment, are just "-rt -g 5".
Thanks! :)
statsman
05-06-2003, 10:02 PM
Initially only updated every 12 hours, but here they are:
http://www.statsman.org/distfoldbetastats
or
http://www.statsman.org/distfoldbetastats/html
Enjoy!
Paratima
05-06-2003, 10:24 PM
Thanks, STATSMAN! Great work as always! :notworthy
Grumpy
05-07-2003, 02:11 AM
I am with the Microsoft stuft something camp..there AMD updates tend to bugger things up big time, last time I had to unuinstall to get a couple of my PCs to run agin :( I have not loaded patches for 2 months and will not in the future unless it is deserately needed. If you have an AMD system, don't get the AMD CPU Updates..they are very nasty .
:swear:
Digital Parasite
05-07-2003, 08:20 AM
For testing, I stopped my DF clients on the machine that has been random rebooting. Just as I expected, the machine still reboots so we know it isn't the DF client.
I found this interesting article about the MS patch I have been talking about:
http://support.microsoft.com/?kbid=819634
It especially acts up if you have anti-virus software (who doesn't these days). I'm going to try uninstalling it and see if it makes a difference since it does seem to be messing people's systems up to a certain extent.
Jeff.
Brian the Fist
05-07-2003, 12:30 PM
Ok, so is it basically safe for me to ignore all the 'rebooting'/crashing problems mentioned in this thread, and we shall attribute them to Micro$oft? No one has had any trouble on LINUX other than the alledged memory leak??
Mahavishnu
05-07-2003, 03:47 PM
Im using the Beta 8 for three days, more than 77.000 strutures calculated so far and everything is working very fast and stable.
My system:
Pentium 4 @ 2GHz w/ 512KB L2
512MB RDRAM PC800 -45
HD Ultra ATA100 7200RPM
nVidia GeForce4 Ti 4600
Windows XP Service Pack 1 with all updates & latest drivers
F@H screensaver with "125MB Extra RAM option" enabled
Thank you so much!
Mahavishnu
From Brazil
Brian the Fist
05-07-2003, 05:02 PM
Please note we have had some discussions and decided to proceed with one more change to the algorithm so I'll be releasing this 'beta' tomorrow hopefully. At the same time we can see if Ive managed to fix all the problems relating to overwriting the old version with the new one. As a bonus, I've also added a benchmark option as requested by pointwood. More on this tomorrow...
bwkaz
05-07-2003, 06:07 PM
Oooh, nifty!
*anxiously waits*
Side note: No, I haven't had any reboot issues (other than the recent self-inflicted one; gotta add hardware eventually ;)) on Linux.
Originally posted by Brian the Fist
The BDRemove errors in your error.log indicate with 99.9999% certainty that your machine has faulty RAM. Please run www.memtest86.com or a similar program to verify this.
Sorry for the late response... Yes indeed it was a memory error... My memory was over volted and very unstable...
Thanks,
Slo...
"Please note we have had some discussions and decided to proceed with one more change to the algorithm so I'll be releasing this 'beta' tomorrow hopefully. At the same time we can see if Ive managed to fix all the problems relating to overwriting the old version with the new one. As a bonus, I've also added a benchmark option as requested by pointwood. More on this tomorrow..."
----------------------------------------
Will automatic update work to upgrade to this new version of the beta? I remember that this function was not working earlier...
Ned
Brian the Fist
05-08-2003, 10:35 AM
No auto-updates are not available for betas and will not be. Sorry. Of course you don't have to beta test every beta either, the choice is up to you! If you have more than one machine, consider running the beta on one and the 'normal' client on the rest perhaps. Anyways, we are almost done with the testing phase here..
Chaser
05-08-2003, 01:05 PM
sounds good :) :cheers:
pointwood
05-09-2003, 02:48 AM
Originally posted by Brian the Fist
As a bonus, I've also added a benchmark option as requested by pointwood. More on this tomorrow... :notworthy :thumbs:
Thanks a lot Howard!
tpdooley
05-09-2003, 05:34 AM
We're getting a little closer to the 4.51A best score for a Beta.. :)
Hopefully the next beta will push us further down the slope.. ;)
Powered by vBulletin® Version 4.2.4 Copyright © 2025 vBulletin Solutions, Inc. All rights reserved.