PDA

View Full Version : 6 of 6 Linux boxes crash on new protein



PinHead
10-16-2002, 11:19 PM
2 Linux Mandrake 9 boxes get the sig11 error and display a crash notice.

2 Linux Mandrake 8.2 boxes just exit the client.

2 Linux Mandrake 9 boxes just exit the client.

From what I can tell, it seems to occur when trying to upload completed work.

All 6 boxes performed without a single crash or unexpected program exit during the previous protein.

There have been no changes to the boxes except for todays download.

Any thoughts?

MAD-ness
10-17-2002, 02:31 AM
- It has come to our attention that there is a relatively serious (but easily fixed) bug in the UNIX versions of the latest release (Windows is unaffected). If your /tmp (or TEMP) directory is NOT on the same hard drive partition as where you are running the client from, the client will crash when it tries to write out a log file (after 5000 strucs or when you hit 'Q' to exit) (It will say 'the program has crashed', etc. etc.). To correct for now, you can change your TMPDIR environment variable or location of your client so they are on the same partition. A fixed version with just the executable for the more common OSes will be posted as soon as they all get built. (For the less common OSes it will be fixed too, but you'll have to download the full package, that's all). Sorry for any inconvenience, but you will soon agree it is well worth the trouble as uploads of structure data to the server will now be about 25 times smaller/faster!

That should help with the sig11 crashes (and maybe the others, I don't know).

bwkaz
10-17-2002, 09:45 AM
I was going to post "Well, 0 of 2 Linux boxes here are having those problems" -- but it seems that that's because on both of them, /tmp is on the same partition as the client is installed. So that explains that one...

Out of curiosity, does anyone know why this might cause sig11's?

Brian the Fist
10-17-2002, 10:15 AM
Well, if anyone wants gory details on the bug, I was trying to use the 'rename' function to move a file from /tmp to the folding directory, without checking for success. Apparently rename does NOT work across filesystems in UNIX (but in Windows I guess it does work across drives). So it never moves the file and then crashes when it tries to do stuff with the non-existent file of course. Anyways, easily fixed by not using rename..

vsemaska
10-17-2002, 10:20 AM
Howard,

Thanks for the gory details. I was curious myself. Inquiring minds want to know. :D

Vic

Darkness Productions
10-17-2002, 02:43 PM
Howard - would it not be possible just to move the file instead of rename it across filesystems? Seems that would be easier/make more sense.


Originally posted by Brian the Fist
Well, if anyone wants gory details on the bug, I was trying to use the 'rename' function to move a file from /tmp to the folding directory, without checking for success. Apparently rename does NOT work across filesystems in UNIX (but in Windows I guess it does work across drives). So it never moves the file and then crashes when it tries to do stuff with the non-existent file of course. Anyways, easily fixed by not using rename..

Brian the Fist
10-17-2002, 03:06 PM
The problem has already been fixed so quit beating a dead horse and get on with your lives :D

Starfish
10-17-2002, 03:52 PM
Which lives? :confused: :D


Howard you're doing a great job! Cheers :cheers:

bwkaz
10-17-2002, 05:09 PM
Oh, right. rename basically creates a hardlink to the file in the new location, then deletes the link in the old location (at least, that's what I gather from reading its manpage). Hard links don't work across filesystems, so yeah.

Life? What is this "life" you speak of? :D

DP -- there is no such thing as a "move" syscall. The only syscall that can move files is rename. You could shell out and use the mv command, but that might be a bit much work. You could also just copy the file(s) using stdio, then delete the old ones (this is, I guess, the workaround that's being used), that would work fine.

Protoss
02-28-2003, 01:32 PM
I just downloaded the new Linux client (gcc) and after about 10000 structures i get this error message: ERROR: [001.001] {foldtrajlite.c, line 1126} Caught sig 11

What's the problem?

cu

Brian the Fist
02-28-2003, 05:32 PM
The program has crashed. It may just be a fluke. If it happens repeatedly, check if your system is overclocked, or test your RAM with www.memtest86.com

If the problem still persists even after this let us know.

Protoss
03-01-2003, 06:14 AM
The problem was the overclocked system! (although the windows client worked fine)

Thank you for the quick answer.

cu