PDA

View Full Version : FreeBSD clients stopping randomly



erk
08-17-2003, 05:05 AM
This is the output from the error.log, at the time there was no Internet outage at my end and I did not send a signal 11 to the process, it did it all by itself on generation 93, this has happened twice in the last 24 hours to two different machines. I would have thought the client would continue on regardless of connectivity issues, as I presume all it was trying to do was upload a completed generation file.

ERROR: [010.003] {taskapi.c, line 1218} [ReadServerResponse] Timeout waiting for response, got 0 chars.
ERROR: [000.000] {foldtrajlite2.c, line 4616} Error during upload: NO RESPONSE FROM SERVER - WILL TRY AGAIN LATER
ERROR: [010.003] {taskapi.c, line 1218} [ReadServerResponse] Timeout waiting for response, got 0 chars.
ERROR: [000.000] {foldtrajlite2.c, line 4616} Error during upload: NO RESPONSE FROM SERVER - WILL TRY AGAIN LATER
ERROR: [010.003] {taskapi.c, line 1218} [ReadServerResponse] Timeout waiting for response, got 0 chars.
ERROR: [000.000] {foldtrajlite2.c, line 4616} Error during upload: NO RESPONSE FROM SERVER - WILL TRY AGAIN LATER
ERROR: [010.003] {taskapi.c, line 1218} [ReadServerResponse] Timeout waiting for response, got 0 chars.
ERROR: [000.000] {foldtrajlite2.c, line 4616} Error during upload: NO RESPONSE FROM SERVER - WILL TRY AGAIN LATER
ERROR: [777.000] {ncbi_socket.c, line 1258} [SOCK::s_Connect] Failed pending connect to www.distributedfolding.org:80 (Timeout) {errno=36,Operation now in progress}
ERROR: [777.000] {ncbi_connutil.c, line 801} [URL_Connect] Socket connect to www.distributedfolding.org:80 failed: Timeout
ERROR: [777.000] {ncbi_socket.c, line 1258} [SOCK::s_Connect] Failed pending connect to www.distributedfolding.org:80 (Timeout) {errno=36,Operation now in progress}
ERROR: [777.000] {ncbi_connutil.c, line 801} [URL_Connect] Socket connect to www.distributedfolding.org:80 failed: Timeout
ERROR: [777.000] {ncbi_socket.c, line 1258} [SOCK::s_Connect] Failed pending connect to www.distributedfolding.org:80 (Timeout) {errno=36,Operation now in progress}
ERROR: [777.000] {ncbi_connutil.c, line 801} [URL_Connect] Socket connect to www.distributedfolding.org:80 failed: Timeout
ERROR: [777.000] {ncbi_http_connector.c, line 101} [HTTP] Too many failed attempts, giving up
ERROR: [777.000] {ncbi_socket.c, line 1258} [SOCK::s_Connect] Failed pending connect to www.distributedfolding.org:80 (Unknown) {errno=36,Operation now in progress}
ERROR: [777.000] {ncbi_connutil.c, line 801} [URL_Connect] Socket connect to www.distributedfolding.org:80 failed: Unknown
ERROR: [777.000] {ncbi_socket.c, line 1258} [SOCK::s_Connect] Failed pending connect to www.distributedfolding.org:80 (Unknown) {errno=36,Operation now in progress}
ERROR: [777.000] {ncbi_connutil.c, line 801} [URL_Connect] Socket connect to www.distributedfolding.org:80 failed: Unknown
ERROR: [777.000] {ncbi_socket.c, line 1258} [SOCK::s_Connect] Failed pending connect to www.distributedfolding.org:80 (Unknown) {errno=36,Operation now in progress}
ERROR: [777.000] {ncbi_connutil.c, line 801} [URL_Connect] Socket connect to www.distributedfolding.org:80 failed: Unknown
ERROR: [777.000] {ncbi_http_connector.c, line 101} [HTTP] Too many failed attempts, giving up
ERROR: [777.000] {ncbi_socket.c, line 1258} [SOCK::s_Connect] Failed pending connect to bioinfo.mshri.on.ca:80 (Unknown) {errno=36,Operation now in progress}
ERROR: [777.000] {ncbi_connutil.c, line 801} [URL_Connect] Socket connect to bioinfo.mshri.on.ca:80 failed: Unknown
ERROR: [777.000] {ncbi_socket.c, line 1258} [SOCK::s_Connect] Failed pending connect to bioinfo.mshri.on.ca:80 (Unknown) {errno=36,Operation now in progress}
ERROR: [777.000] {ncbi_connutil.c, line 801} [URL_Connect] Socket connect to bioinfo.mshri.on.ca:80 failed: Unknown
ERROR: [001.001] {foldtrajlite2.c, line 2018} Caught sig 11


Make that 4 machines in the last 24 hours.

Brian the Fist
08-17-2003, 12:58 PM
sig 11 is a program crash. Ill try running on our BSD machine for a while and see what happens. Can you try running it with quiet mode turned off, and tell us what you see on the screen at the moment of the crash - what step is it at? Thanks.

erk
08-17-2003, 05:05 PM
Originally posted by Brian the Fist
sig 11 is a program crash. Ill try running on our BSD machine for a while and see what happens. Can you try running it with quiet mode turned off, and tell us what you see on the screen at the moment of the crash - what step is it at? Thanks.

I will try, but the machines do not have screens and I don't know which one will do it next.

Brian the Fist
08-18-2003, 10:31 AM
Ive run it on our FreeBSD box and its up to gen 14 with no problems yet... I cant fix it unless I can reproduce the problem.

erk
08-18-2003, 05:09 PM
Originally posted by Brian the Fist
Ive run it on our FreeBSD box and its up to gen 14 with no problems yet... I cant fix it unless I can reproduce the problem.

I had 4 more boxes do it overnight all FreeBSD 4.8-RELEASE, at a guess it looks like they can't contact the DNS for a moment when they want to upload though I might be wrong, it's just the first line in the error.log says a failed gethostbyname() call.

I did try putting a bogus nameserver in /etc/resolv.conf then stopping and starting the client, this created a similar probmem and the cleint quit. This has only started happening since the updated client a few days ago, so somthing in the code must have changed. I do recall a similar gethostbyname() bug being reported back in Novemeber, something to do with glibc.


========================[ Aug 17, 2003 7:38 PM ]========================
ERROR: [777.000] {ncbi_socket.c, line 1173} [SOCK::s_Connect] Failed SOCK_gethostbyname(www.distributedfolding.org)
ERROR: [777.000] {ncbi_connutil.c, line 801} Socket connect to [url]www.distributedfolding.org:80 failed: Unknown
ERROR: [777.000] {ncbi_socket.c, line 1258} [SOCK::s_Connect] Failed pending connect to www.distributedfolding.org:80 (Unknown) {errno=36,Operation now in progress}
ERROR: [777.000] {ncbi_connutil.c, line 801} Socket connect to [url]www.distributedfolding.org:80 failed: Unknown
ERROR: [777.000] {ncbi_socket.c, line 1258} [SOCK::s_Connect] Failed pending connect to www.distributedfolding.org:80 (Unknown) {errno=36,Operation now in progress}
ERROR: [777.000] {ncbi_connutil.c, line 801} Socket connect to [url]www.distributedfolding.org:80 failed: Unknown
ERROR: [777.000] {ncbi_socket.c, line 1258} [SOCK::s_Connect] Failed pending connect to www.distributedfolding.org:80 (Unknown) {errno=36,Operation now in progress}
ERROR: [777.000] {ncbi_connutil.c, line 801} Socket connect to [url]www.distributedfolding.org:80 failed: Unknown
ERROR: [777.000] {ncbi_http_connector.c, line 101} [HTTP] Too many failed attempts, giving up
ERROR: [777.000] {ncbi_socket.c, line 1258} [SOCK::s_Connect] Failed pending connect to bioinfo.mshri.on.ca:80 (Unknown) {errno=36,Operation now in progress}
ERROR: [777.000] {ncbi_connutil.c, line 801} [URL_Connect] Socket connect to bioinfo.mshri.on.ca:80 failed: Unknown
ERROR: [001.001] {foldtrajlite2.c, line 2018} Caught sig 11

Brian the Fist
08-19-2003, 10:59 AM
Is it possible for you to grant me telnet/ssh access to one of your boxes? If so I could figure out what is wrong for certain, as I dont think thats it. You can send an e-mail to trades@mshri.on.ca to discuss further.