Results 1 to 6 of 6

Thread: FreeBSD clients stopping randomly

  1. #1

    FreeBSD clients stopping randomly

    This is the output from the error.log, at the time there was no Internet outage at my end and I did not send a signal 11 to the process, it did it all by itself on generation 93, this has happened twice in the last 24 hours to two different machines. I would have thought the client would continue on regardless of connectivity issues, as I presume all it was trying to do was upload a completed generation file.

    ERROR: [010.003] {taskapi.c, line 1218} [ReadServerResponse] Timeout waiting for response, got 0 chars.
    ERROR: [000.000] {foldtrajlite2.c, line 4616} Error during upload: NO RESPONSE FROM SERVER - WILL TRY AGAIN LATER
    ERROR: [010.003] {taskapi.c, line 1218} [ReadServerResponse] Timeout waiting for response, got 0 chars.
    ERROR: [000.000] {foldtrajlite2.c, line 4616} Error during upload: NO RESPONSE FROM SERVER - WILL TRY AGAIN LATER
    ERROR: [010.003] {taskapi.c, line 1218} [ReadServerResponse] Timeout waiting for response, got 0 chars.
    ERROR: [000.000] {foldtrajlite2.c, line 4616} Error during upload: NO RESPONSE FROM SERVER - WILL TRY AGAIN LATER
    ERROR: [010.003] {taskapi.c, line 1218} [ReadServerResponse] Timeout waiting for response, got 0 chars.
    ERROR: [000.000] {foldtrajlite2.c, line 4616} Error during upload: NO RESPONSE FROM SERVER - WILL TRY AGAIN LATER
    ERROR: [777.000] {ncbi_socket.c, line 1258} [SOCK::s_Connect] Failed pending connect to www.distributedfolding.org:80 (Timeout) {errno=36,Operation now in progress}
    ERROR: [777.000] {ncbi_connutil.c, line 801} [URL_Connect] Socket connect to www.distributedfolding.org:80 failed: Timeout
    ERROR: [777.000] {ncbi_socket.c, line 1258} [SOCK::s_Connect] Failed pending connect to www.distributedfolding.org:80 (Timeout) {errno=36,Operation now in progress}
    ERROR: [777.000] {ncbi_connutil.c, line 801} [URL_Connect] Socket connect to www.distributedfolding.org:80 failed: Timeout
    ERROR: [777.000] {ncbi_socket.c, line 1258} [SOCK::s_Connect] Failed pending connect to www.distributedfolding.org:80 (Timeout) {errno=36,Operation now in progress}
    ERROR: [777.000] {ncbi_connutil.c, line 801} [URL_Connect] Socket connect to www.distributedfolding.org:80 failed: Timeout
    ERROR: [777.000] {ncbi_http_connector.c, line 101} [HTTP] Too many failed attempts, giving up
    ERROR: [777.000] {ncbi_socket.c, line 1258} [SOCK::s_Connect] Failed pending connect to www.distributedfolding.org:80 (Unknown) {errno=36,Operation now in progress}
    ERROR: [777.000] {ncbi_connutil.c, line 801} [URL_Connect] Socket connect to www.distributedfolding.org:80 failed: Unknown
    ERROR: [777.000] {ncbi_socket.c, line 1258} [SOCK::s_Connect] Failed pending connect to www.distributedfolding.org:80 (Unknown) {errno=36,Operation now in progress}
    ERROR: [777.000] {ncbi_connutil.c, line 801} [URL_Connect] Socket connect to www.distributedfolding.org:80 failed: Unknown
    ERROR: [777.000] {ncbi_socket.c, line 1258} [SOCK::s_Connect] Failed pending connect to www.distributedfolding.org:80 (Unknown) {errno=36,Operation now in progress}
    ERROR: [777.000] {ncbi_connutil.c, line 801} [URL_Connect] Socket connect to www.distributedfolding.org:80 failed: Unknown
    ERROR: [777.000] {ncbi_http_connector.c, line 101} [HTTP] Too many failed attempts, giving up
    ERROR: [777.000] {ncbi_socket.c, line 1258} [SOCK::s_Connect] Failed pending connect to bioinfo.mshri.on.ca:80 (Unknown) {errno=36,Operation now in progress}
    ERROR: [777.000] {ncbi_connutil.c, line 801} [URL_Connect] Socket connect to bioinfo.mshri.on.ca:80 failed: Unknown
    ERROR: [777.000] {ncbi_socket.c, line 1258} [SOCK::s_Connect] Failed pending connect to bioinfo.mshri.on.ca:80 (Unknown) {errno=36,Operation now in progress}
    ERROR: [777.000] {ncbi_connutil.c, line 801} [URL_Connect] Socket connect to bioinfo.mshri.on.ca:80 failed: Unknown
    ERROR: [001.001] {foldtrajlite2.c, line 2018} Caught sig 11


    Make that 4 machines in the last 24 hours.
    Last edited by erk; 08-17-2003 at 05:45 AM.

  2. #2
    sig 11 is a program crash. Ill try running on our BSD machine for a while and see what happens. Can you try running it with quiet mode turned off, and tell us what you see on the screen at the moment of the crash - what step is it at? Thanks.
    Howard Feldman

  3. #3
    Originally posted by Brian the Fist
    sig 11 is a program crash. Ill try running on our BSD machine for a while and see what happens. Can you try running it with quiet mode turned off, and tell us what you see on the screen at the moment of the crash - what step is it at? Thanks.
    I will try, but the machines do not have screens and I don't know which one will do it next.

  4. #4
    Ive run it on our FreeBSD box and its up to gen 14 with no problems yet... I cant fix it unless I can reproduce the problem.
    Howard Feldman

  5. #5
    Originally posted by Brian the Fist
    Ive run it on our FreeBSD box and its up to gen 14 with no problems yet... I cant fix it unless I can reproduce the problem.
    I had 4 more boxes do it overnight all FreeBSD 4.8-RELEASE, at a guess it looks like they can't contact the DNS for a moment when they want to upload though I might be wrong, it's just the first line in the error.log says a failed gethostbyname() call.

    I did try putting a bogus nameserver in /etc/resolv.conf then stopping and starting the client, this created a similar probmem and the cleint quit. This has only started happening since the updated client a few days ago, so somthing in the code must have changed. I do recall a similar gethostbyname() bug being reported back in Novemeber, something to do with glibc.


    ========================[ Aug 17, 2003 7:38 PM ]========================
    ERROR: [777.000] {ncbi_socket.c, line 1173} [SOCK::s_Connect] Failed SOCK_gethostbyname(www.distributedfolding.org)
    ERROR: [777.000] {ncbi_connutil.c, line 801} [URL_Connect] Socket connect to www.distributedfolding.org:80 failed: Unknown
    ERROR: [777.000] {ncbi_socket.c, line 1258} [SOCK::s_Connect] Failed pending connect to www.distributedfolding.org:80 (Unknown) {errno=36,Operation now in progress}
    ERROR: [777.000] {ncbi_connutil.c, line 801} [URL_Connect] Socket connect to www.distributedfolding.org:80 failed: Unknown
    ERROR: [777.000] {ncbi_socket.c, line 1258} [SOCK::s_Connect] Failed pending connect to www.distributedfolding.org:80 (Unknown) {errno=36,Operation now in progress}
    ERROR: [777.000] {ncbi_connutil.c, line 801} [URL_Connect] Socket connect to www.distributedfolding.org:80 failed: Unknown
    ERROR: [777.000] {ncbi_socket.c, line 1258} [SOCK::s_Connect] Failed pending connect to www.distributedfolding.org:80 (Unknown) {errno=36,Operation now in progress}
    ERROR: [777.000] {ncbi_connutil.c, line 801} [URL_Connect] Socket connect to www.distributedfolding.org:80 failed: Unknown
    ERROR: [777.000] {ncbi_http_connector.c, line 101} [HTTP] Too many failed attempts, giving up
    ERROR: [777.000] {ncbi_socket.c, line 1258} [SOCK::s_Connect] Failed pending connect to bioinfo.mshri.on.ca:80 (Unknown) {errno=36,Operation now in progress}
    ERROR: [777.000] {ncbi_connutil.c, line 801} [URL_Connect] Socket connect to bioinfo.mshri.on.ca:80 failed: Unknown
    ERROR: [001.001] {foldtrajlite2.c, line 2018} Caught sig 11

    Last edited by erk; 08-18-2003 at 05:47 PM.

  6. #6
    Is it possible for you to grant me telnet/ssh access to one of your boxes? If so I could figure out what is wrong for certain, as I dont think thats it. You can send an e-mail to trades@mshri.on.ca to discuss further.
    Howard Feldman

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •