Results 1 to 6 of 6

Thread: server software upgraded

  1. #1

    server software upgraded

    All,

    We just upgraded the project server software to address the following issues:

    1 - the old version was dying quite a lot for unknown reasons related to its use of the MySQL client libraries... we're not exactly sure what was broken, but it seems to be fixed now.

    2 - it's now smarter about trying to prevent the point-reduction bug.

    Also the server is now 100% Java-based, a precursor to the upcoming v3 client (which, believe it or not, we haven't forgotten about, even though it's been a while since it's been talked about)...

    Post here if anyone notices any problems with the new server version.

    Note this upgrade does not affect the website.

  2. #2

    possibly related problem?

    Hi there
    I've been a member of this project for about 1,5 year (username:Maddog) and I had no problems whatsoever so far, having completed about 80 tests in 3-4 different spec PCs.
    Today though I had my first prob and I can't help wondering whether the server upgrade had something to do with it.
    Specifically, I have the pending tests page showing a test (ID: 327094) for 19249x2^6117746+1 ,that was assigned on May 11, as never reported and 0% completed. The problem is this test was finished and submitted normally today, I got no obvious errors on my logs as shown below:

    [Tue May 11 16:41:34 2004] got proth test from server (k=19249, n=6117746)
    **blahblahblah with only the occassional "server busy" errors of the old server**
    [Sun May 16 03:12:05 2004] n.high = 3713524 . 360 blocks left in test
    **one week gap here since I was away on a trip**
    [Sun May 23 19:43:00 2004] got k and n from cache
    [Sun May 23 19:43:00 2004] AMD Athlon(tm) XP 3000+ detected. Enabling cpu specific optimizations.
    [Sun May 23 19:43:01 2004] restarting proth test from cache (k=19249, n=6117746) [60.7%]
    **more blahblahblah without any "server busy" errors anymore after migrating to the new server**
    [Wed May 26 10:23:40 2004] could not resolve hostname -- block added to submit queue
    [Wed May 26 10:30:37 2004] residue: 519181D3850C7EA3
    [Wed May 26 10:30:37 2004] completed proth test(k=19249, n=6117746): result 3
    [Wed May 26 10:30:37 2004] connecting to server
    [Wed May 26 10:30:37 2004] couldn't report to server [can't connect], retry in 300 secs [error: 0]
    [Wed May 26 10:35:37 2004] connecting to server
    [Wed May 26 10:35:37 2004] couldn't report to server [can't connect], retry in 300 secs [error: 0]
    [Wed May 26 10:40:37 2004] connecting to server
    [Wed May 26 10:40:37 2004] couldn't report to server [can't connect], retry in 300 secs [error: 0]
    [Wed May 26 10:45:37 2004] connecting to server
    [Wed May 26 10:45:37 2004] couldn't report to server [can't connect], retry in 300 secs [error: 0]
    [Wed May 26 10:50:37 2004] connecting to server
    [Wed May 26 10:50:37 2004] couldn't report to server [can't connect], retry in 300 secs [error: 0]
    [Wed May 26 10:55:37 2004] connecting to server
    [Wed May 26 10:55:37 2004] couldn't report to server [can't connect], retry in 300 secs [error: 0]
    **PC was offline when it completed the test, but I connected it as soon as I got to it half an hour later**
    [Wed May 26 11:00:38 2004] connecting to server
    [Wed May 26 11:00:39 2004] logging into server
    [Wed May 26 11:00:46 2004] requesting a block
    [Wed May 26 11:00:48 2004] got proth test from server (k=24737, n=6224887)
    [Wed May 26 11:00:48 2004] AMD Athlon(tm) XP 3000+ detected. Enabling cpu specific optimizations.
    [Wed May 26 11:08:07 2004] resolving hostname
    [Wed May 26 11:08:11 2004] opening connection
    [Wed May 26 11:08:22 2004] logging into server
    [Wed May 26 11:08:25 2004] login successful
    [Wed May 26 11:08:32 2004] n.high = 6451 . 964 blocks left in test

    As far as I can tell the test's progress was correctly being reported this morning at about 98% (I habitually check the pending tests page) but this evening it shows 0% and the test still pending, as if I got assigned this test but never did it at all. Still, it shows one test completed in my personal stats for today. I 'd hate to see one large test as this getting lost for no apparent reason, so if anyone would like to investigate this strange subject, I 'd be most grateful. Thanks!

    Maddog

  3. #3
    The server does have a record of you finishing that test. But it also, for some reason, has a record of that same test being assigned to another one of your machines. The finished test was supposedly assigned to 62.103.251.65... the one that's 0% complete to 62.103.251.91 five minutes later.

    So obviously something strange happened, because the server would never assign the same test twice in five minutes, especially not to the same user. It got duplicated somewhere, and I'm not sure where. Do you happen to have a log file for that other machine that might shed some light on what's going on?

  4. #4
    Ok, I have to confess... I have no idea how or why the test got duplicated. But I think it's probably not a coincidence that the assignment times for the two tests were almost exactly five minutes apart on May 11 (with five minutes just happening to be the retry time on your client)... even still, I couldn't put together a scenario in my head that would have caused the duplication.

    I checked the database from the old machine (before the switchover on the 18th) and the test was duplicated there, too. So at least it's not a bug in the new Java server (at least, not one that didn't exist in the old server too)...

    I closed the 'extra' test in the database... if anything like this happens again please let us know!

  5. #5
    OK, I suppose this will remain somewhat of a mystery forever...
    There's nothing in the log of my machines that seems able to explain the double assignment, but I have made a scenario, which, although *highly* unlikely is the most possible for me with the available info:
    These 62.103.xxx.xxx IPs are both from otenet.gr, my ISP (dialup). It *seems* possible to me that the client requested the test for the first time and THEN the phone line was dropped BEFORE the answer of the server came through to my machine (not completely unlikely with a slow line). I *might* have reconnected it almost immediately (getting a different but similar dynamic IP) and so the client asked once more for a test and since the time among the two connections was minimal, I got the same test. Of course, for this to be valid, I have to assume the server waits for some kind of acknowledgement from the client that the test is successfully assigned before it proceeds to the next available test in the database (which in this case never came because the line was dropped). If this is not the case, I am completely baffled myself...

    Anyway, this seems to be a rare thing, as I said already I have run SoB flawlessly for about 1,5 year. The main reason I posted was to avoid letting a completed test linger in the database to be reassigned. This has been fixed so all's well that ends well
    Thanks!

  6. #6
    That's as good a scenario as I was able to come with too, but I don't think it fits. For one thing, the server's log (separate from the data tables) only shows it making one assignment... the first one. The second one, while in the data tables, is not in the server log.

    Also, the server should never assign the same test five minutes apart. It marks the k/n pair as having been assigned immediately after it decides which k/n to assign.

    Code:
    client.Query("INSERT INTO proth_tests (assignTime, ipAddress, "
       "version, machine, userID, teamID, k, n) VALUES ("
       "?, ?, ?, ?, ?, NULLIF(?, 0), ?, ?)", params);
     
    /* snipped code that builds the parameter list for the next query */
    
    client.Query("UPDATE proth_numbers SET status = IF(status = 'Tested', "
       "'Tested', 'Pending') WHERE k = ? AND n = ?", params);
    And finally, the second test was marked "Late" by the server. That's only supposed to happen if a test expires, and then a client comes back and reports progress on it. In that case the server reopens the test, marks it Late, and it again becomes subject to the usual expiration rules.

    Definitely weird...

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •