Results 1 to 33 of 33

Thread: 910 error resolved

  1. #1

    910 error resolved

    We have resolved the issue that was resulting in 910 errors for a large number of folders. You will have to complete the current run of 250 gens if you are presently receiving a 910 error but after this you will not receive them anymore. If you have nothing buffered, you can just delete filelist.txt to force it to start over at gen. 0 without wasting time on getting to gen 250.

    It appears when we switched to FastCGI (to speed things up) it had a side effect of some peoples' tickets on rare occasion getting the same name (i.e. overwriting the ticket on the server). Thus your result would be lost, and the continuity of your folding movie would be destroyed.

    Thus although it had a few hiccups, we believe the ticketing system is now fully functional and behaving as it was originally engineered. For those of you who have been patient and stuck around while we resolve this, we thank you for your confidence in our abilities For those of you who are gone, well you're not reading this anyways I guess

    From this point on then, other than as stated above, you should never receive a 910 error again. The only exception is if you upload duplicate data, but in this case you will stop getting 910 once the data you upload is not duplicate again. E.g. if you upload get 3, 4, 5 then delete your ticket before 5 is validated, you will upload 5 again, then 6,7, etc. In this case you will get a 910 error when you upload gen 5 the second time, because it indeed cannot find the previous generation (gen 4) which has already been replaced by gen 5 as the latest gen.

    Hope everything is cleared up now. Elena and I are going to take a long vacation now and catch up on our sleep :sleepy: :sleepy: :sleepy:
    Howard Feldman

  2. #2
    OK, I'll trust you with a single client's 500+ buffered gens. Thanks for the update. This is the kind of feedback we all want to see!

  3. #3
    thats all good and well IF ITS FIXED but what about the 1000s of gens lost through what you did yesterday ? ... are you manually going to go through and give credit for them where credit is due ? ....

  4. #4
    Senior Member
    Join Date
    Oct 2003
    Location
    an Island off the coast of somewhere
    Posts
    540
    It appears when we switched to FastCGI (to speed things up) it had a side effect of some peoples' tickets on rare occasion getting the same name (i.e. overwriting the ticket on the server).
    I would hardly classify it as a rare occasion if virtually ALL of the results were rejected since midday yesterday.

    I certainly hope they will restore credit for it- otherwise I'm gone too with my 60+ GHz.

    I had almost 11 million points buffered - maybe 8000+ generations - that all went into the bit bucket with absolutely no credit.

    You cannot screw up the statistics this badly and expect people to stick around.
    Last edited by willy1; 04-29-2004 at 02:42 PM.





    0-6 12-9 11-3 11-3 0-8 1

  5. #5
    Boinc'ing away
    Join Date
    Aug 2002
    Location
    London, UK
    Posts
    982
    Thanks for the info...it appears to be accepting gens now without problem...

    Shame this protein has had a bad start with several teams moving elsewhere - but hopefully all is sorted now



    /edit - uploading is quick as wel l- must've just gotten rid of 100 gens without errors and scoring points in 90 seconds...

  6. #6
    I think it highly unlikely that any lost points can be re-awarded. Unfortunate, some would say tragic, but let's all just pretend that we only care about the science and that the points aren't really important. FOLD ON!

  7. #7
    Senior Member
    Join Date
    Oct 2003
    Location
    an Island off the coast of somewhere
    Posts
    540
    :bs: :bs: :bs: :bs: :bs: :bs: :bs: :bs: :bs: :bs: :bs: :bs: :bs: :bs: :bs: :bs:





    0-6 12-9 11-3 11-3 0-8 1

  8. #8
    Originally posted by HaloJones
    I think it highly unlikely that any lost points can be re-awarded. Unfortunate, some would say tragic, but let's all just pretend that we only care about the science and that the points aren't really important. FOLD ON!
    Building up credit in heaven as it were.

  9. #9
    OK, I've got a client that was generating 910s and had 200 or so buffered. The client is uploading and stopped doing 910s when it hit the new set of gens!

  10. #10
    That's great news, now we can finally rip this proteine to shreds

    * cheers for the DF crew *
    Dutch Power Cows, The Stampede that's heading for the top.

  11. #11
    Senior Member
    Join Date
    Oct 2003
    Location
    an Island off the coast of somewhere
    Posts
    540
    Can we get a straight answer about getting credit for the last 24 hour's downloads (a week's worth of crunching!) that got invalid 910 errors?





    0-6 12-9 11-3 11-3 0-8 1

  12. #12
    Senior Member
    Join Date
    Jun 2003
    Location
    Windsor, England
    Posts
    950
    we get nothing, zero, zip, nada, 0, less than one.

    well, maybe.

  13. #13
    Ancient Programmer Paratima's Avatar
    Join Date
    Dec 2001
    Location
    West Central Florida
    Posts
    3,296
    I would wait for an official word on that. Howard has done it in the past... when there was enough data to reconstruct from. That might not be the case this time.
    HOME: A physical construct for keeping rain off your computers.

  14. #14
    Senior Member
    Join Date
    Jun 2003
    Location
    Windsor, England
    Posts
    950
    well if you read what Howard said in the start of this thread ->>

    Thus your result would be lost, and the continuity of your folding movie would be destroyed.

    I kind of thought he was letting us down lightly.

    the words that did it for me are

    1-lost
    and
    2-destroyed

  15. #15
    OCworkbench Stats Ho
    Join Date
    Jan 2003
    Posts
    519
    They are all casualities of the war against disease
    I am not a Stats Ho, it is just more satisfying to see that my numbers are better than yours.

  16. #16
    Ancient Programmer Paratima's Avatar
    Join Date
    Dec 2001
    Location
    West Central Florida
    Posts
    3,296
    "Lost" "Destroyed" Oh! Ooooops.
    HOME: A physical construct for keeping rain off your computers.

  17. #17

    well?

    Yeah but do people agree that the 910 problem is fixed?
    So far three of my machines are looking o.k.
    OCAU

  18. #18
    Boinc'ing away
    Join Date
    Aug 2002
    Location
    London, UK
    Posts
    982
    had a checksum error which a purgeuploadlist sorted out - but been uploading fine since Howard posted (and dumped around 400k points in ~10 minutes)...

    seems fine here

  19. #19
    I'd like to thank the few people who actually tried to post helpful messages so we could figure out what the problem was. Unfortunately your posts were lost amongst the dozens of other, somewhat less constructive posts. What really gave us the clue was the 908's mixed in occasionally with 910 errors (which I observed in my own client error log this morning). As a few of you noted, this meant the server was obviously mixing up the tickets somehow. That made it clear to us that the ticket names were not always unique anymore after the change to the servers yesterday - that is why sometimes you would be OK but then once in a while you'd get a non-unique ticket and your work would then be borked for that set.

    We have ensured that the tickets are unique again which has remedied the situation. Sometimes you need a good night's sleep before the solution becomes clear.

    As for the matter of the uploaded data that received 910's, no, it did not have scientific value as mentioned above. We do however have sufficient logging that we can compute who uploaded what and how much so we will go through the logs and add the appropriate points to peoples' totals (unless there are objections to this...). This will happen some time before the end of this protein run. We'll post a message when it has been done, probably in about 2 weeks time.

    I do not understand why people have the impression that we said the recent problems are the folders fault, we said no such thing that I am aware of. We have devoted most of the past 3 months developing this new ticketing system. The main purpose was to reduce the load on the servers, improve uploading efficiency, and eliminate the evil 910 errors. These were not to advance the science, they were strictly to improve the performance of the program for YOU, the users. Granted, it didnt work as we intended immediately. The slowness of the ticket processing simply didn't show up in beta testing, and once it was clear we needed to speed it up, we implemented a fix in less than 2 days which greatly sped it up.

    Perhaps we should have tested this 'fix' a bit more before releasing it, but the problem of tickets overwriting would have only shown up with many users all uploading at once, so again it is something we could not have anticipated even though, now that we know what the error was, it was probably something we should have thought of But hindsight is 20-20 as they say.

    Anyhow, whats done is done, and Elena's 3 months of hard work will hopefully now pay off as you enjoy smooth, fast, 910-free uploading from now on. Our servers are much happier now too, now that we have stopped thrashing them

    Happy folding
    Howard Feldman

  20. #20
    Target Butt IronBits's Avatar
    Join Date
    Dec 2001
    Location
    Morrisville, NC
    Posts
    8,619
    Thank you Howard and Elena, and anyone else there that lent a hand to get this back on track!
    By far this was the hardest change over to date.
    I'm sure we all learned a little something from this one.


  21. #21
    Senior Member
    Join Date
    Jun 2003
    Location
    Windsor, England
    Posts
    950
    thanks for you work and input. About time you went home.

    see ya tomorrow


  22. #22
    Boinc'ing away
    Join Date
    Aug 2002
    Location
    London, UK
    Posts
    982
    Thanks for the feedback - hopefully everything can now proceed without further problems on this protein...



    from the stats it seems uploads are definitely getting back to normal

  23. #23
    7G - OCW iggy's Avatar
    Join Date
    Aug 2003
    Location
    London, UK
    Posts
    156
    Thanks for the explanation, hard work and fixes - I hope all should work as intended.

    But...
    The only exception is if you upload duplicate data, but in this case you will stop getting 910 once the data you upload is not duplicate again.
    I thought ticketing system would prevent this from happening ever again...

  24. #24
    Boinc'ing away
    Join Date
    Aug 2002
    Location
    London, UK
    Posts
    982
    I thought you could upload duplicate if you deleted the receipt.txt - hence the advice not to...

  25. #25
    Boinc'ing away
    Join Date
    Aug 2002
    Location
    London, UK
    Posts
    982
    posted too soon

    starting to get 'Failed to query status for ticket [ticket id]' appearing in logs...I assume this is just becuase the servers are now getting under load and nothing major to worry about?

    /edit - never mind - seems to be a hiccup somewhere - all bar 4 clients have now uploaded and error-free...nice to see the system working as it says it should maybe now I'll finish dfMon, dfTag and the stats
    Last edited by pfb; 04-29-2004 at 06:24 PM.

  26. #26
    Senior Member
    Join Date
    Apr 2004
    Location
    Netherlands
    Posts
    109
    Originally posted by pfb
    posted too soon

    starting to get 'Failed to query status for ticket [ticket id]' appearing in logs...I assume this is just becuase the servers are now getting under load and nothing major to worry about?
    Happened to me to. I investigated, and discovered the receipt.txt was een empty file, wich was obviously the problem. But at the next gen it uploaded, so solved !

  27. #27
    Boinc'ing away
    Join Date
    Aug 2002
    Location
    London, UK
    Posts
    982
    yep - noticed that just after I posted...minor heart attack avoided (esp after the issues of the past week )

  28. #28
    Yes, it just means it can't read the status of your ticket (I think the error msg gave it away )
    This could be caused by lot's of things, server being hogged by other users, a crappy firewall (heh I got this just a few hours ago :P) or your connection that's having a little R&R.
    But it's nothing to worry about as it will try to retrieve the status the next time it will try to upload. No data will be lost, as it won't upload untill it get a OK msg from the server

    @howard, sleep should be treasured now go and get some
    Dutch Power Cows, The Stampede that's heading for the top.

  29. #29
    Boinc'ing away
    Join Date
    Aug 2002
    Location
    London, UK
    Posts
    982
    Originally posted by Hagar
    Yes, it just means it can't read the status of your ticket (I think the error msg gave it away )
    This could be caused by lot's of things, server being hogged by other users, a crappy firewall (heh I got this just a few hours ago :P) or your connection that's having a little R&R.
    But it's nothing to worry about as it will try to retrieve the status the next time it will try to upload. No data will be lost, as it won't upload untill it get a OK msg from the server

    @howard, sleep should be treasured now go and get some
    that's what I thought - but the initial thought was 'not another problem'

  30. #30
    Boinc'ing away
    Join Date
    Aug 2002
    Location
    London, UK
    Posts
    982
    Howard, quick question - will this be posted under the news section on the Blueprint site? I know some of my team look there and not here - maybe useful to post the first post there...?

  31. #31
    Originally posted by pfb
    Howard, quick question - will this be posted under the news section on the Blueprint site? I know some of my team look there and not here - maybe useful to post the first post there...?
    Yep, thats coming up.

    To clarify on duplicate data, it is OK to upload duplicate data (if you delete your receipt.txt for example) but you will not get points, and you will register a 910 error in your log. This does NOT mean your whole data set is borked. It simply means the duplicate data was not appended to your growing movie, as it did not match up with the expected next generation for your movie file. You will get exactly one 910 error for each duplicate generation uploaded, no more and no less. So in this case you should just ignore the 910 error. Unless you didnt know you were uploading duplicate data - in this case you may wish to try to figure out why you are uploading duplicate data (e.g. your best friend burned all your buffered sets to a CD while you were and uploaded them in his name).
    Howard Feldman

  32. #32
    Senior Member
    Join Date
    Feb 2003
    Location
    wigan, uk
    Posts
    200
    great to know we'll be getting our points back too (if no-one objects), and makes the problems of the last week (for me at least) seem less severe.

    Oh and cheers for keeping us updated too

  33. #33
    OCworkbench Stats Ho
    Join Date
    Jan 2003
    Posts
    519
    Have 6 Clients going smoother than butter on a glass chopping board. I think this is a good sign
    I am not a Stats Ho, it is just more satisfying to see that my numbers are better than yours.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •