Results 1 to 13 of 13

Thread: Server problems addressed and grace period extension

  1. #1

    Server problems addressed and grace period extension

    The source of the problem is simply an overwhelming number of user connections. Nothing was broken or malfunctioning, but both our web servers and the database were running at the maximum number of connections.

    This may be due to large numbers of buffered results, the faster speed of crunching through the new smaller protein, or more than likely a combination of both. Looking through the logs it seems that the connections are starting to level out to a nice balance, so hopefully most of the people have flushed their results, or at least enough have gone through to not constantly overload our resources.

    In the meantime, if you cannot connect, please wait a bit and try again later.

    As for the grace period for old results, it is extended so that the 24 and 48 hour rules will apply anew starting at 12:00 pm EST today (18/02/2004). Try not to flush enormous amounts of data at once, as that prevents other users from obtaining any connections. If at all possible, upload at some evenly balanced intervals.

    We will be keeping an eye on the resources, but feel free to let us know if any new errors start popping up.
    Elena Garderman

  2. #2
    Stats God in Training Darkness Productions's Avatar
    Join Date
    Dec 2001
    Location
    The land of dp!
    Posts
    4,164
    Do you all need more/bigger hardware for either of these, or is it just an OS configuration issue? I can't imagine how many simultaneous connections you all have, but I don't think it would cause a properly configured server setup to cave like that....

    Just a thought.

  3. #3
    If ever this was doubted, the current situation cries out for a configurable upload. Not every gen but every ten or every fifty even for a protein this small.

  4. #4
    cant stop buying hardware... rofn's Avatar
    Join Date
    Nov 2003
    Location
    Vienna, Austria, Europe ;)
    Posts
    136
    i cry, too

    for my 3rd begging for a personal proxy like dnet is using

    or like halojones says configurable upload # of gens

  5. #5
    Ol' retired IT geezer
    Join Date
    Feb 2003
    Location
    Scarborough
    Posts
    92

    Lightbulb Multi-Generation Input

    The source of the problem is simply an overwhelming number of user connections
    Sounds like you have reached the limit of your current design. You either
    need faster hardware or a more efficient design. If your client gathered
    the results from several succesive generations into one data record to
    send to the server, you have the opportunity to insert their data into
    the DB as one "logical unit of work" with a significant reduction in
    resources used compared with your current menthod. If you need to change
    your design to do that, so be it. You are currently rewriting the data
    upload anyway...

    Yes it will take you little longer to process a connection. But you have
    a lot less connections....


    Ned

  6. #6
    With an upload every few minutes, the latencies in making the connection, uploading and breaking the connection rapidly add up. I can't babysit my farm. It needs to be able to upload on a time period which would not then change between proteins.

    This latest outage started at around 8am GMT and nothing is available at present. This cannot be user connection related.


    EDIT:

    I had a directory with 43 gens of the old protein in it. I wrote an upload batch file to keep trying "-u t" until all the files were gone. It took more than two hours to upload them. With 25 P4s running non-stop I got 36,336 points (not gens, points) uploaded in the last two hours!



    Last edited by HaloJones; 03-19-2004 at 11:32 AM.

  7. #7
    The fault lays with members of the community, for not bothering to plan a more consistent upload.
    Certainly one takes account of the worst case, but it doesn't make sense for them to expend resources to deal with the fact that the servers get slammed once a month because some people horde results.

    It's extremely difficult to design capacity to handle that sort of scenario. If users would make it a policy to execute a predictable and consistent upload of results, they could maximize efficient use of resources on their end.

    In this case we shouldn't be irritated with them, but with ourselves.

  8. #8
    I have not cached anything although I agree others may have. I am deeply unhappy when I see some who have managed to get 2 million points in a two-hour window when I have achieved 100000 but have millions that can't get through. That's not from caching - these are points from the new protein that can't get through.

  9. #9
    Senior Member
    Join Date
    Oct 2003
    Location
    an Island off the coast of somewhere
    Posts
    540
    Originally posted by allenfinch
    The fault lays with members of the community, for not bothering to plan a more consistent upload.
    Certainly one takes account of the worst case, but it doesn't make sense for them to expend resources to deal with the fact that the servers get slammed once a month because some people horde results.

    It's extremely difficult to design capacity to handle that sort of scenario. If users would make it a policy to execute a predictable and consistent upload of results, they could maximize efficient use of resources on their end.

    In this case we shouldn't be irritated with them, but with ourselves.
    The community doesn't necessarily choose to buffer every time a protein change occurs.

    The clients start buffering on their own when the server is taken down for hours for switching over to a new protein. Those buffered results are trying to upload at the same time the servers are trying to download new client or protein packages.

    There are some in the community who do choose to buffer during the protein change - either before getting the update, or after - simply to keep their production rate up instead of wasting cycles waiting for the server to respond or time out.

    Those in the community who fold off-line have not much choice but to try to upload all their cached work within the 24 hour period immediately following the update, or upload early and have the machines sit idle while waiting for the update, and perhaps have to visit the offline farm twice - once to harvest or upload, then to install the update.

    The project by it's design forces these situations.

    willy1





    0-6 12-9 11-3 11-3 0-8 1

  10. #10
    The Cruncher From Hell
    Join Date
    Dec 2001
    Location
    The Depths of Hell
    Posts
    140
    If you crunch offline before a protein switch, whether for a week or the whole protein run, you aren't benefitting the project. If you haven't uploaded all that you can before the switch, but instead choose to buffer until after, you really should find a different project.

  11. #11
    Member
    Join Date
    Oct 2002
    Location
    southeastern North Carolina
    Posts
    66
    i'm on dial up.. I upload every morning, twice every evening- ..

    AFTER downloading the new software, within a day I had hundreds of generations waiting on each of 3 machines..
    some of them I babysat for hours-- connected..
    essentially NOTHING was getting thru....
    " All that's necessary for the forces of evil to win in the world is for enough good men to do nothing."-
    Edmund Burke

    " Crunch Away! But, play nice .."

    --RagingSteveK's mom


  12. #12
    We should be able to handle the load, even with everyone uploading at once. I am starting to think it may be wiser to optimize the database and queries made to it - reduce the total number of connections happening. Unfortunately I am not a database whiz by any means. And this would not be a minor change, so Im just not sure.
    Howard Feldman

  13. #13
    Senior Member
    Join Date
    Jul 2002
    Location
    Kodiak, Alaska
    Posts
    432
    Things have changed dramatically since our upload once every 1 or two days..

    Is it possible to send all the files for a generation together (zipped or packed with an algorithm that has a really low processing hit for decompressing) if that will help in addition to the optimization of the database access? i.e. grab it all at once (reduce handshaking by 3) - and verify it all at once - instead of doing it once per file.
    www.thegenomecollective.com
    Borging.. it's not just an addiction. It's...

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •