Results 1 to 36 of 36

Thread: Please Check the Server

  1. #1
    Senior Member
    Join Date
    May 2002
    Location
    New Jersey USA
    Posts
    115

    Please Check the Server

    The server seems to allow 1 or 2 data sets to upload then times out. It has been in this mode for about the past 4 hours. Other people have noted the problem in the 909 error thread but possibly no one noticed.
    Thanks.

  2. #2
    I can confirm the problems. I cant get to upload the data. Al lot of errors :
    You seem to be experiencing network problems, or may be using an invalid handle.

  3. #3
    I see no problem with the server other than it is handling a rather large load right now, probably due to a bunch of people trying to upload at once. If everyone stops trying to upload results and just lets it run normally it should be fine.
    Howard Feldman

  4. #4
    I understand the higher than normal load, some teams have been or are in the process of doing (or trying to do) quite massive flushes right now.
    The very great problems the server is having handling this right now raises the question of scalability/ until how many users does this project scale at the moment and will the announced "new backend" greatly enhance scalability? Can you say if bandwidth, the server or something else is the bottleneck right now?
    Thanks,
    your efforts are much appreciated
    Jerome

  5. #5
    25/25Mbit is nearly enough :p pointwood's Avatar
    Join Date
    Dec 2001
    Location
    Denmark
    Posts
    831
    Pointwood
    Jabber ID: pointwood@jabber.shd.dk
    irc.arstechnica.com, #distributed

  6. #6

    Re: jlandgr

    Originally posted by pointwood
    http://distributedfolding.org/soon.html

    Let's hope it arrives soon Then doing flushes will hopefully result in some fun and not tedious babysitting like right now
    And of course a larger userbase under "normal" circumstances when no flushing is going on
    But if all the large userbase starts flushing... Nevermind ...
    Jerome

  7. #7
    Might want to ask that people not engage in deliberate megaflushing. It's one thing if you have nonet clients that can't flush regularly, but if deliberate megaflushing is going to cause these problems with the server, then it shouldn't be done.

  8. #8
    I agree, that we had assumed the server and the bandwidth had a bit more head-room than apparently exists at the moment. We will have to see how the new backend changes this. Until that is installed, yes, it probably is better not to do any more mega-flushes. The current ones from different teams (we are not alone ) are underway and can't be stopped , but tomorrow / in the next hours all will hopefully get back to normal.
    We had set the date for the flush to today and not tomorrow to lessen the load on the server, tomorrow being changeover time for everybody.
    I will certainly discuss the situation with my team mates,
    Jerome

  9. #9
    25/25Mbit is nearly enough :p pointwood's Avatar
    Join Date
    Dec 2001
    Location
    Denmark
    Posts
    831
    So, how much can we expect?
    Pointwood
    Jabber ID: pointwood@jabber.shd.dk
    irc.arstechnica.com, #distributed

  10. #10
    Originally posted by pointwood
    So, how much can we expect?
    Ah, great surprise
    Wait and see
    Jerome

  11. #11
    DPRGI Founder
    Join Date
    Dec 2001
    Location
    Europe - Italy - Padua
    Posts
    15
    Hello guys,

    no way to upload or download from Italy too, no network activity.

    bye

    Marzio

  12. #12
    Target Butt IronBits's Avatar
    Join Date
    Dec 2001
    Location
    Morrisville, NC
    Posts
    8,619
    I don't care for megaflushes, nor for team hoppers.
    I think the points should be fixed on the team, not the user.
    It's the users computer that does the work, but once it gets uploaded, it belongs to the project, and the team, but, what do I know. Just some of the aggrevations one must endure while doing DC I suppose...

  13. #13
    25/25Mbit is nearly enough :p pointwood's Avatar
    Join Date
    Dec 2001
    Location
    Denmark
    Posts
    831
    I agree IB. the points should stay with the team. I did ask Howard about that a long time ago (more or less when the project started, IIRC), but he rejected my proposal
    Pointwood
    Jabber ID: pointwood@jabber.shd.dk
    irc.arstechnica.com, #distributed

  14. #14
    FWIW: I agree on point staying with the team. Don't know how this topic landed in this thread , but I agree nevertheless.
    A short update on the flush situation: we are getting seriously delayed by the slow network/server responses, so will probably continue flushing (or should I say trickling) well into tomorrow/later today, depending on your timezone. Don't know what the situation of the DPC and others who might be flushing (or who possibly haven't started yet) is, we'll just have to wait and see ...
    Of course, this drawn out dump is less impressive in graph spikes, as the curve gets wider. Oh well. All for the fun of working together as a team, anyway
    Jerome

  15. #15
    Originally posted by Brian the Fist
    I see no problem with the server other than it is handling a rather large load right now, probably due to a bunch of people trying to upload at once. If everyone stops trying to upload results and just lets it run normally it should be fine.
    I am one of the bunch trying to upload and now I am stuck...what am I going to do? I don't see any answer given to this problem?

  16. #16
    Senior Member
    Join Date
    Apr 2002
    Location
    Oosterhout, Netherlands
    Posts
    223
    It took me 14 hours to upload approximately 1 million structures
    And looking at statsman I think some of us overestimated the bandwith and capabilities of the dfolding server... Larger dumps all around are most likely the problem of these network problems.

    For those of you still trying to send resulsts (and who want to receive 100% credit for it) and who have to go to work, create this batchfile and start it.

    @echo off
    :start
    .\foldtrajlite -f protein -n native -u t -df
    if exist fold*.bz2 goto start

    Now you don't have to manually restart the client...
    Good luck. I still have some 300k to send. :sleepy:
    Proud member of the Dutch Power Cows

  17. #17
    Junior Member PackSwede's Avatar
    Join Date
    Jun 2002
    Location
    Umea, Sweden
    Posts
    14
    Thanks for that tip [DPC] Mobster.

    My only concern with that tip is that if all users who are trying to upload use that thing it is going to overload the servers even more

    /Thomas

  18. #18
    Member
    Join Date
    Apr 2002
    Location
    The Netherlands
    Posts
    47
    Originally posted by PackSwede
    Thanks for that tip [DPC] Mobster.

    My only concern with that tip is that if all users who are trying to upload use that thing it is going to overload the servers even more

    /Thomas
    That's my concern as well, but why not add a 'sleep' command?

    @echo off
    :start
    sleep 100
    .\foldtrajlite -f protein -n native -u t -df
    if exist fold*.bz2 goto start

  19. #19
    There is a deadline to catch and I do not have the luxury of having 24/7 net access just to upload to their so called "no problem... If everyone stops trying to upload results and just lets it run normally it should be fine" server. Anyway, thanks for the above solution.

  20. #20
    Downsized Chinasaur's Avatar
    Join Date
    Dec 2001
    Location
    WA Wine Country
    Posts
    1,847

    Post

    "mega flushing" to rack up a large number of one-time stats, no matter how cute or funny to the participants, only screws over the rest of us.

    It is in fact, a type of DDoS (though not malicious in nature) and harms the rest of us who either dump regularly or just fold.

    If it's done just for laughs, it's in poor community spirit. We all know the backend is getting upgraded, so quit whining about it or abusing it.
    Agent Smith was right!: "I hate this place. This zoo. This prison. This reality, whatever you want to call it, I can't stand it any longer. It's the smell! If there is such a thing. I feel saturated by it. I can taste your stink and every time I do, I fear that I've somehow been infected by it."

  21. #21
    DPRGI Founder
    Join Date
    Dec 2001
    Location
    Europe - Italy - Padua
    Posts
    15
    network overflow can be a big problem for DC projects....

    i hope in a quick solution....

    server down in this moment.

    bye

  22. #22
    Member
    Join Date
    Apr 2002
    Location
    The Netherlands
    Posts
    47
    Originally posted by Chinasaur
    "mega flushing" to rack up a large number of one-time stats, no matter how cute or funny to the participants, only screws over the rest of us.
    I think the problem this time is that it's not just one flush..

    One normal flush is nothing more then the output of another top3 team..

  23. #23
    Junior Member PackSwede's Avatar
    Join Date
    Jun 2002
    Location
    Umea, Sweden
    Posts
    14
    "mega flushing" to rack up a large number of one-time stats, no matter how cute or funny to the participants, only screws over the rest of us.
    I agree completely, the problem for me is that i have to run sneakernetted since many of the computers i run do not have access to the Internet

    /Thomas

  24. #24
    As Chinasaur points out, what you folks are effectively doing is a DDoS on our server, though of course we have made ourselves open to it by designing the client to buffer data. Such attacks can bring down sites as large as Yahoo or Ebay with a few hundred computers launching the attacks, so you can hardly expect out measly server to withstand 1000 or so of your machines all uploading at once. I realize your intentions were all fun and games, but please remember the impact of such activities.

    As for the bottleneck, it is the server right now, it is at 100% CPU usage and the web server has all connections maxxed out (i.e. no more allowed connections).

    The new backend (which will go online most likely on Monday, and stay online if it works ) wil be more scalable in that it will allow us to add web servers as we please, to a load balanced pool. These do the bulk of the work and require most of the CPU usage. There will still of couse just be a single database server though and so we will still be limited by how quickly it can insert data into that database. However, when we switch, we will also be storing significantly less data (the several terabytes we have collected so far has just been too much to handle) - the main difference here is that instead of keeping data and stats on everyone protein structure uploaded, we only keep it on the 'best' structure in each set of 5000 uploaded. When I say stats here, I am talking about energies and so on for our scientific evaluation (see Results page on the web site), this will in no way affect user stats, etc.

    Regardless of how scalable the system becomes though, please remember we do NOT have the sort of server resources that a project like SETI might have and will likely never be able to handle deliberate 'mega-dumps' - so to speak. So please act responsibly and let the software work as it was intended to.
    Howard Feldman

  25. #25
    So I cannot do the no net switch from now on if my machines do not have a internet connection?

  26. #26
    Originally posted by Brian the Fist
    As

    However, when we switch, we will also be storing significantly less data (the several terabytes we have collected so far has just been too much to handle) - the main difference here is that instead of keeping data and stats on everyone protein structure uploaded, we only keep it on the 'best' structure in each set of 5000 uploaded. When I say stats here, I am talking about energies and so on for our scientific evaluation (see Results page on the web site), this will in no way affect user stats, etc.
    Does this mean the result files we have to upload will be smaller? If not, why not? Maybe I've missed something, or this has been answered before, but what exactly is the use of sending a result that has no value? For example, my personal best energy is around 50.0000 at the moment. The best structure is well below that, making any work I've done useless. Why did I have to upload countless MB's of data? Wouldn't it be simpler to have the client check with the server, and if no work of value needs to be uploaded, then credit is given and the client can get on with what it needs to be doing, which is searching for a better energy?

    Ni!
    Oh, what sad times are these when passing ruffians can say Ni at will to old ladies..

  27. #27
    My comments are directed to people with large amounts of computers and to people who deliberately buffered large amounts of data to generate a 'spike' in their statistics. If you have a dial-up connection by all means buffer your data, just don't conspire to upload it at exactly the same time as the other 50 people on your team or whatever.
    Howard Feldman

  28. #28
    Grim Reaper:

    Remember pseudo-energy is just a 'best guess', if you will, at how good the structure is. We keep the best structure from everyone and analyze them further locally before deciding what to submit to CASP, so do not be fooled into thinking that the best pseudo-energy structure is truly the best. There are many other factors which play a role. The main function of the pseudo-energy is to get us a set of nice, compact, protein-like structures, a good pool to which we can then apply more powerful techniques to determine which are likely to be the best RMSD structures, closest to the true structure. So all your structures are important.

    One reason why you must always upload your results to the server is, quite frankly, a matter of trust Someone could surely come up with a way to tell our server 'hey, my client just created 1 million structures, but they all suck, now give me credit'. Without seeing those structures, our server wouldn't know any better and you'd get the credit. Thus to avoid this sort of potential 'cheating', our server must see and verify all the data that is generated. More importantly, the BAD structures still give us useful information about the DISTRIBUTION of various properties, such as pseuso-energy. This allows us to know, for example, how good is a pseudo-energy of 50 on this protein - is it extremely unlikely or very likely? And allows us to answer questions like 'how many structures would we have needed to sample to get a pseudo-energy of 40'? And so on. We have enough information on this latter question now, the distribution issue, so that's why we'll stop storing so much of the data but still require it to be uploaded and processed.
    Howard Feldman

  29. #29
    Originally posted by Brian the Fist
    Someone could surely come up with a way to tell our server 'hey, my client just created 1 million structures, but they all suck, now give me credit'.
    Well yeah, if they could keep it's attention for more than a few minutes Sorry, couldn't resist. That answers my question, unfortunately, I cannot continue with the project. Not on dial up anyway

    Ni
    Oh, what sad times are these when passing ruffians can say Ni at will to old ladies..

  30. #30
    Social Parasite
    Join Date
    Jul 2002
    Location
    Hill Country
    Posts
    94
    I have no idea what a megaflush is.

    I am just a single user; however, I am *not* on a permanent connection. Since I have to dial-up each time I want to connect to the server, my typical upload is of more than one fileset. In the last 36 hours, only __once__ did more than one fileset upload before my client timed out.

    Would it be possible to set up a server IP-address just for "megaflushes", so that
    the remaining individual users can still upload their results without having their clients time out ?

  31. #31
    We at Team Endeavor wish to apologize for our part in the current problems. We will keep all of this in mind in the future, and act accordingly.
    Jerome

  32. #32
    When I try to start the Windows DF service, I get:
    Error 1067: The process terminated unexpectedly.

    When I run foldit.bat manually I get a message
    Server down for maintenance, try later.

    So I guess that Error 1067 is the equivalent of the server-down message, and I see Error 1067 because the message is not passed up through Windows?

    Best, Mark

  33. #33
    As I've suggested before, this project should NOT send out the update email until AFTER the normal period of time that all machines will have normally done it autmatically. Especially now that you're accepting (or at least giving credit) for work units past the cut-off number.

  34. #34
    Target Butt IronBits's Avatar
    Join Date
    Dec 2001
    Location
    Morrisville, NC
    Posts
    8,619
    Originally posted by zer0ne
    So I cannot do the no net switch from now on if my machines do not have a internet connection?
    That's not what he said at all. re-read more slowly.
    What he said was, in a nutshell...
    For those that have full time network connections, STOP saving and dumping on purpose.

  35. #35
    25/25Mbit is nearly enough :p pointwood's Avatar
    Join Date
    Dec 2001
    Location
    Denmark
    Posts
    831
    Originally posted by Mikus
    I have no idea what a megaflush is.
    It 's when a team (a lot of users) decides to stop uploading for a certain amount of days and then on a given date/time, start uploading all the structures they have crunched. Depending on how long they decided to hold on to their structures, they will together have a serious amount of structures.

    In this case it looks like more than one team had decided to do this and it caused trouble.

    It is perfectly okay to do it if you don't have a permanent internet connection. It's not a problem that a single user like you do it once in a while.
    Pointwood
    Jabber ID: pointwood@jabber.shd.dk
    irc.arstechnica.com, #distributed

  36. #36
    Senior Member
    Join Date
    Jul 2002
    Location
    Kodiak, Alaska
    Posts
    432
    Uploading the data continuously, (or daily for those of us on dialup that can't keep our connection on 24/7 all the time.. allows Howard and the rest of the team to judge when the average use of the servers requires an upgrade. (which we're getting on Monday.. if everything works out.) On the other hand, large numbers of folks buffering the output of server farms, higher end systems, etc for a week and uploading at once causes the DF servers to lazily twiddle their thumbs for most of the week; and then get hit with a day of being asked to do 150% of what it's capable of handling..
    (time to nix the comical attempt at personification of the DF Servers.
    Think of those poor, tortured servers...


    On a lighter topic.. where do we go to find out the difference between this 156 residue protein and the 157 residue protein we just worked on.. and why this smaller protein is so much slower? (Or is this just a new M$ feature of Windows that causes a 1 less residue protein to run at 2/3rds the speed?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •