View Full Version : What happened to the server after midnight?
tpdooley
02-26-2003, 03:28 AM
Howard posted around midnight (EST), that he'd put the correct clients for win&linux on the main server.
Any word on why www.distributedfolding.org isn't (or wasn't.. when this is answered tomorrow) ;) accessible and the clients can't upload - yet "tracert www.distributedfolding.org" shows that the server's ip# is present?
bwkaz
02-26-2003, 09:39 AM
I have no idea, but it's a really big PITA when the client requires a valid response from the main servers before it will start (unless it's no-netted). Of course, I would have let it just go, but it seemed to have locked up at one point -- when I noticed CPU usage was at 0, I checked the VC it was running on, and saw "Could not contact server. Will try again later", which is normally when it just starts crunching again. But it stayed that way for at least 5 minutes, so I killed it off and tried to restart.
And found out that because it couldn't contact the server, it couldn't start up. :(
I seem to remember there being a reason it needed to contact the server at startup, but what was it?
Oh, traceroute seems to work, but that's because traceroute uses ICMP or UDP packets on some (fairly) random port. The client, however, uses TCP packets on port 80. Apparently someone between us and the DF servers is dropping packets on port 80 -- at least, that's what it looks like. I tried telnetting to the IP that the client was trying to talk to (according to netstat), on port 80, and telnet just hung there. No response, no "connection refused", it was just acting like packets were being dropped.
I suppose it could have been a bunch of people dumping no-net results all at once, but holding up for 9 hours?
IronBits
02-26-2003, 09:40 AM
Again, use the "-i f" switch (note the space) and it will continue to run...
AMD_is_logical
02-26-2003, 10:26 AM
Originally posted by IronBits
Again, use the "-i f" switch (note the space) and it will continue to run... I always use "-if" with NO space. :p
bwkaz
02-26-2003, 10:46 AM
Originally posted by IronBits
Again, use the "-i f" switch (note the space) and it will continue to run... Yeah, I know, that's what I'm doing now.
But I'm wondering again why it requires an ack from the server on startup... maybe I should try a forum search, duh.
Brian the Fist
02-26-2003, 10:51 AM
It appears our servers are somehow being overloaded with uploads. We will not be able to continue until this can be remedied somehow.
bwkaz
02-26-2003, 10:55 AM
I'm running both my systems nonetted for the moment, so the uploads aren't coming from me. ;)
That may be what you're going to have to end up doing: getting people to run nonet until the main server is available to handle uploads or something.
Or perhaps not. Do whatever you think best. :)
KWSN_Millennium2001Guy
02-26-2003, 11:40 AM
I have 120 machines on the sidelines now, not running anything distributed :eek:
DocWardo
02-26-2003, 11:58 AM
what will no-netting do except cause the servers to crash again when the current batch of units from no-net are returned.
What's different about this changeover that has caused everything to go to hell. I have the client running as a service on winxp/2k
If I do get it running, the only way to stop is by rebooting. after a bit the client will stop processing. I can't kill it (tried all ways) except by rebooting.
then I can't get it to restart cause it cannot connect to the servers.
HaloJones
02-26-2003, 11:59 AM
Originally posted by KWSN_Millennium2001Guy
I have 120 machines on the sidelines now, not running anything distributed :eek:
How're the withdrawal symptoms ;)
Jammy
02-26-2003, 12:42 PM
Howard . .. I took my three boxes off of df in the interim . . .I hate not being able to see my own stats.
Please email us when the servers are back up.
Jammy
Originally posted by Brian the Fist
It appears our servers are somehow being overloaded with uploads. We will not be able to continue until this can be remedied somehow.
with those clients who haven't gotten the new protein update will the 48 hour/half-score still be in place or would this be extended due to the server problems? just asking as 4 of my PCs haven't been able to connect all day (earliest I could get to them was 9am)...
AMD_is_logical
02-26-2003, 01:31 PM
I would like to request that we be given more time to submit work on the old protein for full credit.
I have everything running with "-if" now, and it doesn't look like I'll be able to update and get all my work in by 4:00 today due to the server situation.
HaloJones
02-26-2003, 02:22 PM
Seconded the extended credit period
StrategyFreakAMD
02-26-2003, 03:03 PM
People should stop dumping at the end of every protien. (except no-netters of course) Is really worth it for the stats to stall everthing? This has happened with other protiens, too. On other protiens, when it said "check back an hour later", it usually took several hours.
I second the longer credit period.
RipItUp
02-26-2003, 03:04 PM
thirded !
neevo
02-26-2003, 03:30 PM
Originally posted by RipItUp
thirded !
fourthded
HaloJones
02-26-2003, 03:59 PM
Server temporarily down due to technical difficulties. We hope to have service restored by 7pm EST.
So that's around midnight UK. My office machines are nonetting until tomorrow and I hope they will receive full credit.
KWSN_Millennium2001Guy
02-26-2003, 04:15 PM
My machines have to be doing something, the withdrawl symptoms were too much. I've begun installing the G@H .99 client on machines again. Arghhh!
Ni! :confused:
Brian the Fist
02-26-2003, 05:38 PM
Originally posted by bwkaz
I have no idea, but it's a really big PITA when the client requires a valid response from the main servers before it will start (unless it's no-netted). Of course, I would have let it just go, but it seemed to have locked up at one point -- when I noticed CPU usage was at 0, I checked the VC it was running on, and saw "Could not contact server. Will try again later", which is normally when it just starts crunching again. But it stayed that way for at least 5 minutes, so I killed it off and tried to restart.
And found out that because it couldn't contact the server, it couldn't start up. :(
I seem to remember there being a reason it needed to contact the server at startup, but what was it?
Oh, traceroute seems to work, but that's because traceroute uses ICMP or UDP packets on some (fairly) random port. The client, however, uses TCP packets on port 80. Apparently someone between us and the DF servers is dropping packets on port 80 -- at least, that's what it looks like. I tried telnetting to the IP that the client was trying to talk to (according to netstat), on port 80, and telnet just hung there. No response, no "connection refused", it was just acting like packets were being dropped.
I suppose it could have been a bunch of people dumping no-net results all at once, but holding up for 9 hours?
When it refuses to start, that is intentional. It means there is a serious problem with the server. If everyone queues up too much work, the server could get overloaded once its back online.
As for the problem, we are not sure what happened exactly yet and are still investigating, but it appears the non-beta database has been somehow corrupted and must be restored from backup (at the time of the protein changeover, yesterday afternoon). If all goes well it should be functional again shortly. Try not to do any folding though until it is active again, or we will get swamped with uploads (or at least wait a bit after it is back up before you start uploading stuff, that will help us greatly). I will extend the credit for last protein to 2 days from when everything is back up.
P.S. The beta site should be still working fine now :D so in the mean time, you can help us find more bugs in the beta client...
bwkaz
02-26-2003, 06:06 PM
Originally posted by Brian the Fist
Try not to do any folding though until it is active again, or we will get swamped with uploads (or at least wait a bit after it is back up before you start uploading stuff, that will help us greatly). If I wait until, oh, 9 or 10 pm, would that be long enough?
I've only got two machines, but no sense in pushing it.
As always, thanks! :)
Powered by vBulletin® Version 4.2.4 Copyright © 2025 vBulletin Solutions, Inc. All rights reserved.