PDA

View Full Version : The Server appears to be down.



reader50
01-30-2003, 02:44 AM
I cannot load anything, and cannot ping the server.

http://www.distributedfolding.org

Looks like it went down around 9 PM Pacific.

tpdooley
01-30-2003, 06:13 AM
from my cached units, and the fact that Dryy's stats were last updated at 5pm PST, I'd say it was probably shortly after 5pm that it went down.

And now we have to wait until morning to find out what happened.. *sniff*

Insidious
01-30-2003, 06:23 AM
Hmmmm,

waiting until morning doesn't seem to have helped.... still down :(

Scoofy12
01-30-2003, 08:25 AM
i dont think its just distributedfolding.org either. i can get here and dyy's stats, but not there, or slashdot or wired.com... can get to cnn and aol but not zdnet.com. sometheing fishy is going on here...

AMD_is_logical
01-30-2003, 09:37 AM
Originally posted by Scoofy12
i dont think its just distributedfolding.org either. i can get here and dyy's stats, but not there, or slashdot or wired.com... can get to cnn and aol but not zdnet.com. sometheing fishy is going on here... I just tried, and I can get to all those places with no problem. But I can't get to the DF servers, or even to the DF home page.

IronBits
01-30-2003, 10:02 AM
Howard hasn't posted this morning, there must be more to it than just a server down... more like DOS or SQL slamming, routers hosed... sad, but I'm sure he will mend things quickly as he can, until then, think of the spike(s) you'll get on the STATS :D

C:\>tracert www.distributedfolding.org

Tracing route to distributedfolding.org [206.248.62.8]
over a maximum of 30 hops:
10 20 ms 30 ms 20 ms P2-0.a0.sntc.dc.broadwing.net [216.140.3.9]
11 20 ms 30 ms 20 ms P2-1.a0.hywr.broadwing.net [216.140.3.6]
12 20 ms 30 ms 20 ms P3-3.c0.hywr.broadwing.net [216.140.2.1]
13 90 ms 100 ms 100 ms P2-0.c0.wash.broadwing.net [216.140.16.2]
14 100 ms 100 ms 100 ms p2-0.a0.nwak.broadwing.net [216.140.8.194]
15 110 ms 110 ms 120 ms 65.89.249.106
16 110 ms 120 ms 110 ms 142.46.4.2
17 110 ms 120 ms 120 ms tor-tierb.onet.on.ca [130.185.4.158]
18 * * * Request timed out.

AMD_is_logical
01-30-2003, 12:11 PM
The servers seem to be back up. But what caused the downtime?

Dyyryath
01-30-2003, 12:38 PM
There still appears to be something pretty wonky with the stats. Unfortunately, I don't have time right now to try to dig into it. Hopefully Howard will get everything straightened out soon...he usually does. ;)

Brian the Fist
01-30-2003, 12:55 PM
There was an electrical fire in the hospital last night (not caused by us :) ), forcing a full power shut down. All servers should be operational again now and looks OK to me. If you notice any problems like what you mentioned about the stats, please give me a detailed report and I'll look into it. Anyways, everyone's clients should have happily buffered work and uploaded it now that the server is back on so hopefully this didnt cause any real problems for anyone.

Welnic
01-30-2003, 01:03 PM
The normal team stats at df seem to have just now returned to normal.

PinHead
01-30-2003, 01:28 PM
May want to check the server clock.

The stats page seems to be an hour ahead.
That may be why Dyyryath's stats are negative.

Brian the Fist
01-30-2003, 05:22 PM
You were correct, the clocks jumped ahead an hour (Daylight savings time maybe??) Thanks for pointing that out, it should be correct now that it may have messed up some external stats computations...

Dyyryath
01-30-2003, 05:39 PM
Didn't hurt mine. That negative number thing was a wholly different issue. Things are great now, thanks Howard! :thumbs:

Alpha_7
01-30-2003, 07:04 PM
Howard just a quick question, is the period where the old protien still gets credit been extended ? Just I was trying to dump all my no net boxes, and it co-incided with outtage, I was hoping stucts haven't been wasted ???

Thanks,
Alpha_7

reader50
01-30-2003, 07:36 PM
My stats are fine too, they had no trouble from the break.

It appears that the server reset totats to pre-break levels at least six times during the first few hours, each time jumping back to the correct running totals on the next update. This didn't break my stats either, but produced some weird rate graphs. It happened recently enough that it might still be going on.

Example (click for larger version)
http://home.earthlink.net/~reader50/forum_posts/dfold_206_st.png (http://home.earthlink.net/~reader50/forum_posts/dfold_206_Graph_stBig.png)
note, the graph was generated from totaling up member accounts, not from the listed total on the stats pages. Though we do use that total in other places.

reader50
01-31-2003, 02:18 AM
Update: it's still happening; I've updated the sample graphs. The server is periodically showing member stats totals from just before the break. Always corrected on the next update.

Dyyryath
01-31-2003, 09:10 AM
Yup, I'm seeing it, too. Very strange...

Brian the Fist
01-31-2003, 09:18 AM
Its not THAT strange - remember there's really more than one server, you get a random one with each request. Assuming your using the teampages.tar.gz thingy, you're only making one request but it could come from a different physical machine each time. I'll go through the machines and see if this file is 'stale' on any of them..

[EDIT]

Yep, CRON had died for some reason on one of them after they were rebooted. Not sure why so if it happens again just let me know otherwise I'll assume it was a fluke.

Dyyryath
01-31-2003, 09:27 AM
Yup, makes sense...