PDA

View Full Version : FDCPS Severe Outrages thread



AMDave
11-11-2009, 10:13 AM
**** CONNECTION DETAILS IN THIS THREAD ARE NOT CURRENT ****
*** PLEASE REFER TO THE "How to get started thread" > here < (http://www.free-dc.org/forum/showthread.php?18954-How-to-get-started) ****


COMPLETED - Planned outage 14-15 nov

There will be a planned outage of the FDCPS project server at the end of this week. 14-15 NOV.

The purpose of the outage is to relocate the project to a new server.

Preparations have been made to make this as simple for the participants as possible.

If your llrnet client config file is addressing the server as 'primesearch.free-dc.org' then you should not need to change anything.

In order to minimise any errors we have shortened the knpairs file on IB-7773.
The pairs that have been removed are ready for your clients on the new server.
IB-7773 will be run dry (ETA 13 NOV), allowing your clients to empty their cache and the server to capture the last of the results from that port.

At that point your client will become idle.

Just leave your client running and when the DNS change is complete, it will resume normally.

Once the project DNS change is implemented your client will begin picking up new work from the new server.

The project DNS change may take 24-48 hours depending on the rate of refresh of the DNS cache of your ISP.

Exceptions:
1) If the client config is set up to address the server by the IP address, you should change it to the project server name instead 'primesearch.free-dc.org'.
(Refer to the "How to get started (http://www.free-dc.org/forum/showthread.php?t=18954)" thread)
2) If you are running the client under Unix/Linux and have added the project server to your hosts file you will need to update your hosts file with the new IP address.
3) If you run a local DNS server / cache you may wish to force a DNS update


Any changes to this release will be posted here.

Thank you for your patience during this change.

AMDave

AMDave
11-12-2009, 05:27 AM
The transition occurred a bit earlier than planned.
The DNS cascade was much faster than expected.

Examining the transfer rates per participants per hour it looks like all clients carried across without a hitch.

The stats export from the new server yesterday did not cantain some data from the old server leading up to the transition. This resulted in a 'negative' stats update for some participants. However, the new server has now been updated with the data from the old server and the stats export has made up the difference, returning the stats to the normal and correct values.

Because of unexpected the speed of the DNS cascade, some pairs were left unprocessed on IB-7773. Those pairs have been made available on PCZ-7774 for clean-up.

Cheers to PCZ, Bok and Beyond for the help in the transition.

That pretty much concludes the transition.

Thank you for your patience.


A very special "Thank You" to Ironbits for the incentive, inspiration, ideas and infrastructure that got FDCPS to where it is.
He's a Champion! :thumbs:

Digital Parasite
11-12-2009, 12:27 PM
Yup, everything seems to switch over pretty well, except that the new server still isn't handing out many primes... :cry:

Shish
11-13-2009, 08:19 AM
I'm only running an x58 with 8 instances but I keep a small cache so nothing was even noticed. Congrats to the squad for a succesful, not too? stressful move. All your work is appreciated, especially IB for getting me into it, even if he is on sabbatical or retired ;). Can't stay on holiday for ever bud, come back soon cos I actually miss you.....:thumbs::D:blush:
My lowly thanks to AMDave, and the usual crew of Bok,Pcz and Beyond for your tireless work on behalf of Free-DC and DC in general. Dunno where you guys find the time and energy cos I'm retired and I can't find any of either :eek:

AMDave
03-20-2011, 09:06 AM
/ed- Thread title updated to reflect new status. :thumbs: See thread posts below -ed/

The FDCPS Project is currently offline.
The outage was planned to relocate to an alternate host.
However the new server is experiencing an unplanned connectivity issue that may take a day or two more to be resolved.
Further information will be posted when available.

AMDave
03-23-2011, 08:18 AM
Update - Progress is being made
The broadband connection has been restored.
Middle-ware upgrades are in progress and dynamic testing is in progress.
I'll update next when access will be restored.

AMDave
03-24-2011, 10:06 AM
lots happening.
migrated the site from 32 bit arch to 64 bit arch to take full advantage of 8GB ram
moved to a different and slightly faster OS distro
upgraded all middleware to current stable versions and latest patches
that forced more than a few code changes to the web site and server scripts
re-factored a lot of code while going through the spaghetti
retired ChartDirector and introduced ajax flot charts
most of the bugs are knocked out
located a port 80 block in the WAN that "is not supposed to exist" and worked around it
just a couple more issues (2, I think) to work through before we can resume this crunch-fest
after that, tuning and miscellaneous bug fixes will resume in the background
back soon ...

AMDave
03-31-2011, 08:48 AM
LAN and MAN testing completed.
WU's are being completed and processed correctly.
The server seems pretty stable over the last 6 days.
WAN testing commencing.

AMDave
03-31-2011, 09:51 AM
DNS changes completed and verified.

In the words of the great Duke Nukem, "Come get some!"

FDCPS is open for business.

AMDave
04-03-2011, 01:22 AM
The old address cannot be forwarded to an A-name address and I cannot get a fixed IP address in a short period of time.

So, for the foreseeable future the FDCPS project will be on http://fdcps.no-ip.org/

You will need to change the hostname in your llr-clientconfig.txt file to "fdcps.no-ip.org" to get pairs from the ports.

Please post here if you have any issues.

AMDave
06-17-2011, 09:42 PM
The FDCPS server is running but a planned network change did not go to completely to plan yesterday so it is not available yet.
The old DSL modem couldn't handle the higher rate any more and failed to hold sync.
The new router is working fine but it won't update the dynamic DNS automatically.
I should have the work-around completed today.
But fast?
The connection updgrade is completed and the connection speed is much higher (d/l 5x, u/l 1.2x) and the available usage ceiling is much higher too (4x)
So well worth a few troubles of adding a DNS work-around.

AMDave
08-17-2011, 04:48 AM
There was a local unplanned power outage this morning that lasted for about 5 hours.
When the server came back up the clock was wrong and it performed the next roll-over before it was scheduled to.
I fixed the clock and re-tagged the extra set of files so they won't get overwritten when tonight's roll over occurs on schedule.
Happily the only person inconvenienced was me ;)

AMDave
09-16-2011, 08:23 AM
In around 16 hours from now, there will be a planned FDCPS outrage of about 1 hour to complete a number of software upgrades and hardware maintenance.

AMDave
10-30-2011, 02:57 AM
Updated version of the FDCPS stats site (http://fdcps.no-ip.org/stats/index.php) has been deployed

CSS3 changes completed
HTML5 changes completed (except for one single anchor I have left for later)
Page filtering fixed
Page sorting fixed
a bucket load of syntax errors have been tossed out in the gutter
Some admin security & page changes completed
Added the server status page so we can see ports that are current but not active
Added additional blocks to the drive progress chart
Tweaked the pie charts on the dashboard

Tested in:
Safari, Chrome, Chromium, Firefox, Epiphany, Arora, Midori, Seamonkey Opera, IE9 and elinks

If you spot any bugs, please call a pest controller. You really shouldn't let them breed in your house ;)

AMDave
10-30-2011, 06:37 AM
unplanned outage happening right now
maybe 1 hour
depends on this spectacular lightning storm

AMDave
10-30-2011, 10:22 AM
This extreme weather outage is over.

That is to say ... This extreme weather outage is now over someone else's house. :P

Beyond
11-01-2011, 06:43 PM
It has been a while, okay a long time, but easing back into it.

Thanks Dave for keeping everything up and running.

AMDave
11-02-2011, 06:14 AM
My pleasure.
No really.
I got 3 primes early on.
None since then, but that's ok because they were good ones :)
Great to see your post.

AMDave
11-13-2011, 01:36 AM
unplanned power outage took the server down for about 5 minutes.
back up and running. tests ok.

AMDave
11-21-2011, 03:36 PM
There will be a planned outage for several hours tomorrow while electrical work is carried out.

AMDave
11-22-2011, 10:56 PM
Planned outage concluded without incident.
Updates and patches applied before restart.
All good. :thumbs:

AMDave
11-30-2011, 03:11 AM
There will be a planned FDCPS outage for further mains electrical work to be carried out.
Planned start is 12 hours from now.
Planned end is 17 hours from now.

AMDave
11-30-2011, 10:34 PM
FDCPS planned outage completed on schedule.

AMDave
05-23-2012, 10:34 AM
Full DR plan execution and tests for FDCPS completed successfully at 2012-05-20.
Because DR procedures are not 'good' unless you keep verifying that they work. :idea:

The FDCPS server PSU fan bearings made some bad noise while at max RPMs during a heat spike about a week ago for a few hours, so it was time to check on a few things.
I made a few updates to the rebuild document during the DR test, but there is one more item outstanding, so I'll commit that back into the project tomorrow.
The project server runs well in a KVM cluster so I'm planning to migrate it to a KVM guest back on the same hardware soon-ish to get all the benefits that brings.
I also verified that the implementation works with Nginx . (very snappy btw)
I'll add that config to the doco for the commit tomorrow.

It went so well I am still looking for the "Gotcha!" :confused:

AMDave
06-01-2012, 11:28 PM
The FDCPS site and ports may be unavailable for a few minutes in 3 hours time [16:30pm AEST and 06:30 UTC] while I migrate it to another machine to allow the server maintenance and OS upgrade.

AMDave
06-03-2012, 05:20 AM
...It went so well I am still looking for the "Gotcha!" :confused:
FOUND IT!

That went badly.
The network bridge to the VM version of the server failed.
The version to version upgrade of the OS on the server box failed because the server network configuration failed between 10.10 and 11.10 so it lost all network connectivity and failed to boot, meaning that the next OS version upgrade was a dead-stick.

I am now doing a clean install on the server box: ubuntu-12.04-alternate-amd64

Having tested the DR plans I have the tried and tested recovery path so it's just a matter of time.
Once the OS is running I'll have to rebuild the databases, web and mail servers etc.
Not confident of getting it all done tonight, but it's on the way.

In spite of copious testing, sometimes things that can go wrong still do go wrong.

Damn you to the infernal depths of hell, Murphy!

AMDave
06-03-2012, 07:54 AM
back on line.
time for some r&r

AMDave
06-11-2012, 02:20 AM
DR process re-tested and updated.
More steps added. Added appendices for choice of Apache / Nginx config
Site now running under Nginx.
Still running on bare metal.
LXC / KVM version planned.

AMDave
09-29-2012, 06:55 PM
FDCPS server - planned outage 1 hour
FROM 09:00 AEST
TO 10:00 AEST
For OS upgrade and security updates.

AMDave
09-29-2012, 07:43 PM
Planned outage completed successfully.

AMDave
10-08-2012, 06:02 AM
FDCPS outage:
ISP maintenance activity is scheduled affecting local, national and international access.
It appears to have started several hours early so the window is now much longer.
Expect problems accessing the server for the next 9-10 hours.

AMDave
12-18-2012, 02:06 AM
unplanned outage happening right now
maybe 4 hours
large lightning storms - http://info.energex.com.au/lightningtracker/extern_7765.gif

AMDave
12-18-2012, 06:40 AM
FDCPS is back online.
Threw in a new case fan as one of them was an ex-parrot (http://www.davidpbrown.co.uk/jokes/monty-python-parrot.html).

AMDave
10-29-2013, 03:47 AM
5 hour unplanned outage due to electrical storm activity
server is now back to normal operations

AMDave
11-02-2013, 02:11 AM
The FDCPS server will be intermittently unavailable for up to the next 3 hours during planned maintenance.

AMDave
11-02-2013, 07:24 AM
planned outage is complete.
it took a little bit longer to tidy up some config items.
The server is a little bit quicker at everything now.

AMDave
11-23-2013, 06:23 AM
FDCPS server is down due to an unplanned lightning strike which has upset more than just a few electrons.
Tomorrow I'll migrate the services to another machine, then see if I can recover the hardware.
ITMT - apologies for the outage.

AMDave
11-23-2013, 11:05 PM
FDCPS server restored.
Sorry about the down time.

AMDave
02-20-2014, 10:38 PM
Unplanned power outage in progress.
Large area is in blackout. No ETF.

AMDave
07-01-2014, 08:41 AM
Unplanned outage of FDCPS project.
The server is not out.
The site is not out.
The comms are not out.
Microsoft has hijacked NO-IPs 23 free domains & DNS, with the permission of a US judge, to 'filter' out a few bad hosts and in (predictably) failing to do so appropriately has created a denial of service to pretty much everyone with a no-ip dynamic DNS.
ETF - unknown. Somewhere around the same time as someone important gets the judge out of bed to revoke the order probably.
Hosting a site from the southern hemisphere on an OS that has nothing to do with M$ does not keep you out of the M$ blast zone.
I object that a US judge deems that they can sign off that M$ can highjack an international highway to identify and stop the defective 'vehicles' that they built and sold instead of demanding they issue a recall.
This is no april fools day.

No-IP - https://www.noip.com/blog/2014/06/30/ips-formal-statement-microsoft-takedown/
SlashDot - http://yro.slashdot.org/story/14/07/01/0025220/microsoft-takes-down-no-ipcom-domains?utm_source=rss1.0moreanon&utm_medium=feed
ArsTechnica - http://arstechnica.com/security/2014/06/millions-of-dymanic-dns-users-suffer-after-microsoft-seizes-no-ip-domains/

The offender - http://blogs.technet.com/b/microsoft_blog/archive/2014/06/30/microsoft-takes-on-global-cybercrime-epidemic-in-tenth-malware-disruption.aspx

The fail:
"In the meantime, NO-IP / Vitalwerks have published their answer online:
Apparently, the Microsoft infrastructure is not able to handle the billions of queries from our customers. Millions of innocent users are experiencing outages to their services because of Microsoft’s attempt to remediate hostnames associated with a few bad actors.”

Hopefully normal service will resume shortly.

Telnaior
07-02-2014, 01:52 PM
Well that's just a hassle. Thanks Microsoft! Hopefully things get back up soon, if nothing else I kind of like watching the LLR program run.

AMDave
07-03-2014, 06:39 AM
in green or amber is my preference :)

DNS appears to be back again for the moment.

AMDave
07-22-2014, 06:20 AM
FDCPS planned outage: 24:00hrs 2014-08-07 AEST - 2 to 5 days.
WHAT: There will be a planned outage of the FDCPS server for relocation of the hardware
WHEN: from 24:00hrs 2014-08-07 AEST .
DURATION: The planned relocation window is 5 days, target is 2 days. The shortest possible turnaround on the critical path is 24 hours, depending on network build tasks, lunar and astral alignments, the lucky number 16, the market price of Longjing in Zhejiang Province and whether the roaring forties happens to be blowing at the time.

AMDave
08-19-2014, 09:23 AM
FDCPS server is back online.
It was a super-moon. There were no lucky numbers so the pool jackpotted. The price of tea in china didn't budge. The roaring forties were stuck at twenty. Even the westerlies turned up a week late.
That was 11 long and frustrating days.
But we are back in business.

AMDave
08-28-2014, 04:24 PM
FDCPS server is down due to a PSU failure. I'll either replace it or migrate the websites & databases to another machine tomorrow.
Apologies for the delay

AMDave
08-31-2014, 02:16 AM
The server is up and running on alternate hardware until the server PSU is fixed or replaced.

AMDave
11-29-2014, 07:38 PM
The cause of the stats issue in the database that occurred on 13-Nov-2014 has been located & repaired.
The stats extracts will reflect the correct scores on the next up load to Free-DC.

AMDave
07-14-2015, 05:27 AM
FDCPS unplanned outage resolved.
The FDCPS project server was off-line for the last 20 hours due to an unplanned ISP PPPoE failure.
Was getting nowhere with the ISP's technical support line so I veto-ed our ISP and removed their customer-side hardware and replaced with my "go-to" CISCO equipment.
The PPPoE connection to the ISP still fails but I was able to restore the connection via PPPoA for the time being.
My apologies for the outage.