PDA

View Full Version : Stats are down



Bok
02-16-2012, 12:17 AM
I'm getting drive errors on one of the SSD drives in the database server. Going to remote unmount it and fsck it and see if that fixes it.. :(

Not been having much luck with hardware recently.

LAURENU2
02-16-2012, 12:56 AM
Not been having much luck with hardware recently.
:umm:Do you need a sister site haha I know I'm bad :spank:

Bok
02-16-2012, 01:25 AM
Think I need a miracle :( It's not responding so it's going to be down for a while. I do have one spare SSD which I'll plug in over the weekend when I get home and see what can be done.

LAURENU2
02-16-2012, 01:49 AM
Does the server work with a Raid so there is a backup copy ?

Bok
02-16-2012, 01:54 AM
RAID slows it down too much and it's not that great with SSD's anyway.

I do my own custom replication instead. Backups are on separate drives. This is the web facing database anyway which is read only, so it's 'mostly' duplicated from the main one. Means it gets LOTS of writes though and I guess it's finally taken it's toll.

LAURENU2
02-16-2012, 02:29 AM
Have you looked into the pciE drives at all there faster and have more capacity

Jim1900
02-17-2012, 09:52 PM
Means it gets LOTS of writes though and I guess it's finally taken it's toll.
A write-caching program can cut down on writes by an order of magnitude or more. I have used both FancyCache (Romex Software) and PerfectCache 5.0 (Raxco), and they both give very good results for limiting the writes to my SSD when I run the CEP2 project on World Community Grid. Writes will typically go from 80 GB/day down to less than 1 GB/day, but that depends a lot on the statistics of what you are doing of course, and the size of the DRAM cache you set.

FancyCache is currently in Beta (free 90 day trial), and PefectCache 5.0 is $80 (30 day free trial), but well worth it if you have the right situation.

LAURENU2
02-18-2012, 01:52 PM
Think I need a miracle :( It's not responding so it's going to be down for a while. I do have one spare SSD which I'll plug in over the weekend when I get home and see what can be done.
:umm:Any Update yet :Pokes:

Bok
02-18-2012, 04:08 PM
I only got home a few hours ago after travelling overnight, afraid to look at it right now when I'm still so tired. I *think* statstool is gone unless I can recover the drive though as well as a few other things, possibly historical miestones too. It looks like the backups were failing for the last 10 days or so and I didn't get warned. I only keep last 5 days. :(

LAURENU2
02-18-2012, 06:07 PM
looks like it got your Sig to
Well if you need off line storage
I have over 20 Terabytes of open space here

Zydor
02-18-2012, 09:42 PM
Bok

When you get her going again, if you dont use Diskeeper, go to Diskeeper - Hyperfast (http://www.diskeeper.com/business/diskeeper/hyperfast/).

I have used Diskeeper for years, and its excellent massive defrag and disc i/o savings, however, with the onset of SSDs, they brought out Hyperfast, and the difference thats made to my SSDs is astonishing, not only performance, but arguably more important disc life as it prevents a huge number of disc writes/IO. As SSDs are hugely sensitive in terms of their lifespan on minimising disc writes, Hyperfast hits the button superbly.

I recenly crunched some WUs who are very heavy on disc writes - this particular WU type create literaly thousands of directories as part of its routine. During my time time crunching them, the Hyperfast pulled out an average of 600,000 saved disc io per day, compared to a WU that is not disc dependent where Hyperfast pulled only 25,000 or so disc writes per day. It works brilliantly, and frankly is a must have for Server SSDs - sprint dont walk and get it together with the main diskeeper program for the servers.

Regards
Zy

gopher_yarrowzoo
02-19-2012, 10:14 AM
hmm dang :( I guess you aren't meant to leave the house for more than a day... :(
Let me know if there is anything I can help with...

LAURENU2
02-19-2012, 12:57 PM
:idea: I bet gopher broke trying to install my long rang Radar
come on confess :whip:

gopher_yarrowzoo
02-19-2012, 01:33 PM
Hey Lauren not me....
The Long range radar is on my radar just got a few real life things to get done first my dear friend:).

Bok
02-19-2012, 03:24 PM
Bok

When you get her going again, if you dont use Diskeeper, go to Diskeeper - Hyperfast (http://www.diskeeper.com/business/diskeeper/hyperfast/).

I have used Diskeeper for years, and its excellent massive defrag and disc i/o savings, however, with the onset of SSDs, they brought out Hyperfast, and the difference thats made to my SSDs is astonishing, not only performance, but arguably more important disc life as it prevents a huge number of disc writes/IO. As SSDs are hugely sensitive in terms of their lifespan on minimising disc writes, Hyperfast hits the button superbly.

I recenly crunched some WUs who are very heavy on disc writes - this particular WU type create literaly thousands of directories as part of its routine. During my time time crunching them, the Hyperfast pulled out an average of 600,000 saved disc io per day, compared to a WU that is not disc dependent where Hyperfast pulled only 25,000 or so disc writes per day. It works brilliantly, and frankly is a must have for Server SSDs - sprint dont walk and get it together with the main diskeeper program for the servers.

Regards
Zy

It's all on linux, so products like this wouldn't be of any use. RAID 10 arrays might work, but multiple SSD's with good controllers start getting expensive. I really should have been checking the backups a bit better. I plan to pull the server apart tomorrow and see what can be done. At worst I'll plug in the extra SSD drive I have. (Have to pull apart my windows machine for that, but I'll just use my laptop in the meantime).

Bok
02-19-2012, 05:56 PM
Good news, looks like I've recovered all of the data as far as I can tell. A few tables gave me errors but they are replicated versions anyway so no big deal. Running tests which will take some time but I think it will be back up no later than noon tomorrow. I'll have to put the drives in correctly as they are just loose right now.

I'm going to look into getting 2 external drives (500Gb or would be fine) and running parallel backups to them.

gopher_yarrowzoo
02-19-2012, 06:49 PM
:) Good job man, thank goodness for that....

vaughan
02-20-2012, 06:54 AM
Hey bok, good luck with the re-build and recovery process. Take your time and get it right. Now is a good opportunity for that housekeeping on the server that always got postponed :)

ChertseyAl
02-20-2012, 01:55 PM
Hurrah! Back in business :) Just in time for me to see a milestone I've been after for *ages* :)

Not had much luck with hardware here either. Lost 2 HDs in the last few weeks, and now another machine is randomly freezing and failing to find an OS. Running out of bits to cannibalise from old machines :(

Anyway, hadn't realised how much I relied on Free-DC. Might have to make another donation soon :)

Al.

LAURENU2
02-20-2012, 02:24 PM
Hurrah! Back in business :)

Anyway, hadn't realised how much I relied on Free-DC. Might have to make another donation soon :)
Al.

I think everyone should realize how much they use Free-DC and how mush they missed Free-DC Stats when they were down.
:idea: It MIGHT Make them WANT to give a little back to Free-DC. :Pokes:
I am vary SURE Bok is not to happy about having to strip down other working PC's Just to get the Stats. UP again
..:allhail:

ChertseyAl
02-20-2012, 03:19 PM
I think everyone should realize how much they use Free-DC

Ah, just put some small change in the pot. Would be more, but I'm running out of HDs here and need to order some new ones. BOINC can seriously ruin your hardware :(

Anywoo, just pleased that the best stats site is back again. And Free-DC really is the best. By a Loooooooong way. Just sayin' like ;)

Al.

STE\/E
02-21-2012, 03:48 AM
Stats on the Main Page are totally Boked errr Borked lol this morning ... :coffee:

Bok
02-21-2012, 07:20 AM
:( Amazingly, when I woke up this morning. A 2nd SSD is exhibiting exactly the same behavior as the last one.

Its not the new one I put in, but surely this can't be a coincidence? And more likely points to a motherboard problem. I do not know what to do a this stage. I have one more SSD I could cannibalize from anoter machine, but my head tells me the machine itself is a problem now. I'll plug in the first failed SSD to another linux machine and see if it exhibits any errors at all.

gopher_yarrowzoo
02-21-2012, 08:46 AM
Yeah Bok, Have to say that is pointing at a H/ware controller error - shame you don't got an SATA PCI card ;) see if it's the controller or the whole mobo...

LAURENU2
02-21-2012, 09:15 AM
Can you boot to DOS off a flopy and run scandisk

Bok
02-21-2012, 10:17 AM
linux has it's own version - fsck which is running now, but if it's a controller that's causing it, fsck won't know.

ChertseyAl
02-21-2012, 12:36 PM
I do not know what to do a this stage

I guess it's none of the obvious things like the PSU being underrated, or going on the blink and over/under voltaging the drives?

Don't know if the RAM could be casuing stange problems. Have had a few oddities of the years that were RAM related.

If it helps, I could ship you my old Sinclair ZX81 and/or my Nascom-1 ;)

Al.

LAURENU2
02-21-2012, 01:46 PM
Bok These are SATA Drives Right
On a quad here that has a SATA SSD it would every 2 to 3 days just stop
rebooted OK with a scandisk
I found the drive controller chip and glued a old VGA chip cooler to it
and it has not crashed since
Find your drive/sata chip put your finger on in if feels hot try this
on a norm it is right next to the SATA ports

gopher_yarrowzoo
02-21-2012, 06:29 PM
Hmm I should check mine, i get a click every so often from a drive but I got a fan on 'em ...

Bok
02-21-2012, 09:29 PM
fsck was totally clean, and the stats have been running just fine for the last 8 hours or so. If they are ok still in the morning, I'll re-open the webside connection.

LAURENU2
02-21-2012, 11:48 PM
fsck was totally clean, and the stats have been running just fine for the last 8 hours or so. If they are ok still in the morning, I'll re-open the webside connection.
Did you do the finger test was it Hot or not
And what is the size if the SSD you are using

Bok
02-22-2012, 06:57 AM
It felt just fine Lauren, these are all 120Gb SSD's. Still running just fine, so I'll open it up soon.

ChertseyAl
02-25-2012, 08:27 AM
Looks like stats are down again :eek:

Al.

Bok
02-25-2012, 09:29 AM
yup. Drive not responding again. I'm going to tear down my windows machine and use that.

LAURENU2
02-26-2012, 02:19 AM
What do you think will make this go away

P.S.
Without the need to tear apart you network

gopher_yarrowzoo
02-26-2012, 12:10 PM
Hmm an PC Ice box, nice water cooling, sealed box to prevent dust ingress, with Gigabit ethernet too, with plenty of RAM and some method of minimizing writing to the drives too...

LAURENU2
02-26-2012, 04:17 PM
Would some thing like this work better that a SATA
http://www.microcenter.com/single_product_results.phtml?product_id=0371589
It has a big conch
The pciE has a larger pipe if I am not wrong
I have a new Iceberg W/cooling system I never installed
And or a pirzo system for working at -0 F

P.S.
this looks cool to a Hybrid
http://www.microcenter.com/single_product_results.phtml?product_id=0374294

Bok
02-26-2012, 06:33 PM
What do you think will make this go away

P.S.
Without the need to tear apart you network

I think it ranges from trying out a vga cooler like you mentioned on the SATA controller chip to replacing the mobo (which would mean a new CPU too).

I intend to try the first option as soon as I get a chance, given the drive came back up just fine. Otherwise I'll sacrifice my windows machine which has a very good mobo+cpu

Bok
02-26-2012, 06:34 PM
Would some thing like this work better that a SATA
http://www.microcenter.com/single_product_results.phtml?product_id=0371589
It has a big conch
The pciE has a larger pipe if I am not wrong
I have a new Iceberg W/cooling system I never installed
And or a pirzo system for working at -0 F

P.S.
this looks cool to a Hybrid
http://www.microcenter.com/single_product_results.phtml?product_id=0374294

I've not heard good things about those PCIe SSD's yet.

LAURENU2
02-26-2012, 08:20 PM
I think it ranges from trying out a vga cooler like you mentioned on the SATA controller chip to replacing the mobo (which would mean a new CPU too).

I intend to try the first option as soon as I get a chance, given the drive came back up just fine. Otherwise I'll sacrifice my windows machine which has a very good mobo+cpu

Most of my MB's have some kind of coolers on the chip and like I said adding the heat-sink solved (can't read error)
if you got a Old dead MB take the VGA cooler off and use that But use some heat sink past in the middle and a dab or ball
of silicone at the 4 corners to lock it to the MB Let it cure laying down for 6 hrs and your good to go

The silicone will not harm or short the MB

LAURENU2
02-26-2012, 10:35 PM
I've not heard good things about those PCIe SSD's yet.

OK try these
http://www.ocztechnology.com/ocz-revodrive-3-pci-express-ssd.html

http://www.guru3d.com/article/ocz-revodrive-120gb-review/7

Now I'm starting to want one haha

Bok
02-27-2012, 07:01 AM
That last review is from 2010. I didn't notice until I saw it up against an OCZ Vertex II and was curious why they didn't compare to the Vertex III instead which has similar performance at ~ 550Mb/s though is a lot cheaper.

LAURENU2
02-27-2012, 08:54 AM
Well with a run of 343 years between failures seems like a bargan

Jim1900
02-27-2012, 11:29 AM
I didn't notice until I saw it up against an OCZ Vertex II and was curious why they didn't compare to the Vertex III instead which has similar performance at ~ 550Mb/s though is a lot cheaper.
Is the Vertex 3 the one that is not responding? There was a known problem with "panic lock" with the Vertex 2 drives (Sandforce controller), and it may have carried over to some extent to the Vertex 3.
http://www.youtube.com/watch?v=S0CJ0l1BUGI

I have a Vertex 2 that has never locked up in operation, but if I change video cards on the motherboard, the drive is not recognized on the next reboot and I have to go into the BIOS to find it again. So they are definitely temperamental, but work well on the the right motherboard.

stinger608
02-27-2012, 01:34 PM
Hey Bok, I posted a message in your profile page regarding a possible solution to the Free-DC site man.

Bok
02-27-2012, 09:03 PM
Is the Vertex 3 the one that is not responding? There was a known problem with "panic lock" with the Vertex 2 drives (Sandforce controller), and it may have carried over to some extent to the Vertex 3.
http://www.youtube.com/watch?v=S0CJ0l1BUGI

I have a Vertex 2 that has never locked up in operation, but if I change video cards on the motherboard, the drive is not recognized on the next reboot and I have to go into the BIOS to find it again. So they are definitely temperamental, but work well on the the right motherboard.

Actually no, it was one of the older ones and I don't think it was a problem with the SSD itself given it comes back up just fine. Convinced it's the SATA controller. I spent some time trying to get a VGA cooler on it today but didn't have much luck. See below for further outcome.


Hey Bok, I posted a message in your profile page regarding a possible solution to the Free-DC site man.

I appreciate the offer, but this is the database server. It's the main reason I gave up on a hosted dedicated server many years ago. It *NEEDS* to be a dedicated server. Whilst raw speed doesn't matter that much it certainly helps so a decent core i7 is a must (current one is a core i7-920). Memory is also a necessity to keep the database running well. Current server has 12Gb (which was the max available at the time I built it). It does not run the webpages at all, those are on a separate server. It does all the downloads of stats files, parsing of the xml, mysql updates, ranking etc etc in a schema running on a dedicated SSD. It then replicates all of this into a separate schema running on a different SSD. And it does this a lot.

As an update for all. I tore apart my windows machine today and replaced the components of the database server with it. Was kind of hoping the OS (CentOS 6.2) would come back up without too much effort but alas it did not and I had to re-install it. Same went for the transplant back to my windows server but then I was expecting that one...

I'm typing on the windows box right now, so that went fairly smooth.

The database server is now running a Core i7-2600K with 16Gb Ram (and it will handle 32Gb). And it's running the SSD's on SATAIII fully now which is good. I have most of the packages I need installed, just a few more to go. So tomorrow if all goes well, I'll be pointing the site back here once more and fingers crossed it all works well. I have no spares of anything now which is a little disconcerting as it's the first time in years :) But I'll deal with that in the coming weeks. I've had about $100 in donations on the last week or so, that will go towards another spare SSD and then the 8Gb Ram chips I think. I did pick up a 1Tb external drive which I have hooked up for secondary backups, maybe get one more of those.

Scribe
02-28-2012, 01:11 AM
Talking of donations, how can I donate?

Bok
02-28-2012, 01:22 AM
Until I open the site up again, you can go to www.free-dc.org/paypal.php

Appreciate it.

Another update before I hit the sack. Stats are actually running right now and do appear to be quite a bit faster, mostly. One part is definitely slower and I'm pretty sure that's down to a mysql config I need to correct. Going to let them run overnight and check everything out first thing. After a few more tests then I'll open it up again.. Fingers crossed!

Scribe
02-28-2012, 01:45 AM
OK Phil, fifty bucks on its way.:thumbs:

Alan

LAURENU2
02-28-2012, 09:36 AM
OK Phil, fifty bucks on its way.:thumbs:
Alan

Come on people Give Every Month like I do For The BEST STATS In The World
Between bandwidth and parts it cost alot for Bok to serve up stats for you
and you know it takes a lot of his time to do this that he could spend with his Kids

Bok
02-28-2012, 10:35 AM
They should be up now..

LAURENU2
02-28-2012, 03:24 PM
They should be up now..
Yes Yes Yes they are

LAURENU2
03-01-2012, 12:51 AM
OK now that we are back on track again
Bok Tell us what you need to keep this from happening again
I know of a few Stat junkie's that were about to end it all
and you saved them just in the nick of time
All Hail Bok
:allhail:

Bok
03-01-2012, 01:30 AM
Full backup redundancy :) But that's not going to happen!

Realistically, it's hard to tell. If a mobo fails, then it needs replacing, simple as that. I do need to beef up my backup plans a bit. They are about 95% good, but again need some redudancy.

I'm working on a list of components I need though, not necessarily for prevention but also to beef things up a little.

List in decreasing priority

1. 32Gb Ram for the DB server.
2. 1x120Gb SSD. I'm going to put a dedicated SSD in the webserver and replicate the database to that, this way if the db server goes down, the website will continue (though data will remain static until fixed). I've been testing the replication a bit today and it's fast.
3. 2x120Gb SSD's as backups in the database server.
4. External 1Tb or 1.5Tb drive for extra backups.
5. PSU replacements
6. Another gigabit switch dedicated between the boxes as they all have 2gigabit ports and I could route traffic directly between them.
7. Redundant webserver (rsync to it daily and failover if needed)
8. Redundant database server (much more difficult, possibly in a master-master mysql config)

1+2 are high priorities for me (I'd shift some of the memory from the current db server into the webserver after getting 1), 3 is pretty much up there too. 4 would be useful but not critical at this stage, same as 5+6. 7+8 are not too practical, just dreams :)

I've got about $410 in donations sitting in my paypal account right now which will just about pay for ~ 1+2 1=$239, 2=($139 low end to $199 for the intel 520).

So, all in all looking pretty good I think. It's definitely quite a bit faster. Had a nasty mysql misconfiguration that I've just fixed which was causing some of the threads to timeout in the stats updates. was only set to the minimum 30 seconds and considering some of the updates take 100+ seconds, it breaks the connection in the script. So there might be some data oddities, they'll fix in the next few cycles though.