View RSS Feed

Stats Coding

New Database Server

Rate this Entry
Core I7 2600K@4Ghz
16Gb DDR3
2x128Gb Intel 510 SSD'd (databases)

Intention is to make some much needed changes to some of the underlying tables and therefore the scripts that parse and create the data. Main reason for this the way I create the stats.

I have two databases, each on a different SSD. First database is where I parse the xml files into and do all the manipulation to. The second database is the web facing one. Don't want to do major inserts/updates here as it just ends up slowing down response times.

Once the data is generated and updated I do some custom replication from one database to the other, the very last step being a drop and rename table which is virtually instantaneous. This is a lot of data to copy though. Up to 40Gb or so at a time so the speed of the drives is very important.

But there was a few places where there was room for improvement.

1. Faster processor. Check. It's certainly faster than the core I7 920 in the current DB server.
2. More and faster memory. Check, 16Gb rather than 12Gb in the current server.
3. Faster SSD drives. These Intel 510's are SATAIII and have sequential write speeds of 210Mb+ per second or so.
4. Splitting up the tables so that we don't need to copy over as much data each time . This is what is taking time. Let me explain it further.

Let's take the user table for BOINC, called boinc_user (unsurprisingly)

It's primary key is project code and id (integer) and it currently contains fields such as the nickname (which can change), country, url, team id, Project Ranks 1 thru 28, team ranks 1 thru 28, total credit, total credit at end of yesterday, total credit end of 2 days ago..... thru total credit end of 28 days ago. etc etc etc (about 150 fields in total). Now, there are not too many of these fields that actually change during the day. All of the ranking fields except the current one and all of the metric fields except for current only change once during the overnight processing. So it made more sense to split the table in two, same primary key. For info, this brings down the total size of the boinc_user table from 4Gb down to under 2Gb. Hosts comes down from 11Gb to under 4Gb.

So far, I've split up the user/team/host,combined user & combined team tables and have modified all of the perl scripts to cater for this. They've been running for a week or so and the speed gain is considerable. I've only just started amending the php code to join the tables together though. A lot of places where it accesses the data split from the main tables have to be replaced with joins. Won't take that long though.

Whilst doing this I've done some other table changes where I'd just plain got some of the data characteristics wrong and never had a chance to correct them

I'm going to rename PS3 to GPU, been like that for far too long.

Still to do, bring in the non boinc tables as these are basically copies of the boinc tables, it's the parsing which is different. Timing is key here.

I'm going to buy another SSD, likely the OCZ Vertex 3 which now that it's firmware is up to 2.09 is apparently not suffering from the bluscreens any longer. It has a faster write speed than the Intel's so I'll put it as the web facing database and then move that Intel 510 to hold /tmp and /home. This should provide a two fold increase. First the faster writing of the replication. Secondly, during parsing of the xml files I write the data out into /tmp in a mysql fast loadfile format. Putting these files onto an SSD will be much better.

If anyone really wants to see the new ones you can go to, but as I say most of the php needs fixing, you can compare the statsrun times though


or particularly for something like Einstein@Home


Note the two 100 seconds runs of E@H where I switched off host processing for a test on the 2nd/3rd.