Results 1 to 40 of 53

Thread: Future of Free-DC stats

Hybrid View

Previous Post Previous Post   Next Post Next Post
  1. #1
    Administrator Bok's Avatar
    Join Date
    Oct 2003
    Location
    Wake Forest, North Carolina, United States
    Posts
    24,503
    Blog Entries
    13
    Quote Originally Posted by Jeff17 View Post
    Bok,

    Before making suggestions I need to understand your DB setup better.

    You said you have two databases dcfree and stats; are these in one or two mysql servers on the host?
    Why are you using the the temp and regular table setup?
    - website performance
    - consistency, need to update multiple tables separately but they all need to be updated before used
    - backup
    - update performance
    After you update the tables is the data static until the next update?
    How big is this data?
    - total mysql db size GB
    - stats db size [30GB?]
    - largest table GB
    - largest table rows
    - largest table web users can cause updates to
    Which storage engine are you using?
    What collation are you using?
    Do you know your level of db reads and writes?

    Jeff
    Both databases are under the same mysql on one machine, residing on different SSD's
    I use the temp and regular setup, just to allow it to be able to replicate raw data across databases, then drop and rename the tables. Yes, for consistency. Backups are down nightly to external drives.

    Website performance is another big question and is causing me a lot of grief, but it's nothing to do with the database, likely all due to too much javascript (I could use any help on that too!)

    Data is mostly static after an update, but updates happen frequently..
    Total size of data on each is ~ 70Gb, difficult as there are also static tables like historic milestones and movement data.
    largest table would be the historic milestones at around 25M rows. used to have a weekly data on host that got close to 1B rows, but scrapped that.
    Row size on hosts is about 4k.
    Web users can only affect a few minor tables.
    I'm doing this all on myISAM in order to allow the raw data replication.
    Massive amounts of reads and writes. For instance for Seti@Home one update will involve close to 5M writes.

    It's all pretty well optimized, but the replication is just killing the drives. I've now had 4 drives fail, but they have all been the drive that gets replicated TO. The one that does all of the calculations has not failed yet in almost 4 years. I have replaced it once, but just to move to a SATAIII setup.

    So, this new way of doing it will alleviate it totally. And I'm pretty far into recoding for it now.

    I do all of the same updates and calculations in one database (let's call it stats1), then switch a variable so that the webpages read all the data from that database.
    Next update run will do the updates and calculations in the second database (stats2) and then switch the webpages to read from that one.

    I've moved all static(ish) tables into a 3rd database on another separate drive, so there is recoding to point to those instead.

    My biggest hurdle is the end of day calculations and rollover. I need to make sure that both databases are consistent before doing that rollover in both of them, so that the dailies match in both. Otherwise during the oscillation during the day, there could possibly be discrepancies. Updating less frequently will mitigate this and I'm just planning on a 2 hour window where I don't download any new files.

    I still intend to update fairly frequently, probably just hourly checks.

    I may switch all the tables over to InnoDB now that I'm not doing the raw replication.

    The non boinc projects are a little more challenging in tha tdata is not really downloaded from all of them in the form of xml files, it's scraped. But I'm not that worried about those. I'll get the BOINC ones working first, then do something for them.

  2. #2
    Registered User Fire$torm's Avatar
    Join Date
    Jan 2010
    Location
    U.S.A. Eastcoast
    Posts
    25
    Hi Bok,

    Like some of the other posters, I believe RAID is the way to go. My twist to this theme is to use a separate RAID Box consisting of a CPU with a low TDP (To reduce power consumption).

    My suggestion for a RAID box
    OS: FreeNAS (Link)

    CPU: AMD A8-5500 Trinity 3.2GHz Quad-Core (Link)

    MB: (The following support (x6) SATA III + (x1) eSATA, Gigabit Ethernet, USB 3.0)
    *ASUS F2A85-M/CSM (Link)
    *ASRock FM2A85X Extreme4-M (Link)
    Note:
    *eSATA port would be used for boot drive.

    HDD: (x6) WD Scorpio Black WD7500BPKT 750GB 2.5" (Link)
    Notes:
    *The WD SB series are the most robust 2.5" HDDs that I know of at their price point.
    *The 2.5" form factor allows for better airflow, helping to keep them cool.
    *A way to improve read/write and reduce latency of an HDD, is formatting it to <= 50% capacity. This is called Short Stroking.
    *Using RAID 10 will give you a total capacity of 1,125GB (Link) and give performance near/at SATA III (single disk) levels.

    Case: Cooler Master HAF 912 (Link)
    *Best airflow performance for a small mid-tower at its price point.

    Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. - Benjamin Franklin

  3. #3
    Junior Member
    Join Date
    Oct 2006
    Location
    Big Rock, TN
    Posts
    3
    Bok,
    To cut your writes to any specific drive couldn't you just add more drives to the mix, stats3, stats4,...?

  4. #4
    Administrator Bok's Avatar
    Join Date
    Oct 2003
    Location
    Wake Forest, North Carolina, United States
    Posts
    24,503
    Blog Entries
    13
    Adding more drives would mean I'd have to split up tables and do even more coding - it could get messy. However I have added a 3rd database.

    Here is the current progress.

    I've re-organized the tables and now have one with 'static-ish' tables in whilst the other two are virtual clones of each other.

    Redone the perl and shellscripts so they now flipflop between these two database when doing updates. Seems to be working well. Prior to starting any update, they will pass the other database over to the webserver so it knows to server from the one not being updated. Conversely after updates, it does the same thing so that the latest data is always shown.

    Not quite finished with all the custom scripts I have but the major ones are running.

    On the webside, I've started doing the changes necessary to make sure any of the static tables are accessed correctly. As long as the overnight tonight works ok, I should be in a position to switch them back on sometime tomorrow. Some things won't work, but I'll be able to fix it up fairly quickly over the coming days.

    Hang tight!

  5. #5
    Junior Member
    Join Date
    Feb 2010
    Location
    Raleigh, North Carolina, United States
    Posts
    25
    Our big server (holding the db for an active web application) is about the same size ~140GB total db space, 200+ tables, several tables with 10-70M rows and the largest at almost 200M rows. We are about 4-1 select vs insert/update on the database (see later). We only have two tables that get big push updates, 300K and 1.3M, rows and that is just once a day.

    When we were performance testing two years ago we looked at myISAM vs InnoDB, SSD, SSD raid, standard disk raid and increasing memory. We saw the biggest performance gain by switching to InnoDB and getting row level locking; mainly because we have php sessions in the db (~20K rows but constant read/write/update) and the 200M row table (activity log that gets lots of inserts and large reads when reports are run). Neither SSD option gave us much performance gain over standard disk raid (especially for the price) and frequently the SSD raid was slower during our once a day update. Increasing the memory and then increasing the innodb-buffer-pool-size was a much better option for the money than the SSDs. We have 96GB of RAM in the server and the buffer set to 68GB. Because of this large buffer almost half our reads are cache hits saving a huge db and disk load, before the memory change our db read vs write was about 7-1. The disks are 4 x 7200rpm 2.5in 250GB in a raid 10.

    I think your change to swapping the db in the web config vs copying the tables should help with killing the SSDs. I would watch the disk stats to make sure the reads from the web queries are not also somehow helping to kill the SSDs; since in your previous setup all the web queries were also hitting the SSD that keeps dying.

    Next I would work on changing to InnoDB. Switching to InnoDB will give you some performance gains on the web queries and should give a lot on the updates (row vs table locking). However, before making the change be sure to review the InnoDB configuration settings especially related to the various buffers, there is a whole separate set of config values for InnoDB and if you leave them at defaults you will kill the server. Set the buffer as large as you can, probably with your machine 16GB is about your max.

    Long term I think you would be better off if you can get more memory in the server and having larger buffers, you can then probably go back to standard disks in a raid to save some money. It would be great if you had enough memory for 48GB or 64GB buffer.

    I use the temp table swap trick in a few places also. It is a nice way to update data without impacting the user and then suddenly all the updates are there. In mine I have the temp table in the same db to start with so no copies.

  6. #6
    Now while my expertise in database is restricted to the fact I know they exist and are run on computers...

    I'd just like to say thank you, Bok. BOINCing is half the fun without Free-DC and knowing what my rigs are doing. Please don't give up and while it looks like you found one or more better solutions for hassle-free running, please don't hesitate to ask us for donations. On many other places I have become sceptical about use of donations, but Free-DC always is a no-brainer to me as I know how hard you are working on that.

    Thank you so much, Bok.

  7. #7
    Quote Originally Posted by Mad Matt View Post
    Now while my expertise in database is restricted to the fact I know they exist and are run on computers...

    I'd just like to say thank you, Bok. BOINCing is half the fun without Free-DC and knowing what my rigs are doing. Please don't give up and while it looks like you found one or more better solutions for hassle-free running, please don't hesitate to ask us for donations. On many other places I have become sceptical about use of donations, but Free-DC always is a no-brainer to me as I know how hard you are working on that.

    Thank you so much, Bok.
    I don't post here much, if ever. However, I am a fan of Free-dc and would like to help, as best I can. That would be by sending you a portion of the not-so-large paycheck I get, once in a while. If there is something I do know about DB's, it costs money. Let me help out a little. I know some folks at TPU feel the same way. You have enough to do, let us help defray some costs!

  8. #8
    Quote Originally Posted by Mad Matt View Post
    Now while my expertise in database is restricted to the fact I know they exist and are run on computers...

    I'd just like to say thank you, Bok. BOINCing is half the fun without Free-DC and knowing what my rigs are doing. Please don't give up and while it looks like you found one or more better solutions for hassle-free running, please don't hesitate to ask us for donations. On many other places I have become sceptical about use of donations, but Free-DC always is a no-brainer to me as I know how hard you are working on that.

    Thank you so much, Bok.
    +1

  9. #9
    Registered User
    Join Date
    Jan 2013
    Location
    Santa Ana, CA
    Posts
    2
    Quote Originally Posted by Mad Matt View Post
    Now while my expertise in database is restricted to the fact I know they exist and are run on computers...

    I'd just like to say thank you, Bok. BOINCing is half the fun without Free-DC and knowing what my rigs are doing. Please don't give up and while it looks like you found one or more better solutions for hassle-free running, please don't hesitate to ask us for donations. On many other places I have become sceptical about use of donations, but Free-DC always is a no-brainer to me as I know how hard you are working on that.

    Thank you so much, Bok.
    Well said, Mad Matt. I'll second that.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •