Results 1 to 40 of 53

Thread: Future of Free-DC stats

Hybrid View

Previous Post Previous Post   Next Post Next Post
  1. #1
    Administrator Bok's Avatar
    Join Date
    Oct 2003
    Location
    Wake Forest, North Carolina, United States
    Posts
    24,473
    Blog Entries
    13

    Future of Free-DC stats

    I'm going to post a few ideas in this thread. Please feel free to post your own ideas.

  2. #2
    I have mentioned this before, but there are two ways to prevent writes from killing your SSD:
    (1) If the size of the write files can fit in main memory, use a Ramdisk.
    http://http://www.romexsoftware.com/en-us/primo-ramdisk/index.html

    (2) If the main memory is not big enough, use a write cache (as large as you have memory for) with a long latency.
    An hour will greatly reduce writes to disk in most cases. I used a 24-hour latency and reduced them by 99%, but it depends on the application.
    http://www.romexsoftware.com/en-us/f...che/index.html

    Linux has a comparable feature of using a virtual drive, but I don't know the details of that; I am sure plenty of other people around here can help.

  3. #3
    Administrator Bok's Avatar
    Join Date
    Oct 2003
    Location
    Wake Forest, North Carolina, United States
    Posts
    24,473
    Blog Entries
    13
    Good ideas of course - this is all on linux and I certainly could use Ramdisks (or mysql memory tables). I would love to fit the stats tables into Ram but without removing host data and potentially freeing up some other types of data I would need at least 64Gb Ram and more likely 128Gb which would require some high end hardware.

    I currently have 32Gb on the DB server which is the max for that particular motherboard.

    Here is what I've posted in our private thread.

    I have TWO separate databases on two distinct SSD drives.

    All updates and calculations are done in the dcfree database to tables name like temp_boinc_user, temp_boinc_team etc etc.

    Second database (stats) contains the web facing data with tables named boinc_user, boinc_team etc etc.

    Once all updates are done in a single iteration, I copy the database 'files' from dcfree to stats (outside of mysql), so stats then contains both boinc_user and temp_boinc_user. I then drop boinc_user and rename temp_boinc_user-> boinc_user.

    This operation is pretty much instantaneous so anyone doing an update on the webpages never tries to access a table that is locked for update.

    Problem being though that I write 30Gb of data to the 'stats' drive 20 times per day. That on top of the reads just wears them down.

    I could lower the frequency considerably.

    I'd love to have the whole 'stats' database in memory personally, but would need 128Gb Ram likely be able to do that (totally new hardware) and it would take some rewrite. (Or remove the hosts part)

    Probably other options, but I'm open to suggestions.

    *EDIT* this one was an Intel SSD as well which are supposed to be the most robust. Lasted only 5 months.

  4. #4
    Bok, I have a Supermicro H8SGL motherboard that has not worked for the last year. I am going to RMA it right now. When it comes back I could donate it along with a 6172 processor and heatsink. You would still need $500 to $1000 dollars worth of ram but I imagine we could raise that through donations.

  5. #5
    I have posted a thread at TeAm AnandTech to collect ideas, solutions, and opinions.

  6. #6
    Administrator Bok's Avatar
    Join Date
    Oct 2003
    Location
    Wake Forest, North Carolina, United States
    Posts
    24,473
    Blog Entries
    13
    I've had one idea from AMDave that seems very feasible. Never considered it before but it should be doable with some small changes and would eliminate the writes.

    Instead of replicating the data from one drive to another, instead 'oscillate' the database connection from the webserver.

    i.e.

    still have two distinct databases on distinct drives as I currently have, but make them identical for all intents and purposes (I'd actually have a 3rd for static data which I've been considering anyway).

    1. Run the stats update on db1, whilst the webserver points to db2.
    2. On completion change the pointer from the webserver to point to db1.
    3. Run the stats update on db2 (and ensure any changes include the ones to the prior update as well as any new), webserver still points to db1
    4. on completion change pointer from the webserver to point to db2
    5. Rinse and repeat.

    It will mean I'll be duplicating some updates, but that's actually fairly minor.

    Part 3 is the tricky part but is doable.

    Might need a 3rd SSD in there perhaps.

    Still need to think this out a little more.

  7. #7
    Administrator Bok's Avatar
    Join Date
    Oct 2003
    Location
    Wake Forest, North Carolina, United States
    Posts
    24,473
    Blog Entries
    13
    Quote Originally Posted by Rudy Toody View Post
    I have posted a thread at TeAm AnandTech to collect ideas, solutions, and opinions.
    Thanks Rudy, I do appreciate it, not looking for donations at the moment until I at least have a plan. More looking for ideas. See my previous post for the best one I have so far.

  8. #8
    Registered User
    Join Date
    Jan 2013
    Location
    Santa Ana, CA
    Posts
    2
    Quote Originally Posted by Bok View Post
    Good ideas of course - this is all on linux and I certainly could use Ramdisks (or mysql memory tables). I would love to fit the stats tables into Ram but without removing host data and potentially freeing up some other types of data I would need at least 64Gb Ram and more likely 128Gb which would require some high end hardware.

    [...]

    *EDIT* this one was an Intel SSD as well which are supposed to be the most robust. Lasted only 5 months.
    Hi,

    I know that I may just be another cruncher here, but I work in the computer industry dealing with this type of hardware and software (db's). In my professional opinion, at this time, SSD's are NOT the way to go for any kind of db that will be used continually (as in most web based db's). The problem is the Trim features and even the higher end SSD's with the partitioning cannot keep up with the writes and deletes, so at some point, though the OS shows disk space availability, the SSD actually has nowhere to put the data that should be able to fit on the drive. Sometimes this results in loss of partitions (possible to recover from this) and worse case, especially in db's, depending on what was lost... The Trim and partitioning require some time to perform their duties, even the high reliability drives by Intel, OWC, and Samsung.

    I have equipment that works nearly 24hrs, though not a db, I found only 2 drives that worked without having failures. Crucial (Micron), Toshiba, SanDisk, Kingston, OCZ, Intel and Plextor do not cut the 24/7/565 test in my experience at all; they last 3 - 6 weeks and poop out if they have not been given at least several hours a week of no drive activity to do their garbage collection and Trim. The only drive of reputation I did not try was the Seagate - they are too new in the market to even spend time on. I have, and do run Samsung 840's and OWC drives, and they have been good for over 6 months without a hick-up. Now this is not to suggest they could take the continuous pounding of being db volumes.

    My suggestion is to save the money on SSD's and get some quality disk drives, perhaps some hybrids which would still offer some performance, but they still will write to the platters if/when cache is full or not optimized. Use the money to put into the memory for the MB discussed earlier in the thread and as Jim1900 suggested (and I'm sure you would like as well), keep your main tables in memory, not swapfiles. If you need to ask for donations, so be it.

    Another thing that I would consider is to reduce the frequency of updates from 20 to 12 or even as few as 4 (every 2-6 hours). I do not think that anyone requires the updates that frequent, do they? Just throwing some ideas out there.

    Best wishes,
    Phil

  9. #9

    Possible ideas

    Quote Originally Posted by Bok View Post
    Good ideas of course - this is all on linux and I certainly could use Ramdisks (or mysql memory tables). I would love to fit the stats tables into Ram but without removing host data and potentially freeing up some other types of data I would need at least 64Gb Ram and more likely 128Gb which would require some high end hardware.

    I currently have 32Gb on the DB server which is the max for that particular motherboard.

    Here is what I've posted in our private thread.

    I have TWO separate databases on two distinct SSD drives.

    All updates and calculations are done in the dcfree database to tables name like temp_boinc_user, temp_boinc_team etc etc.

    Second database (stats) contains the web facing data with tables named boinc_user, boinc_team etc etc.

    Once all updates are done in a single iteration, I copy the database 'files' from dcfree to stats (outside of mysql), so stats then contains both boinc_user and temp_boinc_user. I then drop boinc_user and rename temp_boinc_user-> boinc_user.

    This operation is pretty much instantaneous so anyone doing an update on the webpages never tries to access a table that is locked for update.

    Problem being though that I write 30Gb of data to the 'stats' drive 20 times per day. That on top of the reads just wears them down.

    I could lower the frequency considerably.

    I'd love to have the whole 'stats' database in memory personally, but would need 128Gb Ram likely be able to do that (totally new hardware) and it would take some rewrite. (Or remove the hosts part)

    Probably other options, but I'm open to suggestions.

    *EDIT* this one was an Intel SSD as well which are supposed to be the most robust. Lasted only 5 months.
    Make sure swap space is not on the SSD if it still is! The system uses swap TONS during Database/large file operations. If the database will not easily fit in memory it has to use the swap space even more. Your system drive ram buffer can grow over time to fill available 'free' memory, then start using swap space.

    Windows and Unix are from before large files were commonly ran across, so they bang on swap in Unix and the Page file(aka Virtual memory) in Windows. One reason machines have more memory now than years ago!

    Linux you can just mount a separate drive and change a couple settings in the mount table to use as swap space, while Windows it is a file you have to move to another drive and change settings in Windows to do it.

    Frequently used file/s are saved in the ram buffer and the less frequent accessed file/s are put on the swap space.

    Example: mysql and the part of the file worked on is swapped out of the ramdisk buffer, and the actual database it is working on from the swap space.

    It is a VERY simple example, but the idea of how things are done follows this process.

  10. #10
    Administrator Bok's Avatar
    Join Date
    Oct 2003
    Location
    Wake Forest, North Carolina, United States
    Posts
    24,473
    Blog Entries
    13
    Quote Originally Posted by kmanley57 View Post
    Make sure swap space is not on the SSD if it still is! The system uses swap TONS during Database/large file operations. If the database will not easily fit in memory it has to use the swap space even more. Your system drive ram buffer can grow over time to fill available 'free' memory, then start using swap space.

    Windows and Unix are from before large files were commonly ran across, so they bang on swap in Unix and the Page file(aka Virtual memory) in Windows. One reason machines have more memory now than years ago!

    Linux you can just mount a separate drive and change a couple settings in the mount table to use as swap space, while Windows it is a file you have to move to another drive and change settings in Windows to do it.

    Frequently used file/s are saved in the ram buffer and the less frequent accessed file/s are put on the swap space.

    Example: mysql and the part of the file worked on is swapped out of the ramdisk buffer, and the actual database it is working on from the swap space.

    It is a VERY simple example, but the idea of how things are done follows this process.
    yup, swap is on a separate drive, and I also have a separate mysql tmp space too in a ramdisk for the 1! major sql I do for the combined user data which goes to a temp table. (no way around it at all that I can find and I've posted to mysql lists to no avail also - not that this one is too big a deal).

  11. #11

    Give it all up!?!

    It's Clank [MM]

    First, what would you do with all that time on your hands. Second you could collapse Boinc. It's YOUR site that keeps everyone going. Stats, stats, stats. Let me know if you need anything. Patrick

  12. #12
    Junior Member
    Join Date
    Feb 2010
    Location
    Raleigh, North Carolina, United States
    Posts
    25
    Bok,

    Before making suggestions I need to understand your DB setup better.

    You said you have two databases dcfree and stats; are these in one or two mysql servers on the host?
    Why are you using the the temp and regular table setup?
    - website performance
    - consistency, need to update multiple tables separately but they all need to be updated before used
    - backup
    - update performance
    After you update the tables is the data static until the next update?
    How big is this data?
    - total mysql db size GB
    - stats db size [30GB?]
    - largest table GB
    - largest table rows
    - largest table web users can cause updates to
    Which storage engine are you using?
    What collation are you using?
    Do you know your level of db reads and writes?

    Jeff

  13. #13
    You might look into the Intel X25-E SSDs. These have the SLC flash and the 64GB version is rated for around 2 Petabytes of writes. I have seen these go on E-bay with low writes for around $100 a piece. In general, the enterprise grade SSDs come overprovisioned from the factory.

    If you have sufficient memory, eliminate the swap file all together. I never use swap files any more since memory costs very little. Have syslogd and other logging daemons write to non-SSD based storage. You could mount a partition on a hard disk to /var/log for example or mount /var/log to a remote system with NFS. With a high traffic website, I can see log files growing rapidly.

    You can also check the number of writes that your Intel SSD has sustained using smartctl.

    smartctl -a /dev/sda (replace sda with the appropriate device name matching the SSD that you want to check)

    Intel has a field called Host_Writes_32MiB. Multiple the returned value by 32 to get the amount of megabytes written to the SSD. The SSDs with MLC flash are fairly limited in writes and I can see a database like the one hosting your site wearing the flash out fairly quick. Almost all the consumer grade SSDs have MLC or TLC flash.

    If you have a RAID array, you may also want to setup a hot spare.

    Jeroen

  14. #14
    Administrator Bok's Avatar
    Join Date
    Oct 2003
    Location
    Wake Forest, North Carolina, United States
    Posts
    24,473
    Blog Entries
    13
    Quote Originally Posted by Jeff17 View Post
    Bok,

    Before making suggestions I need to understand your DB setup better.

    You said you have two databases dcfree and stats; are these in one or two mysql servers on the host?
    Why are you using the the temp and regular table setup?
    - website performance
    - consistency, need to update multiple tables separately but they all need to be updated before used
    - backup
    - update performance
    After you update the tables is the data static until the next update?
    How big is this data?
    - total mysql db size GB
    - stats db size [30GB?]
    - largest table GB
    - largest table rows
    - largest table web users can cause updates to
    Which storage engine are you using?
    What collation are you using?
    Do you know your level of db reads and writes?

    Jeff
    Both databases are under the same mysql on one machine, residing on different SSD's
    I use the temp and regular setup, just to allow it to be able to replicate raw data across databases, then drop and rename the tables. Yes, for consistency. Backups are down nightly to external drives.

    Website performance is another big question and is causing me a lot of grief, but it's nothing to do with the database, likely all due to too much javascript (I could use any help on that too!)

    Data is mostly static after an update, but updates happen frequently..
    Total size of data on each is ~ 70Gb, difficult as there are also static tables like historic milestones and movement data.
    largest table would be the historic milestones at around 25M rows. used to have a weekly data on host that got close to 1B rows, but scrapped that.
    Row size on hosts is about 4k.
    Web users can only affect a few minor tables.
    I'm doing this all on myISAM in order to allow the raw data replication.
    Massive amounts of reads and writes. For instance for Seti@Home one update will involve close to 5M writes.

    It's all pretty well optimized, but the replication is just killing the drives. I've now had 4 drives fail, but they have all been the drive that gets replicated TO. The one that does all of the calculations has not failed yet in almost 4 years. I have replaced it once, but just to move to a SATAIII setup.

    So, this new way of doing it will alleviate it totally. And I'm pretty far into recoding for it now.

    I do all of the same updates and calculations in one database (let's call it stats1), then switch a variable so that the webpages read all the data from that database.
    Next update run will do the updates and calculations in the second database (stats2) and then switch the webpages to read from that one.

    I've moved all static(ish) tables into a 3rd database on another separate drive, so there is recoding to point to those instead.

    My biggest hurdle is the end of day calculations and rollover. I need to make sure that both databases are consistent before doing that rollover in both of them, so that the dailies match in both. Otherwise during the oscillation during the day, there could possibly be discrepancies. Updating less frequently will mitigate this and I'm just planning on a 2 hour window where I don't download any new files.

    I still intend to update fairly frequently, probably just hourly checks.

    I may switch all the tables over to InnoDB now that I'm not doing the raw replication.

    The non boinc projects are a little more challenging in tha tdata is not really downloaded from all of them in the form of xml files, it's scraped. But I'm not that worried about those. I'll get the BOINC ones working first, then do something for them.

  15. #15
    Registered User Fire$torm's Avatar
    Join Date
    Jan 2010
    Location
    U.S.A. Eastcoast
    Posts
    25
    Hi Bok,

    Like some of the other posters, I believe RAID is the way to go. My twist to this theme is to use a separate RAID Box consisting of a CPU with a low TDP (To reduce power consumption).

    My suggestion for a RAID box
    OS: FreeNAS (Link)

    CPU: AMD A8-5500 Trinity 3.2GHz Quad-Core (Link)

    MB: (The following support (x6) SATA III + (x1) eSATA, Gigabit Ethernet, USB 3.0)
    *ASUS F2A85-M/CSM (Link)
    *ASRock FM2A85X Extreme4-M (Link)
    Note:
    *eSATA port would be used for boot drive.

    HDD: (x6) WD Scorpio Black WD7500BPKT 750GB 2.5" (Link)
    Notes:
    *The WD SB series are the most robust 2.5" HDDs that I know of at their price point.
    *The 2.5" form factor allows for better airflow, helping to keep them cool.
    *A way to improve read/write and reduce latency of an HDD, is formatting it to <= 50% capacity. This is called Short Stroking.
    *Using RAID 10 will give you a total capacity of 1,125GB (Link) and give performance near/at SATA III (single disk) levels.

    Case: Cooler Master HAF 912 (Link)
    *Best airflow performance for a small mid-tower at its price point.

    Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. - Benjamin Franklin

  16. #16
    Junior Member
    Join Date
    Oct 2006
    Location
    Big Rock, TN
    Posts
    3
    Bok,
    To cut your writes to any specific drive couldn't you just add more drives to the mix, stats3, stats4,...?

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •