Results 1 to 12 of 12

Thread: Stats rewrite

  1. #1
    Administrator Bok's Avatar
    Join Date
    Oct 2003
    Location
    Wake Forest, North Carolina, United States
    Posts
    24,472
    Blog Entries
    13

    Stats rewrite

    I'll try to keep some updates on the rewrite...

    Today I rewrote the XML parser using XML:Parser:SAX which greatly reduces the memory requirements by not holding the whole XML document in memory. Seems fairly quick too. I'm monitoring it running Einstein stats right now (parsing host stats) and it doesn't look too bad...

    [freedc@statsbox test]$ cat /proc/3498/status
    Name: a_gather2.pl
    State: S (sleeping)
    SleepAVG: 98%
    Tgid: 3498
    Pid: 3498
    PPid: 409
    TracerPid: 0
    Uid: 501 501 501 501
    Gid: 501 501 501 501
    FDSize: 256
    Groups: 501
    VmPeak: 138600 kB
    VmSize: 138596 kB
    VmLck: 0 kB
    VmHWM: 19376 kB
    VmRSS: 19372 kB
    VmData: 16608 kB
    VmStk: 84 kB
    VmExe: 12 kB
    VmLib: 7688 kB
    VmPTE: 252 kB
    StaBrk: 00604000 kB
    Brk: 01591000 kB
    StaStk: 7fff913fde60 kB
    Threads: 1
    SigQ: 0/16379
    SigPnd: 0000000000000000
    ShdPnd: 0000000000000000
    SigBlk: 0000000000000000
    SigIgn: 0000000000001086
    SigCgt: 0000000180000000
    CapInh: 0000000000000000
    CapPrm: 0000000000000000
    CapEff: 0000000000000000
    Cpus_allowed: 7fffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff
    Mems_allowed: 00000000,00000001

    I've also started splitting the script into components. This is a bit more tricky the way I have the DB set up as I've got to synchronize them, but it shouldn't be that difficult..

    I already have the single scripts that purely do user project ranking and team project ranking

    All on the dev box right now only..

    If people are interested in this level of detail, I'm mroe than happy to post more

    Bok

  2. #2
    =>Team Joker<= LAURENU2's Avatar
    Join Date
    Dec 2004
    Location
    Chicago IL USA
    Posts
    5,478
    Blog Entries
    1
    All them 00000000 make my eyes go
    I am sure I can speak for all of us and say
    We really appreciate all that you do for us Bok

  3. #3
    Yes, indeed. Although I have no idea what all the numbers mean, I'm just happy there's someone like Bok who KNOWS what they mean and gives us some fantastic stats.

  4. #4
    Administrator Bok's Avatar
    Join Date
    Oct 2003
    Location
    Wake Forest, North Carolina, United States
    Posts
    24,472
    Blog Entries
    13
    Ok, part 2...

    I re-implemented again the gathering script using XML::Twig this time and for a change, inserting the data into a temporary table (more on that later)

    Testing on Einstein hosts (~ 373000) took 20mins to insert all the data. Not sure if I can make that any faster.

    Now to run ranking on the hosts...this is where I've used advice that came from Zain Upton over at BoincSynergy quite some time ago (thanks Zain!)

    Let's make this simpler. I have a table that contains amongst other things

    id, score, projrank

    I create a new temp table containing id + rank (which is an auto-incrementing field meaning as you insert, mysql starts at 1 and incremements).

    I then perform the following steps

    select curtime();
    alter table hosts add index forstats (score desc,id);
    insert into temp (id) select id from hosts order by score desc,id;
    update hosts,temp set hosts.projrank =temp.rank where hosts.id=temp.id;
    select curtime();

    And the results are... (index was already created hence the error)

    +-----------+
    | curtime() |
    +-----------+
    | 11:25:36 |
    +-----------+
    1 row in set (0.00 sec)

    ERROR 1061 (42000): Duplicate key name 'forstats'
    Query OK, 373261 rows affected (5.50 sec)
    Records: 373261 Duplicates: 0 Warnings: 0

    Query OK, 296975 rows affected (3 min 55.44 sec)
    Rows matched: 373261 Changed: 296975 Warnings: 0

    +-----------+
    | curtime() |
    +-----------+
    | 11:29:37 |
    +-----------+
    1 row in set (0.00 sec)


    4 mins to rank them isn't too bad.. Current stats script probably would take 10 times that using arrays in perl



    Bok

  5. #5
    almost retired the-mk's Avatar
    Join Date
    Jan 2003
    Location
    KI/OOE/Austria
    Posts
    1,921
    Blog Entries
    6
    cool looking SQL statements

    the-mk

  6. #6
    Administrator Bok's Avatar
    Join Date
    Oct 2003
    Location
    Wake Forest, North Carolina, United States
    Posts
    24,472
    Blog Entries
    13
    Some more investigation on doing project ranking.. Playing about with mysql

    This is actually on a table of users in Rosetta, all run on a dual xeon 3.6

    mysql> select count(*) from tempuser2;
    +----------+
    | count(*) |
    +----------+
    | 138626 |
    +----------+
    1 row in set (0.00 sec)


    Show the top 10 via score.. no ranks...

    mysql> select projrank,name,metric1 from tempuser2 order by metric1 desc limit 0,10;
    +----------+----------------------+-----------------+
    | projrank | name | metric1 |
    +----------+----------------------+-----------------+
    | 0 | XS_DDTUNG | 26074289.110836 |
    | 0 | PY 222 | 19296317.008225 |
    | 0 | UW-Madison CAE | 12952821.188437 |
    | 0 | lunarcom | 9870414.385457 |
    | 0 | Housing and Food Ser | 8993245.2656 |
    | 0 | Keith E. Laidig | 8424687.856815 |
    | 0 | TeAm Enterprise | 7903186.796393 |
    | 0 | lwh1 | 7256810.819434 |
    | 0 | TCU Computer Science | 5758233.912814 |
    | 0 | Oldguy | 5320341.380613 |
    +----------+----------------------+-----------------+

    10 rows in set (0.00 sec)

    Then a wee bit of mysql magic...

    mysql> set @rownum = 0;
    Query OK, 0 rows affected (0.00 sec)

    mysql> update tempuser2, (select@rownum := @rownum+1 AS newrank, id as user_id from tempuser2 order by metric1 desc) r set tempuser2.projrank = newrank where user_id = id;
    Query OK, 138626 rows affected (8.33 sec)
    Rows matched: 138626 Changed: 138626 Warnings: 0

    mysql> select projrank,name,metric1 from tempuser2 order by metric1 desc limit 0,10;
    +----------+----------------------+-----------------+
    | projrank | name | metric1 |
    +----------+----------------------+-----------------+
    | 1 | XS_DDTUNG | 26074289.110836 |
    | 2 | PY 222 | 19296317.008225 |
    | 3 | UW-Madison CAE | 12952821.188437 |
    | 4 | lunarcom | 9870414.385457 |
    | 5 | Housing and Food Ser | 8993245.2656 |
    | 6 | Keith E. Laidig | 8424687.856815 |
    | 7 | TeAm Enterprise | 7903186.796393 |
    | 8 | lwh1 | 7256810.819434 |
    | 9 | TCU Computer Science | 5758233.912814 |
    | 10 | Oldguy | 5320341.380613 |
    +----------+----------------------+-----------------+
    10 rows in set (0.00 sec)


    mysql>

    Seems just a little faster.....

    Now if only I could get the xml parsing to run faster..... still this is MUCH better than before.

    Now I've just got to code in

    1. Updates to actual tables from temp.
    2. movements
    3. milestones

    And of course I have a new machine to play with at our server provider which is somewhat more powerful....

    Bok

  7. #7
    Hehe, glad you like your new "toy" !

  8. #8
    =>Team Joker<= LAURENU2's Avatar
    Join Date
    Dec 2004
    Location
    Chicago IL USA
    Posts
    5,478
    Blog Entries
    1
    Now if you could write a script to pic the winning Lotto Number for me
    I would buy you a Rack of thouse 16 core U2's and your vary own T1 trunk to feed it
    That would give you lots of play room

  9. #9
    Administrator Bok's Avatar
    Join Date
    Oct 2003
    Location
    Wake Forest, North Carolina, United States
    Posts
    24,472
    Blog Entries
    13
    Update:

    I've had some time to refine the new perl script some more this week..

    All the ranking for teams,users and hosts is done.
    Users within teams is done
    Team movement is all done
    Milestones are done

    Seti takes around 1hr 40 mins to complete including hosts, which is a lot less than the current one. If I don't do hosts it takes 20 mins

    So, it's pretty close to being ready. I've got a couple more changes to make which I'm doing now.

    BUT, there are some db changes which mean I have to make some front end changes to the php to get it working in the current site. One of the main ones being that the teams in BOINC projects actually have their boinc id as the primary index rather than the name - that was a long overdue change.

    I'll start running them in parallel on a new server this week and make the changes there. All going well, I expect to migrate the forums over to the new server in around 3-4 weeks time and the rewritten stats will go live then.

    Magnav0x is making some php changes for me too, which we'll hopefully get in on the new version. He's promised me that the new charts will be cool....

    Bok

  10. #10
    =>Team Joker<= LAURENU2's Avatar
    Join Date
    Dec 2004
    Location
    Chicago IL USA
    Posts
    5,478
    Blog Entries
    1
    A Thanks to you Bok and your coding team, for all the tireless effort put in this project
    THANK YOU

  11. #11
    Target Butt IronBits's Avatar
    Join Date
    Dec 2001
    Location
    Morrisville, NC
    Posts
    8,619
    Awesome news Bok!
    Thank you very much for all the TIME it takes you away from your family and friends to do this for us!!!

  12. #12
    Great news Bok!

    Thx for all you work man, it's really appreciated!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •