PDA

View Full Version : Stats rewrite



Bok
05-22-2007, 08:16 PM
I'll try to keep some updates on the rewrite...

Today I rewrote the XML parser using XML:Parser:SAX which greatly reduces the memory requirements by not holding the whole XML document in memory. Seems fairly quick too. I'm monitoring it running Einstein stats right now (parsing host stats) and it doesn't look too bad...

[freedc@statsbox test]$ cat /proc/3498/status
Name: a_gather2.pl
State: S (sleeping)
SleepAVG: 98%
Tgid: 3498
Pid: 3498
PPid: 409
TracerPid: 0
Uid: 501 501 501 501
Gid: 501 501 501 501
FDSize: 256
Groups: 501
VmPeak: 138600 kB
VmSize: 138596 kB
VmLck: 0 kB
VmHWM: 19376 kB
VmRSS: 19372 kB
VmData: 16608 kB
VmStk: 84 kB
VmExe: 12 kB
VmLib: 7688 kB
VmPTE: 252 kB
StaBrk: 00604000 kB
Brk: 01591000 kB
StaStk: 7fff913fde60 kB
Threads: 1
SigQ: 0/16379
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000001086
SigCgt: 0000000180000000
CapInh: 0000000000000000
CapPrm: 0000000000000000
CapEff: 0000000000000000
Cpus_allowed: 7fffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff
Mems_allowed: 00000000,00000001

I've also started splitting the script into components. This is a bit more tricky the way I have the DB set up as I've got to synchronize them, but it shouldn't be that difficult..

I already have the single scripts that purely do user project ranking and team project ranking

All on the dev box right now only..

If people are interested in this level of detail, I'm mroe than happy to post more :)

Bok

LAURENU2
05-22-2007, 10:33 PM
All them 00000000 make my eyes go :looney:
I am sure I can speak for all of us and say
We really appreciate all that you do for us Bok

BlackMountainCow
05-23-2007, 02:28 AM
Yes, indeed. Although I have no idea what all the numbers mean, I'm just happy there's someone like Bok who KNOWS what they mean and gives us some fantastic stats. :thumbs:

Bok
05-24-2007, 11:38 AM
Ok, part 2...

I re-implemented again the gathering script using XML::Twig this time and for a change, inserting the data into a temporary table (more on that later)

Testing on Einstein hosts (~ 373000) took 20mins to insert all the data. Not sure if I can make that any faster.

Now to run ranking on the hosts...this is where I've used advice that came from Zain Upton over at BoincSynergy quite some time ago (thanks Zain!)

Let's make this simpler. I have a table that contains amongst other things

id, score, projrank

I create a new temp table containing id + rank (which is an auto-incrementing field meaning as you insert, mysql starts at 1 and incremements).

I then perform the following steps

select curtime();
alter table hosts add index forstats (score desc,id);
insert into temp (id) select id from hosts order by score desc,id;
update hosts,temp set hosts.projrank =temp.rank where hosts.id=temp.id;
select curtime();

And the results are... (index was already created hence the error)

+-----------+
| curtime() |
+-----------+
| 11:25:36 |
+-----------+
1 row in set (0.00 sec)

ERROR 1061 (42000): Duplicate key name 'forstats'
Query OK, 373261 rows affected (5.50 sec)
Records: 373261 Duplicates: 0 Warnings: 0

Query OK, 296975 rows affected (3 min 55.44 sec)
Rows matched: 373261 Changed: 296975 Warnings: 0

+-----------+
| curtime() |
+-----------+
| 11:29:37 |
+-----------+
1 row in set (0.00 sec)

4 mins to rank them isn't too bad.. Current stats script probably would take 10 times that using arrays in perl

:)

Bok :cheers:

the-mk
05-24-2007, 02:58 PM
cool looking SQL statements :D

:thumbs: :cheers:

Bok
05-30-2007, 04:03 PM
Some more investigation on doing project ranking.. Playing about with mysql :)

This is actually on a table of users in Rosetta, all run on a dual xeon 3.6

mysql> select count(*) from tempuser2;
+----------+
| count(*) |
+----------+
| 138626 |
+----------+
1 row in set (0.00 sec)

Show the top 10 via score.. no ranks...

mysql> select projrank,name,metric1 from tempuser2 order by metric1 desc limit 0,10;
+----------+----------------------+-----------------+
| projrank | name | metric1 |
+----------+----------------------+-----------------+
| 0 | XS_DDTUNG | 26074289.110836 |
| 0 | PY 222 | 19296317.008225 |
| 0 | UW-Madison CAE | 12952821.188437 |
| 0 | lunarcom | 9870414.385457 |
| 0 | Housing and Food Ser | 8993245.2656 |
| 0 | Keith E. Laidig | 8424687.856815 |
| 0 | TeAm Enterprise | 7903186.796393 |
| 0 | lwh1 | 7256810.819434 |
| 0 | TCU Computer Science | 5758233.912814 |
| 0 | Oldguy | 5320341.380613 |
+----------+----------------------+-----------------+
10 rows in set (0.00 sec)

Then a wee bit of mysql magic...

mysql> set @rownum = 0;
Query OK, 0 rows affected (0.00 sec)

mysql> update tempuser2, (select@rownum := @rownum+1 AS newrank, id as user_id from tempuser2 order by metric1 desc) r set tempuser2.projrank = newrank where user_id = id;
Query OK, 138626 rows affected (8.33 sec)
Rows matched: 138626 Changed: 138626 Warnings: 0

mysql> select projrank,name,metric1 from tempuser2 order by metric1 desc limit 0,10;
+----------+----------------------+-----------------+
| projrank | name | metric1 |
+----------+----------------------+-----------------+
| 1 | XS_DDTUNG | 26074289.110836 |
| 2 | PY 222 | 19296317.008225 |
| 3 | UW-Madison CAE | 12952821.188437 |
| 4 | lunarcom | 9870414.385457 |
| 5 | Housing and Food Ser | 8993245.2656 |
| 6 | Keith E. Laidig | 8424687.856815 |
| 7 | TeAm Enterprise | 7903186.796393 |
| 8 | lwh1 | 7256810.819434 |
| 9 | TCU Computer Science | 5758233.912814 |
| 10 | Oldguy | 5320341.380613 |
+----------+----------------------+-----------------+
10 rows in set (0.00 sec)

mysql>

Seems just a little faster..... :thumbs:

Now if only I could get the xml parsing to run faster..... still this is MUCH better than before.

Now I've just got to code in

1. Updates to actual tables from temp.
2. movements
3. milestones

And of course I have a new machine to play with at our server provider which is somewhat more powerful....

Bok

BlackMountainCow
06-05-2007, 05:28 PM
Hehe, glad you like your new "toy" ! :thumbs:

LAURENU2
06-05-2007, 06:47 PM
:idea: Now if you could write a script to pic the winning Lotto Number for me:lmao:
I would buy you a Rack of thouse 16 core U2's and your vary own T1 trunk:rock: to feed it:bouncy:
That would give you lots of play room

Bok
06-21-2007, 11:02 PM
Update:

I've had some time to refine the new perl script some more this week..

All the ranking for teams,users and hosts is done.
Users within teams is done
Team movement is all done
Milestones are done

Seti takes around 1hr 40 mins to complete including hosts, which is a lot less than the current one. If I don't do hosts it takes 20 mins :)

So, it's pretty close to being ready. I've got a couple more changes to make which I'm doing now.

BUT, there are some db changes which mean I have to make some front end changes to the php to get it working in the current site. One of the main ones being that the teams in BOINC projects actually have their boinc id as the primary index rather than the name - that was a long overdue change.

I'll start running them in parallel on a new server this week and make the changes there. All going well, I expect to migrate the forums over to the new server in around 3-4 weeks time and the rewritten stats will go live then.

Magnav0x is making some php changes for me too, which we'll hopefully get in on the new version. He's promised me that the new charts will be cool.... :)

Bok

LAURENU2
06-22-2007, 01:07 AM
A Thanks to you Bok and your coding team, for all the tireless effort put in this project
THANK YOU

IronBits
06-22-2007, 06:49 PM
Awesome news Bok! :rock:
Thank you very much for all the TIME it takes you away from your family and friends to do this for us!!! :hifi:

BlackMountainCow
06-23-2007, 04:57 PM
:rock: Great news Bok! :clap:

Thx for all you work man, it's really appreciated! :cheers: