PDA

View Full Version : Bug fix for Stats



gopher_yarrowzoo
02-06-2005, 09:05 AM
Okay Bok here is one for you - just an idea outline
Psuedo Psychotic Coding:
say you grab stats from http://www.xyz.com/stats/tarball.z or what ever
$filename="http://www.xyz.com/stats/tarball.z"
if (!(file_exists($filename)))
{ error('### tarball fails - file not available'); }
else
{ if (filesize($filename) ==0)
{ error('### tarball fails size = 0') }
else
{delete(old_stats) // delete old stats i.e. generate 2x Hrs ago
rename (current_stats,old_stats) // copy last known good stats to backup file
$filestream=open($filename,"r")
read($filestream to current_stats)
compare (current_stats,old_stats)
if (old_stats are better)
{ delete (current_stats)
copy (old_stats, current_stats)
}
}
}
// it's safe to render page as it will have working stats

Bok
02-06-2005, 09:19 AM
What does this fix exactly ???

I don't use files, it's all DB driven, but essentially I do something like this, though more convoluted.

I think it works already, but correct me if I'm wrong?

Bok:bonk:

IronBits
02-06-2005, 11:37 AM
Originally posted by gopher_yarrowzoo
// it's safe to render page as it will have working stats all that is done in memory, however, the file you that is read in, has a date stamp in the first line of code, so all that is necessary is to do a compare of that date stamp ;)
The file is read in, a line at a time.
If date compare > than last date
suck it all into memory and stuff it in the database where it belongs.
User comes along and clicks on their 'handle' , read from database, present stats
All most zero I/O and no static pages generated. :D

What you suggested is what happens on the old Seti stats here
http://dbestern.com/cgi-bin/seti/setistats.cgi ;)

magnav0x
02-06-2005, 11:52 AM
Yeah, outputing static HTML pages has too much overhead

Darkness Productions
02-07-2005, 09:42 AM
It does? The only way that is true is if there are pages that don't get viewed very often... Otherwise, the DB server gets hammered, and causes more overhead

Bok
02-07-2005, 09:50 AM
Both cases are true I've found, depending on the situation.

For something like D2OL if I created a static html page for every team out there, probably only about 5% or less would get hit with any regularity.

As for dynamic pages (like I do now), sure the DB is doing a lot more work, but more often than not, the sql statements are the same so these are already cached in memory.

Bok

gopher_yarrowzoo
02-07-2005, 04:46 PM
Bok what it's all about is basically - you read the stats into the db from somewhere
i know when the D2oL servers went down your stats pages didn't show the last good result which is what I was trying to explain in the psycho code...
see what I'm getting at now...
Check to see if the data is new before updating the database for that project
that way you've always got a last good set to fall back on - make it clear that the stats are OLD though lol

Bok
02-07-2005, 04:55 PM
I didn't realise my stats showed old data when the d2ol stats went down. (Recently???) They shouldn't as I do check for this. I do it in a different way that's all.

There can be different cases. If the page you are pulling is not there / blank / or a standard error page is shown instead. I've seen 'em all!!

What I do (and it's slightly inefficient, but should work in all scenarios) is

update all users to set them to active_this_run='N',

read users in update details into temp columns and setting each one to active_this_run='Y'. Accumulate total.

if (total != known good total from last run)
update all users with active_this_run='Y' with temp numbers and set active='Y' for these
else
no change, clear out temp columns etc etc

The php pages only check for those users who are active='Y' they don't look at active_this_run. This minimises any down time on the pages too. The architecture also helps in identifying team movement etc etc


over simplified her, but it works..

Bok

pfb
02-07-2005, 06:21 PM
On the stats I've done I've gone for a similar method to gopher_yarrowzoo - if timestamp on data is newer than timestamp of data in DB then parse...as 90% of the storage I have is time based (at all levels - so someone not updating would be treated as a no-stats update) this way (including team movement under DF) suits my method...and means when a stats pull doesn't occur, the data isn't updated as the timestamp (i.e. null) isn't newer than the DB data (unless there is a :swear: moment with the stats data - had a few with DF and a lot, lot more with F@H)...

As for the static vs dynamic...I tried the static (for the main pages) early on in my DF stats and found it was more expensive - in CPU and time resources - it would take ~3-5 mins to build the main team pages whereas dynamic was around 1-2 seconds...never really done static since then unless the data itself is static...

my 2p's worth :cheers:

gopher_yarrowzoo
02-08-2005, 04:58 PM
No Bok, it does pull the old stats - it just pulls nothing if the page goes down - page goes Blank - empty table..

magnav0x
02-08-2005, 06:33 PM
I know what gopher is talking about. Any how some sites don't have last updated times on them and if you are pulling straight from a website with no update times it's a bit erm...impossible, especially if they are using dynamic PHP, because not only do you have no update time in the webpage itself, but you can't even stat the file to see it's last modifyed time, because it's dynamically being recreated all the time.

However if you are pulling from a webpage that has HTML I know you can stat the last modifyed time with Perl (not possible with PHP).

The only other way is to check each entry to it's previous status to see if it's points have changed. The problem with it updating when there is no data is a pitfall to using table shifts, which is what I discovered when I first started doing 3rd party stats. The only true safe way to do it is to do everything timestamp based in the database for each entry. That way even if you don't get data on stats run you have data from previous runs to use.