PDA

View Full Version : [FIXED] stats glitch?



shifted
12-14-2002, 08:52 PM
I'm confused by something, if you look at my production graphs for the past 24 hours, and indeed several days, it shows i'm making over 250k:

http://www.seventeenorbust.com/stats/graphs/user24.mhtml?userID=632http://www.seventeenorbust.com/stats/graphs/userOverall.mhtml?userID=632
(Sorry that the images are fuzzy, but the image is transparent, and the forum rules don't allow me to post html to make a background to fix it)

However, if you look at my stats page (http://www.seventeenorbust.com/stats/users/user.mhtml?userID=632), it shows my last day's rate to be 152.920 K. This number here has been steady decreasing ever since we started testing multiple k's, but i didn't mention it, as several of my machines weren't running stably at the time (first, the sb client keep crashing, and then nfs problems for the first week of December).

I'm i just reading things wrong?

kugano
12-14-2002, 10:27 PM
Shifted:

I just looked at your stats page, and everything seems fine? Your crunch rate is shown as 270.495 K-cEMs/sec (24-hour) and 206.115 K-cEMs/sec (overall), which looks about right judging from the graphs (I also verified this from our database information).

Are you looking at the rates shown in the "Rankings" tables? If so, you have to take into account the fact that those rankings are only computed every 24 hours (at 1:02 AM Eastern time, or 1:04 AM for team rankings). So the "24 hour rate" shown in your Rankings table right now was your rate between 1:02 AM on Friday and 1:01:59 AM this morning, not your most up-to-date 24-hour rate.

Sorry if this confused you!

*adds to list of questions to answer on the to-be-rewritten FAQ*

Halon50
12-14-2002, 10:50 PM
I've been watching my own 24-hour data rate graph and the "Last Day's Rate" leader board, and am experiencing the same thing. My "Last Day's Rate" hasn't changed for several days now (it stays at a flat 342.159K), while my graph has pretty consistently averaged around 450K or so.

I was going to wait for one more update before posting, but Shifted beat me to it. ;)

shifted
12-15-2002, 12:08 AM
Oops! I meant to say under "Neighbors: Last Day's Rate", it looks like this:


Neighbors: Total Production

Rank Username Rate (cEMs/sec) Acct. Age
52 daylight (profile) 155.545 K 2.9 days
53 Emporor (profile) 154.237 K 10.1 days
54 shifted (profile) 152.920 K 28.8 days
55 haase (profile) 145.612 K 15.4 days
56 Rob (profile) 145.212 K 5.8 days


But in the Numeric Statistics, it shows 275.03 K cEMs/sec. Also, there is a difference in the numeric statistics overall work rate (206.46 K cEMs/sec) and the neighbours: overall rate (200.539 K). My production levels are at 275, and have been about that much for several days, but the tables below don't update :/

Maybe i'm just fretting because i've dropped from #9 to #54 lol

smh
12-15-2002, 06:00 AM
Neighbors: Total Production

I see this consistantely throughout the site.

Shouldn't Neighbors be spelled as Neighbours? Or is that the difference between British english and US english:confused:

Confused as a non native english speaker

shifted
12-15-2002, 06:33 AM
Originally posted by smh
I see this consistantely throughout the site.

Shouldn't Neighbors be spelled as Neighbours? Or is that the difference between British english and US english:confused:

Confused as a non native english speaker

Yeah. Sometime ago, the Americans got lazy and started dropping or changing letters to make words shorter. I don't know why--perhaps they were slow typers. Personally, i prefer "proper" English spelling over American for two reasons: as a Canadian, I feel threatened by American culture and tend to avoid Americanisms; and two, English spelling and grammaticisms are more globally accepted.

However, there are no "true" or "right" ways to spell. The only time spelling alterations annoy me is when they lead to confusion. This is often the case of homonyms combined with total laziness, such as: it's and its, you're and your, and they're, their, and there. I find it totally appalling that many native speakers cannot bother to memorise these, and get them confused, especially since they are such basic and common words. Non-native speakers of English learn them so it's not too difficult.

Language is in a constant state of gentle flux. Some is good, some is bad. I actually find it a fascinating topic, and would have been taking a degree in it, except the career opportunities are few.

Jwb52z
12-15-2002, 05:45 PM
You're being irrational if you somehow feel threatened by American culture. Canada is just about the last place ANYONE wants to invade.

Mystwalker
12-15-2002, 05:52 PM
Haven't you seen South Park - The Movie? :D

shifted
12-15-2002, 07:30 PM
Originally posted by Jwb52z
You're being irrational if you somehow feel threatened by American culture. Canada is just about the last place ANYONE wants to invade.

LOL

Jwb52z
12-15-2002, 09:09 PM
What's so funny? You're threatened by American culture and you think what I said was funny?

kugano
12-15-2002, 09:50 PM
I'll calmly skip over the flames here (and suggest everyone else does the same) and make this comment about the ranking stats:

They're fixed. They weren't truly broken in the first place (they really were updating every night, and have been since the switchover), but they have been made 'better' which should alleviate everyone's concerns. Read on for an explanation.

The rankings are computed every night at 1 AM. However, since they're computed using the last 24 hours of data, some data is missing -- work that you're currently doing doesn't get included until your client makes a progress report. So if you had made a progress report 30 minutes before the ranking update, that 30 minutes of work since the last progress report wasn't getting included (since our server didn't know yet that you had done it). This was "fair," since the same omission of work was done for every user, but was misleading.

So the solution: the rankings still recompute at 1 AM, but instead of taking the last 24 hours into consideration, they take the 24 hour period from midnight until midnight. Since the computation is done at 1 AM, that gives an hour time span for any progress reports to come in from working clients. So, although not perfect (it's possible to go more than 1 hour without a progress report), it's much better.

Hope this helps, and thanks for pointing it out. :)

Halon50
12-15-2002, 10:41 PM
Aha! Thanks for the catch and quick fix! :cheers:

shifted
12-15-2002, 10:57 PM
Ahh! That explains it. Thanks for the fix! :)

prokaryote
12-16-2002, 03:52 AM
Originally posted by shifted
Yeah. Sometime ago, the Americans got lazy and started dropping or changing letters to make words shorter. I don't know why--perhaps they were slow typers...


Actually, I heard that it was done intentionally at the time of the Revolutionary war as another way to further seperate "America" from "Britain".


Probably the colonists were feeling threatened by British culture and rule so they avoided Britishisms (is that a word?)... Turn about is fair play!?!? :D :D

Halon50
12-16-2002, 11:39 AM
The "Last Day's Rate" still appears FUBARed. The graph shows an average of around 500 K, and the "Last Day's Rate" chart displays 384.415 K.

Maybe it's the other way around? The chart could be FUBARed and the Rate rankings is fine? But if that were the case, my "Overall Rate" would be going down as well.

dfamily
12-17-2002, 02:00 AM
Now I think I'm confused...
I snagged the textstats at 23:00 my time just before the update and then shortly thereafter...if I'm reading it right, my total production went *down* from 751 to 748? (the update shows me at 746 and the current textstats at 749)

12/16/02 23:00
User 728 dfamily 1038519135 98 391 26 9 751715816376 52813502320.0222 1038519162 1040101151

12/17/02 00:28
User 728 dfamily 1038519135 98 391 25 9 748909777727.25 46212933652.9868 1038519162 1040106116

Yes, I'm 100% positive they are in the correct order! :)
and actually now that I glance at a few others, they seem to be in the same boat...am I crazy?

df.

ps: I assume the hour shift noted in the earlier message would have something to do with it and am trying hard to mentally apply it to the numbers but still can't seem to make it work...:confused:

Angus
12-17-2002, 07:25 PM
I'll add my own situation to the mix...

My User stats Lifetime graph shows a consistent trend easily averaging at or above 250K cEM/s for the last 5 or 6 days, and the 24 hr. Rate graph close to 250K , but the Neighbors:Last Day Ranking shows the rate at a very low 211.344K.

Let's see what the next update brings tonight.....

Angus
12-18-2002, 03:58 AM
I took two screen shots of my 24 hour history graph, one at 1:15 EST, which shows a plateau at 230K until 20:00, then a drop to 200K until 00:00, then dropping to 150K until the snapshot was taken.

At 3:34 EST, the same graph shows the 230K plateau ending at 14:00, going to 275K until 23:00, dropping to 260 until 02:00, the down to about 175K.

HOW IS THIS POSSIBLE?

2 hours after the first graph, my production mysteriously increased almost 50K per hour over the 14:00 to 23:00 time frame - long after that time has passed.

I am using the same 5 CPUs all day, every day, always connected, so how can the production stats jump around so wildly. and change half a day in the past?

I can't see that those stats are anything close to reality. Maybe someone can explain this, or fix the danged graphs and charts.

smh
12-18-2002, 08:35 AM
I am using the same 5 CPUs all day, every day, always connected, so how can the production stats jump around so wildly. and change half a day in the past?

There were some lower N's recycled the other day. They give you a lower cEMs rate. When the tests were finished you probably got much higher N's so the cEMs rate went up.

Angus
12-18-2002, 10:40 AM
SMH - you missed the point - the graphs are changing retroactively ! On the 1 AM graph, t rates were lower for the previous day, than on the 3 AM graph.

This seems to be consistent, day by day, and it drives the rate used in the rankings artificially low. There is no way my rate for yesterday was 222K avg - it never dropped that low at any one point in the graph, so can't average that low.

MDFaunce
12-18-2002, 12:44 PM
Here's the way I understand it:

Let's say you have a computer that takes 4 hours to do a block.

At Noon it completes a block and updates the stats. Over the next 3 hours, 59 minutes your stats from noon to 3:59 show 0.

At 4pm, it completes a block and updates the stats. Now the stats server updates your progress and your stats from noon to 4 show your work.

The stats are updated after the fact.

Obviously with multiple computers on the same account, varying block sizes and varying performance (your computer may actually do something other than SoB sometimes) this is a best guess and just levels the performance of the block over the time it took to return the block (ie if you ran for 3:50 and then stopped the client for 4 days and then ran the last 10 minutes, it would show the block as be computing continuously, but slowly over the entire period).

I've got a Pentium II/233 that takes a while to do a block, so my last 24 hours graph is always wrong. Personally I'd rather see a last week or last 5 days or last 3 days graph where the stats had a little time to "settle in." For slower computers or intermittently running or very busy computers, 24 hours isn't that long.

Mike

smh
12-18-2002, 02:22 PM
It might have been possible that a client wasn't able to connect to the server, and only updated after the last block. This makes it even worse.

The right part of the graph is always too low if the clients crunches 24/7 because the last blocks are not in there. This is very obvious in the general project stats.

It can also happen that the graph is too high. This happens when a (fast) comuter is not on 24/7.

When my home P4 starts a new test the first few blocks will be much higher compared to the rest of the graph. Coz when i switch it of or let it do something else, the time for which the block is reserved keeps on going.

Hmm, don't think anybody understands what i'm saying, but lets say at 0:00 i start a new test. A block takes 1 hour so my rate after submitting the block is (250M/3600) 69.5K cEMs. After the second block at 2:00 it's still the same at 69.5K. Now i switch off my computer until 8:00. If i look at the graph at 8:30 it will show a spike between 0:00 and 2:00 and a flat line between 2 and 8.30. Now at 9 when i submit the next block, the spike will be averaged out over the complete period because all the server know is that it took me 9 hours to do 3 blocks (750M/25200=30K cEMs).

Angus
12-18-2002, 05:22 PM
Well, if that's what it's doing, that's a pretty funky way of recording work.

It should be recording each hour what is returned that hour.


I think the project folks should explain a whole lot more about how these stats work/don't work.

kugano
12-18-2002, 07:18 PM
Angus:

I have explained the inner workings of the stats engines thoroughly in several of my previous posts. Use the forum's search feature to look them up and read them.

Angus
12-18-2002, 07:52 PM
Can you find such a thread? I can't. Found 10 threads for 'kugano' , any date, in the SOB forum - all recent.

I'm not a DC newbie, or some inexperienced kid. I'm a 50+ year old computer professional with 30 years in the business, and those stats are the most counter-intuitive, confusing mess I have seen.

Why does past history change? It's a simple question......

I either returned x amount of work in that time frame or I didn't. Once it's past, it's done. I can't return work in the past.

kugano
12-18-2002, 11:02 PM
Angus,

Please read the following threads:

http://www.free-dc.org/forum/showthread.php?s=&threadid=2163
(in which my 2nd post is the most relevant)

http://www.free-dc.org/forum/showthread.php?s=&threadid=2144
(in which my 4th post is the most relevant)

I will be happy to try to answer any questions not covered in the afforementioned posts.

I also must ask you to restrict your comments on this forum to constructive criticism and open discussion only. I am open to suggestions and ideas, and to the very real possibility that our system and our statistics could be better. My personal goal, as I'm sure is Louie's, is to make this project as good as possible. That is, in fact, the primary reason this forum exists. However, please make an effort to present your ideas in a more positive, level-headed fashion or I will ignore them. Rudeness and snide comments serve only to damage your credibility and to pollute this forum.

MathGuy
12-18-2002, 11:59 PM
At the risk of getting everyone involved hacked off at me instead of each other, let me put in my $.02 here...there are fundamentally two ways of reporting progress: time-driven (at regular time increments the client reports to the server how much was work was done in that increment) and work-driven (when a work unit is done, the client reports completion to the server).

When we see a lovely graph of time (x-axis) vs. work (y-axis), we tend to think of the first model, since we are all taught to read the x-axis as the independent variable.

However, this project uses the second model...a more accurate (but seriously goofy looking) graph would be one in which work unit rectangles of equal width and height proportional to the speed of that work unit are stacked next to one another, with some indication given of the number of work units still pending.

It seems to me that the fundamental confusion lies in the fact that work-driven stats are being reported in a time-driven graph...both of these decisions make sense, but they do lead to the possibility of confusion - it IS possible for these stats to change retroactively since a completed work unit affects the speed all the way back to the point at which that work unit was assigned.

As to how to "fix" this, I see three possibilities:

1) ignore it...it's just an artifact of the way things are
2) use smaller "logical blocks" for statistical purposes and only average over them (not a whole Proth unit)
3) in conjunction with 2) have the client report its progress on shutdown and don't average beyond a shutdown event, either

Well, didn't mean to write a novel...hope this helps somebody...

kugano
12-19-2002, 12:41 AM
You're completely right, and I like the alternate perspective. I'll try to present, once and for all, precisely how our rate statistics work, from top to bottom. I sincerely hope this will be sufficient to clear up any confusion.

Our database contains a table called "proth_tests." This table stores the user to whom the test was assigned, the time it was assigned, the state of the test (e.g. pending, complete, dropped, etc.), and, if pending, the last-known progress from the client (e.g. 47% complete) and the time of that report. (Actually, a whole lot more is stored, but for the purposes of this discussion the other data doesn't matter.)

For tests that have been completed, therefore, the rate statistic is based on two things: the amount of work involved in the test (a formula based mainly on n size), and the amount of time it took the client to do it (time of the final progress report minus time of the initial assignment).

For tests that are still pending, the rate is based on two similar things: the amount of work involved in the test multiplied by the percentage complete (e.g., a 50%-complete test would be credited as half the work of the full test), and the amount of time the client has been crunching (time of the last progress report minus time of initial assignment).

The last situation is where the confusion lies. When a client makes a progress report, like "I'm 62% done with this test", the block server makes the following changes to the database:

1. The "last report time" field is updated to the current time.
2. The "amount complete" field is set to 62%.

This, of course, destroys any information about any previous progress reports. So if your client gets to 10% after a few minutes of crunching, but then you turn it off for the night and start it back up again the next day, a few minutes later when it makes its second progress report at 20%, the "proth test" entry gets overwritten with the new report. The upshot of this is that the database has no record of the fact that your client did less work during the night it was turned off; it only knows that your client did 20% of the test between the assignment time and the time of the last report (in this case, morning the next day).

In anticipation of the obvious question "so why not store each progress report as a separate entry, so that no data is lost and the rate is more accurate?" I give the following answers:

1. In fact, we used to do it this way. But in order to have any sort of accuracy, the client would have to make, say, 10 progress reports for every test. With more than 50,000 Proth test entries (some of which, of course, were dropped and aren't included in the "Proth tests done" statistic on the website), that amounts to over half a million rows in the database! Performing complicated queries on a table approaching a million rows requires computing power that, quite simply, our database server does not have. And it would only get worse with time.

2. There would still be some inaccuracies. What if you turn your machine off between two progress reports? All the server will know is that the segment of work between those two reports was much slower than the rest, but it will still be averaged out!

In fact, you can generalize the entire situation and look at it from a signal processing standpoint. The graph generators can be seen as DACs, converting a digital signal (histories of progress reports, or discrete samples of the client's rate at fixed time intervals) into an analog signal (a pretty graph). In order to get the graph to show the client's work rate with 100% accuracy, you simply need lots and lots and lots of digital samples (progress reports). But as has already been explained, this is completely infeasible. There are, of course, algorithms to encode such data more efficiently (e.g. adaptive sampling), but this is WAY beyond the scope of a Perl script to generate graphs, especially since the graphs are not intended to represent scientific data, but instead general visualizations of a user's rate over time.

I hope this puts to rest any confusion that may still exist. Thanks to MathGuy for his less rigorous but informative and concise explantion.

Angus
12-19-2002, 02:30 AM
If a CPU is calculating constantly at approximately the same rate, 24x7, and always connected to the net, the 'progress report' rate that is calculated should be the same rate all the way through the work unit. If 5 CPUs are doing this, the cumulative rate still should be constant, but higher. The CPUs being used are not changing their speed, nor being at all affected by other work.

Under those conditions (constant crunching, no shut-offs or lost connections) I still fail to see how a graph can show a solid 300K rate for 24 hours up to 11PM, then suddenly at stats ranking time (1 AM), the same graph shows 275K for 14 hours, dropping to 230K for the last 10 hours of the 24 hour period. Where did all the production go? The clients still reported progress, are still crunching at their same speed, and are still reporting. Do the stats NOT include all the checkpointed progress reports for work units not completed when you do the 'daily rate' snapshot calculation?

shifted
12-19-2002, 02:42 AM
Originally posted by Angus
If a CPU is calculating constantly at approximately the same rate, 24x7, and always connected to the net, the 'progress report' rate that is calculated should be the same rate all the way through the work unit. If 5 CPUs are doing this, the cumulative rate still should be constant, but higher. The CPUs being used are not changing their speed, nor being at all affected by other work.

Under those conditions (constant crunching, no shut-offs or lost connections) I still fail to see how a graph can show a solid 300K rate for 24 hours up to 11PM, then suddenly at stats ranking time (1 AM), the same graph shows 275K for 14 hours, dropping to 230K for the last 10 hours of the 24 hour period. Where did all the production go? The clients still reported progress, are still crunching at their same speed, and are still reporting. Do the stats NOT include all the checkpointed progress reports for work units not completed when you do the 'daily rate' snapshot calculation?

Recently, the expiry time was reduced to five days, thus flooding the work queue with old work units. The older work units, having lower n values, produce fewer cEM/s than higher n work units, and so you saw a decrease in production every time one of your machines got one of the lower n values.

Angus
12-19-2002, 02:42 AM
I like MathGuy's solutions 2 and 3:



As to how to "fix" this, I see three possibilities:
1) ignore it...it's just an artifact of the way things are
2) use smaller "logical blocks" for statistical purposes and only average over them (not a whole Proth unit)
3) in conjunction with 2) have the client report its progress on shutdown and don't average beyond a shutdown event, either


Why couldn't 2 and 3 be implemented on the most recent 24 hour basis? Then when the stats have been calculated, store the results in summary, clean out the detail statistics, then start accumulating the next day's details. That should keep the database rows to a manageable level.

Option 1 is not good.

Angus
12-19-2002, 02:46 AM
Shifted:

Are you saying that different work units will cause the CPU to speed up or slow down???

The RATE that the CPU calculates should not change - the total amount of time to complete a work unit may change....


I can't say that I've ever seen the cEM/s rate vary significantly on a given CPU.

I was pointing out that for the period of 3PM to midnight on 12/18, while looking at the graphs periodically throughtout that time period, the rate showed as 300K. Then on the 1 AM graph, the same time frame (3PM to Midnight 12/18) showed a rate of 230K.

shifted
12-19-2002, 02:52 AM
Originally posted by Angus
Are you saying that different work units will cause the CPU to speed up or slow down???

No, that's ludicrous. I'm saying different n values result in a different cEM/s rate. It has to do with the way the algorith works


The RATE that the CPU calculates should not change - the total amount of time to complete a work unit may change....

Obviously.


I can't say that I've ever seen the cEM/s rate vary significantly on a given CPU.

Then what is this drop in the cEM/s graphs? It's a significant change in the cEM/s rate.

Angus
12-19-2002, 03:02 AM
Then what is this drop in the cEM/s graphs? It's a significant change in the cEM/s rate.


THAT is what I'm trying to get them to explain.

None of the explanations so far fit my situation - no stopping, no gaps in reporting, nothing that would lead the stats engine to decide that my rate has suddenly slowed down.

And now, at 2AM, the whole rate chart picture for my last 24 hours has completely changed again.

Every time you check the user stats, under the 'Numeric Statistics' section, in the 'Last Day' column, there is a 'Work Rate' number. That number is ALLWAYS higher, at any time during the day, than the resulting 'Ranking' Last Day's Rate number. I watched it hover at 300K all day today and tonight, then at 1 AM my calculated 'Last Day's Rate' for Ranking is 239K. That just doesn't add up.

smh
12-19-2002, 04:07 AM
I can't say that I've ever seen the cEM/s rate vary significantly on a given CPU.

There really is, of course it's more obvious when the difference in N's get larger.

Also, the rate the client reports is the average rate since you started the client. So when you get a larger N to test, the rate will slowly go up. But if the computer is already running for a couple of days it can go very slow.

Just exit the client and restart it. Let it run a couple of minutes on an otherwise idle system to get an accurate rate. Do the same with a new test and you'll see the difference.

The big dip on the main graph between 20 and 27 november is mainly the result of switching to new K's which were not tested that as far compared to the N's well above 2M before.

The rise after 27-11 was a result of the newly found prime

dudlio
12-19-2002, 06:58 AM
>It seems to me that the fundamental confusion lies in the fact
>that work-driven stats are being reported in a time-driven
>graph...

omg thank you for crystallizing what it is that's been throwing me off.

>But as has already been explained, [100% accuracy] is
>completely infeasible.

Not at all. I figure 24-100 samples would create a nice daily graph. For 1500 users, this would be a max of 150,000 rewriteable rows. For a lifetime graph, you only need the daily average. Assuming 1500 users run the project for a year, this is a half-million permanent rows.

The difference seems to be whether you reassemble rate data from work entries (proth_tests table) or store rate data uncompressed. I think you should generally use a db to store data uncompressed, since it's the manipulations that are expensive.

In this case, the queries would be very simple:

User: select sum(rate) from sb_hourly_rates where user_id=1;
System: select sum(rate) from sb_hourly_rates group by hour;

nuutti
12-19-2002, 07:55 AM
To Angus.

I don't know if you know this, but cEM formula the project is using does not work very well (formula that calculates how many cEMs are in one full test). I mean that if you are using same pc and you are crunching units with n about 500,000 you will get a lot lower cEM/s number than when you are crunching units with n about 1,500,000. This is a known bug and it affects everyone.
Because we are getting bigger n values every day our cEM/s
rate will increase all the time. When you get a smaller one (recycled) the your rate will drop. I think that this bug is in to do list, but Loue is quite busy with other features so it might get some time before this is fixed. I think that we should use modified version of GIMPS formula for corrected work amount.

Yours,

Nuutti

dfamily
12-19-2002, 10:44 AM
I have read all the posts about the calculations of the RATES and GRAPHS and forgive me but am still confused about one little thing...

Total Production and work done...NOT daily rates and graphs and all that, my total score. Is this number going to fluctuate downward at the 1 a.m. update? I don't need the why it does, just if it does so I'll quit worrying about it...thanks...

Angus
12-19-2002, 11:10 AM
This is the graph I'm talking about. How can it be like this for 24 hours, and result in a 239K daily rate?


And, as I've said before, all CPUs are crunching all the time, with virtually no other load.

And Nuutti, I am NOT going to run around to them starting and stopping the client every time a work unit changes!! That is sheer foolishness, and if the client requires it, it is in serious need of help.

Does the displayed cEM/s rate in the client window have any basis in reality? I'm not sure if that is the number that Nuutti implied is not calculated correctly.

dfamily
12-19-2002, 12:24 PM
Is this number going to fluctuate downward at the 1 a.m. update?

why yes it can...think of it this way...during the day, the Numeric Statistics:Work Done:Overall reports the total INCLUDING work units in progress. At update time, the work units in progress are ingnored and the Neighbors:Total Production reports the total EXCLUDING work units in progress. It only includes work units that have been completed...

*----------------------------------
Magpie, this is for you...
Assuming stats file updated every 15 min...

Interim report 12:45 reports x complete units + score for partial units to 12:30 to 12:45

Daily Update 01:00 x complete units only does NOT report for partial between 12:45 and 01:00

Interim report 01:15 x complete units + score for partial units from 01:00 to 01:15

So you won't see the quarter b4 the update and if the 1st quarter after doesn't report as many as you had in the quarter b4, it looks like less..bingo?
*-----------------------------------

Sorry all, I'm cute but slow but I think I *finally* understand.

patrick.

SOB Projection List (http://www.adeatherage.com/sob/sobproj.htm)

Angus
12-19-2002, 12:57 PM
during the day, the Numeric Statistics:Work Done:Overall reports the total INCLUDING work units in progress. At update time, the work units in progress are ignored and the Neighbors:Total Production reports the total EXCLUDING work units in progress. It only includes work units that have been completed...


Well, that explains a LOT. Why couldn't someone have said that clearly days ago? It might have saved us all a lot of bandwidth. :bang:


I certainly don't agree with that method, however. If you're going to show incomplete work in the ongoing 24 hour graph and charts, than use it for the daily ranking update, or else leave it out completely.

No wonder none of the numbers tie together!!!

smh
12-19-2002, 02:48 PM
Hmm, that might explain why my daily rate at the moment is only 11K cEMs while my 24/7 pc crunches at a constant 25K cEMs.

OTOH, i didn't complete a test today, so if i understand this correctely, tomorrow my daily rate should be 0 cEMs??

Really confused now.

But i'm not really in here about the stats (although i don't want to miss them), but more for the project it self.:crazy:

Mystwalker
12-19-2002, 05:37 PM
Uhm, the stats are not the project?!? :confused: ;)

Ok, so let me get serious again:
AFAI understand it, your score will drop somewhat, but it's still > 0 - as you almost certainly did at least 1 block of the current test. So the server has the log that you already finished x blocks.
Let's say:
You completed 10 blocks in the first day (and this day is yesterday). So your cEMs/sec for the last 24 hours would be 2,500 McEMs (10 blocks * 250 McEMs/block) / 86,400 seconds (one day) = roundabout 29 KcEMs/sec.

When the PC does 10 blocks again the next day (assuming the test consists of more than 20 blocks), it will be 5,000 McEMs / 172,800 sec = 29 KcEMs/sec again.

But if you shut down the PC the whole day, it would be 2,500 McEMs / 172,800 sec = 14,5 KcEMs/sec - for both days, but as only the last 24 hours count in this statistic...


Did I get it right now?!? :confused:

kugano:
Your explanation improved my understanding somewhat. Thanks a lot! :cheers:

MDFaunce
12-19-2002, 05:51 PM
Work done is kind of like clocking out at the end of the day.

You don't get paid based on how much work you say you are going to do. You get paid based on how much work you do. They record this by spreading the amount of work you do over the time it took you to do the work. They don't know how long it's going to take you to do a block -- the amount of time it takes to do a block varies widely based on many, many variables -- so they can't credit it until you turn it in. So, you get credit for doing a block when you turn your work in. And your credit is spread evenly over the time since the last block was turned in.

Every time you turn in a block, the stats are updated. This update says "I did this block and I have been working on it (not necessarily continuously mind you) since I turned in the last block."

smh
12-19-2002, 06:12 PM
But my rate for the last day was 11,494K, while my pc which is on 24/7 does at least 25K. I also have a laptop and my P4 which report a couple of blocks a day.

The daily graph always showed a rate of at least 30K.
Even my lifetime average is 17,297K and that includes the 4 months i wasn't running the client at all.

Angus
12-19-2002, 06:15 PM
The way the 'Last Day Rate' stat looks, it appears that dfamily's theory is closest. You don't get ANY credit for completed blocks if the entire work unit is not completed at the 1 AM stats run. That would explain why the daily rate in that stat is so drastically lower.

Now, if kugano could confirm that....

I wonder if it's an oversight, or intentional?

Mystwalker
12-19-2002, 06:37 PM
One other question to throw in:

Since the introduction of the "Production By Country" table (thx again for that little feature), the best coutries (except US) had ~ 1 McEMs/sec. But now, germany weights in at 5.61 McEMs/sec and the UK almost achieved 3! Did you change the distribution method?

Just pure interest...

nuutti
12-19-2002, 07:01 PM
The real work unit is one proth test = full test. Block is just fake.
Starting and stopping client does not matter. Project uses formula about how many cEM is in one proth test and divide this amount by used time. Because formula estimates incorrectly how many cEMs is in one full test you rate will increase all the time.
(bug causes cME per test increase faster than it in reality increases). I guess that same formula is used in server and client.

Yours,

Nuutti

Angus
12-19-2002, 07:49 PM
Uhm, the stats are not the project?!?


Lest anyone miss this point, stats ARE the project.

Without stats, you won't get the big teams' participation. If the stats are borked, or skimpy, or non-existent, or too hard to figure out, or not parsable for external stats engines, the project will most likely wither. (not suggesting that SoB currently falls into any of these categories)

Just my opinion, from years of participating in DC projects.

Want to start one of the 'reasons I do this project' polls?

:crazy:

Chinasaur
12-19-2002, 07:54 PM
/me wonders...hmmmm.

/me thinks "if stats are borked and Ars is here for stats....would Ars go away if stats are borked? and TeamBeOS could take #1?"

/me votes for borked stats all the time!!!!!!!!!!!!

/me doesn't think stats are borked.

/me wishes for BeOS GUI client :)

:cheers:


===========
Posted with NetPositive under BeOS.
"Yeah. THAT dead OS ;)"

shifted
12-19-2002, 08:07 PM
Originally posted by Chinasaur
/me wonders...hmmmm.

/me thinks "if stats are borked and Ars is here for stats....would Ars go away if stats are borked? and TeamBeOS could take #1?"

/me votes for borked stats all the time!!!!!!!!!!!!

/me doesn't think stats are borked.

/me wishes for BeOS GUI client :)

:cheers:

LOL...

/me was thinking the same think

/me was hoping they'd all go away so me would be back in the top ten

kugano
12-19-2002, 11:03 PM
Angus: No, the 1 AM ranking update does credit for "blocks" (unfinished tests).

Mystwalker: Actually, yes, the method of computing countries was completely changed this morning. Instead of using unreliable reverse DNS lookups, the script now queries the IANA databases (many thanks to Vato for pointing out that I can actually do this :cheers: The new numbers are now far, far more accurate (although not 100% perfect). Sorry I didn't post about this earlier.

Angus, smh, et al.: You're right, I'm now myself convinced something is wrong with the ranking updates. I have no idea what; I just ran it manually before making this post and it came up with correct results for every user that I checked. But before I ran the manual update it most certainly showed incorrect numbers. I'll be around for the update tonight to see if I can put a finger on exactly what's going on.

Everyone else: Go see "The Two Towers." It's worth it.

shifted
12-19-2002, 11:26 PM
Originally posted by kugano
Angus, smh, et al.: You're right, I'm now myself convinced something is wrong with the ranking updates. I have no idea what; I just ran it manually before making this post and it came up with correct results for every user that I checked. But before I ran the manual update it most certainly showed incorrect numbers. I'll be around for the update tonight to see if I can put a finger on exactly what's going on.

Yep, that manual update fixed my stats also. Hmm... Good luck on debugging it.

Also, check out the stats for team shifted (http://www.seventeenorbust.com/stats/teams/team.mhtml?teamID=101). It consists entirely of me, and it's exhibiting the exact same anomalies as the user stats were. I suspect it's a common bug.

kugano
12-20-2002, 01:05 AM
Having just watched tonight's update, I think I know what was happening (but not why):

On a whim, a few minutes before the update I removed a script from the scheduled maintenance list. It was a simple script designed to go and recompute the cEM amounts for each test (just to be sure the server didn't 'forget' to calculate cEMs or calculate them wrong or something). Really that script shouldn't be necessary, but it's been in the nightly maintenance since the new system went online just to guard against any programming errors in the new server code.

Anyway, although I can't prove it (and a quick glance at the script in question didn't reveal any glaring errors), I'm betting this script was to blame. It runs 2 minutes before the ranking updates, and is the *only* difference I could find between the automated ranking updates and my manual ones that would explain why my manual ones worked fine and the automatic ones generated incorrect results.

So, check your rankings now. Let me know if they're more correct.

Angus
12-20-2002, 01:08 AM
LOOKS GOOD !!! :thumbs:

:notworthy


Thanks for taking the time to get into it for us.

kugano
12-20-2002, 01:13 AM
Closer examination of the script reveals that I was right. There was a (very sneaky) math error in the way it calculated cEMs. (Not to worry; the error only affected unfinished tests, and thus the server [correctly] recalculated all the bad cEM values.)

How ironic, that a script designed to guard against server math errors instead has math errors itself while the server is fine. *Sigh.*

shifted
12-20-2002, 03:02 AM
Originally posted by kugano
How ironic, that a script designed to guard against server math errors instead has math errors itself while the server is fine. *Sigh.*

That's the same reason why we have multiple levels of government?in case one level fails to screw up, the next will! :D

Halon50
12-20-2002, 05:37 PM
Very cool catch. Thanks for the fix! :cheers: