Page 1 of 2 12 LastLast
Results 1 to 40 of 54

Thread: Something wrong ???

  1. #1

    Something wrong ???

    http://stats.zerothelement.com/cgi-b...ed.pl?Id=24782

    it seems like a huge drop in production !!! even for the biggest folder

  2. #2
    Senior Member
    Join Date
    Oct 2003
    Location
    an Island off the coast of somewhere
    Posts
    540
    Yup-

    Looks like something broke - just in time for the weekend.

  3. #3
    Senior Member
    Join Date
    Jul 2002
    Location
    Kodiak, Alaska
    Posts
    432
    logging on at www.distributedfolding.org shows that the scores match there.. so the problem lies with the DF site. Wonderful new addition of divideby16 on our upload scores?
    www.thegenomecollective.com
    Borging.. it's not just an addiction. It's...

  4. #4
    Apparently this protein was too darn fast.

  5. #5
    Member
    Join Date
    Jul 2003
    Location
    Home of the 2010 Olympics
    Posts
    92
    snail's pace crunching, points not working, memory leaking.....christ! did i mention uploads gone missing all week!!!!!

  6. #6
    Look on the bright side people, at least it never gets dull around here.

  7. #7
    OCworkbench Stats Ho
    Join Date
    Jan 2003
    Posts
    519






    :

    I am not a Stats Ho, it is just more satisfying to see that my numbers are better than yours.

  8. #8
    Perhaps it's only counting structures instead of the points allocation to each generation

    Always seems to happen on a weekend
    Crunching for OCAU

  9. #9
    All our guy's having the same problem........
    Too many computers, too little time......

  10. #10
    OCworkbench Stats Ho
    Join Date
    Jan 2003
    Posts
    519
    OMG, I recommend going OFFLINE NOW ! If the work is lost it will really cheese us off, better to be safe and buffering than
    I am not a Stats Ho, it is just more satisfying to see that my numbers are better than yours.

  11. #11
    Senior Member
    Join Date
    Mar 2002
    Location
    MI, U.S.
    Posts
    697
    Originally posted by N.V.M.
    snail's pace crunching, points not working, memory leaking
    Umm... new executables were posted on the 20th that were supposed to have fixed the memory leak.

    I don't know if they actually do that or not, though -- it hasn't run for long enough yet.
    "If you fail to adjust your notion of fairness to the reality of the Universe, you will probably not be happy."

    -- Originally posted by Paratima

  12. #12
    Administrator Dyyryath's Avatar
    Join Date
    Dec 2001
    Location
    North Carolina
    Posts
    1,850
    The Linux version of the new executable appears to have fixed my memory problems. I've been running it for two days & it hasn't budged over about 92mb. Before it'd would have been up around 150 or 160mb by now.

    As for the stats, I'm not sure what's happening. My system is accurately reflecting what it gets from the official site, so I guess we'll have to wait for Howard's input on Monday...
    "So utterly at variance is destiny with all the little plans of men." - H.G. Wells

  13. #13
    Fixer of Broken Things FoBoT's Avatar
    Join Date
    Dec 2001
    Location
    Holden MO
    Posts
    2,137
    my gut feel is that its just the stats server again, similar to when it broke 3-4 weeks ago, whenever it was

    if the structs aren't piling up on the client side, then it is most likely just the stats server borked

    paging Dr. Howard, paging Dr. Howard, please report to the stats server.
    Use the right tool for the right job!

  14. #14
    Boy-O-Boy, Peeps were complaining about low points from the new client, wait til they see this.....
    Too many computers, too little time......

  15. #15
    Member
    Join Date
    Jul 2003
    Location
    Home of the 2010 Olympics
    Posts
    92
    fo shizzle, suxors!

  16. #16
    OCworkbench Stats Ho
    Join Date
    Jan 2003
    Posts
    519
    Wise man say that when two buckets fall from window, one with water the other camel waste, you know which one will always land on your head.

    I am not a Stats Ho, it is just more satisfying to see that my numbers are better than yours.

  17. #17
    Junior Member Nanobot's Avatar
    Join Date
    Mar 2002
    Location
    Nottingham, UK
    Posts
    8
    So we have a stats problem, can anyone name a DC project which has never had a stats problem

  18. #18
    Originally posted by FoBoT
    my gut feel is that its just the stats server again, similar to when it broke 3-4 weeks ago, whenever it was
    Butt is couldn't possibly be the stats server. We were just told not more than 3 weeks ago that it could be fixed and DID NOT need to be replaced by donations from the masses...

  19. #19
    MgKnight not exactly.

    Originally posted by Stardragon
    Yes, we do talk. When I said we are ok for hardware, I meant to convey the fact that we have machines with which to replace the malfunctioning node. Despite that, the machines that we have will fail eventually, as is the case with the stats machine. Sorry for the confusion.
    Mayby that time has come. Every DC project i know has, from time to time, statsproblems. So it ain't that strange/ big-a-deal.

    Stats aren't everything man

    Member of the Los Alcoholicos

  20. #20
    Free-DC Semi-retire gopher_yarrowzoo's Avatar
    Join Date
    Mar 2002
    Location
    Knoxville, TN
    Posts
    3,985
    Hmm good job I haven't done a dump in a while, saving up like about 200 structs just now :P
    Semi-retired from Free-DC...
    I have some time to help.....
    I need a new laptop,but who needs a laptop when you have a phone...
    Now to remember my old computer specs..


  21. #21
    The stats server appears to be working fine. The date on the stats is from 1 hr ago. I snooped around and don't see anything amiss. Can anyone verify that they are not getting credit for uploaded work right now (and check your own error.log for problems first)? Or maybe people just slowed down uploading??
    Last edited by Brian the Fist; 11-23-2003 at 11:35 AM.
    Howard Feldman

  22. #22
    Ok, now I see what you mean, everyone is being credited approx 1/20 th what they should be. If somebody would send me an e-mail saying this, I could solve the problem a lot faster than the vague indications that 'something' is wrong here. Anyhow, I have corrected it (problem on our side, not yours). And before you ask, no, there is no way to easily give you back the 'extra' points that were not credited since we don't track production over time, however, again, keep in mind the problem affected everyone, and was only for a little over 24 hrs so only minimal harm was done stats-wise. In terms of the actual data (for those who care about the science), everything is fine, this was solely a stats issue.
    Howard Feldman

  23. #23
    Ancient Haggis Hound Angus's Avatar
    Join Date
    Jan 2002
    Location
    Seattle/Norfolk Island
    Posts
    828
    Pretty obvious , I'd say...

    Are you saying that this forum is NOT the place to report problems?
    Attached Images Attached Images

  24. #24
    What about simply multiplying the posted stats by 20? For those who went nonet as soon as this was noticed, they can now dump and gain full credit. The rest of us have lost major numbers.

    I care about the science or I would do a different project, but the stats are important too. This was 36 hours more or less, not 24. On a weekend, I usually produce in excess of 100K per update, and I count 19 updates where I got around 4-5K. So what should have been around 1.9m has been 100K. I think that's a problem. With a slow protein, you will get massive fallout if you don't do something about this.

  25. #25
    Originally posted by Brian the Fist
    Ok, now I see what you mean, everyone is being credited approx 1/20 th what they should be. If somebody would send me an e-mail saying this, I could solve the problem a lot faster than the vague indications that 'something' is wrong here. Anyhow, I have corrected it (problem on our side, not yours). And before you ask, no, there is no way to easily give you back the 'extra' points that were not credited since we don't track production over time, however, again, keep in mind the problem affected everyone, and was only for a little over 24 hrs so only minimal harm was done stats-wise. In terms of the actual data (for those who care about the science), everything is fine, this was solely a stats issue.
    The above statement gives me great cause for concern. One of the big plusses about this project has been the approachability of the people running it, and the fact that they listened and acted upon the feedback from their supporters.

    Those that run the clients 24/7 are donating time & money in no small measure, and the attitude (OK there was a problem - it affected everybody, so we're not going to try and rectify it) is a very poor one IMHO.

    The science was unaffected - that is good of course, but the statement "this was solely a stats issue" suggests to me that the stats are now not worth any effort, and that is just not true.

    Another project tried to run roughshod over the people who crunched for them, ignoring requests for reliable statistics, and we all know what happened to that one.

    I sincerely hope that this is a one-off, and that in future a little more consideration will be given to those who pay for the science (namely the thousands of people who donate their computer time).
    The only good brew is a homemade brew

  26. #26
    Senior Member
    Join Date
    Oct 2003
    Location
    an Island off the coast of somewhere
    Posts
    540
    It's somewhat difficult to believe that the project doesn't track production rates at all, whether for stats purposes, or to simply measure the overall health of the project. I would think that they would want to know that suddenly the uploads fell to a trickle, or that the system was not accumulating credit for work done.

    Wouldn't it be a normal methodology to log uploads to the server and retain the upload work for a period of time, so that if something did go wrong, the work could be recreated from the logs? Or is the record of when and who submitted the work simply tossed into the dustbin when the insert to the database is done? Not a very good way to ensure systems recovery ability.

  27. #27
    Senior Member
    Join Date
    Jul 2002
    Location
    Kodiak, Alaska
    Posts
    432
    Howard took care of the problem over the weekend; and we should be thankful he took the time to take care of one of the weekend problems that all in IT dread.

    If the project doesn't keep hourly stats records; perhaps one of the trusted stats sites that keeps track of all the scores could produce a database of each person's score prior to the problem, their score when the problem was cured - and the difference would be what the scaling would be based on.
    www.thegenomecollective.com
    Borging.. it's not just an addiction. It's...

  28. #28
    They do keep production stats, I'm sure somewhere Howard has a cron job that takes a snapshot of the cumulative total and does some quick and dirty math just for his benefit. If not maybe he should, but even if he did, it's the bloody weekend and he was not there to baby-sit it.

    But keeping track of production rates is a far cry from having a journaled transaction system. Consider the enormity of what you are talking about? How many generations uploaded, each tracking time uploaded, generation #, handle, etc. Not only is this storing probably triple what their servers already handle for tracking stats (and lets not forget that it "dies" occasionally due to overload) but think of how long it would take to reconstruct all of that.

    Stats are great, People love to say I’m doing XXXX better that you. But it is not like Free-DC didn’t' get credit while ARS did. We all got "screwed” over the same. Please before you guys all go on and on about Howard not giving a crap. Remember that they are in this to solve science. Their MAIN concern is that the science continues and that it remains uncorrupted. Sure that means he has to keep the volunteers happy. Otherwise he won’t have any computer time to use to solve science. I for one am glad that Howard is not going to retask valuable computer resources to fix something that really does not make a difference to anyone’s stats. So we all just slowed down for a couple of, but we ALL slowed down the same rate, it didn’t' change anyone's status beyond what would have happened had this error not occurred. But I would rather have it all back online and have reliable uploads instead of having the server crash while it's trying to rebuild some useless fill data that all it will do is bump everyone up the same relative amount.

    Please get down off you soapboxen guys. Every time we have a little glitch everyone gets p.o.'d at Howard for ignoring us and trying to run roughshot over our "needs" . I thought you were MEN! (okay and the tough ladies out there) take it all as a challenge, can you keep your boxen running with an upload problem? Are you tech enough to write a script to convert your farm to no-net in the shortest possible time? Think of it as a challenge, and without challenge it gets pretty boring watching a counter creep upward, if that's all I wanted I'd watch the click meter on my copy machine at work! Lighten up guys. Many of you treat this like a video game, the one with the highest points wins. Whatever you win, who knows, but if thats all it is then definety lighten up cause it's only a game to you!

    That's just my opinion, I am probably wrong. But then it's my wrong opinion to make. I'll get off my own soap box now.

  29. #29
    Senior Member
    Join Date
    Oct 2003
    Location
    an Island off the coast of somewhere
    Posts
    540
    Without responding to all the "Let's be MEN" rhetoric, how about the backend of this project?

    I understood it was hosted on an HP9000/HP-UX box, probably on Informix (cheaper) or Oracle. Both of those RDMS provide for logged databases, with the capability of saving transaction logs ("logical logs" in Informix) to tape or disk. This provides the opportunity to restore to a known good point and roll forward all the transactions, this time with the stats interface in proper operating condition.

    If the failure occurred further upstream (upload server - which is doubtful since the science is OK) before the results get into the database, I would think that server would need logs or to save the live data until the insert into the backend database occurs successfully and the data can be purged.

    I sincerely doubt Howard is the only person running this project. How hard is it to assign rotating "duty" to the team members (including those grad students!) to keep an eye on the system, or even to have a cron job that emails a team member when some metric goes wonky? After all, this is a 24x7 project, not something that goes to sleep on the week-end.

  30. #30
    has been eaten by a grue.
    Join Date
    Jun 2003
    Location
    Detroit, MI
    Posts
    384
    I agree with Doc. Howard was wonderful to fix it on the weekend. I know my boxen don't run 24-7, and I get paid to make computers work for a living. There is no point in getting our knickers in knots here.

    Then again, I did get my stats fix by deciding to run ChessBrain along with DF, so I had something to check and to cackle about. They play well together, and CB takes very little, if anything, from DF. It is a decent way to deal with a slow protein.

  31. #31
    So we all just slowed down for a couple of, but we ALL slowed down the same rate, it didn’t' change anyone's status beyond what would have happened had this error not occurred.
    Unfortunately, that's not true. I climbed two places in my team rankings. I obsessively check the stats and I'm paranoid - so while my teammates were getting 1/25 credit, I was buffering.

    It would be nice if everyone was involved out of altruism and scientific curiosity, but that just is not the reality. Part of Howard's job is managing project resources - including the volunteers. Reading these forums, I've the impression that he doesn't take seriously the human aspect, which is a shame since waste and misuse will limit DF's effectiveness. A show of good-faith in repairing the points snafu would do wonders for our morale.

    If Howard doesn't keep enough records to reconstruct the missing points, perhaps one of the fine stats sites could provide the data? All that's really needed is how many points were posted for each active folder during the drought.

  32. #32
    Anyone who has been folding for a couple of client changes now, should, and must be aware that the stats, the points, the client, the up-load server, the who-knows-what-is-next, go BELLY-UP, on a regular, random basis... At the first sign of trouble, you must take evasive action on your own... Be it by buffering, or by going "teamless" when your team disappears, or whatever..... When my 1 online folder uploaded 41 points instead of the usual 1500-2500 points on Friday evening, that was it for uploads until I get an answer. My 15 machines buffered all weekend... And the answer, which came sooner then I thought it would, was just what we all should have feared, but should have been aware of: POINTS LOST FOREVER....

    So what are we gonna' do at the next sign of trouble that is looming in the very near future???? Do whatever it is to protect out points score... While this project is very worthwhile, us troops out here in the boonedocks, need something to spur us on, to keep folding, pay our sky-high electric bills, invest in new folders. And all there is out here, is the fun of the competation and accumalating points...

    Fold on.............!
    Too many computers, too little time......

  33. #33
    Senior Member
    Join Date
    Oct 2003
    Location
    an Island off the coast of somewhere
    Posts
    540
    Originally posted by QIbHom
    I agree with Doc. Howard was wonderful to fix it on the weekend. I know my boxen don't run 24-7, and I get paid to make computers work for a living. There is no point in getting our knickers in knots here.
    It indeed shows dedication to appear onsite on a Sunday to rectify a problem.

    I was merely trying to point out some obvious shortcomings that should be overcome to prevent future data loss.

  34. #34
    OCworkbench Stats Ho
    Join Date
    Jan 2003
    Posts
    519
    Well, for those locked in battle for places in the Stats, it sure made it interesting Those Buffering got a huge advantage for a 30 hour period against those constantly connected

    Never a dull moment folks, I am still unable though to fathom the way in which the Stats got messed up though. How did the points get altered..was there an actual formula to it, and if so, how the did that formula suddenly activate :bs:
    I am not a Stats Ho, it is just more satisfying to see that my numbers are better than yours.

  35. #35
    Indeed, Grumpy. If it is apparently so easy to divide the scores by 20, it appears that the argument against having different points for different proteins is invalid.

    1) Is there a formula being used for points? If so, what is it?
    2) Why did it change?

    Those who had access to their machines were able to buffer and so will get full points. Those that couldn't, got screwed. The impact wasn't universal and so a lot of people will be very ticked off, me included. An uncaring, blasé attitude from the admins is not going to help.

  36. #36
    OCworkbench Stats Ho
    Join Date
    Jan 2003
    Posts
    519
    I was just about to dump 500,000 points when I looked. Decided not to until things were right thank goodness Imagine the content of this post if I got 25,000 points instead. Definately would have had to be deleted by the Moderator
    I am not a Stats Ho, it is just more satisfying to see that my numbers are better than yours.

  37. #37
    It's not like you need any more bad news, is it?

  38. #38
    Originally posted by DocWardo
    They do keep production stats, I'm sure somewhere Howard has a cron job that takes a snapshot of the cumulative total and does some quick and dirty math just for his benefit. If not maybe he should, but even if he did, it's the bloody weekend and he was not there to baby-sit it.
    No, but there's plenty of grad students who could.



    We all got "screwed” over the same.
    That is just not true. If this project did not allow caching, then yes, but those who noticed there was a problem and immediately went -nonet have an unfair advantage over those who for whatever reason could not.



    That's just my opinion, I am probably wrong. But then it's my wrong opinion to make. I'll get off my own soap box now.
    Me too. All I was trying to point out was that this was the first time Howard seemed to not care about the stats - unlike a certain other project down in California......and that I was hoping it wouldn't be setting a trend.

    If I've offended anyone then I apologise, and no matter what happens it will NOT stop me from running DF - the Cause is worth the effort.
    The only good brew is a homemade brew

  39. #39
    Originally posted by RandomCritterz
    If Howard doesn't keep enough records to reconstruct the missing points, perhaps one of the fine stats sites could provide the data? All that's really needed is how many points were posted for each active folder during the drought. [/B]
    But Stats sites do not (and cannot) Track the generation of EACH of the work units uploaded! this is necessary to accurately recompute the points for each person.

    So oh well, you will get a jump on other people for today. Congrats you just played a fine hand of the game. and today you get to be one of the people who slow down the upload server for the rest of us.

    As to why it's necessary to have the points calculation utlize the generation number, look back in the forum archives.. but basically it's to encourge people to let these things go to full completeion and to give more credit for a full set of 250 generations of data(with a much more refined end product, at least that is/was the predection) than someone who is just doing the first say 100 and then dumping because the early generations are cruder simulations and thus should "go faster".

    as for an uncaring attitude? Yeah he's so uncaring that he didn't come in on the weekend to fix it. nope doesn't care about any of our concerns so he didn't include timestamps in the error logs with the last executable update. need I continue a list? What Howard said was NOT uncaring. All it he did was accurately state that it was not possible (beacuase the necessary data is not tracked on the level needed to do the job accurately) and even if it was what kind of computing power will he need to redirect to recompute everything? Sorry I seemly have jumped back on my soapbox again. All he said was he was able to correct the problem and that he would not be able to recreate the stats that were lost. It's not like he said, I could recreate the list, but I have a hockey game I want to go to so screw you all.

    Ah just forget it.

    /me jumps down off the soap box and kicks it to the side of the alley.

    I just remembed that those who want to be p.o.'d will be no matter what is said, so I'm just wasting space. Maybe it's just that time of year again where students start complaining about 1/2 points on 100 point exams. This sounds the same to me and well after a few weeks it'd get on your nerves as well.

  40. #40
    I sit in the middle as far as the impact of the 36 hour stats reduction was concerned.

    Half of my machines I switched over to no-net within 10 hours of the problem starting and the other half I got to at around 22 hours. As a result I was able to upload around 350,000 earlier today.

    That has to be an advantage over other people.
    Crunching for OCAU

Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •