Results 1 to 13 of 13

Thread: Fair stats?

  1. #1
    25/25Mbit is nearly enough :p pointwood's Avatar
    Join Date
    Dec 2001
    Location
    Denmark
    Posts
    831

    Fair stats?

    It may be too late and I'm not the one that care a lot about this, but I bring this up because there lately have been some talk about the stats and the fact that they gives a big advantage to the older teams/participants.

    The reason is that the older teams have been making "easy points" when crunching the former, faster proteins. People just joining now will have a really hard time climbing up the stats considering how slow this new protein is compared to the previous proteins.

    Yes, I know it's the science that matters here, especially now with the CASP5 competition, but let's face it: For a lot of participants in projects like this, it's the stats that really matters, that's what motivates them. The fact that the stats system isn't fair, can easily make people angry and make them prefer another project (and there are lots of them today...).

    As I said, I don't know whether it is something that should be done anything about (optimally it should have been decided when the project started) or even if it is at all possible to create some kind of system that makes it more fair. Something like if the new protein is 3 times slower than the last one, then each generated structure should count as 3x in the stats.

    I believe it would be a good thing to discuss this now instead of later. Maybe the answer is that it is not possible but then we at least know that and the reasons why, when new participants asks about it.

    EDIT: I moved this to a new thread instead if hijacking an existing one.
    Pointwood
    Jabber ID: pointwood@jabber.shd.dk
    irc.arstechnica.com, #distributed

  2. #2
    This has already been discussed earlier. The length of the protein is NOT necessarily an indication of how long it will take. And the proteins will NOT keep getting longer and longer indefinitely. They will vary in size, especially for CASP. 180 is on the border of what is currently "worth trying" with the present approach. Anything bigger would produce such terrible structures that they wouldn't be worth submitting to CASP. Of course we may not need to solve those proteins by brute force always..

    Anyways, it will all balance out in the end and the scoring will not be touched in that respect.
    Howard Feldman

  3. #3
    The early adopters (which I'm not one) tested the system, tolerated the bugs, and were a big part of where it is today (one assumes from reading the archives)

    Why shouldn't they have an advantage?

    The reason I despise the word "fair" [shiver] is the age-old question:

    Fair to whom? Fair to the newcomers or fair to the early adopters? You can't have it both ways.

    I'm not an early adopter. Stats are significant to me. Ergo - I added 3x the number of machines and therefore climb just as efficiently up the ladder.

    As we said when I was married to a race-car driver:

    Racing is a Gold Card sport. Speed==Money. How fast do you want to go?

  4. #4
    The stats system in a project like this should not be based on how many proteins you fold, but how much resources you contribute. I believe that GIMPS has solved this in an excellent way. In their stats system, which I am sure some/many/most of you are familiar with, the stats unit is one pentium 90 cpu year.

    Measured in calibrated P5 90Mhz, 32.98 MFLOP units: 25658999 FPO / 0.778s using 256k FFT

    as taken from their website http://mersenne.org/primenet/ theird paragraph.

    Just think what it would look like if your GIMPS-stats where based only on how many exponents you had completed! That would favor the good old boys who where there from the start. In distributed folding the effect would be even stronger, considering how fast the transition from 30-something (before my time) through 76 to 182 has gone.

    The project leaders really need to understand that there are many stat-hoes (is there any kind of automated censorship on this forum? I understand it is not uncommon to censor harsh language in foreign countries) in the DC-world, and they want a fair stats-system. They also love a very detailed stats system, where you can compare yourself to other users in many ways, like the system used by SETI@home.

    These are the different criteria by which you can compare yourself to others in the SETI@home project:

    Countries
    Domains (.com, .edu, etc.)
    CPU types (Pentium, SPARC, etc.)
    Operating Systems (Windows, UNIX, etc.)
    Platforms (combination of CPU and OS).
    Location (work, home, school)

    Just something to consider for howard and the other administrators.

  5. #5
    As long as everyone is working on the same protein I see no reason to weight the results. I can see where the opposing viewpoint is coming from, but if you are outproducing a team, you are outproducing no matter the size of the work unit. It was an issue in G@H because you could grab a small unit and run it indefinitely for a greater advantage over other crunchers. That problem does not exist in this situation. It's completely level between all teams. That's my take anyway.


  6. #6
    Originally posted by Jodie
    Ergo - I added 3x the number of machines and therefore climb just as efficiently up the ladder.
    3x wouldn't be so bad except some of us got spoiled on a speedy little 36 residue critter that was almost 4x faster than the 76 we just ended. That means you would need 12x the machines to climb as fast as some were earlier. It is disappointing to see production drop from 1.2mil to 60K with the same hardware. (My older processors seem to really hate this new monster. They seem to have slowed disproportionately)

    Even if I didn't have a bit of a lead over other people I wouldn't want a change in the way accounts get credited. The method of tabulation was set at the beginning and things shouldn't change mid way for that reason.

    I do, however, think that separate "total" and "this structure" statistics would be a decent way to track things going forward but I won't hold my breath to see it implemented officially.

  7. #7
    Everyone have their favorite.
    I am a newcommer to DF, but
    It is like Jodie said:
    they was the beta testers, and deserve to be high in the states.
    Proud member of The Genome Collective

  8. #8
    As a wise person once said,

    One can please all of the people some of the time, and some of the people all of the time, but you can't please all of the people all of the time.

    As long as no one is raging mad, we're happy basically.
    Howard Feldman

  9. #9
    Junior Member
    Join Date
    May 2002
    Location
    Sherbrooke, Québec
    Posts
    27
    I hope the next one will be smaller because this one is like 3 times longer to do and it give the same *** score

  10. #10
    well, if the proteins are going to stay smaller than 180 residues for some time, and they are going to vary in size, it should even out. I read in an other thread about a proposal for a "dual stats system", a total an a per protein part. I think it's a good idea.

    "some of the people can be all right part of the time, all of the people can be part right some of the time, but all of the people can't be all right all of the time! I think Abraham Lincoln said that. I'll let you be in my dream if I can be in your's! I said that."

    Bob Dylan, Talking world war III blues, The freewheelin' Bob Dylan, 1963.

  11. #11
    S-MDC Project DATA's Avatar
    Join Date
    Mar 2002
    Location
    Scotland
    Posts
    86
    Hi ppl

    there is no way to supply stats that will please ppl as everyone has his/her idea of what is fair - existing and new members have signed on for various reasons but the main reason is to assist DistributedFolding in making medical research history on the work that is accomplished by all small and large cruncher alike and knowing we are doing this for the eventual benefit to the human race should give us all the ego kick we need to continue and give off our systems best to Howard and the DF Team for one of the best ran DC Projects on the net

  12. #12
    25/25Mbit is nearly enough :p pointwood's Avatar
    Join Date
    Dec 2001
    Location
    Denmark
    Posts
    831

    Thumbs up Thanks!

    I'm of the same opinion - if anything should have been changed, it should have been done from the start.

    Anyway, now I/we have an answer to that question, which was what I wanted
    Pointwood
    Jabber ID: pointwood@jabber.shd.dk
    irc.arstechnica.com, #distributed

  13. #13
    In my view it shows poor thought by the originators in regards to stats.

    It's like a farmer paying his workers for each bail of hay. He should be paying them by the weight. Then when the bails get heavier he pays more.

    Forget about the past and early adopters, it's the future that matters Every DC project I have been involved in have been terrible with trying to get the stats even ! This being no exception.

    It would at least be a start to pay by the number of AA's and not the number of proteins.


    Regards

    An


    ( sorry, I can't give you my full name as I've changed the way I write it so that unless you increase your computer power by 500% you won't see what you used to see )



Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •