Page 5 of 5 FirstFirst 12345
Results 161 to 180 of 180

Thread: Input wanted on planned new algoritm from users perspective

  1. #161
    My last message composed BEFORE Brians "ad nauseum" message. Sorry.

    I'm sorry, but I must point out that.

    Zaphod said:

    Just about the top 80 contributors are needed to get to 50%.
    Correct me if I'm wrong, I'm surely no mathmagician, but does that means that 1/28th of the TOP CURRENT CONTRIBUTORS, contributed 14/28th's of last week's production?

    That seems like a lot, per capita?

    FBK

  2. #162
    Bottom of the Top Ten TheOtherZaphod's Avatar
    Join Date
    May 2002
    Location
    zone 5 west
    Posts
    100
    LOL, just how many machines do you think those 80 people have involved in the project? I would guess that there is the equivilant of over 1000 full time machines in that group. You ran a "small" farm yourself; how many boxes was that?
    Don't Panic

  3. #163
    Exacltly. Certain folks such as yourself, and to a lesser extent myself, run farms, either at home or at work.

    Accepting your stats, I come to a startalingly, different conclusion.

    You are surprised that the top producers, numerically, produce a relatively small ammount of the total production.

    I think that the large producers, such as yourself, and others, produce not only big numbers, but also a disproportionally large percentage of total production.

    Sorry for that long sentance.

    My percentage figures, based on your figures is as follows:


    The top five contributors produced over 10% of last weeks total.
    *The top .2 percent, (1/5 of 1 percent), (5 users out of 2308), of "ACTIVE" contributers produced 10% of last weeks total !

    * The top .5 percent (1/2 percent), (12 users, out of 2308)of "ACTIVE" contributers produced 20% of last weeks total.


    * The top 1.29 percent, (30 users, out of 2308)of "ACTIVE" contributers produced 33% of last weeks total.


    * The top 3.5 percent, (80 users, out of 2308)of "ACTIVE" contributers produced 50% of last weeks total.


    * The top 8.7 percent, (200 users, out of 2308)of "ACTIVE" contributers produced 66% of last weeks total.


    FBK

  4. #164
    Opps sorry for the double post. I really must get out and post more often
    Last edited by FBK; 01-22-2003 at 10:44 PM.

  5. #165
    Could someone summarize what's going on with the new algorithm and the stats system? Thanx.

    It's a good thing to see the project managers want to hear from the users and are making the project user oriented.

    Cheers!

  6. #166
    reader50...thanks, I promise to say nothing but nice things about Macs for the rest of the week, I'll even drop into our graphic design shop and say it to them (Mac loyalists)

    Howard, ultimately I believe that you will do what you feel is best for the project. I am glad that I don't have to be the one to make the decision. I'm sure we will all learn to live with it over time, no matter what the decision.
    ExtremeDC...Do YOU Have What it Takes?

  7. #167
    Originally posted by Brian the Fist
    It is not out of the question that WE could host 3rd party stats on our servers, provided you were willing to give us your code of course, and provided it didn't eat up a lot of our bandwidth which is all that we are short on. We have no shortage of CPU or disk space really
    Our current strategic plan does not call for tracking all members, or all teams. We just want to show off what cool stats our team has, as in "wow, let's join them!" This strategy does not require all teams to be tracked. Our motivation is team competition, and the offer would not fit. If special references to a specific team were removed, our team gains nothing. If such references are left, the project is playing team favorites on officially hosted pages. However, I'd like to comment on the offer in general. Like Dyyryath says, it's incredibly generous. Also risky, I would recommend against doing such a thing.

    First, there are security problems. My code is approaching 20K lines of code for tracking Distributed Folding. It could reasonably reach 30K lines before it has all of the currently planned features. Putting that much foreign code on a project server ... my code would not go looking for passwords or emails, but how could you be sure? Howard would have to sift through all that code, if only to protect the other project servers. It could take weeks to be sure about all of it, assuming Howard worked on nothing else during that time.

    When I have a bugfix ready, or a new feature, he would have to review the foreign files again. Bug fixes or feature upgrades often involve modifications to dozens of site files. At the very least, I'd have to nag him each time to replace certain files. He could give an outside party (me) access to one of the project's servers and skip the file checking/nagging, but that would be even worse. If the donated server were separated from the project servers, then it would require additional bandwidth to reach the source data.

    Any codebase that does serious 3rd party stats unattended is going to be large, it has to be to deal with so many things that can go wrong. Also, it will be large so it can present lots of neat data to the stats connoisseur. ps, I like this term much better than "stats-ho"

    Second, it is generally assumed that 3rd party stats relieve bandwidth demands on the project. Users increasingly visit the 3rd party stats instead of the basic project pages. Basically, the cooler stats pages are, the more hits they are going to draw. Having rather plain project stats pages is all we need for 3rd party stats, the magic comes from analysis of data over time. And plain project stats pages help send people over to those private stats servers, with their own bandwidth. I expect that having heavy stats available on the project servers will result in more people finding those pages. More visits, more page hits per visit, and generally larger pages being served up.

    Third, there is another good reason to have plain pages on the server. Our pages contain a multitude of links leading back into the page. Column sort links, links to the next member's Personal page, alternate team links, page modification links, etc. Plain project pages have none of these things.

    Several days ago, a Harvester hit our server and got stuck in the Free-DC dFold section. Personal pages, to be specific. It followed every link. It tried and Tried and TRIED to find emails for Free-DC members. After about 40 hours, we firewalled it off, but it had certainly saturated the box or our upload bandwidth during some of that time. 7K hits and 295 MB of page files, and it did not even find a single email for all that trouble. The only email in our pages is the team contact link, and that one is ASCII-encrypted, current harvesters cannot read it.

    Today's plain project stats pages would be loaded once, and left alone.

    We will be passing on the offer at this time, but I'd recommend great caution in allowing anyone to use it. Seems like it brings too many problems, along with tons of bandwidth demands.
    Get Addicted

  8. #168
    Administrator Dyyryath's Avatar
    Join Date
    Dec 2001
    Location
    North Carolina
    Posts
    1,850
    Originally posted by reader50
    It tried and Tried and TRIED to find emails for Free-DC members. After about 40 hours, we firewalled it off, but it had certainly saturated the box or our upload bandwidth during some of that time.
    Well, we're a very popular bunch of people, you know.

    All joking aside, reader50 has brought up some valid points. He's probably right that hosting 3rd party stats isn't a good idea, but it sure was a cool thing for them to offer.

    Actions like those just keep me firmly rooted here at DF.

  9. #169
    Please correct me if I'M wrong, but no attempt was ever made to equalize stats since the begining of the DF project. I.E., any given machine, running since the inception of DF, might have folded 4500 folds per hour, on one target. 11,000 fold per hour, on the next target. And 3,200 fold per hour on another target.
    And at the risk of furthering the "ad nauseam" the above quote from FBK is the heart of this issue. The project has never tried to have a consistent scoring approach. Unlike Stanford which has different scores for each protein, DF gives a point per structure. Some are quick, some are slow. When the new client came in that re-analysed already analysed proteins and was therefore much faster, did anyone complain that it was unfair? Not that I saw.

    If the new method awards twice as many points as now, it won't be any different from what has gone before. The scoring method has never been "fair" so leave it as it is, please.

  10. #170
    OCworkbench Stats Ho
    Join Date
    Jan 2003
    Posts
    519
    I am with you HaloJones, less than 30% of people in OCworkbench who responded said they wanted the Stats set to 0. Just leave it as it is and see how it goes, if people think it is too out of whack then we will see what can be done. But no one has ever complained seriously before....

  11. #171
    25/25Mbit is nearly enough :p pointwood's Avatar
    Join Date
    Dec 2001
    Location
    Denmark
    Posts
    831
    Well, there is a difference between DF and F@H in that we all crunch on the same protein all the time. On F@H you can get several different types that takes various lengths of time to crunch.

    If I had to choose, I think I would reset the stats, mainly because what I find fun is the race and competition and currently we are so far ahead that we just have no competetion. A reset would give all other teams a big oppertunity to ramp up and give try to compete with us.
    Pointwood
    Jabber ID: pointwood@jabber.shd.dk
    irc.arstechnica.com, #distributed

  12. #172
    Stats, stats, stats - I read a lot about stats.

    The more important question is: With the new DF algorithm, will results be uploadable from any computer regardless where they have been calculated or not? If not, many members from our team will have to say good-bye to this project.

    After the CASP5 results have now become available, one can conclude that this project has produced moderately good results. Moreover, as I suspected above, the announced changes in the algorithm are a direct result of this CASP5 result feedback. Hence, the current algorithm has proven not to be sufficient to continue with it. As a consequence, I would like to ask how long it will approx. take until the new algorithm will be in place as I don't see much reason to continue computing with this one.

    All the best,
    Michael.

    P.S.: Don't get me wrong. Without the past efforts in supporting this project we would never have learned how to improve it. I am looking forward to the new approach hoping that the "upload problem" will be solved.
    http://www.rechenkraft.net - Germany's largest distributed computing community

    - - - - - - - - - -
    RNAs are nanomachines or nanomachine building blocks. Examples: The ribosome, RNase P, the cellular protein secretion machinery and the spliceosome.

  13. #173
    Release All Zigs!
    Join Date
    Aug 2002
    Location
    So. Cal., U.S.A.
    Posts
    359
    Howard,

    Can we run two stats engines? Say the main stats will zero phase I , and be phase two counting only, but old stats will still be available. The secondary stats engine donated by someone would do the combo of the two projects. And to help offload the extra bandwidth, mirror the secondary stats engines at sites people are willing to contributed some space and the bandwidth to host it?


    One question though before anybody comments on that... I haven't seen anybody really ask this. From what you are saying it seems the scoring system will be completely different for Phase II. It seems that it will be point valued based, but Phase I is structure based. So how do you mix those two? 6 oranges + 3 apples = 9 fruit? Does that really mean anything mixing them?

    Personally I am in favor of only counting Phase II work once we begin that... but my question remains regardless of that. Is no one else seeing this? Am I just totally missing something here?

    Best,

    RuneStar½

    P.S. I'm running the screensaver on both my systems, Howard. =)
    The SETI TechDesk
    http://egroups.com/group/SETI_techdesk
    ~Your source for astronomy news and resources~

  14. #174
    OCworkbench Stats Ho
    Join Date
    Jan 2003
    Posts
    519
    Agreed, to lose the ability to upload remote computers output would be a hard blow. Contrary to all reports, money is easy to find for research, it is the human resources that are hard to find. Money cannot magically produce people who can do what the folding community provides. We may be a bunch of cranky pants, but we are productive little cranky pants

    I just can't wait to see the new client in action, and just double my medication if they zero the stats

  15. #175
    Senior Member
    Join Date
    Apr 2002
    Location
    Santa Barbara CA
    Posts
    355

    Re: Input wanted on planned new algoritm from users perspective

    Originally posted by Brian the Fist

    < snip >

    - as a side effect, each machine will need to be uniquely identified so it will be not be easy to, say, generate on one machine and then upload from another - this is necessary to avoid other nasty potential problems but should not affect people you use proxy servers or firewalls, only physically moving around data will cause trouble.
    - you will at most be able to buffer 50 generations (about 2 days work (maybe)) but this could change

    Ill add more stuff as I think of it. Comments and complaints are welcome. Keep in mind some concessions will be necessary to get this new more complicated algorithm to work, which may include alienating some users with special needs.

    EDIT:

    Actually, I think we can do it without those last two points, so scratch those off

    So Howard, I guess after all the mention of those last two points, the next time that you edit you would just scratch those off yourself instead of telling us to.

  16. #176
    25/25Mbit is nearly enough :p pointwood's Avatar
    Join Date
    Dec 2001
    Location
    Denmark
    Posts
    831
    Originally posted by FBK
    As an "Old User", I think that I would prefer my stats to be worth less, as opposed to my old stats being worth nothing at all.
    I don't think your stats suddenly become nothing worth, but I know what you mean. I personally think your stats are worth just as much as they where before. Actually to me and the rest of Team Stir Fry, I think they are worth more since if Howard decides to reset the stats. We can forever claim to have "won" the first phase. I don't care that much about that though since it's not about who have crunched most, it's about who got the best structure
    Please correct me if I'M wrong, but no attempt was ever made to equalize stats since the begining of the DF project. I.E., any given machine, running since the inception of DF, might have folded 4500 folds per hour, on one target. 11,000 fold per hour, on the next target. And 3,200 fold per hour on another target.
    Good point
    Pointwood
    Jabber ID: pointwood@jabber.shd.dk
    irc.arstechnica.com, #distributed

  17. #177
    Release All Zigs!
    Join Date
    Aug 2002
    Location
    So. Cal., U.S.A.
    Posts
    359
    Well, there has been no point value assigned to the stats. Its just been a raw number. Then again, we've been doing brute forcing primarily of the structures so there has been a whole lot of point of any kind of value system.

    RS&#189;
    The SETI TechDesk
    http://egroups.com/group/SETI_techdesk
    ~Your source for astronomy news and resources~

  18. #178
    Originally posted by m0ti
    ... In the meantime, there's always dfDetect which is a little Win32 utility I made (by team-mate request's) that does some worthwhile stuff:

    - works for DF installed as a service (or multiple services) or CLI.
    - make sure DF is always running
    - run DF in completely hidden mode (useful for Win9x users)
    - restart DF after X minutes have passed
    - stop DF/keep DF from running while program X is running (great for corporate farmers with CAD machines/gamers).

    In any case, I'll probably be back doing the dfQ thing soon (I hope). February perhaps will see a relase (I was originally hoping for January, but, life got in the way ).
    m0ti,

    I tried to download dfDetect but I got an error message saying that the server was unavailable

    I would also be very interested in dfQ... I've got access to about 30 pcs that don't have Internet access... I sure would like to keep them busy on a great project.

  19. #179
    apparently the worm that took out large tracks of the net yesterday affected the server there, too (well actually the proxy through which all traffic flows to and from that server).


    It's now also available here. Rename to zip.

    Ok, I'll go out on a limb here and promise to have some usable version of dfQ by March 1st. This version will more than likely be in Java, though it will be littly trouble to port over to C++ and run natively.
    Team Anandtech DF!

  20. #180
    Senior Member
    Join Date
    Jan 2002
    Location
    England, near Europe
    Posts
    211
    Originally posted by m0ti

    Ok, I'll go out on a limb here and promise to have some usable version of dfQ by March 1st. This version will more than likely be in Java, though it will be littly trouble to port over to C++ and run natively.
    I'd also be very interested in dfQ
    Train hard, fight easy


Page 5 of 5 FirstFirst 12345

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •