Page 1 of 2 12 LastLast
Results 1 to 40 of 49

Thread: Attn: stats-mongers - old proteins

  1. #1

    Attn: stats-mongers - old proteins

    We are prepared to now accept uploads of previous proteins for credit after an update. However, we wish to discourage people from staying on a smaller protein because it is faster, if we switch from a small to a big protein for example. What if we only gave half-credit for the previous protein? But you could upload it until the next protein change? i.e. you could always upload the current protein, or the previous protein for half-credit. Anything older would still be refused. Only the current protein would be kept/ranked for energy/RMSD purposes of course.

    I want your comments on this and if acceptable I can implement it for the next release.
    Howard Feldman

  2. #2
    Senior Member KWSN_Millennium2001Guy's Avatar
    Join Date
    Mar 2002
    Location
    Worked 2 years in Aliso Viejo, CA
    Posts
    205
    Your proposal is acceptable... but I am reading it that this change is only being made as a concession to the stat-lovers.

    It would be even more useful if you could accept the old results AND make use of them for the SCIENCE aspect also. If you are only accepting the workunits to bump up a person's stats it is kind of meaningless. It would be better if you could continue to accept the "best" proteins submitted to add to the pool of "good candidates" for the analysis phase of the project.

    Just my .02 pence.

    Ni!

  3. #3
    Ancient Programmer Paratima's Avatar
    Join Date
    Dec 2001
    Location
    West Central Florida
    Posts
    3,296
    You need a definite cut-off for the science side of things.

    I would rather see full credit for 24 hours after a protein switch, use those units, and deep-six any arriving after that. That way, everyone should be willing to let their boxen roll over naturally. That would allow plenty of time for even the slowest machines on the slowest dial-ups and we won't get the huge traffic jams in the download aisle.

    ...and Happy Birthday!

  4. #4
    Psycho Penguin dnar's Avatar
    Join Date
    Dec 2001
    Location
    Perth, Australia
    Posts
    111
    I am sure the many dial-up and other users that are not online during the updates would appreciate this Howard. I also agree, you need to specify a reasonable cut-off period.

    Its a good proposition!

    BTW:- I still think it stinks that you can move your stats around, but thats another issue...

  5. #5
    Ok, so everyone is OK with the half credit part? And a limit of 48 hours after the changeover should be fine after which ONLY the current protein will be accepted? Any strong objections to that? I will have to work out a way to track the time accurately.
    Howard Feldman

  6. #6
    I would say :

    a) the first 6 hours full credits, so that even normal users can turn on their pc's and modems on normal times. we don't all live in The States you know.

    b) half credits for 24 or 48 hours after the switch.

  7. #7
    Keeper of the Fridge PY 222's Avatar
    Join Date
    Jul 2002
    Location
    San Jose, CA
    Posts
    2,706
    I would agree with stappel but would like to extend the time to probably 24 hours for full credit and half credit after 24 to 48 hours. Cutoff would be at 48 hours after the initial update of the client.

    That will be sufficient time for everyone, even those that are not in the States.

  8. #8
    I stopped DF for the CASP season because the time difference between the UK and Canada meant that my boxes always lost most of the night after the changeover. (The changeover usually ends after the end of the working day here and I can't manually upgrade - I switch on the systems automatically from a single copy of DF around 9pm UK time - some time after I have left in the evening. Having thirty+ PCs all switching on and then trying to update at the same time killed my company leased line on the changeover day and left them mostly spinning their wheels. If I ran them off net with a morning upload it meant they wasted the entire night after a changeover. )

    Personally, I would prefer full credit of the old protein for 24 hours after a change (all my boxes would have uploaded by then), but I would also want to know that the results were used (and useful) for the science - after all, that is really what the project is about!

    Ni!

  9. #9
    dismembered Scoofy12's Avatar
    Join Date
    Apr 2002
    Location
    Between keyboard and chair
    Posts
    608
    I think if you allowed full credit for 24 hours after the changeover, this should be ample time for boxen to change over by themselves regardless of their location and you shouldnt need any half-credit period. Do you think this would be enough to discourage people from hanging on to a fast protein? I guess then there are modem users to think about... Can they be fairly expected to be able to connect sometime within 24 hours after a changeover? I guess you could still do a half-credit period for a few days, but what if the old protein is twice as fast as the new? has this happened?
    As for large farms killing bandwidth, I still think it would be a neat idea to have the ability to get the update from some sort of local proxy. Maybe you could have the client check wherever it normally does for updates, if there was an update, the server could give the client a filename of a separate file (signed and all) that the client would look for, by default on the DF site, but alternately on a user-configurable location. It should be secure enough to be workable because of the signed-update architecture. You can do that during all that free time you must have in your cushy job :bs:

  10. #10
    Ancient Programmer Paratima's Avatar
    Join Date
    Dec 2001
    Location
    West Central Florida
    Posts
    3,296
    Originally posted by Scoofy12
    I think if you allowed full credit for 24 hours after the changeover, this should be ample time for boxen to change over by themselves regardless of their location and you shouldnt need any half-credit period.
    I agree. This should be plenty!

    Edit: Oops! I voted twice. Sorry.

  11. #11
    Target Butt IronBits's Avatar
    Join Date
    Dec 2001
    Location
    Morrisville, NC
    Posts
    8,619
    That would be damn good... 24hrs, full credit.

  12. #12
    Psycho Penguin dnar's Avatar
    Join Date
    Dec 2001
    Location
    Perth, Australia
    Posts
    111
    Originally posted by IronBits
    That would be damn good... 24hrs, full credit.
    Damn fine idea, 24 hours full credit.

    There has to be a way that autoupdate machines are more spread out over time, I see an ongoing problem where too many clients are attempting to download the new client (automatically).

  13. #13
    24 hours full credit would immensely help those in other parts of the world and would greatly encourage people to let the auto-update run its natural course.

    Also, 24 hours is not enough to make "cheating" and staying on a faster protein intentionally worthwhile.

    a) if they guess wrong, they get ZERO credit and..

    b) they would have to go to all the effort that people go to now trying to get full credit during a changeover and I think that most people here will tell you that it isn't worth the effort.

    The impact of having 24 hours leeway during a changeover would probably be pretty large in terms of network traffic and congestion as well.

    We should not have to sit at home to make sure we don't lose tens of thousands of credits during a changeover (I am talking about structures generated before the changeover or between the changeover and the completion of the current data set on that client machine). That is all that really matters to most of us.

    24 hours with full credit should be enough time for foreigners to not get screwed and for even slow computers to finish one data set. This 'feels' fair to me, atleast.

  14. #14
    Fixer of Broken Things FoBoT's Avatar
    Join Date
    Dec 2001
    Location
    Holden MO
    Posts
    2,137
    Originally posted by MAD-ness
    24 hours full credit would immensely help those in other parts of the world and would greatly encourage people to let the auto-update run its natural course.

    i agree, 24 hours to get changed over with no penalty, after that you are cut-off, no half credit
    Use the right tool for the right job!

  15. #15
    Junior Member
    Join Date
    Mar 2002
    Location
    Grand Rapids, MI
    Posts
    15
    I agree with what everybody else seems to be saying - 24 hours after the update to get full credit, then cut it off. Also, it would make me feel better if you could get some use out of the late-returned proteins.

  16. #16
    Junior Member
    Join Date
    May 2002
    Location
    Germany
    Posts
    5
    To this Matter.......

    We had to take 99 % of all Units out off this Programm,
    becose beeing sick, loosing dubble the Work - Units
    we done so far. All the ohter Devices are running SETI.

    All Points have to be counted in a fair Project.

    Behaps Brain may understand, how mutch Power he
    lost allone with us and also that the GERMAN TDSL
    Line only give one Line at the Time to send Data out.
    Due to a other Unit trying to upload, if this line is bussy
    the hole Net stopps and results in a Loss of all done Units.
    Look how many of 11000 User are finished, becose off this
    and other Reasens and gone for good.

    It seams to us, that there here are counting with enough
    Stupid-s, signing up annyway, and running there PC-s to Death,
    so it does not realy care.

    Count all the Work done, regardless on Time and bring this Projekt
    to a fair Standard to all Users.

    The Day may come, your count on your own, if your continuing
    taking Mebers just as Stupid,s.
    Sorry if this sound,s harsch, but after loosing Millions of Point,s
    your see the Result off this.

    Cheers

  17. #17
    Senior Member
    Join Date
    May 2002
    Location
    New Jersey USA
    Posts
    115
    My .02 cents
    24 hours full credit cutoff, IF the changeover(s) are not on or near a weekend. This allows people using work systems and manual uploads/updates not to make a special trip to the office.

  18. #18
    Ok, thanks, I get the idea. So you wil have 24 hrs and get full credit for the previous protein, and up until 48hrs you will get half credit (just in case). This will be effective AFTER the next protein update (so the current protein will still be invalid as soon as we changeover).

    I would like to try to address the problem of people with lots of computers but that only run the software nightly. If you start it up on all your machines with a cron script or something all at the same time, they will all download at the same time and wastefully clog the network. Having an intermediate local server to put the update on would be an interesting idea but can anyone propose exactly how this might work with minimum hassle?
    Howard Feldman

  19. #19
    A caching proxy, perhaps?

  20. #20
    Fixer of Broken Things FoBoT's Avatar
    Join Date
    Dec 2001
    Location
    Holden MO
    Posts
    2,137
    you are saying setup a proxy server system like RC5 and SETI have, i believe

    i haven't done either, IronBits from Free-DC is running a RC5 proxy, perhaps he can comment
    Use the right tool for the right job!

  21. #21
    Bottom of the Top Ten TheOtherZaphod's Avatar
    Join Date
    May 2002
    Location
    zone 5 west
    Posts
    100
    Adding my $.04:

    First, thank you for your efforts to resolve this issue.

    Second, under the category of what I personally am looking for. All my machines are connected full time. All I really need is to have the server accept the results for the previous protein when it hands out the new one. Since you have been tailoring the number of structures between uploads to the size of the bugger, all I really need is about 6 hours or so to acommodate my slowest machine (a PII/450). I would greatly prefer full credit to half, but a grace period of 6 to 8 hours would be adequate, and 24 would be extremely generous. I would prefer that it be kept to the short side in fact, to discourage people from delaying their switchover when the target size increases.

    There it is, and again, thanks.
    Don't Panic

  22. #22
    dismembered Scoofy12's Avatar
    Join Date
    Apr 2002
    Location
    Between keyboard and chair
    Posts
    608
    24 hours should be plenty of time for any computer with an always-on or on-demand connection to run its course and update automatically.
    Here's a suggestion for modem users: since during CASP season we know when updates will be, why not add a feature that could schedule an upload? Most modems are or can be confgured to dial when some application requests a net connection. Alternately this could even be added to DFgui rather than the client itself, since the ability to manually upload is already there.

    Originally posted by Scoofy12

    As for large farms killing bandwidth, I still think it would be a neat idea to have the ability to get the update from some sort of local proxy. Maybe you could have the client check wherever it normally does for updates, if there was an update, the server could give the client a filename of a separate file (signed and all) that the client would look for, by default on the DF site, but alternately on a user-configurable location. It should be secure enough to be workable because of the signed-update architecture. You can do that during all that free time you must have in your cushy job :bs:

    Perhaps to elaborate on this a bit: For security reasons I think the master location checked by the client to see if there are any updates available should remain the same, the distributedfolding.org servers. However, there could be a proxy server. This proxy server could, periodically or manually, check for updates. If found it could download an update from the master, or even from another proxy. Then, when clients find that there is an update, rather than downloading it from df.org, they could instead check a (user-configurable) proxy location, and download the (signed) update, thus conserving bandwidth over any kind of shared line.
    Likewise, they could maybe even upload WUs to this proxy, which could then queue them for sending to DF, either manually or at a scheduled time. This wouldn't be very difficult especially considering that there are no unique WUs to download to clients.


    "It takes a certain special kind of ego to quote oneself"
    -My friend Justin from high school

  23. #23
    Target Butt IronBits's Avatar
    Join Date
    Dec 2001
    Location
    Morrisville, NC
    Posts
    8,619
    I will defer any comments on how to setup a proxy server to Dyyryath, no matter how busy he is
    My limits are to someone who already wrote the proxy server software package.
    I can help ya get it running, but I have no idea how it 'really' works.

  24. #24
    I see a few ways to implement a proxy. One would be a 'peer-to-peer' methodology - publish the client out to one of the popular peer-to-peer networks, then use one of the APIs for interfacing to them. This is probably the most engineering intensive - but practically guarantees normalization of spiky bandwidth.


    Another would be a central file drop/cache which is what I do for the machines that I have set-up to not autoupdate. Effectively I automagically wget the download page every five mins around the proposed change time and check the date/time of the "* Most recent client version released _____, 2002 " string. If it's later than what I've cached, it downloads it and touches a file on an nfs mount. A simple script/cron on each node checks the date/time on the touched file once a minute and if it's changed in a sample period, it removes its .lock file and pulls down the new client, unpacks and executes... I've found this to be far more reliable than the autoupdate which seems to puke about 1/3 of the time for me.

    Finally would be a true caching proxy - but those have their own inherent issues and I'll let someone else flesh that out.

  25. #25
    HEre is the method Ive decided upon for now which seems logical to me. I wrote a 'daemon'-like program (will be built for all the supported OSes) which checks out server for updates every 5 minutes. When a new version is detected, it downloads it for whichever OSes you specify in the config file and places it in some directory you specify. This local machine must be running a web server and put the files somewhere in the web server directory tree.

    Then on your local client, you can make a local config with the URL for your local webserver. IF present, then when an update is available the client will download the update from the local URL and not our server. If it fails to get it from the local URL(perhaps because it didnt download it yet) it will keep trying every couple minutes until it succeeds, it will never contact out server directly for the update (as long as the local config file is present).

    I hope to have this in place in time for Tuesday's update. Any comments, ideas are wlcome until then, especially from Millenium Guy or Jodie or DATA who have big farms.
    Howard Feldman

  26. #26
    I'm guessing that both the daemon and the client will still do the signature checking before accepting the new version correct?

    Jeff.

  27. #27
    Thanks Brian!

    Let us know when you're ready to start testing it. I can test on Win2k, Irix, Solaris and Linux (2.4.x)

    My hack seems to work ok, but I'll happily run an 'official' methodology...

  28. #28
    I am not knowledgeable enough to comment on the methodology, but the concept is one that is very good (and needed).

    I know that TSF lost one of our main crunchers because he was tired of managing weekly updates on a home farm that consisted of what I imagine must be 20+ computers (he has 25ghz, so I am guessing at 20 computers) and I know that having the 4 or 5 machines I at times run here at home autoupdate is a pain, because it clogs my puny DSL for quite a while doing multiple downloads.

    Anyways, one further step to making the project more friendly to those who have larger networks of computers running DC clients.

  29. #29
    Psycho Penguin dnar's Avatar
    Join Date
    Dec 2001
    Location
    Perth, Australia
    Posts
    111
    Originally posted by Brian the Fist
    This local machine must be running a web server and put the files somewhere in the web server directory tree.
    This is great. One external download from the project for an entire farm! Nice.

  30. #30
    Senior Member KWSN_Millennium2001Guy's Avatar
    Join Date
    Mar 2002
    Location
    Worked 2 years in Aliso Viejo, CA
    Posts
    205
    Originally posted by Brian the Fist
    HEre is the method Ive decided upon for now which seems logical to me. I wrote a 'daemon'-like program (will be built for all the supported OSes) which checks out server for updates every 5 minutes. When a new version is detected, it downloads it for whichever OSes you specify in the config file and places it in some directory you specify. This local machine must be running a web server and put the files somewhere in the web server directory tree.

    Then on your local client, you can make a local config with the URL for your local webserver. IF present, then when an update is available the client will download the update from the local URL and not our server. If it fails to get it from the local URL(perhaps because it didnt download it yet) it will keep trying every couple minutes until it succeeds, it will never contact out server directly for the update (as long as the local config file is present).

    I hope to have this in place in time for Tuesday's update. Any comments, ideas are wlcome until then, especially from Millenium Guy or Jodie or DATA who have big farms.
    It would be great if it could be a fileshare instead of a webserver, but in my case either would be fine. Upon thinking about this, probably anybody with a farm could spare a little space on an existing webserver. I like the concept and will be glad to test it when you have a version ready.

  31. #31
    Psycho Penguin dnar's Avatar
    Join Date
    Dec 2001
    Location
    Perth, Australia
    Posts
    111
    Originally posted by KWSN_Millennium2001Guy


    It would be great if it could be a fileshare instead of a webserver, but in my case either would be fine. Upon thinking about this, probably anybody with a farm could spare a little space on an existing webserver. I like the concept and will be glad to test it when you have a version ready.
    A HTTP server would be best I feel, much less setting up required, for starters you dont need to handle mounts on each node. Yet again, not everyone has a HTTP server running on a system.

    For me though, I already have Apache running on my server, so its all too easy.

    Howard:- I feel the 5 minute check peiod for the daemon may be a little short... How about configurable?

  32. #32
    Ok, I think I have a 'beta' of the daemon ready and functional.
    You'll have to suffer through Tuesdays update without it but for the NEXT update you should be able to make use of it if there arent any serious bugs
    It will be added to the Download page on the main site, along with instructions on its use.

    And I think if anyone has a compute farm, their bound to have at least one machine with a web server running (or could install one just for this purpose - apache is pretty trivial to set up, even on Windoze).
    Howard Feldman

  33. #33
    I managed to bumble my way through an Apache install a couple of weeks ago on a system runnning WinXP. Wasn't that hard.

    Now, configuring everything else to work with Apache and installing all the other junk that you end up wanting most of the time is a different story, but actually installing Apache was pretty painless if I remember correctly.

  34. #34
    The Cruncher From Hell
    Join Date
    Dec 2001
    Location
    The Depths of Hell
    Posts
    140
    Hmm, sounds great.
    My main farm won't be able to use it, as there is no way apache can make it there, but for home it will be most helpful.

    This really rocks.
    BTW, unless it barely uses bandwith, could the app not hit the server every 5 mins? That's a bit excessive, especially since when CASP is over changes won't happen very often.

    --Scott

  35. #35
    "hitting" the server should be a matter of sub-1k-byte of transfer.

    Of course, none of us have gotten to sniff it yet - but my imagination suggest a request-file-doesn't-exist-ack. That's only a few hundred bytes.


    SimpleRequest
    Content-Type: plain/text;
    GET <file://filename or http://filename> CrLf
    HTTP/1.1

    Or along those lines... Real bits don't come into play until a file is actually there for download.

  36. #36
    Junior Member
    Join Date
    Jul 2002
    Location
    Northfield, MN
    Posts
    7

    Post

    About the whole 100% for the first 24 hours and 50% for the 48 hours after that...

    I know that this has probably been discussed before, but why not go with a weighted stats system? I'd recommend giving a stats point for every 100 residues calculated instead of a point for every structure examined.

    If you were to implement this, you would not have to worry about people scumming for stats with each protein switch. It would also be a lot easier for teams to track their progress and improvements in the project. It seems like it would be really easy to implement, too, as long as your servers have access to the number of residues associated with each protein. The conversion time from structures to stats units would be trivial (a matter of 2 multiplications).


    Crunch Something

  37. #37
    Psycho Penguin dnar's Avatar
    Join Date
    Dec 2001
    Location
    Perth, Australia
    Posts
    111
    Another solution for the server conjestion during changeover is for the next update to be issued progressively over a number of days, and then activated when required.....

    It makes more sense to have the next update on our systems several days before hand, issued at the servers leasure. The client can be changed-over at our end on response to a server issued command.

    Just an idea....

    BTW: For large farms, one download from the project server for the entire farm still makes sense.

  38. #38
    Originally posted by Brian the Fist
    ...This local machine must be running a web server and put the files somewhere in the web server directory tree...

    ... Any comments, ideas are wlcome until then...
    Sorry, I'm a bit late on this one - we got hit by the Frethem virus and another more destructive unidentified one using the same propagation mechanism and it has taken days to clean up (six systems needed a reformat )

    The single local update idea is really good, but the web server version is not workable here as all the web servers are effectively off-site and locked down.

    Is there any chance of providing simple file access rather than web access? There are no suitable servers on-site that could be used as a web server, but there are three fully available file servers (one of these already controls the scheduling of the 30 odd DC machines, so would be ideal for doing the updating).

    In the meantime I've put in an order for a new T1 link, so the bandwidth congestion at this end should be lessened - combined with the 24 hour grace period I think I might be able to live with it...

  39. #39
    Do you have a windows box on the desktop someplace?

    There are trivially tiny web server-ettes that will live on a desktop quite happily...

  40. #40
    Bottom of the Top Ten TheOtherZaphod's Avatar
    Join Date
    May 2002
    Location
    zone 5 west
    Posts
    100
    Ok, I'll bite...for instance?
    Don't Panic

Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •