Input wanted on planned new algoritm from users perspective

**frozenchosen** · 01-16-2003, 03:24 PM

I like zeroing out the stats. Make a page that tells us who the winners were for phase 1 so that we can all go and pay our respects and admire their fine accomplishments. Maybe even make it a hall of fame with statues and stuff. Then let's get on with the new stuff. The thing that is most interesting about this project is that we are assisting in validating NEW algorithms that are becoming more sophisticated. I like the idea that my contribution, small as it is, is unique and is not being duplicated by five or six other people around the world.

**Brian the Fist** · 01-16-2003, 03:32 PM

As a side benefit, I can avoid any Int4 overflows I haven't already caught as well!

Ok, well sounds almost unanimous that zeroing the stats is a good idea, but also keeping a page with final Phase I stats available for viewing (just top 10, or all the final team pages too??).

Keep in mind this won't be happening for a while still though. I'd expect it to be ready for beta testing in a couple weeks at the earliest.

**Aegion** · 01-16-2003, 03:38 PM

Originally posted by Brian the Fist
As a side benefit, I can avoid any Int4 overflows I haven't already caught as well!
Ok, well sounds almost unanimous that zeroing the stats is a good idea, but also keeping a page with final Phase I stats available for viewing (just top 10, or all the final team pages too??).

Keep in mind this won't be happening for a while still though. I'd expect it to be ready for beta testing in a couple weeks at the earliest.

I would strongly advise making it everyone. The final rankings will be an important piece of information for many people.

**AMD_is_logical** · 01-16-2003, 03:39 PM

On sneakernetting:
It looks like what you really want are complete sets of 50 generations. Could you make it so that a script could easily identify completed sets on a no-netting client, and then move the files for those sets out of the clients directory? Right now I need to shut down the client (by deleting the .lock file), because one of the .bz2 files is in use when the client is running.

With the current client, I can just remove all work and upload it later. With the new client I will only want to remove completed sets, so I will need a way for my script to remove only those sets, and not the partly completed set that the client is working on.

Also, the client should be able to handle power failures. Currently the client tends to leave an orphaned .bz2 file sitting around.

On stats: I vote for keeping the current stats. I think, however, that the new client should be normalized so as to produce somewhat higher numbers (for a given amount of CPU-hours) than the old client did. This would reduce the lead of the older teams, (in terms of CPU-hours), but in a positive way (Hey, look at the great production I'm getting), rather than in a negative way (What the ... all those months of hard work ... GONE).

Also, for a new team, stomping ones way up the ladder is where the fun is. Reseting the stats ruins that fun.

**vsemaska** · 01-16-2003, 03:48 PM

Originally posted by MAD-ness
It seems obvious that those with a lead would be the ones with the greatest motivation to remain the status quo.

I don't see why. If they have the CPU horsepower to pull out ahead, they'll just do it again when the stats are zeroed.

Vic

**tpdooley** · 01-16-2003, 03:54 PM

When I started folding in June, I joined a team with about 30% inactive accounts. After running through many of the inactive accounts, and those that only folded for a few hours a day - I noticed that I needed more horsepower to catch up to those at the top. And I'll have a higher score than the person currently in first place on that team within a month or two.
For some of us, having a long list of those to overcome in the score race is a wonderful challenge and really got us hooked.

**m0ti** · 01-16-2003, 04:00 PM

Not being much of a stats-ho myself (and I'm the one putting out all the stats for my team. What I lack in terms of statslust I more than make up for in team spirit!), it isn't too much of an issue for me.

However, if the stats are reset, make sure to maintain the stats for everybody (perhaps nicely organized ala Dyyryath's or statsman, etc) so that nobody feels that their contribution to phase I of the project wasn't "important" enough to warrant it appearing in the stats for the phase.

**Kibosh** · 01-16-2003, 05:09 PM

I have two thoughts on the whole process.

first, if you don't count for the first 5000 structures I am going to lose out a lot more than some people since I run a fleet of K6-2s that take a good long time per structure. (unless of course they are generated so fast it doesnt' matter). If it takes multiple hours on a P3 or P4 though then my K6-2s will be severely handicapped since they will take much longer to get to the "counting stage." And yes, I am a stats whore... if the project doesn't have stats then I don't run it.

second, I think the stats should be zeroed or you will get people upset. If you declare that the project is ending (be sure to give a date far enough in advance that people can ramp up and try to "win") and then declare a winner that seems like the best option. That way, nobody can get pissed off since the stats changed. You can keep the usernames and teams and everything but start a new project (DF2 or some such)...

Just my thoughts.

Kibosh
aka SphincterLord

**FoBoT** · 01-16-2003, 06:08 PM

Originally posted by Aegion
I would strongly advise making it everyone. The final rankings will be an important piece of information for many people.

yes, it needs to include everyone that participated.

remember how many times the little guy has had the best RMS? that is one of the things about this project that is so great, truly ANYONE can be the one that finds the needle in the haystack, and with DF , you can actually see this in action

**Dyyryath** · 01-16-2003, 06:19 PM

I don't mind the thought of resetting the stats. It sounds like the project is legtimately moving into a different 'phase' anyway.

The new algorithm sounds good, though I probably won't have anything constructive to add about the scoring until I see it in action. That said, I'd LOVE to beta test a new client. I've got all kinds of different machines/architectures to test on as well.

The benchmark idea mentioned above is also a good one.

I think the addition of a 'current protein' total to the project stats is going to be a good thing. reader50 over at Team MacNN has already been trying to track this and it's really a pretty neat feature.

I considered trying to track it myself, but decided that even the small amount of inaccuracy caused by the overlap time was annoying enough to me that I didn't want to bother with it.

Will you be presenting the format for the new stats output when you have people running the beta client? It'd be nice for those of us building third party stats to have a little lead in time with the new output before you officially move people to the new client.

**cygnussphere** · 01-16-2003, 08:25 PM

Looks like its time for a opinion pole?

**Spankin Partier** · 01-16-2003, 08:31 PM

As a member of a farly new team that's still rising through the ranks (currently ranked 18th), I like to see the stats remain in place. By resetting the stats, the passing that occurs currently will stop as current production will dictate the new positions. Our team has traveled through various DC projects (Seti, Genome, and Folding) and always celibrated the passing of a team. We've also enjoyed the challage of the occational upcoming team that threatens our position. By reseting the stats, things will become very static. But if they are not reset, I would hope that the new points per CPU hour will be approximatly equal to the original points per CPU hour. This wasn't considered when the Seti project switched to ver 3 of thier client. A lot of members were lost becouse of that.

On a different note, most of my farm is located at work. I have these machine set up to run only after hours. If I stop a client after say the 20th generation, will it continue again on the same generation?

Thanks,
Have Fun!

**prokaryote** · 01-16-2003, 08:43 PM

If you're going to zero the stats for phase II of the project, may want to consider the team jumping issue as well then? Or not.

**lemonsqzz** · 01-16-2003, 09:23 PM

Yeah.. you can strip me of my billions..

I'm still rich in friends here... It'll be fun to do all over again... and lets try to clear out the zero-ever producers from the systen...

I think the reward should be based on CPU time somehow.. then everybody is equal... points scaled to the speed of the CPU that crunched them.. more for the slower onces since that is more painful to run.. I'll go with the flow though..

**Insidious** · 01-16-2003, 09:32 PM

I think DF Phase II is a great idea. I would like to suggest you keep the PhaseI results intact and viewable.

I think there might be a feeling of futility created if all those months of work just vanished. The obvious thought would be, so
are Phase II worthless also?

So you have one more hat in the ring for zero points to begin the
next DF phase.

-Sid

**Spankinmonkee** · 01-16-2003, 09:49 PM

I'm open for the good of the project as long as it will let me continue to nonet..if not Spankies toast as I run all my systems @home here and all but one are full time noneters

Also will there be any change as far as Cpu load stress over the current algorithm

Spankie

**Tawcan** · 01-16-2003, 09:53 PM

First of all, I got to say this project sure is user oriented. I like it.

(You know what I mean if you've ran Folding/Genome before).

Could someone explain how exactly the generation 50 works? From what I've read it seems like you need to keep systems running for a long time in order to reach generation 50?

If so, it doesn't seem to be fair to those who aren't crunching 24/7. I like the resume idea, but does this mean you resume the generation work as well?

So basically you crunch a protein but you can get different generations?

As for nonet. It would be nice to have similiar nonet ability as the current client. It's easier for those us who can't connect to the net all the time or have limited access to internet.

As for stats, either way works for me. I'm on the same team as Logical and Spankin Partier. We're a pretty new team and we're currently #6 production I think. Most of our fun came from spankin errr passing

teams. Like SP mentioned, if the stats reset it might not be as fun for our team. Resetting the stats would be good in the sense everyone starts fresh and new teams will have a chance to pass some older teams.

If the stats were to stay I would like the scoring system to be approximately the same as now in terms of CPU power vs time.

**jaydee116** · 01-16-2003, 10:01 PM

I am also apart of the Killer Barbarian Frogs. I started DF about a month ago and have been climbing in my own team ranks and we have been climbing in the overal teams pretty steadily. Still I am not opposed to starting fresh. It all has to end sometime. We can make this round a learning one and see what happens so we don't make the same mistakes next time. If we do keep going with the current stats our team can enjoy the battles upcoming, but still we will reach are peak sooner or later. Not zeroing is just putting it off a little. Anyone who thinks their progress would be wasted by starting the stats over should remember what the project is really about. I joined DF because I felt SETI was becoming quite useless with the redundant units being so high. If I ever feel this project is a waste of my resources I will move on. Stats just makes it more fun.

**MAD-ness** · 01-17-2003, 03:02 AM

It is good to see some of the Killer Barbarian Frogs posting here on the forum.

Welcome guys.

I think that it is VERY important to have a stats system that is accurate, consistent and reflects the processing power contributed.

Projects that do goofy stuff (like uh, I think it might be UD) or reward based on some sort of 'curve' are a major turn off for many of the more hard-core DC people.

If people want to compare different CPU architectures, they can do the math themslves, comparing the units/time ratios. Trying to build this type of stuff into the stats makes things overly complicated and less accurate, IMO.

**m0ti** · 01-17-2003, 03:39 AM

I agree with MAD-ness.

Stats should be kept respective of the number of folds produced and not the amount of CPU time utilized. That would be a separate stat I think people would be interested in (and how long on average per fold... sort of like SETI and their total crunching time, average time per WU thing).

Actually, after giving it some more thought I think that ending off phase I, saving all the results to be viewed in a convenient way, and resetting for phase II could be a very good idea. I think very few people will leave DF as a result, particularly due to the close attention paid to what the users want, and a lot of the users are voting to move on to phase II in this thread. Plus, it can be a boost for recruitment: come in and join up while the project's starting and you can climb high and fast. Should be of particular interest to anybody who's remotely interested in stats (everybody likes to be in the top X overall ASAP).

Thanks for the time and effort to listen to your community, Howard!

**Michael H.W. Weber** · 01-17-2003, 04:53 AM

1. Our team will have a big problem if it would no longer be possible to move the results from one computer to another for upload. Similar to FoBot, we have quite a couple of machines (if not most) that don't have a network connection at all. I believe that this project will run into general problems if upload is bound to the computer on which the results have been generated (and participation in this project is comparably low, anyway).

2. I don't think that it would be a nice idea to delete the present stats (if this is considered at all). This would look as if we never contributed to this project. I have no problem with creation of a new stats system as suggested - although getting zero credit for initial generation is NOT acceptable (I pay the electricity bill and I want to see where the money has gone). However, the old stats must at least be kept somewhere for the record.

3. Any (preliminary) news on the CASP results? I wonder why such drastic changes of the algorithm are undertaken (not satisfied with the current results?).

Michael.

**Vato** · 01-17-2003, 05:12 AM

Originally posted by Brian the Fist
Ok, well sounds almost unanimous that zeroing the stats is a good idea, but also keeping a page with final Phase I stats available for viewing (just top 10, or all the final team pages too??).

Keep a snapshot of the position and points for every team and user.
Everyones contribution is worthwhile, and what's the cost of it?
It certainly avoids the project looking like NEO.

**HaloJones** · 01-17-2003, 06:06 AM

As a nearly top-tenner, my immediate reaction is :shocked:

If we start again, my production will put me around the #10 slot from where I will never deviate. Wow, that'll be exciting. Maybe I could turn off all the systems every couple of weeks to let #11 pass me and then blast past again. Yup, that'll be just grand.

I have goals at the moment. I want to pass you Brian and some others up there. I want to try to fight off the guys coming up behind me. If you zero the stats, the only movement will be from people who like to horde and dump.

Every protein has been different. Some people have only crunched for DF on the fast proteins and skipped the slow ones; personally, I have crunched every protein irrespective of its speed. So the scoring of the new algorithm will be different - why is this time so different that we have to have new stats? Why wasn't this done for each new protein then?!

Score the new system in a similar way to any of the proteins we've been working on and no-one should have any reason to feel that they will be any less able to catch others than before. New users already see huge mountains ahead of them but don't demand that the people who've helped make this project should be forced to start over.

You can easily see from the benchmark threads here how many structures a range of machines can create per hour/day. Time the new algorithm over a day and work out how many points each "generation" should get. So we're no longer getting points per structure! So what! We could be getting points that equate to what has gone before.

Keep the current stats!

**TheOtherPhil** · 01-17-2003, 06:23 AM

I don't think it will be that big of a problem....with your production Halo, you will reach number 10 pretty soon anyway and will be in the same position that you are objecting to. This is not just a protein change but a substantial change to the client, code and the entire project. This is indeed a different "phase" of the project and I believe that the stats should be zeroed.

**FoBoT** · 01-17-2003, 07:11 AM

Originally posted by Michael H.W. Weber
3. Any (preliminary) news on the CASP results? I wonder why such drastic changes of the algorithm are undertaken (not satisfied with the current results?).

Michael.

i will look for the quote from howard (or he can jump back in here and repeat it himself), but howard posted that changes are needed to refine the process to be more competitive (ie get better results) with the other processes that turned in results to the CASP thingy (ok , i admit it, the science is all way over my head

)

any way, i will go look for the quote, hold on

here we go, i think you can get the idea from these 3 quotes

We will post a complete 'report' as soon as we have gone through all the relevant numbers. While I can tell you we did not 'win', we appear to have done reasonably well considering our algorithm is still in its infancy compared to many of the other participants. Clearly brute force alone is not sufficient to predict protein structures but in combination with a 'smart' algorithm we believe it can far exceed what anyone else will be able to do without DC. As we continue to test new enhancements to our current basic algorithm and scoring functions, we expect to see great improvements in our ability to sample structures and pick out low energy ones.

As for the new method, I am busy coding it now as we speak, the new infrastructure I'm putting in will allow us to try several different ideas, all sharing the common idea of iteration. i.e. make some structures locally, then take the best one(s) and do something with it to then make more structures, etc. Thus there will be a lots less uploading to the server but occasional downloading as well. Most changes will be invisible to the user though. We'll keep you posted as it develops.

Our current plan is to improve the sampling from 'random' structures to doing it more intelligently. We shouldn't need to sample 10 billion structures to get what we are getting, if we do it a bit more intelligently. Massive computational resources will still be required though

it seems to me (in my simple way) he is saying that improvements need to be made to get better results

for those involved from the beginning (about 1 year ago), howard has always intimated that this was a work in progress and that the science may dictate changes, after all what is the goal? to figure out a better way to do this folding stuff, right?

**HaloJones** · 01-17-2003, 07:32 AM

Originally posted by TheOtherPhil
I don't think it will be that big of a problem....with your production Halo, you will reach number 10 pretty soon anyway and will be in the same position that you are objecting to. This is not just a protein change but a substantial change to the client, code and the entire project. This is indeed a different "phase" of the project and I believe that the stats should be zeroed.

Maybe I will reach #10 but maybe those coming up behind will beat me to it. The point is that the stats can change, there are people competing.

You zero the stats and within a week, everyone will have taken the place in the ladder and there will be no competition! Stats aren't everything BUT they provide interest and competition. If we put everyone back to zero the only competition will come from new joiners.

I realise I am in the minority here and will probably lose but I really don't think you all realise the inertia that will happen.

**jkeating** · 01-17-2003, 09:28 AM

I'm not a top 10er - a top 100er last time I looked, however i've been with the project from the early stages... I'll add my voice in here and say "ok to restart the stats"...

**Brian the Fist** · 01-17-2003, 09:34 AM

Its great to see so many first timers posting here, your input is especially appreciated (get tired of hearing from the same people all the time...)

A few people wondered about this so to clarify - whenever you stop the client and restart it later, it will continue exactly where it left off, on the same structure number and the same generation. You do not have to complete 50 generations before you upload, it'll upload once per generation. You will still be able to copy files to upload from a different machine, the exact files will just be a little different (instruction will be given in the readme).

I'll try to get some graphical summaries of our CASP performance posted today or next week on the Results pages.

The argument for zeroing the stats has nothing to do with the fact that the algorithm/scoring is changing, it has to do with the fact that we are beginning a new phase of the project, and can give some newcomers a chance to get a decent ranking. It is a convenient point at which to 'level the playing field' so to speak.

I will not erase 0-production users though, because who knows why they're at 0? Maybe they're buffering oodles of work and just preparing to upload it all? Or are having trouble getting through a firewall? Or whatever. My goal in the 'official' stats is to deliver to you all the raw information exactly as we have it. Then thanks to all your 3rd party stats dudes, you can filter out whatever you don't like and make it more presentable.

I think that addresses most of the issues mentioned here.

**bwkaz** · 01-17-2003, 09:35 AM

Halo -- the question that jumps to my mind is, why isn't that "inertia" effect happening now? Especially with all the people that have been at the top for however long?

Or do I just not get what you're saying? (possible...)

You're saying "you zero the stats and within a week there will be no competition", but how does that make any sense when you're saying earlier that "stats can change"?

If this sounds overly confrontational, sorry, it isn't meant to, I'm wondering why you think that.

**thezo** · 01-17-2003, 11:27 AM

To add my $.02:
I think looking at this next phase as a whole new challenge will be good for the project. I am excited about a new competition with zeroed out stats and that we may get better results. I admit that I don't fully understand that changes as of yet, mostly because I don't totally understand the underlying science of the whole project.

I don't think the idea not being able to move the cached results from one machine to another for upload is the best choice. From what I understand - this is a common practice and removing this option will imo drive some producers away. This isn't an issue for me, so I could be wrong.

**Grumpy** · 01-17-2003, 11:32 AM

As a fully qualified Stats Ho for OCworkbench,
I am somewhat saddened by the impending
loss of our current work. Nevertheless,
the Project is why we are here, and as long
as we have a new stats system that allows
for internal and external competition similar
to the existing one, I see no problem. Our
Team thrives on the competition, and benefits
from the Benchmarking aspect also. The loss
of competition would have an impact on
output, as to how much I cannot hazard a guess.

**scruff35** · 01-17-2003, 11:34 AM

I`m new to DF. I have been with the AMDMB Killer Barbarian Frogs for a month now. I`m also just like jaydee116, i just switched from AMDMB`s Seti team. I was there for almost 2 years. Now i`m with the DF project for the same reasons. I really like this project and will continue crunching for it no matter what. I would like to say that i`m for the reset of the stats, with the older stats still active for viewing for all that have contributed. Just want to end this on a positive note. I think the one`s that stick around after the stats are zero`ed will be the one`s that care deeply about this project. And hey wouldn`t be fun passing everybody again, for the big/small crunchers out there?

**FoBoT** · 01-17-2003, 11:39 AM

Originally posted by Brian the Fist
(get tired of hearing from the same people all the time...)

now you've gone and hurt my feelings

**FoBoT** · 01-17-2003, 11:42 AM

Originally posted by Grumpy
As a fully qualified Stats Ho for OCworkbench,

where do i get my certificate?

**bguinto1** · 01-17-2003, 02:20 PM

When are you targeting this new algorithm? Will it be after this current protein is finished? Also, are you intending for the current client to automatically update to this new algorithm or will we need to download a new application?

Thanks.

**PY 222** · 01-17-2003, 02:26 PM

Just putting in my vote.

I vote for a reset of the current stats but keep all the records and display it under Phase 1 completed or something like that.

It would be good as I might be able to show my grandchildren one fine day about what I did when I was with DF and what position I was in

Heck, it might just motivate them to run DF as well, if everyone is still around

**vsemaska** · 01-17-2003, 03:12 PM

Originally posted by bguinto1
When are you targeting this new algorithm? Will it be after this current protein is finished? Also, are you intending for the current client to automatically update to this new algorithm or will we need to download a new application?

Thanks.

Howard said there'll be a beta test phase before its full release. So I assume that there'll be 1 or 2 more proteins that'll use the current algorithm.

Vic

**tpdooley** · 01-17-2003, 03:48 PM

Halo:
I've been here since June, and production rates have never seemed to be constant for the members of OcWorkbench, or for the DF community at large. (Brian's 500 mil hasn't increased dramatically in the 6 months I've been here, for example - it looks like he was allowed more time on the local MegaWomp during the start of the project than he is now). The leading folder at OcW when I started had 25% of the folds for the whole team; and was running a horde of machines that made catching up with him with my 1 system look impossible. I've since passed him when I setup a host of machines to increase my production rate - and he's now dropped down to 1? machine.
Between losing hardware, losing interest in the project, or gaining hardware/increased interest in the project - everyone's production rates seem to change over time. And they'll most likely continue to vary in the future.

**HaloJones** · 01-17-2003, 04:31 PM

The "inertia" isn't happening now because so many people started (and gave up) at different times. I'm closing in on ZaphodB directly above me and have just waved by Michel as he passed me while bguinto1 sped past a little while ago. These changes are becuase we started at different times.

Now picture the new system.

Lemonsqzz immediately is #1. No-one can or will ever threaten that place.
bguinto1 is immediately #2. He will get further behind Lemonsqzz every update.
Michel is immediately #3. He will get further behind bguinto1 every update.
etc. etc.

Unless one of them gives up or another one ramps up massively, the top n crunchers will be rigidly in place. At the bottom, it will be different as people have fewer machines and perhaps don't run 24/7. Upgrading to a better processor or running overnight can change a smaller producer's output significantly enabling relative acceleration. When you have 10s of computers running 24/7, it is much harder to impact output - particularly over the short-term.

For this reason, the top 100 or so are likely to be boringly static.

Whatever, since no-one else objects to a reset, my argument is moot. I reluctantly withdraw my objections and will attempt to put in as much effort to DF.II as I have to DF.I.

**Grumpy** · 01-17-2003, 08:17 PM

University of Imaho - Statistics Without Morals 101

Thread: Input wanted on planned new algoritm from users perspective

Thread Tools

Rate This Thread

Display

Posting Permissions