PDA

View Full Version : What you really need are 2 servers



Chris Wolfe
06-01-2002, 01:04 AM
Why?

So we could finish uploading our structures to one while the new protien starts on the other. The client updates itself, so bandwidth wouldn't double or even increase. Couldn't the client get it's upload info from a server.txt file that's part of an update?

Why?

I went on vacation and before I left I checked the thermometer. 3 billion of 10 billion structures generated. Cool, I put all my machines -df -i f and go on my vacation. I come back and find that we have changed protiens midstream and now I have hundreds of thousands of structures generated that are useless.

That's why! :swear: :bang: :haddock: :spank:

Brian the Fist
06-01-2002, 11:16 AM
In an ideal world, yes, that is how it would work.
As for the sudden change, it was due to the start of CASP which is a time critical even (runs May-August). You mught not be able to completely trust the thermometer throughout the rest of CASP but we will always post a warning on the News page and this forum several days in advance of changing proteins. This is because we don't know when new CASP proteins will be released so we have to go with what they give us. Sorry for any inconvenicence.

1fast6
06-01-2002, 12:46 PM
I agree with Chris...

this is the one thorn in this project... that the admins maintain and defend a process that knowingly and willingly discards what we consider good work, even though you may not consider it as such... on this matter, our position is defendable - there is no diference in the data returned before the changeover and the data returned after, except you decision not to accept it... your process expects us to be sitting at our computers (all of them), monitoring your website, and execute an exchange at the moment there is a changeover announced or accept that work will be discarded... this is an undefendable position...

your responsiveness to our concerns and requests up to this point is commendable and unparallelled in my experience with DC projects (and I have been involved in many) and had bought you considerable "good will" with the participants that have allowed us to overlook and accept this situation while the project is in it early developmental stages...

but know that in the long run, there is very little else that will sour participants enthusiasm more that knowing that the project is throwing data away... that this result is designed and intended in this project makes it even more untenable...

if I volunteer for a charity event, and I'm asked to dig a hole, I'll do it... if I check back in and then am told to fill it in, I will be disheartened and suspicious - but I will do it... but, if I'm asked to do it over and over and the explanation given is "this is how we run this event", then I look for another charity, no matter how nice the organizers are or how good the cause...

I would ask that you review your changeover process to find a way to accept work (or at least give credit) on both old and new proteins for an acceptable period of time...

Brian the Fist
06-01-2002, 10:37 PM
As I just said, and have said in the past, our current server and network design does not permit what you have requested. Remember we are not a big company with lots of $$$. We have limited resources and must make do with the hardware and security measures that we already have in place.

Anyhow, I don't see why you have to sit by your computer waiting for an update - that's what auto-update is for. True, you mught leave it running for weeks on end without connecting to the 'net and then connect some time later to find you're too late and we've switched proteins. This is unfortunate but there's no practical way for us to deal with this at the moment If this latter case applies to you, all we can reccommend is make sure you upload your data say twice a week, to minimize any potential waste.

We are also considering setting up a mailing list you could sign up for, in which we would e-mail you a day or two before we change proteins just to give you a heads up. If enough people are interested we can set such a thing up so let us know.

Eaglechild
06-01-2002, 11:43 PM
Brian,

I will restrain my reaction to the comments of some of the more immature and unreasonable persons here. Personally I think the you and all the distributed folding staff are doing a heck of a job considering the resources you have on hand. When possible, you have never hesitated to try to accomodate the requests of the participants. The system is not perfect, but then no system is. Still this is the smoothest running and most responsive DC project on the net. Thanks! As far as the change over, there was plenty of advance notice for those who were paying attention. Life isn't perfect, but you guys have sure tried to make it as smooth as reasonably possible for the majority of us. I am sure if you can find a way to do it better you will.

Thanks for all the hard work, the fun, and for putting up with all the abuse.
:cheers:

pointwood
06-02-2002, 10:23 AM
A mailing list would be great!

Getting a reminder as early as possible about a protein change would be nice.

Another idea: What about having an estimated date on the front page and on the main stats page of the website for the next protein date?

You have the data for how many structures that gets generated each day and what the goal is. Calculating an estimated date for the next protein should be easy :)

I know this will not work for CASP5 or the latest change, but otherwise...

Scott Jensen
06-02-2002, 11:20 AM
First, I'm not having these problems the others are having since I have mine connected to the net 24/7. But curiousity begs me to ask...

Why are the results of old proteins thrown out? Cannot your server tell apart the work units of the old protein from the work units of the new one then simply put late results for the old protein in the file belonging to the old protein and the work units of the new protein in that one's file? This way there's no loss of work done on the old one. Afterall, one of those late work units might have a score of 0A.

Also, what kind (type, model, etc.) would the second server need to be and how much do they go for these days? Would there be any other related costs of having two servers and if so, what would they be? Lastly, would having two servers serve any other useful purpose for this project?

Brian the Fist
06-02-2002, 11:54 AM
We will not disclose our current network topology or how many servers we have access to. We have been extremely lucky so far to not have experienced any hack attempts or cheaters. The more popular the projetc gets though, the more likely we'll draw attention to such ungainly folks. The less we reveal on this topic the better. Suffice it to say that it's not plausible with our current organization. And even if it were, it wouldn't matter scientifically. The first 5 experiments were to sample 1 billion structures and see what we get, no more and no less. It is important that all the sample sizes of thoe different proteins are the same so we can make fair comparisons between them. But this is a moot point like I said :spank:

Now can someone suggest some good, free mailing list software as we've never set one up before and not sure where to look.
TIA

TheOtherZaphod
06-02-2002, 01:35 PM
The university I work for uses Listserv from L-soft. It is good, but not free. A quick google rendered this option:http://www.cgi-factory.com/maillist/index.shtml which is free. It requires a Unix host with some basic tools, but I'd guess you have access to the required configuration. You know, it is possible that I could get permission to host the list. If you want me to look into it, drop me a line.

By the way, I while I agree with Eaglechild that your job isn't an easy one, and that you are doing passing well at it, I do see some room for improvement.

Deriving a more equitible stats unit, and smoothing out the changovers are definitely things to think about/work at. The whole DC model is that people all over the world give you resources for free in exchange for fun and a sense of community and well-being. You need to make it as easy and fun as possible to attract and retain their resources.

Between hardware and power I easily spend a few thousand dollars a year on my DC hobby/habit (If you tell my wife I will deny everything), and that doesn't even begin to take the time spent into account. If stats and competition were all that mattered I'd probably stilll be running SETI. You have a nice little project here, and I'm having a good time getting it setup, tuned, and tweaked. But we both know that if it got ugly or boring people like me would be out the door and on to something new in a heartbeat.


My .02

Scoofy12
06-03-2002, 07:02 PM
Another good (and free) mailing list i have used is mailman http://www.gnu.org/software/mailman/mailman.html . It has an easy web-based interface for both list creators and list users and is mature and well-written in my experience. I remember you saying most of your development was done in linux, so without revealing anything unknown about your network topology, id guess that it's a safe bet that you have access to linux servers... and this is a good one for linux :)

1fast6
06-03-2002, 08:40 PM
the comments of some of the more immature and unreasonable persons here.
I am not used to a project that throws work away, which occurs even when using "auto-update"... I simply stated my position and rationale and asked for a review... I don't think that makes me immature or unreasonable...


Personally I think the you and all the distributed folding staff are doing a heck of a job
I concur, wholeheartedly. My apologies to the project admins
if I came off sounding a bit harsh...

as I said previously...

your responsiveness to our concerns and requests up to this point is commendable and unparallelled in my experience with DC projects