PDA

View Full Version : Emergency Free DC doiwn to 39 actives



Fozzie
04-24-2004, 05:28 AM
Sirens and klaxons

I know it was a nightmare changeover but guys and gals this is the lowest number of actives I have ever seen from our DF crew.

Looks like all is sorted if anyone that fancies this protein wishes to fire up a boxewn or 2.

It usually is the slow ones that help us regain our lead over the rival hordes.

PCZ
04-24-2004, 08:00 AM
What do you mean sorted ?

Anteraan
04-24-2004, 08:30 AM
I'm not shown as active because I can't upload any of my 131 AA gens. My boxen are all still (sort of) working on my backlog of buffered 58 AA gens. Rest assured I am crunching 131's though.

This is not at all sorted -- that's the problem. I suspect there are others in the same boat with me.

Richard Clyne
04-24-2004, 08:41 AM
Fozzie, where the f**k do you get "looks like all is sorted" from. I have been crunching non stop and so far how many many points have I got for this new protein. Answer = Nil, Nothing, Zip, F**k all. Take your pick. As for the previous proteins points that where buffered during the server problems just before the change over. I have received zero credit for them.

We are told all is well. Well I say to Howard, "get a grip of your project, peoples goodwill only stetches so far".

Paratima
04-24-2004, 08:47 AM
Being the prudent sort that I am, I dumped all my backlog before the switch. :D

After the dust settled yesterday, I downloaded all-new packages from DF & did clean installs. Just checked on the downstairs XP2100+, which is probably typical: working on gen 54, gens buffered 46. No sign of the infamous "ticket.txt", just plain old DF, running thru 100 steps, then twiddling its electronic thumbs for 5 minutes or so, till it times out & goes on to the next. No errors, btw.
:sleepy:

Until the backlog gets "sorted", I'm gonna keep a few boxen on DF just to watch the progress. The majority stay on other jobs where they can get hot & sweaty and do real work.

Just as a subnote, I just checked my mirror stats and it looks as though the download came off correctly this time. I served up 4.6 GB on Thursday and 5.10 on Friday. So at least SOMETHING worked correctly. ;)

Tamari
04-24-2004, 08:51 AM
restarted with fresh clients, my boxen seem to be doing well now

Fozzie
04-24-2004, 09:01 AM
:blush:

Sorry guys as I am uploading buffered work and all is going swimmingly I thought everyone else would be Ok too.

Guess I was wrong.

:bonk:

Got 4-5 mill to upload so I'll just get along and do that, and come back when I'll have my foot back outta my mouth.

Tamari
04-24-2004, 09:05 AM
I dont have any buffered work, and my clients are producing 100% new work. Check out my sneakers graph.

Richard Clyne
04-24-2004, 09:07 AM
Originally posted by Paratima
Being the prudent sort that I am, I dumped all my backlog before the switch. :D



Its okay for some. I did not have a back log until the server went down just before the change over.

I just want the buffered work of my computers and I can go try some other project, but I can't even get that. It could come a time when I say enough is enough, cut my losses.

Moogie
04-24-2004, 09:08 AM
It's just frustrating. It took a long while for anything to upload, and not much has to this point. I'm hoping things will pick up soon, but I'm not really holding my breath.

Tamari
04-24-2004, 09:11 AM
whos guru? Him and I seem to be the only ones producing decent numbers. Reset those clients folks. Start fresh! Join the party:|party|:

Moogie
04-24-2004, 09:14 AM
Originally posted by Tamari
whos guru? Him and I seem to be the only ones producing decent numbers. Reset those clients folks. Start fresh! Join the party:|party|:

Uh...I hate to break this to you but I already have done that. I'm trying to join but thus far, I've been denied entrance to the party.

Tamari
04-24-2004, 09:29 AM
Sorry, I should say install fresh versions of the old client, not the new one... Just woke up :sleepy: This is what I did last night and the numbers are flying in today. Your results may vary. Last update I was #2 in all of DF!, not including guru which I suspect is uploading cached work now, but even if so I'm #3:elephant:

MerePeer
04-24-2004, 09:39 AM
Me:
-- Completely fresh last night.
-- No errors in error.log.
-- Some points getting in, but does not reflect work completed.
-- 63% generations buffered across boxen.
-- Until confidence (and point award mechanism) rises I'm running DF as background priority to other projects.
-- Fumble(s) by project mgmt + super long protein + constant buffering = good time to apply M$ security patches, play with Linux installs, check out other projects.

-- Free-DC is the place to be.
:cool:

Richard Clyne
04-24-2004, 09:59 AM
MerePeer,

Can't argue with you. Seems sensible action to me.

Anteraan
04-24-2004, 10:05 AM
Originally posted by Paratima
Being the prudent sort that I am, I dumped all my backlog before the switch. :D
I would have loved to do that, but with the delay, I was forced to be out of town from Tues. PM to Fri. AM. :(

No sign of the infamous "ticket.txt", just plain old DF, running thru 100 steps, then twiddling its electronic thumbs for 5 minutes or so, till it times out & goes on to the next. No errors, btw.
:sleepy:
Look for "receipt.txt". That's the infamous one. :) Good to hear the mirror worked well.

devzero
04-24-2004, 10:05 AM
Most of my machines are not intel based and the new client blows up with a sig11 with -qt.
None of them have consoles.

Cant hardly wait til the fixed client comes and I can run around and manually update all the machines.

FoBoT
04-24-2004, 10:26 AM
i won't try to talk people into doing something they don't want to (the "Free" part of Free-DC still applies)

if it is too frustrating , then take a break while this gets sorted out or stick a second project on to take up most of your load or something

howard has the personality such that he doesn't mind winging it. he is going to let this go as is this week (he posted as such in the readme thread) and see what happens

our lead isn't going anywhere are long as the system is clogged up, so those that are frustrated shouldn't feel bad about doing something else or whatever for a while


however, i would recommend anyone with clients that are still crunching the 58 protien to use the "Ghost" workaround to force all of that old work up, or just delete it all (along with the entire directory and start over)

i know some have posted that they have done fresh installs and are still get errors in the log, the only thing i can recommend is a totally new install, put it in a directory with a different name, don't over write old directory, maybe something isn't in the new install or something

the first rule of trouble shooting is to try something, if you don't try making a change, the problem is likely to never go away

i'll just say good luck to all, i understand the frustration, i have DF on so many boxen i really don't know, around 60 , and i would have liked to check them all/fix all of them yesterday before leaving for the weekend, but it just wasn't possible. it will all sort itself out somehow

i don't know how to help with the 908 or 910 errors other than to start over, but if you aren't getting errors, we'll just have to wait for the backlog to clear

when i setup my laptop after getting home from work last night, it was working on gen. 12 with 12 gens. buffered. i left it connected all night with -qf and this morning it is currently doing gen. 50 with only 20 gens. buffered. so during the night it made up ground and got some work uploaded.

if you look at the stats, the #'s for the current protein are going up more with each update, guru RK'ed me during the night and is near 4 million. if i could find the boxen that are returning work and look at them i would but its the weekend, i gotta do stuff around the house. i'll check in here throughout the day, but i urge my teammates to relax and each do what you need to to get through this without getting too many feelings hurt

have a nice day! :)

Moogie
04-24-2004, 10:45 AM
Perhaps I'll try a fresh install and see what happens then. The only gens buffered are 131's, and I've got no error messages.

I'll have to see what happens over the weekend.

Welnic
04-24-2004, 10:49 AM
Originally posted by devzero
Most of my machines are not intel based and the new client blows up with a sig11 with -qt.
None of them have consoles.

Cant hardly wait til the fixed client comes and I can run around and manually update all the machines.

You could try running with >/dev/null instead of the -qt switch.

FoBoT
04-24-2004, 10:50 AM
if you got the 131 and no errors, don't reinstall, it is working correctly

only reinstall if you are still doing 58 or if you are getting errors in the log (908 or 910)

if you are getting 910 errors, it might be do to not deleteing EVERYTHING/killing the whole directory

if the slow upload is bothering you and you have no errors, then add a second project or something that will take some/most of the cpu load to reduce your production rate on DF

Moogie
04-24-2004, 10:55 AM
Ok..I'll just wait around for a bit. I can't say it's really bothering me...just frustrating a bit but I can certainly deal with it. It sounds as if it will get through eventually.

Thanks for your help FoBoT.

gopher_yarrowzoo
04-24-2004, 11:39 AM
Sorry Im out... totally, no more.. I've had it, since I don't run any other projects sorry guys my lousy 7Ghz wouldn't help much anyway...

FoBoT
04-24-2004, 11:42 AM
this is a radical change from the instant gratification that we are all used to

we'll have to wait and see how the technical side goes, but from a "PR" standpoint, howard should have prepared his volunteers better for such a radical change in the behaviour of the system (assuming he knew it would be like this)

when a systems behaviour changes so much, it is natural to assume it is broken

if it is working correctly, but badly, then howard will have to address that, there really isn't anything for us on the client side to do

willy1
04-24-2004, 11:42 AM
I just found 8 W2K server boxes that finished gen 0 and just stopped. foldtrajlite was still running, with 0% CPU. These were fresh installs just before I left work yesterday. No errors in the log.

Just flat quit crunching.

Maybe I should do the same.

Moogie
04-24-2004, 12:23 PM
Originally posted by FoBoT
this is a radical change from the instant gratification that we are all used to

we'll have to wait and see how the technical side goes, but from a "PR" standpoint, howard should have prepared his volunteers better for such a radical change in the behaviour of the system (assuming he knew it would be like this)

when a systems behaviour changes so much, it is natural to assume it is broken

if it is working correctly, but badly, then howard will have to address that, there really isn't anything for us on the client side to do

That's very true. I'm hoping that perhaps, Howard and company will learn something from this (assuming he knew this time). I think that most of us, if we were pre-warned, would be alot less unhappy.

Moogie
04-24-2004, 12:25 PM
Originally posted by gopher_yarrowzoo
Sorry Im out... totally, no more.. I've had it, since I don't run any other projects sorry guys my lousy 7Ghz wouldn't help much anyway...

Sorry to here that gopher but, falling into the way of FDC, you are "free" to make that chocie :)

Any amount of power is useful, no matter how small. I hope that one day you will see your way to come back.

Richard Clyne
04-24-2004, 12:32 PM
Originally posted by gopher_yarrowzoo
Sorry Im out... totally, no more.. I've had it, since I don't run any other projects sorry guys my lousy 7Ghz wouldn't help much anyway...

Sorry to hear this, but 7Ghz can be very usefull in many other projects.

devzero
04-24-2004, 01:19 PM
Originally posted by Welnic
You could try running with >/dev/null instead of the -qt switch.

I was trying to avoid having to make another 'trip' to each machine to alter the script. I know I will have to visit each soon to update the binary.

And I tried using > dev/null but the client just exits. execing from a shell still gets a signal 15.
Won't run from cron. So unless I keep a terminal open to all the machines....

Tamari
04-24-2004, 01:28 PM
I was hoping someone could decode this for me. I know you all have more knowledge about these files. In my progress.txt I get:

Building structure 21 generation 102
179 until next generation
1 generations buffered
Best Energy so far: 15.722

So notice how there are 200 structures per generation? Everyone else seems to have 100 per. Is this why all my clients are doing fine? What am I missing here?

Fozzie
04-24-2004, 01:40 PM
then that is the second client they put out to slow the influx of completed gens to the server and possibly get better RMSDs.

Nothing to worry about, but keep an eye on when the deadline for full credit ends. I think many of the Cows are still crunching the old protein too.

FoBoT
04-24-2004, 01:54 PM
Tamari , you are running the old client , it was set to do 200 structures per generation

you need to blow that up/delete it all and start over with a fresh installation

the new foldtrajlite.exe file is dated 4/22 i believe and is right at 3 mb (for windows)

make sure the native.val file is also new

Tamari
04-24-2004, 03:47 PM
roger that. I thought the old client would automatically update itself, or is that just the proteins? I dont know if I want to spend another 4 hours installing a new client on 100+ machines though.

I dont complain much, but from what I've seen in the past few weeks is that this software sucks...

Perhaps we could all use this client until the new one is fixed/forced upon us? Atleast we can gain a little against the cows for once:crazy:


on an unrelated note, i think this is the first emoticon that ever actually made me laugh out loud :drink:

MerePeer
04-24-2004, 03:55 PM
Originally posted by Tamari
...use this [58] client until ... gain a little against the cows

This may be the best idea yet! Perhaps that's where those big dumps are coming from without having points accum under "current protein". :eek:

Anteraan
04-24-2004, 04:15 PM
Well, Howard did post deadlines to the full credit/half credit value of 58AA proteins, and the full credit deadline passed about 4 hours ago, IIRC (Check out his "Readme" thread/post for the exact times). That said, I hope I can get the 1000+ gens of 58 AA out of my last box before I lose it all, so to speak. After all the crap involved with this changeover, I almost feel lucky to have gotten the big flush that I did out of my other two boxen.

Richard Clyne
04-24-2004, 07:04 PM
Originally posted by Anteraan
I hope I can get the 1000+ gens of 58 AA out of my last box before I lose it all, so to speak. After all the crap involved with this changeover, I almost feel lucky to have gotten the big flush that I did out of my other two boxen.

Best of luck to you.

I just managed to clear my backlog of 58 AAs before the revised 24hr deadline, and I guess some of the 131 AAs aswell. According to the stats I have been full credit for all uploaded work - so far.

rofn
04-25-2004, 01:08 AM
@willy

i have exactly the same problem on some of my linux boxen....client just gets to sleep and doesn't do anything...the .lock file is gone too then but the processes still show up....

btw. i'm still crunching, even my points are goin nowhere...dunno but i uploaded my old work on the 58er on the day of the changeover with the no update trick but i got almost no points for it....was about 350k should be over 1mill...but anyway...who cares for old points these times...if only the new one would get through
:bang: :bang: :bang: :bang:

Dyyryath
04-25-2004, 01:26 AM
I added about 10 new clients tonight. Some on Windows XP, some on Linux. Each was a completely new install. We'll see how things go for the next day or so. If everything look good, I'll add another 10 or so later in the week...

GHOST
04-25-2004, 06:52 AM
roger that. I thought the old client would automatically update itself, or is that just the proteins? I dont know if I want to spend another 4 hours installing a new client on 100+ machines though.

check this file

Non-interactive Auto-update
---------------------------
For your security, you will be asked for confirmation before any updates
are download and installed on your computer. All updates are digitally
signed and so it is fairly safe to always allow digitially signed updates
to be installed. A malicious user would have to compromise the private
encryption key in order to "spoof" an update and make it appear to come
from us. Thus you have the option of allowing the client to automatically
accept and install digitally signed updates (and automatically refuse
unsigned ones). By default this feature is disabled and you must give
your consent for downloads to begin, because we feel the choice should be
that of the user. To enable this feature and allow automatic updating
without your intervention required, simply create a text file in the same
directory as the client, called "autoupdate.cfg" with the the digit 1 on a
single line. If you change your mind, simply remove the file, and it will
revert back to its default behaviour the next time you run it.

willy1
04-25-2004, 09:55 AM
Last 12 hours - not a single point credited - from 25+ boxes crunching away on the 131 protein.

It's well and truly broke now.

Dyyryath
04-25-2004, 10:02 AM
Yup, none of my new clients has managed to turn in any work at all over the last 10 hours. I'm shutting them down until this mess gets sorted...

MerePeer
04-25-2004, 10:07 AM
How funny would it be if they (again) wiped out all activity and decided to revert to the previous no-ticket client which we know would work perfectly with this slow protein? Good news: open road again. Bad news: lost weekend activity.

I think next time they might want to pick a holiday weekend + take the week after on vacation!! :bath:

Tamari
04-26-2004, 11:25 AM
ok, well its monday morning, and according to my dcmon, I have over 60 clients "running" with 1 or 0 gens buffered (so they are uploading). But according to sneakers I have 0 points last update. All these clients are auto-update so I dont know whats going on.

Paratima
04-26-2004, 11:27 AM
They may be running the old "58" protein. Autoupdate only seems to work when the stars are correctly aligned. :cool:

FoBoT
04-26-2004, 11:42 AM
i found several boxen running the old protein, my log file had the "you are doing the old protein" message

i wiped them all out to start fresh and added a second DC project/client to slow them all down (except the really slow boxen, 600 Mhz and under)

PCZ
04-26-2004, 12:07 PM
The ticketing system was supposed to help with the uploads not slow everything to a crawl.
This protein is a slow one imagine what it would be like with another 58.

As for slowing down you can't get any slower than stopped, which is exactly what I have done.

As far as I am concerned Howard has screwed up for the last time.

gopher_yarrowzoo
04-26-2004, 04:49 PM
Well If you really, Really Need me back I'll come back but if It p*sses me off, im gonna go to D2OL or something Chinasaur gave me an invite to join as it were, currently have 2Ghz on Lifemapper, 2.2 Off and the 2.6 running purely as web connection..

FoBoT
04-26-2004, 05:24 PM
with all the teams backed up, there is no need for anyone to have much Ghz on DF

i have only my slower boxen and one group of faster PC's that i don't have time to switch and/or need a totally hidden/service client that i don't have time to fool with right now

there is no need for us/anyone to have massive Ghz when things are so backed up

if it starts flowing fast again, we can move boxen back (if people choose)

no need to sweat :geezer:

Paratima
04-26-2004, 05:27 PM
What ^ he said, G-Y. It's not quite ready for prime time, yet. The DFers are back on the job and (finally) have acknowledged that there's a problem.

Once they get us over the present hump, and I believe they will, then there will be a public spanking :spank: and we'll all feel better. Also, at that point, we'll probably need a bunch of boxen to keep the Cows off.

Watch this space! ;)

Chinasaur
04-26-2004, 05:34 PM
So WTF IS the Official Party Line on what has happened to DF?

Anteraan
04-26-2004, 06:08 PM
..."the 'fiasco' that was the changeover was self-inflicted", as in by the users. I guess that's as "official" as it gets.

It's still a bunch of :bs: and he deserves a good :slap: for that one, although I don't think it would be at all useful for me to put that in a response to him. The "bottom" line -- When you say things that make you look like an :moon: people tend to figure it out on their own.

Moogie
04-26-2004, 06:37 PM
Elena has posted about it as well. She's a bit better, for the most part, with her responses. I don't mean that she always answers the questions asked, but she does a better job of it and is quick to come on and admit if she is wrong. Maybe she should be Howard's PR person. :)

Paratima
04-26-2004, 06:46 PM
Ummm, near as I can tell, the OPL is something like this:
-------------------------------------------------------------------------
Yepper, it's an experimental project. We're not even sure that the underlying premise is valid, but hey, that's why we're doing it. :eek: Well, in response to y'all's input, we decided to try this here "ticketing" approach to keep more boxen working more of the time.

And it's almost working. We have to tune the servers and maybe even fire a couple of them, but that's really all that's needed.

And btw, y'all need to keep a bit cooler about the whole thing, know what I mean? :D
-------------------------------------------------------------------------
I think that's about it, for the time being.

Paratima
04-26-2004, 06:51 PM
Update: I suspect Howard is getting a bit frazzled.


And you took a whole 3 minutes to look in the thread, and read the whole FAQ, right? We have tried to answer people's most pressing questions for now while we continue to test and check if everything is working properly. When will everything work? It is working now. When will it be faster? When we figure out what the bottleneck is and fix it, which is ASAPThe boy seriously needs a trip to IB's fridge! :fridge:

MerePeer
04-26-2004, 07:02 PM
I think this is what we have been waiting to see:


Originally posted by Brian the Fist
The servers are processing the tickets slower than they should, indeed. We are going to get it fixed but it will take a few days to isolate and correct the problem. This is a complex system and not trivial to 'debug'. It sounds to me like the only real problem people are having is that it is buffering more than it is uploading. Once we fix the backend this will allow the buffered stuff to come through.

So no we will not leave it as is, do not worry. As for all the posts, we cannot possibly answer everyone's questions, it has taken me several hours just to read a day's postings. Please see the FAQ we added on the web site (and in this forum) to answer some of your questions and we will add to this if we feel it is necessary. Otherwise just hold tight and let us look at and fix the speed issues. If anyone has issues other than the buffering, let us know, preferably in a new thread with a constructive description of the problem.

We apologize for the inconvenience this may cause for off-line folders but we believe the new system will work much better once all the kinks are ironed out. Hopefully you will bear with us until then, or at least come back later when it is.

Moogie
04-26-2004, 07:03 PM
Originally posted by Paratima
Ummm, near as I can tell, the OPL is something like this:
-------------------------------------------------------------------------
Yepper, it's an experimental project. We're not even sure that the underlying premise is valid, but hey, that's why we're doing it. :eek: Well, in response to y'all's input, we decided to try this here "ticketing" approach to keep more boxen working more of the time.

And it's almost working. We have to tune the servers and maybe even fire a couple of them, but that's really all that's needed.

And btw, y'all need to keep a bit cooler about the whole thing, know what I mean? :D
-------------------------------------------------------------------------
I think that's about it, for the time being.

LOL Paratima! Sounds like this came from someone in my part of the country.

As for Howard...I've learned to take him for what he is. He can be abrasive, sarcastic and abrupt..and those are his finer qualities.

I'm sure most of us have worked with someone like this before (or at least know someone). I tend to turn a deaf ear to the "nasty" parts, and fine tune his message to read some of the stuff that might be helpful. Albeit, he hasn't given us very much good stuff to go on.

But that's the way it is. As a good friend told me the other day, just as I was about to lose my mind, "It is a hobby...if it is stressing you out, take a time out, and come back when things are settled if you so desire. It shouldn't consume your life." Or some such thing like that.

Speaking for myself ONLY (and I am a small cruncher), I'm willing to wait it out a bit. If things don't turn around, I'll most likely go to a different project until they do. DF has always been one of my favorites, and still is, despite the troubles we've been encountering.

Chinasaur
04-26-2004, 07:33 PM
Really?..I figured it must be one of these reasons Howard has always given for DF being fs@#ed up...

1. Are you overclocking?
2. Your RAM is bad.
3. Your HD is out of space
4. I've never heard of this before.
5. Try using /temp
6. Must be your connection
7. Delete all your work and start over. Your filelist.txt is corrupt.
8. Read the FAQ
9. Nothing wrong with the programs I include with DF
10. "Hey! I'm only one person!"

Moogie
04-26-2004, 07:40 PM
Originally posted by Chinasaur
Really?..I figured it must be one of these reasons Howard has always given for DF being fs@#ed up...

1. Are you overclocking?
2. Your RAM is bad.
3. Your HD is out of space
4. I've never heard of this before.
5. Try using /temp
6. Must be your connection
7. Delete all your work and start over. Your filelist.txt is corrupt.
8. Read the FAQ
9. Nothing wrong with the programs I include with DF
10. "Hey! I'm only one person!"

He's used a few of those Chinasaur...but I think they may be standard responses for him, until he figures out what is going on.

Chinasaur
04-26-2004, 07:44 PM
"I have no @#$%ing clue" would work just as well.

Or the truth....which might be the same....

Moogie
04-26-2004, 07:48 PM
Originally posted by Chinasaur
"I have no @#$%ing clue" would work just as well.

Or the truth....which might be the same....

Could be..won't argue with that one...but it also could be that he just doesn't know what the problem is..which he actually did say "this time" (he's not too good about that normally).

Paratima
04-26-2004, 07:59 PM
I'm with Miz Moogie Whatshername on this one. :D Dr. Howard came out & admitted he's got a problem & sez, "Hang on. I'll fix 'er." I can almost picture him rolling up his sleeves and reaching for his bit-wrench. That's good enough for me.

I still remember the bad old days of crunching Genomes At Home and Folding There, Too. Try to get an admission of ANYTHING out of Stonewall Vijay!

Basman
04-26-2004, 09:47 PM
Originally posted by Anteraan
I'm not shown as active because I can't upload any of my 131 AA gens. My boxen are all still (sort of) working on my backlog of buffered 58 AA gens. Rest assured I am crunching 131's though.

This is not at all sorted -- that's the problem. I suspect there are others in the same boat with me.

Far enough for me, i am taken my boxes off this CRAP!!!

How do you mean, the new system will be better :confused: Even during the 58 gen all went pretty well :bang:

Illegal generations all the time on every box, i don't have the time and also dont want to babysit everything all the time, making backup's, flushing backup's and so on and so on :bang: :bang: :bang: :bang: :bang:

And millions of points down the drain :bang: :bang: :bang: I,ve had it!!

Basman
04-26-2004, 10:20 PM
I just got a new message in my error.txt saying: "Please delete the F*c*e* up new ticketing system and start over again" :D