PDA

View Full Version : Who took all the work



PCZ
09-15-2005, 02:57 AM
Hope it's only temporary

Angus
09-15-2005, 03:22 AM
This is not a good time for this to happen...

Project folks very out-of-town (http://szdg.lpds.sztaki.hu/szdg/forum_thread.php?id=80)

PCZ
09-15-2005, 04:34 AM
Hope they can do something before they leave.

Bok
09-15-2005, 08:06 AM
:swear: :bang:

Luckily I've got all clients attached to LHC and Einstein too..

Bok

Mustard
09-15-2005, 11:25 AM
I guess I probably really shouldn't say much.............. after all, it is a BOINC project.................. :(


Oh well, guess I'll take a detour on seti for a while. Like grab 3 days worth of work so I don't run out for 3 days. Now whether or not it will be working so I can upload the results is a horse of a different color.

black_civic55
09-15-2005, 03:48 PM
hey lexx why not LHC? we're in 10th and seems like a helpful project.

Mustard
09-15-2005, 04:34 PM
LHC doesn't like my systems. And I'm messing with recompiled boinc and clients is why seti.


Okay, got the LHC set up instead of the seti on 10 or so systems. Also trying that shoft@home for the AMD_Users team.

PCZ
09-16-2005, 04:21 AM
Sztaki is back :)

They fixed it when they arrived in the states.

Bok
09-16-2005, 07:40 AM
Ok,

switched my preferences back

sztaki = 500
lhc = 5
einstein = 1

It will take a while to get through the backlog I think.

Bok

PCZ
09-16-2005, 08:36 AM
Bok

You will probably end up in shortest deadline first mode.
The LHC and Einstein WU's will run into LTD to get finished on time.

Bok
09-16-2005, 08:40 AM
I know,

it can't be helped, I think it's only fair in this case, with only a 0.5 day cache though it's a short turnaround.

I'm slowly getting adjusted to the way boinc does things, tweaking my settings etc.

When sztaki ran out yesterday, I found I had to change it's preferences down as boinc is not intelligent enough to know that it has to get enough wu's for the other projects. It would not download enough for lhc or einstein to keep going, so you have to keep on top of those a little.

Bok

PCZ
09-16-2005, 08:57 AM
I agree it is a bit tricky managing boinc.

A few times this morning i have seen messages from Sztaki that the project has no work but retrying a few mins later and it downloads work.

I guess the work generator is struggling a bit to keep it with folks filling there caches again.

Bok
09-16-2005, 09:15 AM
I seen that too.

As an aside, ror a lot of my linux boxes I just use this script to make sure it's at least being utilized..
A6401 ~ # ./up.sh
.5
09:21:02 up 22 days, 15:22, 2 users, load average: 2.00, 2.00, 2.00
LINUX .9
2:07am up 28 days, 17:40, 1 user, load average: 1.00, 1.00, 1.00
.27
09:05:12 up 23 days, 20:56, 1 user, load average: 1.00, 1.00, 1.00
.28
13:42:04 up 54 days, 18:46, 0 users, load average: 0.99, 0.98, 0.99
.29
9:12am up 28 days, 17:24, 1 user, load average: 1.00, 1.00, 0.92
.30
13:47:52 up 7 days, 18:50, 1 user, load average: 2.02, 2.01, 1.93
.31
03:42:58 up 30 days, 10:16, 0 users, load average: 1.00, 1.00, 1.00
.32
04:12:41 up 28 days, 17:16, 2 users, load average: 1.00, 1.00, 1.00
.34
08:19:47 up 95 days, 20:52, 0 users, load average: 1.00, 1.00, 1.00
.35
14:03:23 up 5 days, 9:44, 1 user, load average: 1.01, 1.00, 1.00
.36
04:09:59 up 5 days, 17:45, 0 users, load average: 0.95, 0.98, 0.99
.37
11:22:08 up 30 days, 12:49, 0 users, load average: 1.00, 1.00, 1.00
.39
00:02:00 up 51 days, 20:05, 2 users, load average: 1.00, 1.00, 1.00
.40
09:09:47 up 148 days, 15:11, 2 users, load average: 0.99, 0.98, 0.99
.42
01:10:56 up 32 days, 18:13, 0 users, load average: 1.00, 1.00, 1.00
.43
09:12:51 up 114 days, 16:57, 0 users, load average: 1.00, 1.00, 1.00
.46
09:13:04 up 148 days, 16:56, 1 user, load average: 0.00, 0.00, 0.00
.47
09:10:17 up 77 days, 17:28, 0 users, load average: 1.00, 1.00, 1.00
.48
08:26:05 up 30 days, 10:56, 0 users, load average: 1.00, 0.99, 0.91
.49
05:10:08 up 30 days, 11:00, 0 users, load average: 1.24, 1.05, 1.02
.59
06:13:04 up 16 days, 19:37, 0 users, load average: 1.91, 1.95, 1.90
Xeon1 .60
03:20:37 up 22 days, 14:31, 1 user, load average: 2.99, 2.97, 2.95
Xeon2 .62
09:15:34 up 100 days, 14:15, 0 users, load average: 3.99, 3.97, 3.84

Then I jump in if I see load average a lot less than 1.

I'd be interested to know what other people use? I used to have a full blown monitoring solution (Big Brother) running, but it's just too much considering what I need to know.

Bok

PCZ
09-16-2005, 11:12 AM
Bok

I use BB to monitor all my nodes, just icmp polls though no agents are set up.

To monitor and manage the clients i am using boincview.
I can thoroghly recommend boincview to anyone running boinc on a farm.

Bok
09-16-2005, 11:24 AM
ICMP will only show that they are up though, not at load right ?

BB used to be real nice, I had a load of custom scripts running on the clients.

you do boincview over nfs right ?

Bok

Mustard
09-16-2005, 11:41 AM
I use my finger.......... and puch the little buttons on my KVM switches. :)

Bok
09-16-2005, 11:43 AM
mmmm,

boincview is very nice, using the GUI_RPC function, no shares or anything and it's working nice with my linux clients..

:woot:

Now to set them all up!!

Bok :cheers:

PCZ
09-16-2005, 11:47 AM
No not over NFS.

I start the boinc clients up with the remote control switch.

./boinc -allow_remote_gui_rpc

The boinc client then listens on an IP port.
So no file sharing needs to be setup.

boincview runs on a windows PC anc can monitor any of my nodes that it can reach via IP.

You set boincview up to use network access rather than file sharing.

Mustard
09-16-2005, 12:39 PM
Since this is about who took all the work and boinc operation.........

using the linux boinc client 4.19. Assume that you are loaded up on seti for a day or so. To run those out without contacting the server to download any more work units, what do you dor? Or is it even possible?

PCZ
09-16-2005, 02:03 PM
I do it with boincview.

third button in from the right. 'disable work requests for the selected project'



Bok you may not have found this yet.
To view STD and LTD right click on the grey projects tab and select them.

Also the default polling interval is 5 secs, that is way too short if you are monitoring a farm.
Boincview will freeze up if you use this setting with a lot of clients.

I set it at somehere netween 60 and 90, randomising a bit so that boincview doesn't try to poll all the clients at the same time,

Bok
09-16-2005, 02:24 PM
So what exactly is long and short term debt in this case ?

An indicator for when you are going to go into EDF mode ?

Bok

p.s. got all my boxen hooked up to boincview now..

p.p.s X2 #2 arrived an hour ago, need to go get a PSU and it should be running tonight :)

PCZ
09-16-2005, 02:33 PM
STD is not a problem, LTD is.

A project that has built up too much LTD will stop crunching and will not download new work.

It usually happens because boinc has been running in shortest deadline first mode
and a project has had more than its share of CPU time.