PDA

View Full Version : Problems with "DEFAULT_*"-results?



the-mk
12-20-2005, 01:44 PM
Hi,

just looked into my BoincView and I saw two "DEFAULT_*"-results in my work list.

Running for some hours now and don't have much % done...

Anyone ever seen something like that?

Marky-UK
12-20-2005, 01:56 PM
I haven't seen that, but I also don't tend to watch the Result column - never enough room on the screen!

I have had several R@H WUs that have failed after 10-20 seconds in the last 24 hours though, mostly on Intel boxes - not sure if I'd had any fail on the AMD boxes...

PY 222
12-20-2005, 02:01 PM
Hi,

just looked into my BoincView and I saw two "DEFAULT_*"-results in my work list.

Running for some hours now and don't have much % done...

Anyone ever seen something like that?


Look at this post by David Kim:

http://boinc.bakerlab.org/rosetta/forum_thread.php?id=719#6908

ABORT ABORT!!!!

the-mk
12-20-2005, 02:11 PM
Thanks for the info!

I'm glad that I aborted this one... that took me another hundrets points :rolleyes:

PY 222
12-20-2005, 02:14 PM
Thanks for the info!

I'm glad that I aborted this one... that took me another hundrets points :rolleyes:

I think I may have a few of this but I don't even know where to start looking. :rotfl:

The bad side of having too many puters.

pfb
12-20-2005, 02:25 PM
Thanks for that - been monitoring those forums for the "problem" with some WUs and the 4.81 client but didn't see that...my slowest had 2 of them queued...

PY 222
12-20-2005, 02:33 PM
Btw, does anyone know how to abort a workunit in Linux?

I've tried the BOINC Wiki but am unable to find anything.

Bok
12-20-2005, 03:09 PM
Thanks for that, I had 10 of them sitting in various queues.

PY , only way I know how to abort in linux is via boincview. All i did myself here was load it up and then sort by result name, abort all the DEFAULT ones. You really should be using it......

Bok

PY 222
12-20-2005, 03:44 PM
Thanks for that, I had 10 of them sitting in various queues.

PY , only way I know how to abort in linux is via boincview. All i did myself here was load it up and then sort by result name, abort all the DEFAULT ones. You really should be using it......

Bok

I don't think Boincview will work well in my environment as there are far too many machines to configure.

I think I'll just ride this one out as according to the R@H people, this workunit is actually still valid but because of its size, we are not able to get it done by the dateline set.

We'll see how it pans out.

LAURENU2
12-20-2005, 11:48 PM
I am also getting WU's LIKE THESE
No_Rand_WTS
BARCOAD_FRAG
INCREAS_CYCLES
MORE_FRAG
NO_MORE_RELAXED_CYCLES

Whats up is this a Virus:confused: I see a lot of my nodes twith the WU clock stoped
Have to reboot to get it going
Any body know what is happening Is the Sky falling ?

Bok
12-21-2005, 12:27 AM
I think David mentioned in this post (http://boinc.bakerlab.org/rosetta/forum_thread.php?id=684#6631) about some of the new names they are giving the wu's. They should be ok I think...

Bok

the-mk
12-21-2005, 12:43 AM
I've got some with "MORE_FRAGS_*", and "NO_RANDOM_WTS_*" in my queue...

Are those bad too?

Is far as I see they behave pretty normal

MerePeer
12-21-2005, 06:46 AM
The only ones that are bad begin with DEFAULT and have _205 in them; the DEFAULT_*_206 and greater are fine. I had two; spotted and aborted them via boincview.

http://boinc.bakerlab.org/rosetta/rah_technical_news.php

PCZ
12-21-2005, 07:42 AM
I haven;t got any of the known bad ones.

However i am seeing lots of computation errors.
Something is wrong with the latest batches of WU's

Will have to pull the plug if it gets any worse

pfb
12-21-2005, 07:54 AM
Hidden away in that link is this:


Another problem has been identified with some new work units which is causing a 0xc0000005 UNHANDLED EXCEPTION error. This is a weird bug that appears to be dependent on the random number seed and we are currently looking into its cause. A short-term fix of using the computer clock to generate the seed (as has been done in previous runs) is in place.

I'm getting around 1/3 of my WUs have this atm...

Marky-UK
12-21-2005, 07:59 AM
I've not been too bothered about the WUs that error out lately as they're failing after about 10 seconds, so I'm not wasting much time. But I've just had 6 fail in a row and now the PC has run out of work - BOINC won't download anymore either, something to do with the last RPC being too recent. :(

PCZ
12-21-2005, 10:28 AM
Seems to be getting worse now.

Some of my PC's cant get any more work from Rosetta.
Maximum quota exceeded.

Pretty soon they will all be idle.

pfb
12-21-2005, 11:39 AM
getting worse for me as well - out of ~80 'completed' WUs, ~70 are coming with that error...

/edit:


2am Seattle time, and I've found the source of the problem for the quick crashing jobs. It's amazing how distributed computing puts ones code to the test.

David Kim's work-around should make things okay until we fix the code.

Unfortunately, I think the bad work units will have to error out to be removed from the queue. Again, we appreciate your patience.

Sounds like we just have to ride it out...

Marky-UK
12-21-2005, 04:23 PM
There's a detailed explanation from one of the project scientists here (http://boinc.bakerlab.org/rosetta/forum_thread.php?id=726#7056).

LAURENU2
12-22-2005, 02:09 PM
I tried to get help over in the rose forum http://boinc.bakerlab.org/rosetta/forum_thread.php?id=680#7137
and got noting but back lip and being accused of doing this project just for points by Bill Michael
And in this one http://boinc.bakerlab.org/rosetta/forum_thread.php?id=726#7210 I said I would uninstall there software and wihed them luck
So I will be shuting down ny nodes:swear:

Exci
12-22-2005, 02:20 PM
Lauren,
Bill isn't part of the project. . he was just granted forum moderator rights because of the time he spends replying to folks. I'm sure he's just a bit flustered because he's trying to help with the fallout from the bad wu's.

As for the bad WU's. . All of these problems are related to a new application they're developing on the back end to allow individual scientsts submit jobs rather than everything going through David Kim (who has put aside his own research to get this going).

The bad wu's don't really hurt anything, unless you're on dial-up. Sure, you're queue might empty out, but Boinc is designed to handle that by getting work from other projects (or in my case, wait for more work from the project since I don't have other projects attached*). If you have one of the WU's that hang, you can abort it, or let it crash itself. . the scientists have said they will grant credit for them when they get back from holiday.

I think this is a great project. . having a few wu's error out isn't a big deal (imo). . The science of this project is to create something that will help scientists better study proteins. . . these bad wu's accomplish the same goal, the fixes will only make the project better. I'm not saying beta testing on a live system is the best way to go. . but the sky isn't falling by any means.

*I'm going to work with them to get their testing procedures in order to try and prevent this from happening again. The going theory is to have a seperate boinc instance that would allow a couple hundred units to be run before going public.

Cheers, and happy holidays everyone.

-Ethan

Bok
12-22-2005, 02:23 PM
Thanks Exci,

was just going to post a similar response.

Lauren, give it a little time. At least they are actively responding about fixing most things. I believe the main people involved are on vacation now as it's a university so repleis might be slower. As Exci said, Bill Michaels is just a moderator on the board, nothing else...

Bok

PY 222
12-22-2005, 02:33 PM
Lauren, I've read all the post in the R@H forums and all I want to say is...

PLEASE DON'T GO

Who is going to :Pokes: me on?

Who is going to :bonk: me when I get low production?

Who is going to :whip: when I lose direction?

You.

So, take a break for a while until they settle this and come back in full force next year. :cheers:

Shish
12-22-2005, 02:45 PM
Come on Lauren....don`t let some twerp upset you. I`m sure you`re bigger than that.
I`ve had a lot of erroring out units recently and aborted a couple of default ones. Thought it was the computer or summat. One default unit was running for eighteen hours with very little progress.
Now I`m also getting default 2 units.
AND I`M BLOODY SOBER AND DRUG FREE FROM TODAY
Latest scan says I need a new spine if not a new body.....guess I`ll have to whistle for that one but I`m still first on the list for a full body transplant :banana: :cheers:

HAPPY XMAS everyone :umm: I think :looney: :rotfl:

Keep on plooping them in for the best team in DC and that`s not the little town in the District of Columbia I`m on about.....

PCZ
12-22-2005, 03:50 PM
Lauren don't worry about bill he is just frustated becomes he doesn't have a big farm to play with :rotfl:

On NT and Linux there are process timers which applications use.
On the 9x series the RTC has to be used.

Does your PC clock freeze ?

LAURENU2
12-22-2005, 04:15 PM
Lauren, I've read all the post in the R@H forums and all I want to say is...

PLEASE DON'T GO

Who is going to :Pokes: me on?

Who is going to :bonk: me when I get low production?

Who is going to :whip: when I lose direction?

You.

So, take a break for a while until they settle this and come back in full force next year. :cheers:
That was my plan as I first posted over at the Rose forum before there moderator started implying I was some sort of fool . Now I do not want to do work for them at all

Don't wory PY I can still :Pokes: :bonk: and :whip: you even if I am not at rosett

LAURENU2
12-22-2005, 04:18 PM
Lauren don't worry about bill he is just frustated becomes he doesn't have a big farm to play with :rotfl:

On NT and Linux there are process timers which applications use.
On the 9x series the RTC has to be used.

Does your PC clock freeze ?

No nothing on my nodes frezes all works just fine except the Rose WU clock Nothing else is running except the OS
like the helpless help desk said

EDIT:: I just double-checked something. I know I said I'd stop helping, but... Windows ME is not supported by Rosetta. Seems it doesn't report CPU times back to the application correctly.

Thats why I uninstalled the BOINC Rosetta@home client on my network
I will not pay power bills to get insulted I have a wife for that:slap:

Petey
12-23-2005, 07:40 AM
I've been less than impressed with some of the posts on the Rosetta forums.

While I can expect the occaisional 'off colour' post from a fellow member (and there have cetainly been quite a few at Rosetta), I was little surprised at the emotion in the reply to Lauren - especially from a forum moderator.

All I can say is thank god for the Free-DC forums - some other forums could learn a lot from us!!

Happy holidays everyone!!

:cheers:

Petey

the-mk
12-24-2005, 02:21 AM
Now I found another one taking very long to compute:

DEFAULT_2reb_219_4151_0

this one takes now about one hour and is still at 1%...

I think I'll abort that one...

LAURENU2
12-28-2005, 01:26 AM
I've been less than impressed with some of the posts on the Rosetta forums.

While I can expect the occaisional 'off colour' post from a fellow member (and there have cetainly been quite a few at Rosetta), I was little surprised at the emotion in the reply to Lauren - especially from a forum moderator.

Yes that surprised me to All the Mod's I have seen keep a cool head (Like Bok does) There job is to keep peace and keep the forum clean so to speak


All I can say is thank god for the Free-DC forums - some other forums could learn a lot from us!!

Happy holidays everyone!!

:cheers:

Petey:rock: Yes we do have fun here_:umm: And we don't Fight That's the best Plus:hifi: