PDA

View Full Version : Repeated "Error -3"



Halon50
11-27-2002, 06:49 PM
A few of my machines were stuck in infinite loops early this morning, which made me think that the server was down again, so I switched back to SETI.

I've restarted the farm on SB, and one machine came up with the "unable to report block" error -3 code shortly after starting. I don't have time to check any of the other machines' logs right now, but I'll attach them if I find any others with this error.

I've attached the relevant section of the log with the problem, as well as the workunit itself.

res0r9lm
11-27-2002, 07:02 PM
yea I believe it was a server problem of some sort. I had same problem this morning. checking my logs I had 2 machines down from 7:00am to 7:45am. all I did was leave them running and they did eventually recieve work.

Halon50
11-27-2002, 07:52 PM
I found a second machine with a k=24737 doing the same thing; cleared its buffer and got it working on something else.

I will email or post the workunit and logfile if you need it.

Alien88
11-27-2002, 10:30 PM
There was a problem earlier this morning with blocks and that is why you were getting that. As soon as it was noticed it was fixed and we apologize for the problem.

Halon50
11-28-2002, 12:17 AM
Ok thanks, but please note that I had to manually discard the block by changing username in order to get out of the Error -3 cycle.

On another note, I just went around and physically checked the status of my clients, and found about 4 of them stuck in the middle of processing blocks. Each showed varying percentage complete on the total test, but around 65% on the block progress bar, and 99:999:etc (whatever the highest time amount is) on the time left indicator.

I went ahead and restarted these clients, but this little quirk really irks me because I can't detect this problem by checking the logs from my main machine - I have to physically check the client. I don't have a nice VNC setup like I want (yet), so this means I have to go around and physically check each machine by hand.

At any rate, if these clients lock again, I'll grab a screenshot from each and send them your way.

Halon50
11-28-2002, 12:19 AM
Oh and have a great Thanksgiving weekend, guys! (Assuming you're in the US and not Canada... :rolleyes: )

shifted
11-28-2002, 01:00 AM
Originally posted by Halon50
Oh and have a great Thanksgiving weekend, guys! (Assuming you're in the US and not Canada... :rolleyes: )

Yeah, we canadians don't procrastinate and get things done on time ;)

Halon50
11-28-2002, 03:03 PM
Attached are screenshots from 3 different machines taken a few minutes ago. Restarting the client (exit then run again) corrected the problem in each case.

I'm off to gorge myself on that wonderful sleepy-time stuff now! :cheers:

Halon50
11-29-2002, 02:12 AM
I just went around and restarted clients on 5 machines that were showing the 9930 blah blah hours remaining on the current block.

All are running Windows (of different flavors).

All were stuck in varying percentages on the current block, between 40% and 99.5%.

Most were running on slower CPUs, but then the majority of my farm is slower CPUs, so this is statistically insignificant.

I did not think to grab screenshots of these clients. Hey it's turkey day. I'm really sleepy.

One of the machines was my dual P3-450. Both clients running on it were frozen.

All clients are running version 1.00. All have differing k values.

I can't think of any further details.

Halon50
11-29-2002, 08:20 PM
I have screenies of 4 more clients with the "stuck time" attached.

It is interesting to note that the cEMs displayed on each client are normal, although they were decreasing at a very slow rate (until I exited/restarted the client). I am wondering if this is related to a quirk I noticed on previous clients:

While running the client (v0.97 I believe) on some slower machines, the client would sometimes "freeze up" for a few seconds. The cEMs would stay the same, but the "Time remaining" counter would not decrement for several seconds. The progress bar would also not move until the lockup ended.

This happened rarely, but often enough so that I could catch it happening on several different machines with minimal monitoring. Anyway, I hope this helps track down what's going on.

Halon50
11-29-2002, 08:23 PM
One other thing, the progress bar for the "last block" does not correctly display percentage complete since you changed the client over to fixed block counts. For these smaller workunits, sometimes the "last block" will complete at less than 10%, and the workunit will be sent to the server at that point.

jjjjL
11-30-2002, 07:16 AM
i had actually never noticed the exact time error you were talking about until i came home and noticed it on my parents (slow) computer.

are you sure it was really hung just because the completion time was messed up? on the computers i've noticed it on, the display does seem to be incorrect as far as eta but it is still running fine. restarting the client does fix it but i think it's only a display error, not a real stopping. since it's a slow computer, it's obviously harder to watch and see if the % is still moving but if you see this problem again, and wait awhile, it will keep moving.

i think the reason this happens is because of the % bar mis-sizing on the last block of a test which you have also noticed. basically, i know both these things happen but i also know that they are only display issues and they only occur on slow computers so fixing them is a low priority right now.

-Louie

MAD-ness
12-03-2002, 03:20 AM
I have been getting this a lot on one of the machines I run SB on. Restarting the client doesn't seem to help.

If I have time tommorow, I will play with it and see when the error happens. It is on WinXP and the 1.0.0 client, I think, BTW.

Stricker
12-03-2002, 04:33 AM
my linux (1.0.2) box w/
k= 24737
n=727591
residue = EFD8A1ABB1DAC3E9
won't report because of error 3

ltd
12-03-2002, 03:07 PM
On a win2k system with client 1.0.0

Out of the log file.

[Tue Dec 03 19:17:39 2002] requesting a block
[Tue Dec 03 19:17:45 2002] got proth test from server (k=54767, n=57895)
[Tue Dec 03 19:18:33 2002] residue: 60F5DEAD06F9E27F
[Tue Dec 03 19:18:33 2002] completed proth test(k=54767, n=57895): result 3
[Tue Dec 03 19:18:33 2002] connecting to server
[Tue Dec 03 19:18:34 2002] logging into server
[Tue Dec 03 19:18:34 2002] couldn't report to server [report denied], retry in 100 secs [error: -3]
[Tue Dec 03 19:20:15 2002] connecting to server
[Tue Dec 03 19:20:17 2002] logging into server
[Tue Dec 03 19:20:20 2002] couldn't report to server [report denied], retry in 100 secs [error: -3]
[Tue Dec 03 19:21:30 2002] got k and n from cache
[Tue Dec 03 19:22:17 2002] residue: 60F5DEAD06F9E27F
[Tue Dec 03 19:22:17 2002] completed proth test(k=54767, n=57895): result 3
[Tue Dec 03 19:22:17 2002] connecting to server
[Tue Dec 03 19:22:23 2002] logging into server
[Tue Dec 03 19:22:32 2002] couldn't report to server [report denied], retry in 100 secs [error: -3]
[Tue Dec 03 19:23:40 2002] cache cleared

Hope this helps.
What looks very strange to me is the very low "n" value

Lars

jjjjL
12-03-2002, 07:07 PM
i cleared the record of about 20 blocks below n = 750000 because they were almost a week old and I needed them completed so they were reassigned.

The next server (should be installed soon) will save this data correctly but for now, clear your cache. sorry for the trouble.


-Louie

MAD-ness
12-04-2002, 02:26 AM
How do I clear the cache?

I deleted the file with the n value as part of the file name, as well as my log file, but the client keeps "retrieving the k and n from cache". I am currently searching the registry for entries that might contain this.

I really don't feel like re-installing the client just to get around this error. Fixing the problem and/or having more helpful directions for clearing the cache would be appreciated.

Halon50
12-04-2002, 02:29 AM
I did it by changing my username, clicking "no", then changing it back and clicking "yes" to retrieve a new workunit.

Mystwalker
12-04-2002, 07:37 AM
The keys are HKLM\SOFTWARE\LhDn\sob\cache. But changing name works, too, of course. :)

MAD-ness
12-04-2002, 04:11 PM
THe name change thing did the trick.

Thanks for the registry key info also.

I hadn't had time to dig for it yet.

smh
12-04-2002, 04:54 PM
Originally posted by jjjjL
i cleared the record of about 20 blocks below n = 750000 because they were almost a week old and I needed them completed so they were reassigned.

It's always good to see some progress. Looks like out of the twenty only 3 or 4 (very hard to see in the small graphs) haven't returned yet.

I know this is been asked before, and i don't know if it's possible, but can you set different expiration times on different K/N's ?

There is no option to unreserve a test, and even a slow client can easely do one block in a day. Of course, the pc might be of for a couple of days, but reporting one block a week should be doable

For K=27653 you can see in the graph that every time after expiration of the lowest remaining numbers a lot of tests are completed. Looks like N's around 2,8M are expiring now and are getting reassigned.

But every round there are a few exponents that will expire again. Even with only 56 pending tests, my guess is that by the end of the year 5 to 10 numbers still need to be checked if the expiration time stays at two weeks.

Mystwalker
12-04-2002, 05:28 PM
As n values increase, so do the cEM/s. That means it takes less time to get thru with a block. Plus, the block count per n increases, so PCs that get excluded would take several month for a n anyway soon.

Take make it short: I think it should be possible to decrease the expiration time, too.

Theoretically, there's no problem with the expiration, as there are infinite values to check, but in order to lessen holes in the statistic, it is a wise move IMHO.

But the final decision is of course yours, Louie. :D

Stricker
12-04-2002, 05:44 PM
also after removing the cache it leaves the number as a Currently pending blocks
i have 4 blocks even tho i have 3 computers on my earlier post in this thread i said which k/n it was and what its residue was