PDA

View Full Version : Significant Downtime



deranged128[OCAU]
09-07-2004, 07:28 AM
I am only new to this project, as of a few days ago, but have a question regarding some significant lost processing time that has happened for me today.

I have DHE installed onto 6 PCs, 5 of which are dedicated to the project and the other part-time as it is my main browsing/gaming setup. For the first 1.5 days the production was quite high, but today my production has dropped dramatically and upon checking task manager I find that the machines have been idle for between 8 and 11 hours each in the past 16 hours.

What would cause them to be idle for so long. Java was still running on each machine, but at varying levels of CPU utilisation (72%~98%) and varying CPU time. On one machine CPU time was still zero :eek:

They are running as services so I'm not sure where I can look for a guide as to what is happening.

I'm not happy about having my machines sitting idle (they are all my personal PCs) and if this is to be a recurring problem with the project I may start looking elsewhere.

Can anyone give me some pointers?

prokaryote
09-08-2004, 01:30 AM
The only time that I've noticed machines sitting idle is if a restart was attempted while the data server was down. Then it will poll the server about every hour until it can connect again. If it is currently crunching and it can't return a result, it should still keep on crunching.

As for the service part, I run the text client so I don't have much experience with the service portion of the code.

Usually, the most issues with the server appears to happen over the weekends when no-one is around to restart or reset them. May want to consider having a backup DC project running in the background just in case? Hope that this helps, maybe Miguel has some insights?

prok

em99010pepe
09-08-2004, 03:19 AM
Originally posted by prokaryote
If it is currently crunching and it can't return a result, it should still keep on crunching.
prok

The computer keeps crunching without sending results. I had that problem once. Lost 14 hours of work.:mad:

Carlos

michaelgarvie
09-08-2004, 05:34 AM
Hi Pepe,

Welcome to dhep! :|party|:

This could have something to do with the version of Java you have installed.

Could you please install the Manual Installation with GUI (Not Service) . The instructions are http://www.informatics.sussex.ac.uk/users/mmg20/dhe/download.php#winnt

This should provide some feedback as to the problem.

Cheers:cheers:
Miguel

em99010pepe
09-08-2004, 05:43 AM
miguelgarvie,

I was crunching DHEP under Win Me.

I use IBM java because it is 30% quicker than Sun Java.

Please read your PM.

Cheers,

Carlos

deranged128[OCAU]
09-08-2004, 06:37 AM
Miguel, does this mean that I should also run the gui and not service client?

Thinking back I can recall a time when I could not get onto the dhe website, would a protracted period of not able to talk to the server result in the application shutting down? How often must the application talk to the server?

I also had a problem in Linux earlier today. I will post details tomorrow.

em99010pepe
09-08-2004, 06:42 AM
When I had that problem I was using the GUI version.
I think the program talks to server every 1000 generations, I read this somewhere....

Carlos

michaelgarvie
09-08-2004, 07:03 AM
The GUI client provides a little bit more of feedback. But if your client is working OK then the service is less intrusive.

By the way, everyone should be using the Sun JVM. Using the server VM is FASTER than IBM's VM by far. See how to use the server VM here:

http://www.informatics.sussex.ac.uk/users/mmg20/dhe/faq.php#fastest

[OCAU]Googlybear
09-08-2004, 02:02 PM
Hi Baz,

Have notice the same problems here that you have experienced. The client is running fine in service mode but it is not uploading results. I have to restart the service to get things moving again.

Can anyone answer the following:
when the client can't contact the server, for what ever reason, and continues crunching, does the client continue to buffer results to be uploaded?
what happens to the results it working on while it not uploading?
And is there any speed difference between using the GUI/text/service versions?

Also is there a webpage the explains the why, when and how the clients works. Forgive me if i have missed it.

I have only noticed a problems when my stats stop updating on the website.

thanks

Googs

prokaryote
09-08-2004, 07:17 PM
Originally posted by em99010pepe
The computer keeps crunching without sending results. I had that problem once. Lost 14 hours of work.:mad:

Carlos

If it can't connect while it was crunching and you restart before it could dump the results then I think you'll lose those results. I usually just let it crunch until the client can reconnect on its own without a restart. Don't know how the service mode is different or if it has a bug. I just use the GUI/text client.

deranged128[OCAU]
09-08-2004, 07:37 PM
Originally posted by prokaryote
If it can't connect while it was crunching and you restart before it could dump the results then I think you'll lose those results. I usually just let it crunch until the client can reconnect on its own without a restart. Don't know how the service mode is different or if it has a bug. I just use the GUI/text client.
So if you have to turn a computer off, or reboot for some reason, you stand to lose significant amounts of processed work if connection is down for some reason.

Is there a way of knowing, in either service or gui mode, whether you have work waiting to be uploaded? Can the client be altered to save work completed to that point in time in the event of shutdown?

Another question, how do you shutdown the GUI client?

Googs, problems continue for me too as I found another machine (XP2400 @ 2.2Ghz) sitting idle this morning, hadn't crunched anything for around 12 hours :( Probably have to start easing the overclocks soon too as it's finally starting to warm up after winter :D

I'm thinking of changing over my dedicated machines to GUI, to see if that is any better.

Barry

michaelgarvie
09-14-2004, 04:22 PM
Two things.

The bug making idle clients has been fixed.

IBM's JVM may be slightly faster than Sun's Server VM. It's worth a try.

maharius
10-13-2004, 05:53 AM
Is there a way of knowing, in either service or gui mode, whether you have work waiting to be uploaded?
Hi,

With all the recent outages, it might come in handy to have an answer to the above...

must... get... more... points

michaelgarvie
10-13-2004, 10:10 AM
Is there a way of knowing, in either service or gui mode, whether you have work waiting to be uploaded?

Easy answer: YES
How: if you have the GUI mode and the best of your population is better than the one displayed in the current goal stats (http://www.informatics.sussex.ac.uk/users/mmg20/dhe/statsGoal.php) then your island is hosting a circuit better than any seen so far - probably the best solution to the problem ever evolved.
What to do:
1) Keep the client going until server restarts.
2) Copy the genotype (<g>sdjkskjdfhkjsdfh</g>) and email it to me or post it in this forum.

Complicated answer: It's not so simple. Your island may host genetic material that at the moment is not extremely fit but holds to key to find a whole array of good solutions.
What to do: Contact me ASAP when the server goes down and try to keep your client running until then!
However: It's all a stochastic process in the end, so upon losing connection and gaining it you may immediately find a great solution anyway.

Important:
If you have lost connection with the server, you have an individual fitter than the one in the current goal stats (http://www.informatics.sussex.ac.uk/users/mmg20/dhe/statsGoal.php), and you have to stop the client then please do copy the genotype (<g>sdjkskjdfhkjsdfh</g>) and email it to me or post it in this forum. You will be credited with the relevant stats.

PS: workload packets are submitted every i minutes where i is the configurable interaction time.