Running client in short bursts [Archive]

View Full Version : Running client in short bursts

aasmunds

06-20-2006, 09:44 AM

I have access to a linux cluster where I can run SoB on idle CPUs, but to ensure I don't block real use of the cluster I can only run the client for a short time (~10-20 min) before killing it and putting it back into the job queue.

Q: Does the client save its progress continuously, or only at intervals, f.x. every five minutes? How many seconds of work do I lose when I kill the client?

vjs

06-20-2006, 10:28 AM

You should be able to leave the programs installed at the lowest priority, wouldn't this work just the same?

How many CPU's are we talking about anyways???

If your looking for 10-20 minute tasks you should consider the factoring section P-1 or even better sieve. You could set it up with a very small range 20 min worth, call everything remote complete the whole task in 20 minutes, the cpu will exit and you done, etc... It should even be possible to run the client over a network or mapped drive...

aasmunds

06-20-2006, 03:10 PM

You should be able to leave the programs installed at the lowest priority, wouldn't this work just the same?

No go. I'm running in idle time at a scientific computing cluster running linux. Programs are run via a queue system. Since there's only one idle queue, and not an idle-real-science and an idle-silly-stuff queue, I can't run jong jobs. idle jobs get killed when priority jobs are submitted, but not when new idle jobs are sumbitted. And since most of the stuff that runs in the idle queue is real science, I have to run short sessions to avoid hogging cpus. I can set the waiting priority so that other idle jobs waiting to run start before mine, but a running idle job is never killed to start another idle job.

How many CPU's are we talking about anyways???

0 most of the time, and a few hours with ~100 CPUs occasionally during holidays and weekends.

If your looking for 10-20 minute tasks you should consider the factoring section P-1 or even better sieve. You could set it up with a very small range 20 min worth, call everything remote complete the whole task in 20 minutes, the cpu will exit and you done, etc...
Sounds interesting. How?

It should even be possible to run the client over a network or mapped drive...

I run from a shared newwork drive to make it easier to shuffle files around.

Mystwalker

06-20-2006, 05:55 PM

What cluster system is used?
I did a similar thing with Condor approx. 2 years ago.

wblipp

06-20-2006, 11:19 PM

I don't know which SOB tasks match that time slice. It looks like it's probably a good match for running a single GMP-ECM curve at B1=11M for composites in the vicinity of 300 digits. For smaller B1 or shorter composites you might do several curves in a time slice. You could shuffle files around, but the manual bookkeeping might become a nuisance. Is there TCP/IP access from the compute nodes to a computer where you could run an ECM Server (a very low usage process that mostly waits for work requests and work results)? If yes, then you could custom compile a version of ecmclient that stops after one task, and have a very slick system to process ECM curves and keep the bookkeeping automatically - copy 3 small files to the process node and start one program.

Unfortunately, I don't think SOB has any ECM tasks running in this range. (If I'm wrong, I'm sure somebody will post a correction.)

aasmunds

06-20-2006, 11:30 PM

What cluster system is used?
A beowulf-type cluster with Sun GridEngine queue system.

aasmunds

06-20-2006, 11:32 PM

I've solved the file-shuffling problem, so I'm just wondering how much work is lost each time the client is killed. One minute? Five minutes?

znedelchev

06-21-2006, 03:27 AM

The SB 2.50 Windows client writes the cache file every 10 minutes .
So, may be the correct answer is "between 0 and 10 minutes" ;)
BUT !!! I have had cases, when stopping a machine with SB client
by switching off the power, caused damaging of the cache file !
I don't know, may be killing the client under Linux when SB is writing
the cache file is safer....

hhh

06-21-2006, 07:51 AM

Running the standard SoB-Client sounds like nonsense to me, as you would have 100 tests, each sliced down to perhaps 1h per day; and these take already a month, when run 24h/d.
Why don't you run 100 sieve clients, each by commandline:
./sieve x x+0.1G
./sieve x+0.1G x+0.2G
.
.
.
you see what I mean? x is the starting number of your range, you have to do the math before, of course, and the 0.1G has to be repaces by something that corresponds 20min.

Somebody else might explain better than me. H.

aasmunds

06-21-2006, 08:01 AM

Ah, yes, one of those undocumented side projects. I'll consider it, if I can find a linux client.

vjs

06-21-2006, 10:00 AM

Alot of really good comments and suggestions here...

Znedelchev is totally correct IMHO at a few minutes a "spurt" prp testing using the main client won't work very well at all. Your potential for errors and the time required to complete a test may be enormous. I wouldn't go as far as nonsense but certainly a bad idea in practice.

hhh, has already hooked you up with what to do. Sieve is really the way to go, the memory requirements are fairly small <<100MB. In addition you can specify the time for each run based upon the range sieved, it may not be a large range but 100 small ranges add up quick. The best part... I don't see a reason why it can't be run over a decent network with nothing residual on the individual rack etc.

The sieve client needs:
The exe, a dat (data file for matching factors), some place to write factors, some place to write a log.

Basically the client can be started from a commandline with switches one of the switches can specifiy where to look for the dat, another the range to sieve, and I believe the log file can be written anywhere.

Best part is Joe_O and Chuck wrote the sieve client so we have the source. Not sure if it can or they already have a Linux version.

If you have a windows box, I suggest you head over to the sieve sub portion and try to figure out how the client works with windows/dos command line first. This will give you a better idea how you can implement it on your cluster. I'll forward this thread to Joe later today if he doesn't pop in first.

hhh

06-21-2006, 11:24 AM

Sure there is a linux version.

http://www.mklasson.com/proth_sieve.php

This is not the latest and fastest version of the program, but it works, and isn't slow. The linux version of jjsieve isn't out yet, I think.
Use the -h switch to get help on the swiches.
All information the client needs can be put to the command line.
I suggest you do high p SoB only-ranges, as lost ranges will not result in many lost factors.
Have fun figuring it out a bit.
Yours H.

vjs

06-21-2006, 03:00 PM

Of course hhh, thanks. Sometimes I forget about the orginal versions I've been through so many betas of the current.

There are also some sieve benchmarks in the sub-forum somewhere to give you an estimate of speed. But I'm sure if you post us the exact processor you can get a good estimate here.

Check out the file sievecalc it will give you an idea on what size of range to choose, but I think should be able to do 1 G per processor every 20 minutes.

jasong

06-21-2006, 05:29 PM

Matt

06-21-2006, 07:34 PM

You can use JJsieve on linux with Wine and it's still faster than proth_sieve, I run several clients via wine on FreeBSD and it works like a charm.

Joe O

06-21-2006, 10:32 PM

If I may bust in for a moment:

JJSieve can handle individual ranges significantly less than a billion. If I may be bold, I'd recommend ranges of 10 million or less(.01 billion). 20 minutes would probably anger a lot of people that would like to use the computer for their own work. It would work especially well if a new tiny range could follow every completed one, then there would be less "wasted" time.

Just my opinion.
Although jjsieve will handle small ranges, the efficiency drops off very sharply at ranges below 67'108'860. in fact 134'217'720 would be another good range size, as would 268'235'245 or 536'870'910. Range sizes in between these number will have varying efficiency, possibly as low as 50%. These numbers may change in future releases.

aasmunds

07-23-2006, 08:57 AM

I've solved the file-shuffling problem, so I'm just wondering how much work is lost each time the client is killed. One minute? Five minutes?

Looks like the linux client writes results to disk every two minutes or so, which means up to 10% waste in 20-minute runs, less for longer runs.

My current system of hundred to two hundred parallell runs when the cluster is available seems to work nicely.