PDA

View Full Version : Thoughts on sieve client wrapper



Greenbank
07-06-2005, 07:08 AM
I've been toying around with a wrapper program for proth_sieve (on Linux).

Apologies if this is similar to sobistrator, I don't do sieving on Windows so I've never seen it.

I had multiple machines running proth_sieve and wanted to keep track of them all without logging into them and wading through the logfiles.

My current wrapper program does the following:-

Checks the status of SoBStatus.dat (finds pmin,pmax and the last pmin value in the logfile).

If all is ok (i.e. there is no 'Done' message) it starts the appropriate proth_sieve client.

It then monitors the SoBStatus.dat file keeping track of the last line.

Every 30 seconds (this is configurable) it checks for new factors in all of the fact*.txt files. Once a day is a more realistic setting.

If it finds anything new (factors or a later line in SoBStatus.dat) it uploads it to a central server. I can then view the status of all of my sievers on one webpage (via some cgi-bin stuff).

I would need to move the comms layer from a proprietry protocol over TCP to HTTP GET/POST, but that shouldn't take too long.

Reporting back by HTTP GET/POST would mean it would be relatively easy to integrate it into both Matt's reservation site and also the factor submission pages.

This is an important point. I'm not suggesting a new website or central server. I would only do this if it is going to integrate into the existing systems.

It also takes a note of the size of the SoB.dat file, which can be checked against a central server to inform the user that there is a new and updated SoB.dat and that they should go and get it. It would also help identify the other (1M-20M .dat files) that are being used.

I'm redeveloping it in stages:-

1. Parsing SoBStatus.dat file giving a simple report to stdout.
2. Parsing fact*.txt files for number of factors in report.
(The above is similar to Matt's php script)
3. Checking file size of SoB.dat
4. Monitoring both of the above files
5. Uploading status to webserver
6. Uploading factors to webserver

Future stages (open for discussion):-

a) Running the proth_sieve client directly.
b) Logging in to webpage before uploading factors (would be required).
c) Logging in to webpage before uploading status (would be nice but not 100% necessary)

Further on from this is the possibility of automatic range assignment but given the fact that a standard 1T range takes most people 1 month it would probably be more work that necessary.

Benefits:-

1. Automatic submission of factors relatively soon after they are found.
2. Progress updates are submitted automatically and frequently (once a day).

Source code would not be released but I would share it with the appropriate people to get it compiled for the other OSs that proth_sieve is available for.

hhh
07-06-2005, 07:25 AM
If you want to see Sobistrator, you can run it on a Windows machine without starting to sieve.
http://www.teamprimerib.com/vjs/
So you can see what it does. H.

Mystwalker
07-06-2005, 11:54 AM
Well, some time ago, I also started work on a sieve client wrapper (in the days of Sobsieve) and the corresponding server.

It basically works, and I already adapted it for proth_sieve some months back. Also some logging (for the client only so far), using configuration files, and I started a new way to extract information: By just taking the screen output (well, before it goes to the screen) of proth_sieve, I think I can get every information I need. :)

It's not that feature-rich, but it allows automated sieve range distribution (reservation and completion) and factor submission (incl. storing in a DB).
A found a lot of nice features that could be implemented, but I totally lacking time to do this.
I'm currently considering releasing it to a small community (or the public) for further work on it.

It's written in Java, as proth_sieve exists for multiple OSes - and networking code in Java is as easy as taking candy from a baby (not that I've tried the latter so far ;) ).

Matt
07-07-2005, 06:39 AM
I did consider doing this myself but I'm no good at any programming except PHP. I am very keen to support this and will obviously help in any way with the Sieve Reservation site and intergrating it with your wrapper.

Greenbank
07-12-2005, 02:39 PM
This is what I just posted over at:- http://www.rieselsieve.com/forum/viewtopic.php?t=461

I'm not part of Riesel Sieve but I am part of SeventeenOrBust and we're using proth_sieve over there (as I'm sure you know).

I'd be interested in sharing ideas and even helping out on the coding front.

I was going to implement something similar for SoB and use a domain I have as the server part of this.



IMPORTANT NOTE (especially for Matt!):

I should point that I used my own domain purely for testing to try and assigning sub ranges within my assigned range to different machines under my control.

I expect the final version to speak to Matt's reservation site AND www.seventeenorbust.com/sieve for factor submission.

The only thing it will never be able to do is email factrange.txt to factrange@yahoo.com but I'm sure we can come up with something for this.



The client would just have to speak simple HTTP and do GETs and POSTs.

I'll see what I can knock up in the next few days and get back to you.

The way I was going to do it:-

Users connect to the site and reserve a large chunk.

( Over at SoB we get ~ 440kp/s on an Athlon XP 2400+ (2GHz). At this speed a 1000G (1T) chunk at just under 1T takes about a month. )

I'm not too sure of the speed of Riesel as you have many more k's and different bounds for n so I'll just carry on quoting SoB figures.

So, a user goes to the site with a normal browser, registers or logs-in and reserves a 1T range.

They then configure their client to connect to the site (with the username and password) and let the client go. The client will connect and get the range details and start sieving. Every hour or so it will connect back to the server and provide an update (size of SoB.dat used, pmin,pmax,pcurr, min/max/av kp/s and any new entries in the fact*.txt files).

If the client runs out of work to do (i.e. the user has not reserved a new range on the site) it will be assigned a small range as a stopgap measure, roughly a days work (so 30G for SoB). That way if someone forgets to reserve a new range it will continue to be productive to the cause. People could even run the client in this mode permanently, never reserving their own ranges.

The 'ConnectOnlyAtStartup' mode would allow people to use the system if they have dialup or an otherwise limited connection.

It would only connect to the site and provide an update upon startup. They would do the following:-

o Dialup...
o Register on site or log-in. Reserve a range. Configure client with 'ConnectOnlyAtStartup=y'
o Run client
o client would then grab range and attempt no further connection
o User is free to close dialup connection

Whenever they want to update the progress they can dialup, stop the client and restart it. It will check the status of the files, connect to the server, update the server with status (or even update everything and grab a new range) and then attempt no further connection. User would then be able to disconnect their dialup.

Completely offline users (i.e. sneaker-netters) would be able to register and reserve ranges and mark them as being processed completely offline. They would have no requirement to run the client (there'd be no point anyway!). They're free to split up their range as they see fit if they are processing it on multiple machines.

There would be a form on the site to upload the same details (.dat file used, pmin,pmax,pcurr, min/max/avg kp/s, fact*.txt files)

There are a couple of other considerations (relevant to SoB at least) that I'm thinking how to solve:-
o Factor submission to the SoB servers (client should be able to handle this)
o Range/reservation management. The ability to abandon a range, mark it as lost (due to crash), recover a range from the last time it was updated to the site, etc).
o Admin functions (retreiving factors, etc)
o Online stats (min/max/av kp/s for 1 day, 1 week, 1 year etc). Current k,n pairs still to test below maxn, etc.

Anyway, enough for now.

OverlordQ
07-12-2005, 06:15 PM
could do it in Perl pretty easily, since there's about a million and one different modules that could interface with the sieve submission and the beta reservation site.

Greenbank
07-13-2005, 04:31 AM
My problem with Perl is that it can be hairy to implement on the Windows platform.

I do have access to P2EXE (a perl to native .EXE compiler) but unfortunately it is licensed to the company and so I can't use it for non-company things.

Matt
07-13-2005, 05:28 AM
Presumably it would still only be command line on windows though, and would need yet another wrapper to make a windows GUI.

Mystwalker
07-13-2005, 06:45 AM
Implementing it Model-View-Controller style would greatly ease including a GUI lateron - either by another wrapper or by changing the source...

My Java wrapper had an AWT GUI on the server side at the beginning (wasn't my idea, though). I changed it to CLI, which took half an hour. Using a fliexible output behavior, there should be no problem adding a GUI again. The logging mechanism is a step in that direction, as it already centralizes log file and screen output...

Greenbank
07-13-2005, 08:01 AM
I had several reasons for doing this:

1. I like to help out. This would benefit the project, and if written correctly, also help out the riesel sieve project too.

2. I have several machines that do proth_sieving and the admin/coordination is a nightmare.

Right now I'm waiting for one machine to finish the last tiny part of my range (a 1T range starting at 882T). As I got access to various machines I diced up my range and spread it about. I've had up to 6 machines working on this range (about 2600 kp/sec in total). One was my home laptop (no internet connection) and so there was lots of to-ing and fro-ing with paper and flash-usb drives. I was also scp'ing files from various machines so i could see how it was going with Sobistrator.

So in 59 minutes I will have completed my range. I'll then have to grab all of the fact*.txt files from each of the machines, confirm that each of them have finished their alloted ranges (looking at stat.txt and/or SoBStatus.dat) and make damn sure I haven't missed out part of the range. Then I have to collate all of the fact*.txt files and the submit then and email factrange.txt off.

A nasty load of admin on my part.

The ideal goal would be to just set the machines off and let them go and get more and more chunks, much like the SoB sbclient does for PRP testing, without having to check up on them, fiddle with them, dice up ranges, work out range sizes for various different kp/sec machines, etc.

I want to make sure the automated wrapper provides two main options:-

1. Pick a chunk from range pre-reserved by themselves (on the site).

Say I reserve a 1T range from 882T to 883T and I want this to be processed by my 6 machines. The server would dice this up into say 50G chunks. Each machine would receive a sepearte chunk and process it as normal. (Don't worry, I plan to include an option on the client for the user to override the minimum chunk size).

2. Just do some work for the project by processing whatever the server decides to give them. This may be filling gaps, this may be putting say 10% of the work towards reseiving low p with high n.

I'd see most of the GUI side being handled by the server. It's the one place that will have all of the info and would allow the users to see how their reserved chunk is being processed.

By constantly updating the site (every hour or so) we minimise the risk of losing work. The client would upload factors as they are found, and also tell the server what pmin has progressed to. If the user's machine crashes, or they abandon the project for no good reason (losers!) then we'll have a reasonably up-to-date status of where they were so someone else can carry on.

Users would be able to login to the site and see how each of their machines are processing. How much of their range has been sieved, by which machines, at what speeds, etc.

We can gather statistics on av kp/sec and use this for estimation on reaching p=2^50 (proth_sieve limit).

Lots of other useful stuff follows from having a central repository.

Does all of this sound reasonable?

Mystwalker
07-13-2005, 08:29 AM
Originally posted by Greenbank
I'll then have to grab all of the fact*.txt files from each of the machines, confirm that each of them have finished their alloted ranges (looking at stat.txt and/or SoBStatus.dat) and make damn sure I haven't missed out part of the range. Then I have to collate all of the fact*.txt files and the submit then and email factrange.txt off.

A nasty load of admin on my part.

For the time being (read: before sieving gets automated), I'd make intense use of the nextrange.txt feature if I was you...
You can e.g. reserve two big chunks. One for current processing, the other for reserve.
Now, you roughly split the range across your machines (if you 100G off, it's no problem), so that they have work for at least two weeks, maybe even a month.
They same holds for the reserve range, split it similarly.

Once in a while, you can check whether the nextrange has begun. If yes, take the found factors and put in a new nextrange worth some weeks of work.

I'm confident this should decrease the needed administrative effort.

Of course, automation is certainly the better way. :thumbs: