Pick it Up!

**excaliber** · 12-11-2003, 07:41 AM

US-Distributed did about 4.5 times more particles than we did last week, securing a nice lead, and moving fast!

**willy1** · 12-11-2003, 10:56 AM

COMING THRU

Sorry Excaliber, hope that didn't hurt too much

**magnav0x** · 12-11-2003, 12:58 PM

I'll help out a bit for a while to see if we can't get some more support to catch back up with US-Distributed.

**excaliber** · 12-11-2003, 03:42 PM

Lol.

Didnt hurt too much.

Memo to self: Need to get more boxen running dpad...

**PCZ** · 12-13-2003, 04:33 AM

How long does it take to do 100k of results on an Athlon XP2400 ?
Thats the amount before an autosend yes ?

**magnav0x** · 12-13-2003, 08:29 AM

100 results not 100k, but the stats are based on MPTS (how long it takes for each result to complete). On a Athalon XP 1800 I think it takes a few hours to do 100 results I believe.

**PCZ** · 12-13-2003, 09:02 AM

Well its been running for about 12 hrs, and if I have interpreted the results file correctly there seems to be about 30 results there.

So it looks like another 24 hrs before it sends something.

**excaliber** · 12-13-2003, 09:42 AM

You could also do a manual send with the manualsend script in the folder.

As the design gets better, each result takes longer (a better design means more particles retained, which means more to compute). But it also means that the MPTS goes up accordingly.

Heres a thought. I've been working on my own 'breed' of design for the project. Talking to the project manager, the problem they run into is inbreeding. Since alot of people use the same best100, only one or two designs get naturally selected. This causes massive inbreeding, while another better solution may have been crushed alread.

Im running without updating from that list. Im making slow gains, at about 1.46% after two days or so. I was thinking, what if the whole Free-DC team worked on this 'breed'? A team breed, if you will. None of us update with the best100 or the new sample100 file thats auto-downloaded (in v4.33, can be turned off).

Heres my results.dat that i've been using. Im sure one of us can set up a script or something similar to automate the process.

http://intheory.ath.cx/results.zip

Thanks.

**rshepard** · 12-13-2003, 10:02 AM

I'm playing with a similar idea. I just started on this project last week, and I have 3 boxes running "pure" strains (no best100 file). Then yesterday I set up a fourth box using the merged results.dat of the other three boxes.I was thinking of letting it run, and periodically restarting it with a new merger of the top results of the other 3.

**excaliber** · 12-13-2003, 10:09 AM

Thats the general thing I was thinking of.

Except if we have one pure team strain, we can advance it alot faster than individually.

**rshepard** · 12-13-2003, 10:26 AM

Hmmm....
How about this: I could set one box running the team strain, 2 boxes running a pure strain, and one running a merger of those 2 strains and the team strain? Best results kicked back into the team strain. Any benefit/penalties to such a setup? I haven't really run this long enough to know if I'm thinking in the right direction or not......

**excaliber** · 12-13-2003, 10:38 AM

Hmmm...Im still fairly new myself.

As to the setup. The two boxes that run individual strains will start to lag behind. After a while, the team strain will pass it up on muon yield (assuming there is more collective computing power on the team strain)

And heres where Im not sure. I dont know how the client chooses from the results.dat for the next generation. It might randomly choose, or only take the best out of the results.dat. If its the latter, the two individual strains would never be picked after they are passed up, making them useless to the team strain.

So, not sure. I'll go ask on their forums too.

How bout this setup (means more work). You have one box that is a 'gateway' to your other two/three boxen. The gateway box is nearly always up to date and using the team strain. The other boxen periodically (once a week? two weeks?) take the team strain and start working on it individually in isolation, merging their results with the team strain but not using it. This way they can stay up to date on muon yield, but have time to try a different path before being re-incorporated into the team strain.

Hope this makes sense.

Say, the team strain is at 2%. The individual boxen (lets call them developers, or dev boxen) grab the merged team strain and isolate themselves. For the next week or so, they work on that strain, updating the team. But they dont use the team strain, a that would contaminiate their progress and introduce inbreeding.

At the end of the period, they grab the latest team strain and start the process over.

This would allow for the most optimial breeding program, while reducing inbreeding. Since each box has a period of time to work on the current strain individually, a new and better strain can be found and incorporated. But the periodic update from the team strain allows it to stay up to date. The best of both worlds. Inbreeding will occur, but not nearly as bad as before.

Whatcha think?

If everyone did this (each box updating the team strain daily, but only grabbing once a week or so), it could do some interesting things.

**rshepard** · 12-13-2003, 11:08 AM

Wooo- lots to think about.....
1. I don't know if it's practical to update the team strain daily-- as the return gets better, the runtime is stretching out, and it may get difficult to get the low-end boxes to finish a run in the 24 hr period. I think a weekly synchronization is better.

2. I roughly know what the client is doing to produce the generations-it either:
(A) Mutates- randomly changes one or more parameters
(B) Crossover- uses part of result a and part of result b to produce a new configuration
(C) Interpolates- interpolation between results a and b for the new configuration

3. I agree that the two pure strains in my setup should lag behind; but they may not--
that is the whole point of the pure strains; to catch a really good result that the main strains miss because of the inbreeding problem. In the setup I described, my box running the team strain is the "gateway" box, the one running the merged team/pure strains is a "dev box"-- and they are probably the only two that would produce results that would ever make it into the team strain. We are pretty much saying the same thing; I am just holding on to the two pure strains out of (a) hope and (b) curiosity.

**excaliber** · 12-13-2003, 11:30 AM

Ahh...I see. The two boxes that are running pure may lag behind, but thats ok. The point is to find an oddity otherwise missed.

The dev box is the 'normal' box that many might run. I agree about synching to the team strain. Every week or so would be sufficient. That can be left up to the boxen operators. Longer synching periods would produce more varied results, like a mini-pure strain.

Im working on a PHP script right now that lets you upload a file. It will then (still working on this part) append the results to a results.dat that is held serverside. It will also let you download the server results.dat strain.

Let you know when its done.

**rshepard** · 12-13-2003, 11:35 AM

Sounds good-- I was wondering how we'd handle the maechanics of this. I'm off to the office to grab your "team strain" file and implement the setup I described.
This ought to be interesting

**excaliber** · 12-13-2003, 11:53 AM

Indeed. Almost done with the script.

**willy1** · 12-13-2003, 12:12 PM

Good Grief!! Look at all this activity

How long does it take to do 100k of results on an Athlon XP2400 ?

On my Athon XP2600, it seems to vary from 8 to 14 hours.

I didn't realize there was so much going on with this project. I just added the clients to a handful of machines, and let 'em rip.

What is this 'best100' or 'sample100' file you're talking about? I can't see that any of mine have downloaded any new files. I'm running 4.32g on all the boxes.

willy1

**rshepard** · 12-13-2003, 12:19 PM

4.32g won't auto-download the best100 file. There are best results files posted on the website that people can d'l to "jumpstart" their clients to produce good yields-- but if everyone uses the same start point, it leads to the inbreeding of
results we're trying to avoid. If you d'l the new client, you need to set the "download sample results file" line to 0 if you don't want to use those files.

@excalibur: OK, I have one box running from your results.dat file, and one box running a results.dat made from the top 100 of your file and the top 100 of each of my pure strains... now we'll see what happens

**willy1** · 12-13-2003, 12:34 PM

So, is it a good thing or bad thing to use the downloaded file? Good for starting off with better results, bad because of inbreeding.

I've just been looking aroud the site for this 'best100' file, and poking around in their forum as well. Haven't found it yet, although I've seen it mentioned in the forum.

Also, looking at my result.dat file (1.7MB) from the box that's been running this the longest, how do you tell what is a 'good' result? Are they in any particular order other then chronologically?

**magicfan241** · 12-13-2003, 12:37 PM

Back when I did it (4.0, when there were 125 users total

) there was a uber-result file. Of course, it was around 3.5Mb,and there was only a dozen variables, where now I think there is 50-something variables....

But anywho, here it what I have to ay on a team strain file, how they figure out the next run, and other stuff:

1. The Team Strain File-- A great concept. Ars used to do one, but with a team that no one cares about, that got sidetracked. For a fle like this, the more results the better. The problem with the best 100 file was that there are too few results, leading the "intelligent" run selector engine to come up with the same run to do on many machine.

I have an idea for a setup, while it would be way above my technical ability, would be the best soluvtion for all, to get rid of inbreeding, while still churning out high value result files:

1. Get a database together with all of the result file. I'm sure there is a way to have a program read the result file, and add entries to a database (I have a Oracle 9 CD set for linux if anyone wants me to make a copy. I'll have to find a ocuple of blank DVD(+ or - I forget)R's to copy it, but I can for whoever wants it). If this could be semi-automated, it would be better. Like have a page for uploading result files, which would submit them to the back-end database. At this point, we should also have a dupe checker, so people can just dump their result file every week, without worring about dumping the same result by accident. Another option for this is to put a line (or something) that would indicate the end of the results that have been taken from the database.

1b. That is how we could put files in it. The way to get them out would be another web interface. Have it listing the number of results, possibly splitting them into three catagories (x<1%, 1%<x<2%, 2%<x) and allow people choose how many of each they want. Then, the grabber program/script would take RANDOM results of each area, eliminating the problem of inbreeding.

2. I don't know much about the back end of choosing a new run anymore, but I know it used to be that the first run was completly random, while each run after that used all of the previous data to get an idea of where the higher results would be. Not quite as sure as I used to be, but it used to be a good idea to have 250-1000 results in a file to get a better result. IT might have changed.

Finally, you might see me after the SETI gauntlet ends at Ars.

Don;t know if I helped or not, but that is my $.02,

magicfan241

**PCZ** · 12-13-2003, 12:44 PM

The sampleresults file was automatically downloaded when I started the clients.

It wont be used unless I rename it or merge it into my results.dat file.
I hope this is the case.

I downloaded the best500 file and renamed that to results.dat so maybe that wont be too bad for the inbreeding.

**rshepard** · 12-13-2003, 12:45 PM

@willy1:
It's a toss-up really -- the best results strain needs a certain amount of work to be done, to continue to develop the most promising strain. Or you can strike off in a completely new direction and hope to find a better design, which will then become part of the best100 file. "Best" is determined by %yield, which is the number right before the (XX.X Mpts) value in the results file (I think). I use MuonMonitor to look at the results file, even though it's an old app. see http://free-dc.org/forum/showthread....&threadid=4702

**excaliber** · 12-13-2003, 01:03 PM

Interesting. I'll start to take a crack at it. Ill do it in PHP with a mySQL database. When you upload a results.dat, it first will take a hash of the data. This will allow for checking of duplicates (if the two hashes are the same, then the results are identical).

Next, it will take that data and and slot it into a table cell, and place the muon yield in another cell. Easy as pie. Upload, and the script takes care of the rest.

The retrieving of a new results.dat wouldnt be that difficult either. Just specifiy the percentage of each muon yield, and it will dynamically compile a results.dat for you.

Till then, heres my quick script for a team strain:

http://intheory.ath.cx/dpad.php

Currently, it doesnt duplicate check, but I'm doing that manually with MuonMonitor (has a duplicate check built in). When you upload, it merges/appends with the current one. I'll get rid of dupes about once a day, or whenever I remember. It doesnt hurt the client to have dupes, just a big size to download.

Have fun, I'll start working on the next one.

**excaliber** · 12-13-2003, 04:20 PM

Pheww!

Done, go check it out. Its got my current results.dat in there.

Still at http://intheory.ath.cx/dpad.php

When you upload, it places the file in a temp file that is timestamped. Next, it loops through each line and checks to see if its hash is in database or not. If it is not, it adds the hash, the data, the yield (which is pulled from the line for referece) and the date.

On download, selecting all gives you the entire database. The other option allows you to ask for a certain amount, and randomly selects them from the database.

Let me know if anything odd happens. Im going to move this to a more permanent host after I work on some more features/stats.

**magnav0x** · 12-13-2003, 04:36 PM

PCZ, if it automaticaly downloaded and is useing the sample best100 file then it will take a lot longer to finish each run than if you were starting the results.dat file from scratch. If you use the the sample results.dat file the MPTS per particle could be as high as 400 MPTs. If you start from scratch they will start relativly low (well less than 50 MPTs each.

BTW exaliber I wouldn't mind helping out with the development of the php/sql system. Also if we need somewhere to host it, I can help out there too.

**excaliber** · 12-13-2003, 04:51 PM

Sounds good, I may take you up on your offer for the hosting and developement. I've got some more stats and such I'd like to incorporate first.

**magnav0x** · 12-13-2003, 04:58 PM

You can take snipets from my php file I posted in another topic if any of it would be useful, but it looks like you've already had to write it all from scratch any how

**willy1** · 12-13-2003, 04:59 PM

Originally posted by PCZ
The sampleresults file was automatically downloaded when I started the clients.

It wont be used unless I rename it or merge it into my results.dat file.
I hope this is the case.

I downloaded the best500 file and renamed that to results.dat so maybe that wont be too bad for the inbreeding.

Saturday morning denseness here:

I can't find these files on the MUON site you all are talking about. Links?

excaliber - uploaded 1700+ results from this box.

**magnav0x** · 12-13-2003, 05:03 PM

Oh yes another note on development. It would probably be a good idea to make an option to specify what kind of file you are uploading (ei: Linux (solenoidsonly) or Windows (solenoidsto15cm)). And of course it would store them in a seperate table. Then the users would also have an option of downloading either a results.dat file that is solenoidsonly or soleniodsto15cm. I have a solenoidsonly results.dat file that is a bit over 1mb that I would like to upload but I don't want to mix it with your results, because if a window users downloads it the solenoidsonly optimized results would not run on the windows client unless they have the solenoidsonly lattice. All in all it's not worth the windows users to get the solenoidsonly latticce, because the solenoids15cm is much more efficient.

**magnav0x** · 12-13-2003, 05:04 PM

Willy1, it's better (suggested) that you don't use the top500 results file from the site any how. Just let your client continue on with it's own strands. It will slowly evolve

**excaliber** · 12-13-2003, 05:44 PM

Magnavox: Ahh, didnt think about that. I'll add that when I get some time. Shouldnt be hard at all, just a few option buttons and another table. Same code.

Thanks

EDIT:
Found a small minor name bug, so if you uploaded already, please do so again. It works now, I tested it to make sure.

Heh, didnt know that the Linux version was doing something different from the windows version.

**magnav0x** · 12-13-2003, 05:53 PM

Well the original 4.3 series of the client used a differnt optimization (solenoidsonly). Thing is no one has ported the client to linux since 4.3. Autosend/manualsend don't work under linux. They don't for me any how. I had to copy the solenoidsonly lattice from my linux box to the lattice folder in windows (of the 4.32 client) so I could manualy send the results that way. It's a pain in the but. I haven't dumped my linux box results in a while, the results.txt file is just over 1mb, I think it has about 30,000 MPTs in it. The results.dat file is a bit over 2mb.
Take a look here: http://www.technosapiens.cc/muonresults.php

as you can see the optimization on the linux clients is not very great....

**magnav0x** · 12-13-2003, 07:03 PM

Another nice addition for the script for downloads would be the dumping to a file so they don't need to copy and paste (though copy and paste would be better for merging their current results.dat with the ones in the database. Though if they upload their results.dat to the database then that won't be a problem....in fact it would encourage people to do so). Have the script dump the contents of the database to a temp file named FreeDCResults.dat or something along those lines. Then use a header redirect to point to the temp file just created so they can directly download the file. And of course, delete the file upon completion of the script.

Optionaly, maybe it would be good to impliment a on the fly compression.

**excaliber** · 12-13-2003, 08:46 PM

Yes, I've thought of that. Though Im not sure how to force the browser to show a Save As prompt using PHP.

Another problem. When they run the download script, it creates the temp file. But the script ends, and is not around to delete the file.

I guess I could put it into a loop and have it wait 5 minutes or so before deleting the file.

Alternatively, I could have it call another script that checks every temp file for the last modified date. Anything past an hour old gets deleted. That sounds better, and easier.

Yes, I may look into adding dynamic zip compression (so the zip is downloaded instead).

I added a few more stats, so you know what the max and min yield is, and the last update.

**excaliber** · 12-13-2003, 09:34 PM

Done. It compiles a results.dat dynamically and stores it in a temp file. You can still copy/paste or save as under the random download. For downloading the entire Strain, it only allows you to save (so it doesnt have to download the text to place in the textbox first, bad for modems).

Everytime the index page is visited, it runs a check for temp files older than 10 minutes, and deletes those. As the results.dat increases, I may need to increase that so that modems have a chance to download before it gets deleted.

Im looking into compression next. Theres lots of empty space that could be compressed.

EDIT:

So, heres the two options I see now that we have a team strain.

Each person has at least one box that is synched with the team strain with some regularity. This means that they contribute, and then use the team strain results.dat for computation.

Any additional boxes may do this or run 'pure' strains. They do not update their results.dat with the team strain, but still merge their results with the teams.

Just like rshepard said.

**magnav0x** · 12-13-2003, 11:39 PM

Great work buddy

If you like, drop me a copy of the scripts and I can add some stuff as well. Then we could eventualy merge our workings together for our the completed script. Hell we can start our own sourceforge project

If you don't mind so co-op code added e-mail me a copy to magnav0x@ezracing.net and I'll do a couple of additions in my spare time. Well off to work I go....

**magicfan241** · 12-14-2003, 08:47 AM

Well, with the 4.33 client, you can now set it to download a sample 100 file every x days, and you can tell it where to download this from. An idea would be to have these scripts create a new sample100 file from our strain every day so that we could just auto-download it.

Just another thought, and wow, that page is cool...

grank grank

magicfan241

**excaliber** · 12-14-2003, 09:26 AM

Hmmm...didnt know that it allowed downloads from a different source.

If you'd like, I could have it generate a team strain file once a day, containing all the results. Or would a lesser number be better, like randomly pick half the max results in database?

Thanks for the input.

magnav0x: Dont mind at all. I'll send you an email later today. Be warned, the code is kinda messy, it was done in a single day after all. Theres not much indentation, and no comments.

Do you have a permanent host we can use? If so, would it by chance have an FTP account I could use as well (not full right, just rights to the DPAD directory). That way I can continue to mess with the script as well.

Sourceforge...why not? :P

**rshepard** · 12-14-2003, 09:37 AM

How many results should we keep when uploading our files? Do we want to send the whole file, or would it be better to trim it to the top 100 or so results?

**excaliber** · 12-14-2003, 09:44 AM

I've been uploading my entire results.dat. Stephen (the project admin) said this over at their forums:

The results are ordered by rank and then the ones used for generating new designs from are selected according to a distribution that is heavily weighted towards the top end of the range. Removing, say, the bottom half won't have too much effect. In general I've tried to design it so that culling isn't really necessary as it ignores the lower results _most_ of the time naturally, but will sometimes try them for some variety.

So I figured it cant hurt to upload the entire thing.

Thread: Pick it Up!

Thread Tools

Rate This Thread

Display

Pick it Up!

Posting Permissions