Results 1 to 8 of 8

Thread: Team Strain quality thoughts

  1. #1
    Senior Member wirthi's Avatar
    Join Date
    Apr 2002
    Location
    Pasching.AT.EU
    Posts
    820

    Team Strain quality thoughts

    Hi,

    I don't think how the team strain is currently handled is of very high quality. The main reason is: the diversity is not very high. Take a look at a newly downloaded results.dat from the team-strain with the viewresults.exe. Almost all results look similar, they are all concentrate at one point in the 3D-map.

    That's because only the top 1000 results are included, and they all concentrate on that point.

    One reasonable thing to do would be the following: include some results with lower muon yield in the team strain download. Like: if you download a file with 1000 results, include the top 800, and additionally 200 random results (with lower quality). You should at least make such an option in the download-script.

    If we don't do that, we risk at heading towards a dead end (a "local maximum").

    A simple workaround would be: don't exchange your results.dat with the one you download, but append it; a programm that sorts out dpulicates would be helpful (there was a "muonmonitor" from bananeweizen of team rechenkraft, but I don't know if it still works with the current client).
    Engage!

  2. #2
    Stats Developer magnav0x's Avatar
    Join Date
    Mar 2002
    Location
    Dallas, TX
    Posts
    1,747
    I'm not active in the muon project at the moment so I'm not going to throw any opinions in the ring. I only limited the result.dat to 1000 to help with download and overall user wait. If you guys want to switch back to full download then I'll happily switch back to that on full strain download.
    Warning this Post is Rated "M" for Mature

    -Contains Harsh Language
    -L337 HaX0r W3RD2!
    -Partial Nudity

    I haven't lost my mind; it's backed up on tape drive somewhere.

  3. #3
    First, MuonMonitor does work with the current release- I am using it and another program called muon1datautil for culling/merging the files.

    As for the diversity question-- The client will generate Random setups, along with the Crossover, Mutate, and Interpolate setups that are based on the results.dat file.
    So you get a certain amount of low-end results to work with from there. Additionally, if you are running more than one client, you can keep the team strain on one and an independent strain on another, and periodically merge the independent results into your local team strain. I do this with my 4 clients; periodically I clip the best 100-200 from the three independent lines, d/l the best 200 from the team strain, merge all 4 files, cull the duplicates, and voila- my new local team strain.

  4. #4
    Senior Member wirthi's Avatar
    Join Date
    Apr 2002
    Location
    Pasching.AT.EU
    Posts
    820
    Originally posted by rshepard
    First, MuonMonitor does work with the current release- I am using it and another program called muon1datautil for culling/merging the files.
    Good to hear that, I'll have to download it from rechenkraft again ...

    Originally posted by rshepard As for the diversity question-- The client will generate Random setups, along with the Crossover, Mutate, and Interpolate setups that are based on the results.dat file.
    So you get a certain amount of low-end results to work with from there. Additionally, if you are running more than one client, you can keep the team strain on one and an independent strain on another, and periodically merge the independent results into your local team strain. I do this with my 4 clients; periodically I clip the best 100-200 from the three independent lines, d/l the best 200 from the team strain, merge all 4 files, cull the duplicates, and voila- my new local team strain.
    Yea, I know that; but:

    First, the random results usually lead to a very, very bad result compared to those that are in the strain currently. It's verys unlikely that a random result or a result derived from a now computed random result will overtake the other good results we have now.

    Second, of course there are these 3 types of mutations(+random). But, they'r only used on those 1000 similar (but of course good) results. As I said we could already be at a dead end. What if a good result can't be reached without take a detour via a bad result again? Those bad results are of course computed and would lead to results even better than those we got now; but by "killing" with this script they got no chance to "live" long enough to be productive.


    Example: Assume those 1000 results are "dead end" results. The only way to get better results than those is to mutate on of those to a worse result, and then mutate this result again, probably leading to a better one the 1000 you started with.

    Now, on my computer there are 1000 results. I upload daily. I compute 100 (dunno, just a guess) results every day. Some of them will be very bad random results => useless for now. Some of them will be mutated ones that are dead end results themselves. The rest will be mutated ones, that could possibly lead to better ones. Assume they are 1/3 off mine (=33). It's quite unlikely that among the 1100 I have at the end of the day on of those 33 is chosen AND mutated the right way so a better result is found. Most likely the better (dead end) results are taken (that's how the algorithm works, it takes better ones with higher probability). At the end of the day, those 33 are likely to be killed by the Team strain script because it thinks they are bad results (they are, but they could lead to good ones).

    Of course this is all theory. There is no dead end. But from some regions it's just easier to reach better results. And according to Stephen Brooks, bigger populations are generally better than smaller ones.

    Ok, knowing that I'll do as I've written: I will use the team strain but I won't throw away my own results.


    Whoha, got 5000 keys on typing this ... our TKC and Project orca teams will love this
    Engage!

  5. #5
    Senior Member wirthi's Avatar
    Join Date
    Apr 2002
    Location
    Pasching.AT.EU
    Posts
    820
    Originally posted by magnav0x
    I'm not active in the muon project at the moment so I'm not going to throw any opinions in the ring. I only limited the result.dat to 1000 to help with download and overall user wait. If you guys want to switch back to full download then I'll happily switch back to that on full strain download.
    No, I'ts ok how it is. There is no use in downloading the same results again and again.

    As I said, I'll just avoid to delete my results.dat and instead append the downloaded one.

    I've you've got too much free time (I assume you've not) you could make an function to download only the new results. Making it possible to download all results that were newly uploaded during the last X days.

    Wirthi
    Engage!

  6. #6
    Here are my thoughts on the issue.

    Yes, I know we are limiting our horizion, if you will, by selecting the top 1000 at max. I feel there is good reason for that. The team strain is about creating a new strain seperate from the overall project, to see if we come up with something better and different. Chances are we will, and it could turn out to perform better than the overall project's.

    We all started with absoulutely nothing. And now we are at 4.xx%, meaning we have refined that nothing to here. Sure, it would be great to keep all data collected. But as the amount of data goes up, the rate of change goes down much quicker. Which means, to get any type of yield remotely useful to the project, we would require the computing power of the project on a whole, not just our team

    So we are left with two options. Try to get the most random and unrelated results, by using the entire data set. Or continue to refine one set of data to a better yield.

    I'd rather refine one set, and get something that is production quality (at the yield of the project). Once we around that yield, our results will be filtered into the project's top100. In my mind, we have succeeded if we do that, as we have introduced another strain that the project would not have (who else is doing this? very few are working independantly, let alone as a whole team).

    At that point, if we wanted, we could start again and try to find a new strain, etc.

  7. #7
    Stats Developer magnav0x's Avatar
    Join Date
    Mar 2002
    Location
    Dallas, TX
    Posts
    1,747
    Actualy, selecting all new results in the past x amount of days wouldn't be that hard. Maybe this is something we'll look into doing.
    Warning this Post is Rated "M" for Mature

    -Contains Harsh Language
    -L337 HaX0r W3RD2!
    -Partial Nudity

    I haven't lost my mind; it's backed up on tape drive somewhere.

  8. #8
    Senior Member wirthi's Avatar
    Join Date
    Apr 2002
    Location
    Pasching.AT.EU
    Posts
    820
    Originally posted by excaliber
    Here are my thoughts on the issue.

    Yes, I know we are limiting our horizion, if you will, by selecting the top 1000 at max. I feel there is good reason for that. The team strain is about creating a new strain seperate from the overall project, to see if we come up with something better and different. Chances are we will, and it could turn out to perform better than the overall project's.

    We all started with absoulutely nothing. And now we are at 4.xx%, meaning we have refined that nothing to here. Sure, it would be great to keep all data collected. But as the amount of data goes up, the rate of change goes down much quicker. Which means, to get any type of yield remotely useful to the project, we would require the computing power of the project on a whole, not just our team

    So we are left with two options. Try to get the most random and unrelated results, by using the entire data set. Or continue to refine one set of data to a better yield.

    I'd rather refine one set, and get something that is production quality (at the yield of the project). Once we around that yield, our results will be filtered into the project's top100. In my mind, we have succeeded if we do that, as we have introduced another strain that the project would not have (who else is doing this? very few are working independantly, let alone as a whole team).

    At that point, if we wanted, we could start again and try to find a new strain, etc.
    You'r right on everything you've written, but that has nothing to do with my concerns.

    I did neither say that the team strain is a bad idea (in fact, it's a good idea to form a separeted pool of genes) nor that the data we've found is useless.

    I just ment we could make it even better by using a larger pool of possibilites (making our island bigger). That can't be done by including more results (due to obvious reasons, too much data to be downloaded), but as I have written by diversing the results more (taking in some lower-quality results).

    My main concern is that our team strain reaches a dead end and by limiting the results.dat too much and not finding a way out of it. It's unlikely, but possible.

    Just my 2
    Engage!

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •