PDA

View Full Version : In the Quest for a low A



PinHead
11-16-2002, 08:29 PM
Is it better to leave the client running for a long as possible?

Or is it better to restart every now and then to restart the random numbers?

I guess the answer lies in whether the program is purely random in folding or if it picks an initial value and subsequent values are derived from that initial value.

Anyone have any info, links or input?

Aegion
11-16-2002, 09:19 PM
Originally posted by PinHead
Is it better to leave the client running for a long as possible?

Or is it better to restart every now and then to restart the random numbers?

I guess the answer lies in whether the program is purely random in folding or if it picks an initial value and subsequent values are derived from that initial value.

Anyone have any info, links or input?
Its completely randomized with each new structure crunched, so there is no benefit for restarting the client unless something like a crash has occured. There is no benefit from restarting the client as far as protein structure prediction is concerned.

FoBoT
11-16-2002, 09:41 PM
how about putting one of these inside? will that help?

http://www.dribbleglass.com/backgrounds/rabbit-foot.gif

PinHead
11-16-2002, 09:44 PM
Woud you happen to know how it is randomized?

Is it randomized by a number that is picked up after the upload of 5000 units?

Is it a random number that it generated based on previous value and current random number?


I know that they are crazy questions, but I am trying to understand how multiple computers that are not connected to the internet are not duplicating each others work!

Or for that matter, how multiple computers connected to the internet are not duplicating work. Files on the computer don't seem to get updated with a new seed value after an upload? And yet low A's seem to show up on the same days in the top 10 list.

My experience with programming has shown that random numbers are not random for lengths of time and that is what has peaked my curiosity. Not out to bust anyone's chops, I just want to understand how the client that I am running works!

Any thought or input is greatly appreciated!

PinHead
11-16-2002, 09:49 PM
Originally posted by FoBoT
how about putting one of these inside? will that help?

http://www.dribbleglass.com/backgrounds/rabbit-foot.gif

:rotfl: With my luck they would definitely help. When I place my order can a substitute a cpu fan for the key chain?:rotfl:

bwkaz
11-17-2002, 09:00 AM
The random number generator gets seeded when the client starts up, and not again after that, AFAIK. Of course, I don't have the code here, but I believe that's what happens from past descriptions. ;)

The seed is a combination of the current time (in seconds, I think, though I could be wrong) and the client's process ID, which is guaranteed to be unique on any single machine. The only way two different computers' random number generators could be seeded the same is if they were both started at exactly the same time, and the process IDs on both were the same. Which is possible on e.g. Linux, if you boot up all machines in a cluster at the same time, then run a script that connects to each of them and starts the client. Linux assigns PIDs sequentially, so if you don't sleep for a second in between client startups, I think I remember someone having trouble with duplicated work -- the PIDs were all the same, and some clients were started at the same "time" as others.

The mechanics of the random number generator are such that if you don't know the seed, every number has an equal chance of getting generated. If you know the seed, it's a deterministic algorithm, so you can predict the numbers, but you're cheating. ;)

Also, I'm not quite sure on why it works this way, but if you seed two different RNGs with different values, no matter how close they are, the probability of them ever generating the same number at the same time is extremely small. So if you can guarantee unique time/PID combinations on each machine, you can guarantee different seeds, which means you can come extremely close to guaranteeing different sequences of numbers.

If I forgot anything, I'm sure others will jump in.

Edit: oh yeah, the random number generator re-seeds itself whenever you ask for a random number. The seed it uses is the number it just generated.

Brian the Fist
11-17-2002, 11:09 AM
The initial seed is indeed a combo of PID and current time (in seconds). The radnom number generator is seeded only when the program starts. There is indeed some duplicate data - about 5-10% of the data we receive is duplicate. However, it is unclear whether this is from people trying to upload the same files from different machines (copying it over) or from duplicate random seeds, which as mentioned most likely would occur on a homogeneous cluster. We have found that even using the exact same seed on different OS/hardware results in different final results, most likely because somewhere along the way, a floating point computation results in a slightly different result, leading the program down a new path

as for the generator itself (not coded by me):

Additive random number generator

Modelled after "Algorithm A" in Knuth, D. E. (1981). The art of computer programming, volume 2, page 27.

which is a relatively complex one, though there are better ones.

PinHead
11-17-2002, 03:32 PM
So on a single machine with a fast processor, would there ever come a time that you would be better off to restart the client? To start it down a different road.

As I don't know the size of value generated or what it is used for, is it possible to exhaust that space after a few million folds? Resulting in repeated work.

Thanks to all for the input so far.

PinHead
11-17-2002, 06:05 PM
Maybe the answer to this question will satisfy most of my curiosity:

Is the random number used just at the beginning of the fold or is a new random number used at each step (129 currently) of the folding process?

Brian the Fist
11-17-2002, 11:11 PM
Thousands of random numbers are chosen for EVERY structure that is built, yes. However, if you look up in the Knuth book, that number generation method has a very large period so that it is not likely it would 'repeat' if you left your computer running for several years even.

PinHead
11-18-2002, 07:51 AM
Thanks to all for the input!:cool:

I think I have a grasp now for how difficult it would be to duplicate work.

MAD-ness
11-18-2002, 02:16 PM
Damn you Howard.

I think this is the last straw. I am finally going to get that 3 volume series by Knuth that I have been eyeing for years.

I can feel my bank account decreasing by ~$130 already. ;)

Pascal
11-25-2002, 04:17 PM
Howard, tell me one thing.
Will there be any change in the used RNGs after Dec, 8th?

I've already read, that the client will work faster. Will we still have 5 - 10 percent duplicated data?

Just tell me something more, perhaps also about the coming protein, would be nice to know.. ;)

KWSN_Millennium2001Guy
11-25-2002, 04:57 PM
Originally posted by Pascal
Just tell me something more, perhaps also about the coming protein, would be nice to know.. ;)

If I read the DF news correctly, the next protein will be the same one we are doing now, with a different calculation engine.

http://www.distributedfolding.org/news.html

Ni! :)

RaginSteveK
12-09-2002, 06:40 AM
the text at DF says the new mechanism should be faster, with lower RMS
[closer to something real? for an arbitrary length ],
but with sloppier structures-
meaning simply sloppier, graphically ?? or else I dont understand..

not that my understanding is critical to anything here, but you know-- we all have our illusions ....:confused: ;)

[ havent improved my pesonal RMS since practically day 1 of these longer -to -generate protein simulations .. a little disconcerting ]

Brian the Fist
12-09-2002, 10:31 AM
sloppier means sloppier geometrically and sterically - they may have some slightly overlapping atoms, etc. which can be corrected for later on once the good structures are identified.

Paratima
12-09-2002, 11:34 AM
Before anyone asks... :D

steric, sterical: Of or relating to the spatial arrangement of atoms in a molecule.

MAD-ness
12-10-2002, 01:29 AM
Paratima: thanks. :)

Microbe
12-17-2002, 01:37 AM
Brian gets a 2.18?

A little bit of insider folding???

:cheers: :|party|: :thumbs: ;) :crazy:

m0ti
12-17-2002, 01:56 AM
2.18?????

Whoa!

Definitely looks like they're doing something right.

I think that's the best RMS we've ever had in the project for ANY protein.

bwkaz
12-17-2002, 12:34 PM
Assuming it's valid. There have been some 0A structures before, I think, that were invalid results...

m0ti
12-17-2002, 01:18 PM
He's down to a 1.93 now.

/dancing: "How low can he go... how low can he go..."

Brian the Fist
12-17-2002, 01:54 PM
False alarms folks. Been testing the new Windows screensaver. Apparently it still has some bugs that need to be worked out...

m0ti
12-17-2002, 02:37 PM
Sorry to hear that it's a false alarm... was about 90% sure it was going to be one anyway. Oh well. /shrugs.