Scott Jensen
05-03-2002, 02:46 AM
To dramatically cut down on server load and thus also the number, power, and expense of servers for DF, why not have our client programs only contact the server when they have better (smaller) RMSD structures than the current smallest one?
In this scenario, the client program crunches away as it currently does, but either after each structure it constructs or after so many structures (even 5,000), it compares their RMSD ratings against the current smallest structure for that protein. If none of the structures it has created beats the current smallest RMSD structure (or RMSD score) that it has on file, it simply trashes all of them, starts creating new ones, and doesn't bother the DF server. If it has one that beats the current title holder, it contacts the server and reports it, trashes the rest, and goes back to trying to create an even smaller RMSD one. And if it does report in a smaller RMSD structure than the one it has on file but the one it reports in doesn't beat the current smallest RMSD structure, the server updates the client program as to what is the current smallest RMSD structure (or possibly just RMSD score) so the client program doesn't bother it until it has a better one. If the client program does beat the current smallest RMSD structure, the volunteer gets credit and their name goes up in lights on the smallest structure board.
And to even cut down on server load further, when a new protein is selected, simply set the bar low to begin with. Some RMSD number that Dr. Hogue is sure that will be surpassed but which should take a good deal of time (folds) before it is. For example, RMSD of, say, 5.5 for one like the one we're currently working on. This way there's no initial flood of clients submitting in larger structures that are expected to very likely be surpassed.
To prevent a client program from never moving onto the next protein because it never beat the current smallest RMSD it had on file, have the client program "ping" the server ... hmmm ... once a week? Hmmm. Once every 100,000 folds? If once a week, when the client pings in, it would inform the server how many folds it has done since the last ping-in so that score could be added to the folder's stats and, saying there's any, then get the newest (updated) versions of itself (client program). If once every 100K folds, even less information would need to be given to the server thus less server time taken up and load stress on the server ... though it would still receive any updated versions of itself if there's any at that ping-in time. At these ping-ins, the server would ping back the current smallest RMSD score and, when it's getting close to being done with a protein, instruct the client program to ping back in a shorter time span (or fewer folds) so when roll-over occurs for the next protein, it isn't still working on the old one too long. Hmmm. Then again, to reduce down server traffic, perhaps the server should NOT instruct the client to ping-in in a shorter time (or fewer folds) when the current protein worked on is nearing its folds goal. The logic being that there really wouldn't be any harm in letting the client continue on since there's a chance it might just come up with a still smaller RMSD structure (which would be great if it did), the volunteer will still get credit for folds done (even though they've exceeded that protein's fold goal), and no special pinging instructions would need to be written up for the server and client to shorten the client's ping-ins at some point ... as well as less of a flood of client programs pinging in to get the next protein to be worked on.
If the above were done, DF webmaster should probably then replace "Best structures generated to date" roster with "Milestones in structure creation" (MISC). MISC would chronicle each folder that beat the then-current smallest RMSD structure. MISC would give not only the folder's name, organization name, team name, and "Best RMS in A" score, but also the date when they achieved it. The first score on this milestone chart could be a sort of "Dr. Hogue's RMSD Challenge" which would be what the client programs would have on file that would come with each new protein to be worked on and needed to be beat before reporting in a structure.
Now I could see what might kill the above idea is if the RMSD scoring program is a big program that would take up too much space on people's computers. If it isn't, I don't see any other problem with the above idea. Or am I missing something ... again?
In this scenario, the client program crunches away as it currently does, but either after each structure it constructs or after so many structures (even 5,000), it compares their RMSD ratings against the current smallest structure for that protein. If none of the structures it has created beats the current smallest RMSD structure (or RMSD score) that it has on file, it simply trashes all of them, starts creating new ones, and doesn't bother the DF server. If it has one that beats the current title holder, it contacts the server and reports it, trashes the rest, and goes back to trying to create an even smaller RMSD one. And if it does report in a smaller RMSD structure than the one it has on file but the one it reports in doesn't beat the current smallest RMSD structure, the server updates the client program as to what is the current smallest RMSD structure (or possibly just RMSD score) so the client program doesn't bother it until it has a better one. If the client program does beat the current smallest RMSD structure, the volunteer gets credit and their name goes up in lights on the smallest structure board.
And to even cut down on server load further, when a new protein is selected, simply set the bar low to begin with. Some RMSD number that Dr. Hogue is sure that will be surpassed but which should take a good deal of time (folds) before it is. For example, RMSD of, say, 5.5 for one like the one we're currently working on. This way there's no initial flood of clients submitting in larger structures that are expected to very likely be surpassed.
To prevent a client program from never moving onto the next protein because it never beat the current smallest RMSD it had on file, have the client program "ping" the server ... hmmm ... once a week? Hmmm. Once every 100,000 folds? If once a week, when the client pings in, it would inform the server how many folds it has done since the last ping-in so that score could be added to the folder's stats and, saying there's any, then get the newest (updated) versions of itself (client program). If once every 100K folds, even less information would need to be given to the server thus less server time taken up and load stress on the server ... though it would still receive any updated versions of itself if there's any at that ping-in time. At these ping-ins, the server would ping back the current smallest RMSD score and, when it's getting close to being done with a protein, instruct the client program to ping back in a shorter time span (or fewer folds) so when roll-over occurs for the next protein, it isn't still working on the old one too long. Hmmm. Then again, to reduce down server traffic, perhaps the server should NOT instruct the client to ping-in in a shorter time (or fewer folds) when the current protein worked on is nearing its folds goal. The logic being that there really wouldn't be any harm in letting the client continue on since there's a chance it might just come up with a still smaller RMSD structure (which would be great if it did), the volunteer will still get credit for folds done (even though they've exceeded that protein's fold goal), and no special pinging instructions would need to be written up for the server and client to shorten the client's ping-ins at some point ... as well as less of a flood of client programs pinging in to get the next protein to be worked on.
If the above were done, DF webmaster should probably then replace "Best structures generated to date" roster with "Milestones in structure creation" (MISC). MISC would chronicle each folder that beat the then-current smallest RMSD structure. MISC would give not only the folder's name, organization name, team name, and "Best RMS in A" score, but also the date when they achieved it. The first score on this milestone chart could be a sort of "Dr. Hogue's RMSD Challenge" which would be what the client programs would have on file that would come with each new protein to be worked on and needed to be beat before reporting in a structure.
Now I could see what might kill the above idea is if the RMSD scoring program is a big program that would take up too much space on people's computers. If it isn't, I don't see any other problem with the above idea. Or am I missing something ... again?