Beta 2 now available - beta testers please update

**m0ti** · 02-26-2003, 12:20 AM

WinXP Pro,
CLI Client default switches:

ERROR: [777.000] {ncbi_http_connector.c, line 217} [HTTP] Error writing body at offset 8192

ERROR: [777.000] {ncbi_http_connector.c, line 117} [HTTP] Retry attempt(s) exhausted, giving up
ERROR: [000.000] {foldtrajlite2.c, line 3618} Error during upload: NO RESPONSE FROM SERVER - WILL TRY AGAIN LATER
ERROR: [777.000] {ncbi_http_connector.c, line 217} [HTTP] Error writing body at offset 8192
ERROR: [777.000] {ncbi_http_connector.c, line 117} [HTTP] Retry attempt(s) exhausted, giving up
ERROR: [000.000] {foldtrajlite2.c, line 3618} Error during upload: NO RESPONSE FROM SERVER - WILL TRY AGAIN LATER
FATAL ERROR: [012.002] {trajtools.c, line 2377} Attempt to insert duplicate residue number into database

The fatal error occurred during Trajectory Distribution.

**m0ti** · 02-26-2003, 12:23 AM

Originally posted by Brian the Fist
Sounds like a full disk (or no permission to write). It cannot recover from that sort of error for obvious reasons. (Probably your /tmp partition). This error is directly from a failed fwrite (i.e. not all elements were written) and so is pretty straight-forward.

Disk is not full (all partitions have free space) and all permissions are there. This happened after leaving the client on the entire day.

perhaps it couldn't upload to the server for a while so the buffer on disk got full?

Or is the buffer now infinitely large?

**m0ti** · 02-26-2003, 12:52 AM

Originally posted by Brian the Fist
The point system SHOULD be in place right now. Please test this for us too You should, I believe, get 5000 points for gen 0 (but this may be changed to 200), and for gen. x you should get 200*sqrt(x) points (ok, whip out those calculators). If this is NOT the case please let me know and I'll check it out.

Aside from this, I've gone through the whole thread and identified 7 bugs and 7 features (including stuff Chris and I have decided to add) which I will now fix and/or shove into the next beta, which I should hopefully have ready later this week. Any further betas after this will likely be to play with parameters like size of and number of generations to optimize those a bit more but I think you've all done a really great job at nailing all the bugs and even potential bugs. You found some things I really didn't expect with such a relatively small testing group (under 100 of you anyways).

Unless you find another new bug or have an important suggestion/feature to add which hasn't been already approximately mentioned in this thread, lets pause it here for now and I will get these changes done ASAP. With the next beta I may also release the screensaver, and hopefully a few of you will be willing to test that out as well just to make sure there's nothing quirky specific to it (but remember its all really the same code so most things should work the same in the screensaver as the text client in general).

Thanks All!

I think there's a definite advantage to the Gen 0 group for sure. Gen 0 certainly doesn't take more time than any of the other generations (if anything it takes less), but even gen 50 is only worth 200 * sqrt(50) = 1414.

Perhaps gen 0 should be aggressively scaled down? I'm thinking like a 100:1 ratio; make it worth 50 points. Basically, ensure that the higher generations are attractive enough to make people not want to stick around at gen 0.

**tpdooley** · 02-26-2003, 06:39 AM

for the final release.. it'll be gen0=5k, and later generations will be 5k*sqrt(gen)?

**FoBoT** · 02-26-2003, 10:54 AM

this is why starting with new stats for phase II would be an advantage

if you continue to add to the old stats, you have two problems to deal with

1- making the stats of phase II comparable to phase I
2- making the inter generational stats of phase II structured in a manner to encourage people to run it through a complete sequence that most benefits the science vs. manually manipulating the process to squeeze out a few extra points (and thus the science suffers from this manipulation)

doing both of these simultaneously will be challenging.
if you eliminate #1, you can concentrate on #2, which i think is more relevent to the current active participants

as m0ti points out, the points awarded for the generations need to be weighted to give incentive to letting the process run through its "natural" course, if this isn't the case, people will come up with a way (scripts, .bat files, 3rd party apps, etc) , to start/stop the client in a manner that is advantagous to gaining more points, not more scientific data

have a nice day!

**AMD_is_logical** · 02-26-2003, 12:14 PM

Originally posted by FoBoT
this is why starting with new stats for phase II would be an advantage

if you continue to add to the old stats, you have two problems to deal with

1- making the stats of phase II comparable to phase I
2- making the inter generational stats of phase II structured in a manner to encourage people to run it through a complete sequence that most benefits the science vs. manually manipulating the process to squeeze out a few extra points (and thus the science suffers from this manipulation)

doing both of these simultaneously will be challenging.

I disagree. I don't see any problem here. First adjust the relative scores for the various generations to accomplish (2), then scale everything to accomplish (1) using the assumption that most people will be running through all 50 generations.

For each new protein the overall scaling could be adjusted to give comparable credit for a given amount of CPU time.

**Brian the Fist** · 02-26-2003, 05:26 PM

Lets not start arguing about stats again now. Somebody just please confirm whether you are being credited the proper amounts, as stated in the formula I gave earlier. This formula may change once the beta becomes non-beta.

**mighty** · 02-26-2003, 06:59 PM

Originally posted by Brian the Fist
Lets not start arguing about stats again now. Somebody just please confirm whether you are being credited the proper amounts, as stated in the formula I gave earlier. This formula may change once the beta becomes non-beta.

I don't think the points are awarded as described. I just uploaded 87 generations and tried to update my stat-page multiple times during this upload. I kept getting points in rather round and neat intervals - like 200, 400 or 800 pr. generation, but if its 200*sqrt(X) then there should be some not-so-round numbers in between, like gen. 50 should be 1414.

Of course this could easily be explained if you're doing some rounding up or down...

**AMD_is_logical** · 02-26-2003, 08:08 PM

Originally posted by Brian the Fist
Lets not start arguing about stats again now. Somebody just please confirm whether you are being credited the proper amounts, as stated in the formula I gave earlier. This formula may change once the beta becomes non-beta.

I made a new account, and got the following:

gen 0 - 5000
gen 1 - 5200
gen 2 - 5400
gen 3 - 5600

So the 0'th gen gives 5000, and the rest give 200 each, with no sign of a sqrt(x).

Also, all numbers on the stats page seem to be multiples of 200.

**Insidious** · 02-26-2003, 08:17 PM

wouldn't it make more sense to let it 200 * x instead of using the square root of x?

I mean, the square-root of 50 is only about 7 or so.
Stat whores will VERY quickly realize 7 gen 0 calculations takes
MUCH less time than 50 generations.

-Sid

**Scotttheking** · 02-26-2003, 08:37 PM

When is the OSX Beta coming?

**Aegion** · 02-26-2003, 08:39 PM

Ok, I believe I have now discovered another bug, and I'm trying to figure out which type it is. Does anyone know if a val file should be produced when quitting halfway through the first structure generated in a set? I quit in this situation, and when I restarted, the structure got stuck in the first structure of a generation set for over an hour. When I checked the files, the val file listed in the filelist is missing.

**AMD_is_logical** · 02-26-2003, 09:41 PM

Originally posted by Insidious
wouldn't it make more sense to let it 200 * x instead of using the square root of x?

I mean, the square-root of 50 is only about 7 or so.
Stat whores will VERY quickly realize 7 gen 0 calculations takes
MUCH less time than 50 generations.

-Sid

You're overlooking two things. First, the amount you get for the 0'th generation will be reduced until it gives fewer stat points per CPU cycle than other generations.

Second, the 200 * sqrt(50) is for generation 50, NOT for all 50 generations. By the time you reach generation 50 you will have already gotten points for each and every generation from 0 to 49.

**Brian the Fist** · 02-26-2003, 09:47 PM

Thanks for pointing out the scoring bug, I suspected it was in error.

Any other comments or questions about scoring for the beta will be officially ignored. You will just have to trust us to make it fair.

**Brian the Fist** · 02-26-2003, 09:48 PM

Originally posted by Aegion
Ok, I believe I have now discovered another bug, and I'm trying to figure out which type it is. Does anyone know if a val file should be produced when quitting halfway through the first structure generated in a set? I quit in this situation, and when I restarted, the structure got stuck in the first structure of a generation set for over an hour. When I checked the files, the val file listed in the filelist is missing.

If it has not completed one structure in a generation, there should be no .val file. The .val IS the structure after all. No structure, no .val. Partial structures are not (cannot) be saved.

**Aegion** · 02-26-2003, 09:51 PM

Originally posted by Brian the Fist
If it has not completed one structure in a generation, there should be no .val file. The .val IS the structure after all. No structure, no .val. Partial structures are not (cannot) be saved.

In that case, I've definately found an instance I can duplicate where during the crunching of a single structure, it gets stuck at the same point for over at least an hour. Should I email the pertinent files for your to examine?

**Brian the Fist** · 02-26-2003, 10:54 PM

Originally posted by Aegion
In that case, I've definately found an instance I can duplicate where during the crunching of a single structure, it gets stuck at the same point for over at least an hour. Should I email the pertinent files for your to examine?

No, but please clarify exactly what you are talking about. What options (flags) did you run with, and exactly what did you observe that appears to be wrong. How do you know it gets stuck at the same point? What generation is it at? And getting stuck is not a bug, remember? What is it that you think is wrong here exactly...

**Aegion** · 02-26-2003, 11:14 PM

Originally posted by Brian the Fist
No, but please clarify exactly what you are talking about. What options (flags) did you run with, and exactly what did you observe that appears to be wrong. How do you know it gets stuck at the same point? What generation is it at? And getting stuck is not a bug, remember? What is it that you think is wrong here exactly...

I'm watching for starters. Just to be clear, it is getting stuck at 72-73 on #1 generation 28 for my set. When I let it run, it definately does not eventually move forward, but stays stuck perpetually in the exact same place for over at least an hour. (It does occassionally vary the number slightly but always remains suck, it appears it might get stuck in the high 60's instead of the 70's sometimes.) I'm running an Athlon 2000+ system without anything else cpu intensive running so speeds not the issue here. I have been able to verify it does NOT eventually move to #2 generation 28 after running the structure for over an hour in each case. It stays stuck at exactly the same position. Its possible after several hours it might move to the next structure, but it definately takes a vastly longer time than it should. This same behavior occurs when I use the q option to cause the client to close and then reload it.

I'm running it on a Windows XP system. My settings are .\foldtrajlite -f protein -n native -qf -df -it -rt

My filelist displays the following:
.\fold_1_7vshcwgg_0_7vshcwgg_protein_27.log.bz2
.\7vshcwgg_1_7vshcwgg_protein_27_0000005.val
CurrentStruc 1 1 123 28 1 0 10000000.000

edit: The structure does try outright resetting from time to time, but it always gets stuck at the same point.

update: The structure did finally move on to the next one after a couple of hours. I can still send the file with the structure to you so that you can examine its behavoir since I backed it up in a seperate location. The current structures are also unfortunately exibiting similar behavoir.

**Brian the Fist** · 02-27-2003, 01:03 PM

This is normal behaviour Aegion, nothing wrong here. You may not like it getting stuck for so long, but it can happen. I may fiddle with this a bit still before the final release but it will never go away completely. In the long run everything will balance out though.

**Aegion** · 02-27-2003, 01:09 PM

Originally posted by Brian the Fist
This is normal behaviour Aegion, nothing wrong here. You may not like it getting stuck for so long, but it can happen. I may fiddle with this a bit still before the final release but it will never go away completely. In the long run everything will balance out though.

Ok, I do have to wonder if it is somewhat counterproductive to allow the client to expend so much time on a single structure.

**Brian the Fist** · 02-27-2003, 06:07 PM

Originally posted by Aegion
Ok, I do have to wonder if it is somewhat counterproductive to allow the client to expend so much time on a single structure.

That is a matter for study and research, which we have performed, and not something which can just be decided on a whim, or even intuition, unfortunately. You'll just have to trust that we know what we are doing

**m0ti** · 02-27-2003, 06:42 PM

Originally posted by Brian the Fist
That is a matter for study and research, which we have performed, and not something which can just be decided on a whim, or even intuition, unfortunately. You'll just have to trust that we know what we are doing

I think that looking at the results the beta has produced so far justifies the time spent per fold. There may be more efficient ways of balancing things, but I'm sure that Howard and Dr. Hogue have taken a good look at it; after all we're after top-notch structures in possible narrow valleys, which are very compact... it can take a lot of folding time to get to them.

Just to point out how good a job the new algorithm has done:

we've got less than 100 users and we've done some 15 Million folds in about a week. The entire top 10 folds are better than the best fold we found in 11 Billion folds under the old algorithm in 3 weeks. Yes, the new algorithm is slower, but it produces much higher quality folds. I'm eagerly awaiting the release of the final beta and the algorithm then going into general use.

**Aegion** · 02-27-2003, 06:49 PM

Originally posted by m0ti
I think that looking at the results the beta has produced so far justifies the time spent per fold. There may be more efficient ways of balancing things, but I'm sure that Howard and Dr. Hogue have taken a good look at it; after all we're after top-notch structures in possible narrow valleys, which are very compact... it can take a lot of folding time to get to them.

Just to point out how good a job the new algorithm has done:

we've got less than 100 users and we've done some 15 Million folds in about a week. The entire top 10 folds are better than the best fold we found in 11 Billion folds under the old algorithm in 3 weeks. Yes, the new algorithm is slower, but it produces much higher quality folds. I'm eagerly awaiting the release of the final beta and the algorithm then going into general use.

I certainly was not questioning the new algorithem in general, just a specific aspect of it. I definately recognize its overall potential and how it is improved over the old one. If Howard has carefully researched my issue and determined that the issue should remain as is, I'll take his word for it. I was was simply bringing up a possible area of concern.

**m0ti** · 02-28-2003, 10:54 AM

I got that write error again (this time at generation 43 - again during Trajectory distribution). This is highly annoying since the current line is completely lost. I had produced a very good fold (6.24 RMS) and would have liked to continue to generation 50 with it.

Any chance of doing a backup of the needed files before doing energy minimization and trajectory distribution? That way, in case of an error, it can resume by trying the energy min and traj distribution again instead of resetting to gen 0.

Again; this is not a write permission problem, and this is not a disk-space problem. Interestingly enough, this ONLY occurs during Trajectory Distribution.

**Brian the Fist** · 02-28-2003, 01:09 PM

Originally posted by m0ti

we've got less than 100 users and we've done some 15 Million folds in about a week. The entire top 10 folds are better than the best fold we found in 11 Billion folds under the old algorithm in 3 weeks. Yes, the new algorithm is slower, but it produces much higher quality folds. I'm eagerly awaiting the release of the final beta and the algorithm then going into general use.

Actually, remember the server is counting 5000 for gen. 0 and 200 for gen 1+ when in reality you are submitting only 500 and 20 respectively. Thus although it says we've made 15 million or whatever, it is actually 1/10th of this. So we're another 10 times better than you thought

**Brian the Fist** · 02-28-2003, 01:11 PM

Originally posted by m0ti
I got that write error again (this time at generation 43 - again during Trajectory distribution). This is highly annoying since the current line is completely lost. I had produced a very good fold (6.24 RMS) and would have liked to continue to generation 50 with it.

Any chance of doing a backup of the needed files before doing energy minimization and trajectory distribution? That way, in case of an error, it can resume by trying the energy min and traj distribution again instead of resetting to gen 0.

Again; this is not a write permission problem, and this is not a disk-space problem. Interestingly enough, this ONLY occurs during Trajectory Distribution.

Has ANY other beta-tester received this same error (File Write Error)?? I am still not convinced it is a bug and I am 99.999% certain it is a problem with writing to your TEMP dir/partition. Maybe it is NFS mounted or something weird like that if it is not full. If not, please give me a detailed description of your OS, your partitions/drive schemes, which filesystems are remote, and the value of your TEMP environment variable or any other temp directory indicators and the free space on each of them.

**TEN-Catdaddy63** · 02-28-2003, 01:50 PM

I've been running the BETA as a service on a C1200, 256mb PC133 machine with Windows 2000. Installed last Friday evening and it has completed 2 full generations and is working on gen 19 of number three. I have this machine set to useram=0 and have had absolutely no problems. I may try useram=1 over the weekend and see if that causes any issues. Nice job so far Howard, looking very good!

**KWSN_Millennium2001Guy** · 02-28-2003, 05:28 PM

I was experiencing the same write error on one machine. It turned out that there was only 4 or 5 megabytes free on the drive where the temp directory was located because the user had set IE to cache 10 gigs of history.

I deleted the IE file cache and the machine began running again.

Ni!

**Brian the Fist** · 02-28-2003, 05:57 PM

Beta 2 has now been 'terminated' Please see the thread on Beta 3 to continue beta testing. Thanks.

Thread: Beta 2 now available - beta testers please update

Thread Tools

Rate This Thread

Display

Posting Permissions