Next test - Beta 5 [Archive]

Brian the Fist

03-25-2003, 02:12 PM

I've updated the beta files again, they are in the same place. Aside from fixing a few bugs you've managed to still find, I've modified the algorithms as per some of our discussions from beta 4. (Let me know if the CPU is still being taken up during energy minimization like it was for some people before) It should now complete 250 generations in under a week on most machines but we'll have to try to find out. We may make it give up when it gets stuck even faster depending on how this goes. Also, AMD_is_logical's trick will no longer work..

Lastly we've changed it a bit in an attempt to keep helices from unfolding which we experience with beta 4. The fix applied may or may not work so if this fails we have a backup plan (for test # 6) which will almost certainly do the job. This is what was preventing us from getting too close to the correct structure (below 5A).

We intend to run beta 5 for about 1 week and then try the other method for keeping helices after that for another week or so. We are nearing the final release of the new approach however, just a few more tests to tweak the algorithm before we unleash it. Thanks again to all the beta testers for your input and help.

I am going to wipe the beta stats shortly, so please do not start running the new beta until you see the stats have been reset on the beta web site (or else you'll have to start over again anyways).

m0ti

03-25-2003, 03:40 PM

Is the link up?

I don't seem to be having any success downloading it!

Welnic

03-25-2003, 03:43 PM

Originally posted by m0ti
Is the link up?

I don't seem to be having any success downloading it!

I had no trouble grabbing the linux client.

m0ti

03-25-2003, 03:48 PM

I've always had problems downloading from df.org for some reason.

Any chance of someone hosting the windows CLI at another site?

Digital Parasite

03-25-2003, 03:53 PM

For those of you who don't want to dig:

Linux:
ftp://ftp.mshri.on.ca/pub/distribfold/download/distribfold-beta-linux-i386.tar.gz

Windows text client:
ftp://ftp.mshri.on.ca/pub/distribfold/download/distribfold-beta-win9x.zip

Windows screensaver:
ftp://ftp.mshri.on.ca/pub/distribfold/download/distribfoldss-beta-win9x.zip

m0ti

03-25-2003, 04:03 PM

Just figured out what the problem was; for some reason my server stopped servicing ftp requests properly. Did a restart and it was all ok.

Hmm, maybe in the future they could also be accessible by http? Not that I expect special treatment or anything. :D

arjanscholl

03-25-2003, 04:18 PM

Hello there, just a question for the maker of the DF gui, don't know if anyone else knows the answer. There are 3 nice bars in the latest beta version of the gui, with the words 'Structure laxness levels' above it. But what the hell do these bars represent? :D

Digital Parasite

03-25-2003, 04:38 PM

Originally posted by arjanscholl
Hello there, just a question for the maker of the DF gui, don't know if anyone else knows the answer. There are 3 nice bars in the latest beta version of the gui, with the words 'Structure laxness levels' above it. But what the hell do these bars represent? :D

Suprisingly enough, they represent the laxness levels of the structure being built. ;)

In the current beta, structures can get "stuck" more easily than in the previous algorithm of the client. If this happens too many times, the software is being too strict on how the protein is folded and the 3 parameters that control this are relaxed. The higher the bar graphs are in the GUI, the more relaxed the settings are for folding the structure. Howard, feel free to correct me if what I explained is not correct.

Jeff.

bwkaz

03-25-2003, 06:29 PM

I don't know how much this will screw up dfGUI for Windows (Jeff, care to comment? if it screws your code up then it's probably not a good idea, since so many more people use the Windows version), but the Linux version could really use a feature with regard to the progress updates.

Just before starting energy minimization (which appears to be running at normal priority for me, on Linux at least), could you do one final write of the progress.txt (and filelist.txt might be a good idea too) file, using the same format as now, just something like:

building structure 50 generation 5
0 until next generation
x generations buffered
Best Energy so far: x.xxx or similar?

The benchmark info is about all that would need this. Right now, seeing as the progress.txt file resets the structure count at each new generation (as opposed to before, where the only time it reset was when you stopped the client), my old benchmark code was very confused. The fix was ugly, though, and requires that I try to figure out when a generation is over. If I miss, the bench data goes negative.

Which wouldn't be a problem either, if the client kept on schedule with its progress.txt updates. I'm using the default (perhaps that's part of the problem?) -g value, and on half the generations, partway though, the client writes progress.txt after doing 3 or 4 structs instead of 5. Perhaps that's a bug, but I'd rather have dfGUI work regardless of what the user's -g setting was. A final write of progress.txt would accomplish that.

Obviously that'd be something for the next beta (or whatever).

AMD_is_logical

03-25-2003, 10:50 PM

I checked this forum tonight and saw that beta5 was out. I then discovered that my nodes had been crunching with beta4a for several hours after the stats reset, and that work from two of the nodes was being accepted by the server. That's why I already have a gen 75 6.22A and a gen 40 6.68A structure on the stats page.

Sorry about that. :D

I've now removed all old work from my nodes and installed beta5.

Digital Parasite

03-26-2003, 06:59 AM

I have been running beta5 over night and it sure is much faster. I am already at generation 41 on one CPU and 36 on the other. My average generation time is 20 minutes where it used to be over an hour for beta4.

bwkaz: Updating the progress.txt file/filelist.txt won't screw anything up in dfGUI. What I am doing is storing the current generation # (I also need that to for timing the previous and average generation times). When I read the progress.txt file, if the generation I read is different from the one I have stored in memory, I know it is a new generation and I can restart the benchmark.

Jeff.

bwkaz

03-26-2003, 09:43 AM

Originally posted by Digital Parasite
I have been running beta5 over night and it sure is much faster. I am already at generation 41 on one CPU and 36 on the other. My average generation time is 20 minutes where it used to be over an hour for beta4. Yes, decidedly faster. I'm at gen 36 now, where after a day of crunching beta 4, I was at like gen 3 on the same machine. Although it was sharing the CPU with the non-beta client at the time (and isn't now), so that probably has a bit to do with it.

bwkaz: Updating the progress.txt file/filelist.txt won't screw anything up in dfGUI. What I am doing is storing the current generation # (I also need that to for timing the previous and average generation times). When I read the progress.txt file, if the generation I read is different from the one I have stored in memory, I know it is a new generation and I can restart the benchmark. OK, cool. Now that I think about it, I'm going to have to do the same thing anyway, so that I can get the generation time as well (haven't had time to work on getting the latest Windows features ported over -- the vertical progress bars are taking all of it, since Qt has no such thing as a "vertical progress bar").

So there is a fallback if for whatever reason it's decided that another write of progress/filelist is a bad idea. Good. :)

Brian the Fist

03-26-2003, 12:11 PM

You can now make your own plots, like those computed for the top 10 on the beta. I have posted the software package 'AnalyzeMovie'. Go to http://bioinfo.mshri.on.ca/trades/ and scroll down near the bottom to get it. Read the enclosed readme for details on how it works and how to use it. You can log in and download your 'best movie' and then run this program on it to generate the graphs and so on for it. Be warned that for a big movie (250 generations) it may use a fair bit of RAM and make take five minutes or more if you've got a slow (say 500 MHz) computer!

Have fun!

Digital Parasite

03-26-2003, 12:47 PM

Hi Howard,

I tried the Analyze Movie (Win2k) program but I couldn't get it to work.

I downloaded my best structure movie and the native structure and use this as my command line:
analyzeMovie -f besstruct.val -g native.val

But it doesn't load and I get this error:
[NULL_Caption] FATAL ERROR: [067:001] FindPath failed in LoadDict
Hit Return

Then a dialog box opens and says "Abrupt: code = 1".

Any idea what I am doing wrong? The other options appear to be optional and have default settings.

Jeff.

shortfinal

03-26-2003, 01:02 PM

Howard,

I downloaded the Windows Beta5 text client and tried it watching the thread priorities w/ TaskInfo. The priorities are now correct during minimizining/Trajectory Dist. Also tried changing the priority (-p option) and the thread priorities changed accordingly.

Shortfinal

Guff®

03-26-2003, 07:21 PM

This one is looking really good.
I have no problems with it on any systems so far, fingers crossed.
The "Stuck-O-Meter" is a nice addition, allowing users to see that it's trying to work itself out of a jam.
As we say while standing around the grill for the steaks to cook, "Mine's done!" :)

Brian the Roman

03-27-2003, 06:41 AM

Howard;
I understand that once we go live we will likely use crease energy to choose the best conformation of each gen. I was thinking it would be interesting to do this but still calculate the rmsd (when possible) and graph it. That way we'd get a better understanding of the relationship between them.

ms

Digital Parasite

03-27-2003, 07:22 AM

It is interesting to see how different two clients act because of the random sampling at the beginning and how the folds proceed. With beta5, I started two clients from scratch on my Dual MP-2600+ machine, both have been running for the exact same amount of time and now after 1.5 days one is on generation 122 and has a best RMSD of 7.118 and the other is only at generation 92 but has a best RMSD of 6.389.

Jeff.

Brian the Roman

03-27-2003, 07:38 AM

Howard;
question on how the list of the top 10 best structures works. It used to be only one entry per userid would ever show up, and then, I understand, you changed it to show all the best no matter who did them. I can see that that is working somewhat since I can see Guff has multiple entries in the top 10. However, I currently have one entry in the top 10 at 5.96. An hour ago I was also in the top 10 but it was showing my earlier 6.02 fold. Why isn't that fold showing up too, when the worst structure in the top 10 now is 6.11?

ms

Digital Parasite

03-27-2003, 07:57 AM

Don't forget everyone, if you want a version of dfGUI that works with the current DF beta client, you can download it from here:
http://gilchrist.ca/jeff/dfGUI/dfGUIv22beta.zip

It is still v2.2beta2 so if you already have that version, there is nothing new, I'm just reposting the link in this beta5 thread.

Jeff.

pointwood

03-27-2003, 08:24 AM

I can't download either :(

EDIT: now it works.

Brian the Fist

03-27-2003, 10:31 AM

Originally posted by Digital Parasite
Hi Howard,

I tried the Analyze Movie (Win2k) program but I couldn't get it to work.

I downloaded my best structure movie and the native structure and use this as my command line:
analyzeMovie -f besstruct.val -g native.val

But it doesn't load and I get this error:
[NULL_Caption] FATAL ERROR: [067:001] FindPath failed in LoadDict
Hit Return

Then a dialog box opens and says "Abrupt: code = 1".

Any idea what I am doing wrong? The other options appear to be optional and have default settings.

Jeff.

Do you have 'bstdt.val' in the same directory as the executable, and are you running it from a DOS prompt in the directory you unzipped it to?

Digital Parasite

03-27-2003, 10:35 AM

Originally posted by Brian the Fist
Do you have 'bstdt.val' in the same directory as the executable, and are you running it from a DOS prompt in the directory you unzipped it to?

Yes, and yes. All the files (including the beststruct.val and native.val) are in the same directory and I am running it from a DOS prompt in that directory.

Jeff.

Brian the Fist

03-27-2003, 10:39 AM

Originally posted by Brian the Roman
Howard;
question on how the list of the top 10 best structures works. It used to be only one entry per userid would ever show up, and then, I understand, you changed it to show all the best no matter who did them. I can see that that is working somewhat since I can see Guff has multiple entries in the top 10. However, I currently have one entry in the top 10 at 5.96. An hour ago I was also in the top 10 but it was showing my earlier 6.02 fold. Why isn't that fold showing up too, when the worst structure in the top 10 now is 6.11?

ms
The stats look right to me. Remember they are only updated once per hour now while when you log-in it is real-time. I see you on the top ten now with a 5.86...

Brian the Fist

03-27-2003, 11:27 AM

Originally posted by Digital Parasite
Yes, and yes. All the files (including the beststruct.val and native.val) are in the same directory and I am running it from a DOS prompt in that directory.

Jeff.
I see now it is a bug in the NCBI toolkit. Anyways, the easiest way to fix it is to run it with '.\analyzeMovie etc. etc.' - put the .\ in front of the command. I'll add this to the documentation until the bug is more properly fixed.

Digital Parasite

03-27-2003, 11:58 AM

Originally posted by Brian the Fist
I see now it is a bug in the NCBI toolkit. Anyways, the easiest way to fix it is to run it with '.\analyzeMovie etc. etc.' - put the .\ in front of the command. I'll add this to the documentation until the bug is more properly fixed.

Everything seems to be working fine when I use '.\\' in front. Thanks.

Jeff.

Brian the Roman

03-27-2003, 03:18 PM

Originally posted by Brian the Fist
The stats look right to me. Remember they are only updated once per hour now while when you log-in it is real-time. I see you on the top ten now with a 5.86...

The point that I was trying to make is that one of my top 10 entries disappeared after a better one was added even though the original entry was still in the top 10 overall. I didn't think it should disappear.

As it happens, however, I now have 2 in the top ten but they're from different clients. It looks to me like only one entry per client is logged in the top 10 even if a single client generated many that should be in the list. I didn't think that you could tell the clients appart...

ms

AMD_is_logical

03-27-2003, 04:54 PM

Originally posted by Brian the Roman
As it happens, however, I now have 2 in the top ten but they're from different clients. It looks to me like only one entry per client is logged in the top 10 even if a single client generated many that should be in the list. I didn't think that you could tell the clients appart... A particular set of generations gets only one entry. Your client would need to finish all 250 generations and start a new set before it could get another entry.

pointwood

03-28-2003, 07:37 AM

I haven't followed the beta tests that close so I appologize if this has been discussed before. From progress.txt - Is this normal:

Building structure 1 generation 10
49 until next generation
0 generations buffered
Best Energy so far: 10000000.000 It seems like nothing new is happening.

Brian the Roman

03-28-2003, 08:19 AM

Looks to me like you're simply currently working on the first structure of the generation. Wait a bit and it will probably have moved on.

ms

Digital Parasite

03-28-2003, 09:09 AM

Originally posted by pointwood
It seems like nothing new is happening.

I find that the start of each new generation seems to take a lot of time before it get to the second structure but once it gets to the second, the rest of the structures in that generation go fairly quickly.

Jeff.

AMD_is_logical

03-28-2003, 02:00 PM

Although others are crunching faster, I find that I am crunching a LOT slower than when I was able to use my clock trick. Also, back then, I had half a dozen structures below 5.2A from just 4 nodes. I'm nowhere near that now.

The 5A wall looks as solid as ever.

Secondary structure still seems to evaporate. Perhaps the use of RMSD for scoring is actually hostile to such structure. I think we will need to test it using energy as a scoring function before we can evaluate how well secondary structure is being encouraged by the beta5 changes.

tpdooley

03-29-2003, 08:11 AM

In a little over a week with the last beta, our best score was 4.95A.. and we're already down to 5.10A.
If 4 machines running at 30-60x the speed of the opposition couldn't get a better score than the 40 other participants in the beta testing.. I'd think it wasn't the route to take...

Georgina

03-29-2003, 06:04 PM

Originally posted by tpdooley
In a little over a week with the last beta, our best score was 4.95A.. and we're already down to 5.10A.

Make that 4.86 :|party|:

G

arjanscholl

03-30-2003, 05:31 AM

Originally posted by Georgina
Make that 4.86 :|party|:

G

or a 4.62 ;)

bwkaz

03-30-2003, 08:20 AM

I see a 4.51, myself...

So much for that wall at 5, I guess. :p

Brian the Roman

03-30-2003, 10:04 AM

I hope I'm not beating a dead horse here, but...

I'm finding that some generations or conformations take MUCH longer than others. I haven't watched long enough to determine if it is every conformation in a specific gen or if it's only a couple.

When I noticed that I didn't seem to be making progress for several hours I went and looked at the client. I saw the in a tight spot message and it went up to just over 200K conformations. Then it went back to normal work but immediately did the tight spot routine again. This happened about 6 or 7 times while I was watching (about 15 or 20 minutes). During this entire period the residue # it was trying to place stayed in the 69 to 72 range, so it's not like it was making its way through. To spend 1/2 an hour on a single conformation seems a bit excessive, particularly since the end result was significantly worse than my best so far anyway...

Does the effort limiting process still need a bit of tweaking?

ms

Mikus

03-31-2003, 09:24 AM

Originally posted by Brian the Roman
Does the effort limiting process still need a bit of tweaking?

Looked at the display last evening, then again this morning -- the client seemed to not have gotten much further.

So looked at the timestamps of the accumulated .bz2 files, and saw that there was one generation that had taken 7 hours 51 minutes !! (This is on an AMD machine running around 900+ MHz)

mikus

Brian the Roman

03-31-2003, 09:51 AM

One of my clients, an Athlon XP1900 is averaging about 4 hours per generation. By my calculations that indicates 250 gens will take over 41 days.

ms

Digital Parasite

03-31-2003, 11:05 AM

The speed of the client seems to vary widely. I started two beta5 clients at the same time, each on an AMD MP-2600+ processor. They both were active for the same amount of time but one of them finished all 250 generations on Saturday and the other one is still on generation 240 and is running much slower.

The one that is already finished, generated a respectable 5.24 RMSD but the one that is take much longer to run is still in the 6.x range.

Jeff.

Georgina

03-31-2003, 06:30 PM

Originally posted by Digital Parasite
... They both were active for the same amount of time but one of them finished all 250 generations on Saturday and the other one is still on generation 240 and is running much slower...

Jeff.

I started Beta 5 on my Win2K server (with VERY light load) the same day it was released. It has a 1800+ AMD. As of today it is only on gen. 90

:shocked:

I guess it is running one of the slower versions !! A 2600+ shouldn't be that much faster.

G

bwkaz

03-31-2003, 07:44 PM

Yes, that does seem slow. On my XP1800+, which started the day that beta 5 started as well, it's on generation 149.

But then, I run Linux, too, so that might be part of it. I assume you're running the text client, right? Is it running quiet?

Georgina

03-31-2003, 10:42 PM

Originally posted by bwkaz
Yes, that does seem slow. On my XP1800+, which started the day that beta 5 started as well, it's on generation 149.

But then, I run Linux, too, so that might be part of it. I assume you're running the text client, right? Is it running quiet?

Yes, I am running the text client in default mode, whatever that is.

G

Scoofy12

03-31-2003, 11:07 PM

just fyi, default mode is not quiet. anyone have good data on how much difference that really makes?

Georgina

03-31-2003, 11:19 PM

Originally posted by Scoofy12
just fyi, default mode is not quiet...

Ok, what am I supposed to hear? :rolleyes:

My pain in the a$* coworker got me running this thing. Now that you mention it, I remember him saying something about switches - I guess I'm just having a blond moment :o

How do I get this thing running faster?

G

Paratima

03-31-2003, 11:52 PM

Originally posted by Georgina
How do I get this thing running faster? What all the world wants to know... :p

IronBits

04-01-2003, 12:31 AM

-rt - uses about 125mb of ram, but runs almost twice as fast!
-qt - quiet mode, no display output... 10%? increase in performance.
AMD - fastest processor you can afford :) (better than Intel performance in most projects)
DDR memory - heard it seems to give another significant boost...
Faster processor speed better than slower processor with higher FSB.
That should get a good debate going ;) :D
Unknown if all of the above is applicable to the beta client. YMMV

tpdooley

04-01-2003, 04:40 AM

I'm getting the error [NULL_Caption] ERROR: [001.023] Invalid PMMD given to GetRMSD.
Hit Return.
(repeat until annoyed.. ;) then kill the dos window.)

.\analyzemovie -f c:\temp\best2.val -g c:\temp\native.val
(renamed beststruc.val so it was easier to type).

I was just trying to see what generation I ended up on since the machine is at work, and I'm at home.
--------
Add my system to the list of those that ended up having difficulty getting to generation 250.
It's an Athlon axp 1800+ with 256megs of pc133 sdr running win98se.
For Beta 4, I spent a few days running on a 600Mhz Athlon, and then switched to the Athlon axp 1800+ and got to generation 250 and got to generation 80? of the next batch, before the beta 5 client was released.
(Ended up getting a 6.04 this time and a 6.02 last time.. so my scores haven't seemed to changed.)
------------------

Brian the Fist

04-01-2003, 12:27 PM

Originally posted by tpdooley
I'm getting the error [NULL_Caption] ERROR: [001.023] Invalid PMMD given to GetRMSD.
Hit Return.
(repeat until annoyed.. ;) then kill the dos window.)

.\analyzemovie -f c:\temp\best2.val -g c:\temp\native.val
(renamed beststruc.val so it was easier to type).

I was just trying to see what generation I ended up on since the machine is at work, and I'm at home.
------------------

The most likely problem here is your native.val is not the same protein as that in best2.val. Please ensure you get the two files from the same place (i.e. from the BETA server).

Brian the Fist

04-01-2003, 12:39 PM

For those who seem to think their client is running slower than 'normal' can a few of you post the 'currentstruc' line from your filelist.txt? Specify which are from 'slow' computers and which are from 'fast ones'.

Because the algorithm currently is highly dependent on the initial gen. 0 structure chosen, it is possible that the wide variance in time is mostly due to how 'lucky' you get in gen. 0. However it is not intended to work this way, all equal machines should finish 250 generations within say at most 20% variance. If I see the filelist.txt I may be able to see what is going on to cause this.

On the other hand, we have been running the beta on 'a lot' of identical machines, and they currently range from generation 44-91, with the majority being at about 60-70. This is about what I expect. So is it not possible that some of us are being just a 'wee bit' paranoid? :haddock: :bs:

Digital Parasite

04-01-2003, 02:49 PM

Originally posted by Brian the Fist
On the other hand, we have been running the beta on 'a lot' of identical machines, and they currently range from generation 44-91, with the majority being at about 60-70. This is about what I expect.

If clients running on identical hardware for the same amount of time, some being 50 generations apart from others is what you expect then I am not seeing anything different myself.

But the client on generation 91 is obviously processing structures faster than the one that is only on generation 44. If you are saying the clients shouldn't be more than 20% off, your own example is way over that. The client on gen 91 is 100% ahead of the client on 44.

Jeff.

Brian the Fist

04-01-2003, 02:55 PM

Originally posted by Digital Parasite
If clients running on identical hardware for the same amount of time, some being 50 generations apart from others is what you expect then I am not seeing anything different myself.

But the client on generation 91 is obviously processing structures faster than the one that is only on generation 44. If you are saying the clients shouldn't be more than 20% off, your own example is way over that. The client on gen 91 is 100% ahead of the client on 44.

Jeff.
Actually no its not. If you consider the 'normal distribution', in a cluster of our size, you expect a few nodes to be 3-4 standard deviations from the mean, while the majority will lie within 1 standard deviation of the mean (20%). Like I also said, most are between 60-70 and thus if you take a mean of 65, that means most are within 5 gens. of each other, or well less than 10%. So probably about 90% of the machines are within about 20% of gen. 65 (I didn't count exactly). If you haven't taken stats recently, just trust me on this ;)

Brian the Roman

04-01-2003, 03:32 PM

Howard;
how long do you expect 250 gens to take, on average, for your p3-450s?

Both my machines are way faster than that (xp1800, xp1900, both quiet mode and extra ram) and after 6.5 days on beta5 I'm at gen 230 and 148, respectively. Are these roughly performing as you'd expect? At the fast end, slow end or in the middle? If these machines are not on the slow end then I have more questions surounding ROI, but I'll wait for your answer before raising them.

If these are both on the slow end of the spectrum of what you'd expect, it would be interesting to see how many other people of the 40 participants are also on the slow end.

Digital Parasite

04-01-2003, 03:56 PM

Originally posted by Brian the Fist
Actually no its not. If you consider the 'normal distribution', in a cluster of our size, you expect a few nodes to be 3-4 standard deviations from the mean, while the majority will lie within 1 standard deviation of the mean (20%). Like I also said, most are between 60-70 and thus if you take a mean of 65, that means most are within 5 gens. of each other, or well less than 10%.

Oh, you meant of all your machines the majority shouldn't be more than 20% off. I thought you meant that taking any 2 machines of the same speed, they shouldn't be more than 20% off. In my sample size of 2 machines, I was in fact correct. ;)

So is SARS affecting your building at all? Are you actually in the hospital complex at all or in a different building completely so you have less to worry about?

Jeff.

Mikus

04-01-2003, 05:21 PM

My DF machine is on a local LAN. From time to time, I dial out on another machine and activate a proxy, to provide a path to the DF server.

Meaning most of the time that the DF machine wants to send results, it has no conection. This is what the "old client writes to ERROR.LOG :
ERROR: [000.000] {foldtrajlite.c, line 3691} Error during upload: NO RESPONSE
ERROR: [000.000] {foldtrajlite.c, line 1195} Tue Apr 1 08:17:37 2003
Unable to check server status

Contrast that with what the "beta client" writes to ERROR.LOG [truncated] :
ERROR: [777.000] {ncbi_socket.c, line 910} [SOCK::s_Connect] Failed pending c
ERROR: [777.000] {ncbi_connutil.c, line 526} [URL_Connect] Socket connect to
ERROR: [777.000] {ncbi_http_connector.c, line 117} [HTTP] Retry attempt(s) ex
ERROR: [777.000] {ncbi_socket.c, line 910} [SOCK::s_Connect] Failed pending c
ERROR: [777.000] {ncbi_connutil.c, line 526} [URL_Connect] Socket connect to
ERROR: [777.000] {ncbi_http_connector.c, line 117} [HTTP] Retry attempt(s) ex
ERROR: [000.000] {foldtrajlite2.c, line 1445} Tue Apr 1 08:13:14 2003
Unable to check server status
ERROR: [777.000] {ncbi_socket.c, line 910} [SOCK::s_Connect] Failed pending c
ERROR: [777.000] {ncbi_connutil.c, line 526} [URL_Connect] Socket connect to
ERROR: [777.000] {ncbi_http_connector.c, line 117} [HTTP] Retry attempt(s) ex
ERROR: [777.000] {ncbi_socket.c, line 910} [SOCK::s_Connect] Failed pending c
ERROR: [777.000] {ncbi_connutil.c, line 526} [URL_Connect] Socket connect to
ERROR: [777.000] {ncbi_http_connector.c, line 117} [HTTP] Retry attempt(s) ex
ERROR: [000.000] {foldtrajlite2.c, line 3882} Error during upload: NO RESPONSE

This verbiage makes it difficult to scan ERROR.LOG to see if any other kinds of errors are being reported. Let me suggest an optional "less verbose" mode that suppresses "could not connect socket" messages.

mikus

Brian the Fist

04-01-2003, 05:45 PM

Originally posted by Mikus
My DF machine is on a local LAN. From time to time, I dial out on another machine and activate a proxy, to provide a path to the DF server.

Meaning most of the time that the DF machine wants to send results, it has no conection. This is what the "old client writes to ERROR.LOG :
ERROR: [000.000] {foldtrajlite.c, line 3691} Error during upload: NO RESPONSE
ERROR: [000.000] {foldtrajlite.c, line 1195} Tue Apr 1 08:17:37 2003
Unable to check server status

Contrast that with what the "beta client" writes to ERROR.LOG [truncated] :
ERROR: [777.000] {ncbi_socket.c, line 910} [SOCK::s_Connect] Failed pending c
ERROR: [777.000] {ncbi_connutil.c, line 526} [URL_Connect] Socket connect to
ERROR: [777.000] {ncbi_http_connector.c, line 117} [HTTP] Retry attempt(s) ex
ERROR: [777.000] {ncbi_socket.c, line 910} [SOCK::s_Connect] Failed pending c
ERROR: [777.000] {ncbi_connutil.c, line 526} [URL_Connect] Socket connect to
ERROR: [777.000] {ncbi_http_connector.c, line 117} [HTTP] Retry attempt(s) ex
ERROR: [000.000] {foldtrajlite2.c, line 1445} Tue Apr 1 08:13:14 2003
Unable to check server status
ERROR: [777.000] {ncbi_socket.c, line 910} [SOCK::s_Connect] Failed pending c
ERROR: [777.000] {ncbi_connutil.c, line 526} [URL_Connect] Socket connect to
ERROR: [777.000] {ncbi_http_connector.c, line 117} [HTTP] Retry attempt(s) ex
ERROR: [777.000] {ncbi_socket.c, line 910} [SOCK::s_Connect] Failed pending c
ERROR: [777.000] {ncbi_connutil.c, line 526} [URL_Connect] Socket connect to
ERROR: [777.000] {ncbi_http_connector.c, line 117} [HTTP] Retry attempt(s) ex
ERROR: [000.000] {foldtrajlite2.c, line 3882} Error during upload: NO RESPONSE

This verbiage makes it difficult to scan ERROR.LOG to see if any other kinds of errors are being reported. Let me suggest an optional "less verbose" mode that suppresses "could not connect socket" messages.

mikus

If you are keen, you may note that the verbose messages come from the NCBI toolkit part of the code (hence the ncbi*.c) which is the part we did not write. Nevertheless we do have some limited control over the errors it outputs. I am not certain what changed to change this though, I think actually those verbose messages were always there. It's not a prime concern but if you bug me enough times and I have a bit of time I should be able to suppress those (mostly useless) error messages without modifying the NCBI toolkit itself.

Brian the Fist

04-01-2003, 05:51 PM

Originally posted by Digital Parasite

So is SARS affecting your building at all? Are you actually in the hospital complex at all or in a different building completely so you have less to worry about?

Jeff.
Yes, it is very affected. Our entire hospital as with all Toronto hospitals is majorly affected, to the point that we will not be showing up at work any time soon (but I can still work from home of course :) ). No research staff or students have been infected with SARS yet as far as I know, only nurses/hospital workers in direct contact with patients. But nevertheless in order to contain the virus anyone entering a hospital must fill out a questionaire, have their temperature taken, wear a mask at all times while inside the building, etc etc. (which is why we won't be showing up there). We hope the insanity and paranoia will die down in a couple of weeks...

I should be able to proceed with the protein updates and beta updates without going in though, I just need a bit of time to get it organized here, hence the delay in switching proteins.

tpdooley

04-01-2003, 10:44 PM

CurrentStruc 1 35 123 37 1 19 6.493 -3037.588 -1312.773 -1503.734 173163056.000 1.250 2.300 764.752

Which means that I'm working on generation 37, and I've got to generation 250 once, if I'm reading the stats right.
This is an Athlon xp 1800+ running for 7 days, (quiet, use extra ram), win98se, 256Megs pc133 sdram.

The early generations (first 40-60 generations?) seem to go by really quickly - but several of us seem to get bogged down between the early generations and generation 250. Our overall best score is much better - but I mentioned this in reference to the comment made at the beginning of this thread that slower systems shouldn't have any problem getting to generation 250 in a week.

(and yes, I did grab the native.val out of the wrong directory.. whoops. ;)

Good to hear that you're not part of the quarantined group.
:(

bwkaz

04-02-2003, 10:10 PM

I was about to report this as a bug, until I realized what the problem really was.

I'm working on a new Linux dfGUI version (for anyone interested: it's using in Gtk 2 now, since IMHO Gtk is ten times prettier than Qt, even if C GUI programming is hard ;)), and no matter what I was doing to start the client, it would print out "checking for newer versions", then "This program has crashed. If you suspect a bug, please contact us at [email protected]. Note: some work may be lost."

It would do this no matter how I was starting it up, as long as I was doing it in a terminal window (and not from a real console). The ncurses init function hadn't even gotten called when it crashed, so I was pretty much at a loss what it could be.

Until I realized that every time I had been starting it from the console, I'd been doing it as root (with my init script, which only root has execute permission for). So I switched to root in the Eterm window I'd been using, and voila, it worked.

So it appeared to be a permissions thing. Quitting the client and chmod go+w filelist.txt let all users start it without a crash -- however, I notice that the error.log file is still mode 755 and owned by root, so I don't know what'll happen if it ever tries writing to that file. progress.txt won't have this problem, since it gets recreated every time the client starts up, owned by whoever owns the client process.

Anyway, this isn't a bug report, it's my stupidity. So if anyone running a Linux client is wondering why it just crashes right at startup with that message, check permissions on filelist.txt. Brian, could you perhaps put a slightly more informative error message in if that happens, as well? I know, it's my stupidity for not checking that kind of thing, but the error of "this program has crashed" wasn't helping, either. ;)

Brian the Fist

04-03-2003, 01:05 PM

Originally posted by bwkaz

Anyway, this isn't a bug report, it's my stupidity. So if anyone running a Linux client is wondering why it just crashes right at startup with that message, check permissions on filelist.txt. Brian, could you perhaps put a slightly more informative error message in if that happens, as well? I know, it's my stupidity for not checking that kind of thing, but the error of "this program has crashed" wasn't helping, either. ;)
It's clearly just a matter of me not checking for a NULL pointer after I open a file. I could put a check in, but I'd probably have to do it everywhere I open a file as any of the files could have incorrect permissions at some time or other. I do check in some cases I think but there'll always be some that get missed.

bwkaz

04-03-2003, 01:37 PM

If you don't want to, then that's your choice, obviously. Whatever -- I know what it was now. ;)

Mikus

04-03-2003, 08:05 PM

Earlier, I had the experience of a single generation taking 7 hours 51 minutes. Now, at this moment, seems like every time I look at the client display it is in "Minimizing energy of best ..." mode. This is the _same_ client run, with the same hardware and software and generation_0 start point, except it is significantly further (in terms of generations completed). Now the run is "knocking off" many generations in less than six minutes apiece (those six minutes INCLUDE the time spent "Minimizing").

I'm hoping that the "scoring" takes into consideration an 80-to-1 ratio in the amount of effort that participants might expend on a single generation.

mikus

shortfinal

04-03-2003, 09:21 PM

I had a similar experience. On my PIII 733 MHZ PC generation 116 took a little over 6 hours to complete. After that subsequent generations averaged about 20 minutes.

Shortfinal

Digital Parasite

04-04-2003, 05:55 AM

It seems the beta.distributed.org site is down, is anyone else experiencing that? I have about 46 generations bufferd on one machine and 14 on another.

Howard, here is a filelist.txt line for you, my values seem to be very high on this client:
CurrentStruc 1 51 123 79 1 47 6.264 -3557.425 -1004.781 -2668.209 371461504.000 2.350 4.500 16552.963

Jeff.

Brian the Fist

04-04-2003, 10:02 AM

Beta site is back awake again now.
Ok, it is clear that plan A did not work to keep our initial helices, so it is on to plan B now. This one will definitely do the trick and, hopefully, get us all teh way down to some 2 or 3A structures as well. Expect 'beta 6' to be coming later today. This will hopefully be the last change in the algorithm.

In response to some ofthe generations taking several hours - yes, remember, it is random. some may take a long tiem and some may not. This is to ensure the integrity of the structures. It tries to make them as good quality as possible and then slowly relaxes this constarint until it gets something that works. However as you pointed out, after this it whips through them very fast. Because once it finds the 'laxness level' that works for the current run, it hovers around there and proceeds much faster. Thus you need to consider the overall time and effort to get to generation 250 (or whatever) and not how long any one given generation takes. This is why the scoring is biased towards later generations - if you get over the initial hurdles you are rewarded later on (kind of like the piece of cheese at the end of the mouse maze :p )