Next test - Beta 5

**Georgina** · 03-31-2003, 06:30 PM

Originally posted by Digital Parasite
... They both were active for the same amount of time but one of them finished all 250 generations on Saturday and the other one is still on generation 240 and is running much slower...

Jeff.

I started Beta 5 on my Win2K server (with VERY light load) the same day it was released. It has a 1800+ AMD. As of today it is only on gen. 90

:shocked:

I guess it is running one of the slower versions !! A 2600+ shouldn't be that much faster.

G

**bwkaz** · 03-31-2003, 07:44 PM

Yes, that does seem slow. On my XP1800+, which started the day that beta 5 started as well, it's on generation 149.

But then, I run Linux, too, so that might be part of it. I assume you're running the text client, right? Is it running quiet?

**Georgina** · 03-31-2003, 10:42 PM

Originally posted by bwkaz
Yes, that does seem slow. On my XP1800+, which started the day that beta 5 started as well, it's on generation 149.

But then, I run Linux, too, so that might be part of it. I assume you're running the text client, right? Is it running quiet?

Yes, I am running the text client in default mode, whatever that is.

G

**Scoofy12** · 03-31-2003, 11:07 PM

just fyi, default mode is not quiet. anyone have good data on how much difference that really makes?

**Georgina** · 03-31-2003, 11:19 PM

Originally posted by Scoofy12
just fyi, default mode is not quiet...

Ok, what am I supposed to hear?

My pain in the a$* coworker got me running this thing. Now that you mention it, I remember him saying something about switches - I guess I'm just having a blond moment

How do I get this thing running faster?

G

**Paratima** · 03-31-2003, 11:52 PM

Originally posted by Georgina
How do I get this thing running faster?

What all the world wants to know...

**IronBits** · 04-01-2003, 12:31 AM

-rt - uses about 125mb of ram, but runs almost twice as fast!
-qt - quiet mode, no display output... 10%? increase in performance.
AMD - fastest processor you can afford

(better than Intel performance in most projects)
DDR memory - heard it seems to give another significant boost...
Faster processor speed better than slower processor with higher FSB.
That should get a good debate going

Unknown if all of the above is applicable to the beta client. YMMV

**tpdooley** · 04-01-2003, 04:40 AM

I'm getting the error [NULL_Caption] ERROR: [001.023] Invalid PMMD given to GetRMSD.
Hit Return.
(repeat until annoyed..

then kill the dos window.)

.\analyzemovie -f c:\temp\best2.val -g c:\temp\native.val
(renamed beststruc.val so it was easier to type).

I was just trying to see what generation I ended up on since the machine is at work, and I'm at home.
--------
Add my system to the list of those that ended up having difficulty getting to generation 250.
It's an Athlon axp 1800+ with 256megs of pc133 sdr running win98se.
For Beta 4, I spent a few days running on a 600Mhz Athlon, and then switched to the Athlon axp 1800+ and got to generation 250 and got to generation 80? of the next batch, before the beta 5 client was released.
(Ended up getting a 6.04 this time and a 6.02 last time.. so my scores haven't seemed to changed.)
------------------

**Brian the Fist** · 04-01-2003, 12:27 PM

Originally posted by tpdooley
I'm getting the error [NULL_Caption] ERROR: [001.023] Invalid PMMD given to GetRMSD.
Hit Return.
(repeat until annoyed.. then kill the dos window.)

.\analyzemovie -f c:\temp\best2.val -g c:\temp\native.val
(renamed beststruc.val so it was easier to type).

I was just trying to see what generation I ended up on since the machine is at work, and I'm at home.
------------------

The most likely problem here is your native.val is not the same protein as that in best2.val. Please ensure you get the two files from the same place (i.e. from the BETA server).

**Brian the Fist** · 04-01-2003, 12:39 PM

For those who seem to think their client is running slower than 'normal' can a few of you post the 'currentstruc' line from your filelist.txt? Specify which are from 'slow' computers and which are from 'fast ones'.

Because the algorithm currently is highly dependent on the initial gen. 0 structure chosen, it is possible that the wide variance in time is mostly due to how 'lucky' you get in gen. 0. However it is not intended to work this way, all equal machines should finish 250 generations within say at most 20% variance. If I see the filelist.txt I may be able to see what is going on to cause this.

On the other hand, we have been running the beta on 'a lot' of identical machines, and they currently range from generation 44-91, with the majority being at about 60-70. This is about what I expect. So is it not possible that some of us are being just a 'wee bit' paranoid?

:bs:

**Digital Parasite** · 04-01-2003, 02:49 PM

Originally posted by Brian the Fist
On the other hand, we have been running the beta on 'a lot' of identical machines, and they currently range from generation 44-91, with the majority being at about 60-70. This is about what I expect.

If clients running on identical hardware for the same amount of time, some being 50 generations apart from others is what you expect then I am not seeing anything different myself.

But the client on generation 91 is obviously processing structures faster than the one that is only on generation 44. If you are saying the clients shouldn't be more than 20% off, your own example is way over that. The client on gen 91 is 100% ahead of the client on 44.

Jeff.

**Brian the Fist** · 04-01-2003, 02:55 PM

Originally posted by Digital Parasite
If clients running on identical hardware for the same amount of time, some being 50 generations apart from others is what you expect then I am not seeing anything different myself.

But the client on generation 91 is obviously processing structures faster than the one that is only on generation 44. If you are saying the clients shouldn't be more than 20% off, your own example is way over that. The client on gen 91 is 100% ahead of the client on 44.

Jeff.

Actually no its not. If you consider the 'normal distribution', in a cluster of our size, you expect a few nodes to be 3-4 standard deviations from the mean, while the majority will lie within 1 standard deviation of the mean (20%). Like I also said, most are between 60-70 and thus if you take a mean of 65, that means most are within 5 gens. of each other, or well less than 10%. So probably about 90% of the machines are within about 20% of gen. 65 (I didn't count exactly). If you haven't taken stats recently, just trust me on this

**Brian the Roman** · 04-01-2003, 03:32 PM

Howard;
how long do you expect 250 gens to take, on average, for your p3-450s?

Both my machines are way faster than that (xp1800, xp1900, both quiet mode and extra ram) and after 6.5 days on beta5 I'm at gen 230 and 148, respectively. Are these roughly performing as you'd expect? At the fast end, slow end or in the middle? If these machines are not on the slow end then I have more questions surounding ROI, but I'll wait for your answer before raising them.

If these are both on the slow end of the spectrum of what you'd expect, it would be interesting to see how many other people of the 40 participants are also on the slow end.

**Digital Parasite** · 04-01-2003, 03:56 PM

Originally posted by Brian the Fist
Actually no its not. If you consider the 'normal distribution', in a cluster of our size, you expect a few nodes to be 3-4 standard deviations from the mean, while the majority will lie within 1 standard deviation of the mean (20%). Like I also said, most are between 60-70 and thus if you take a mean of 65, that means most are within 5 gens. of each other, or well less than 10%.

Oh, you meant of all your machines the majority shouldn't be more than 20% off. I thought you meant that taking any 2 machines of the same speed, they shouldn't be more than 20% off. In my sample size of 2 machines, I was in fact correct.

So is SARS affecting your building at all? Are you actually in the hospital complex at all or in a different building completely so you have less to worry about?

Jeff.

**Mikus** · 04-01-2003, 05:21 PM

My DF machine is on a local LAN. From time to time, I dial out on another machine and activate a proxy, to provide a path to the DF server.

Meaning most of the time that the DF machine wants to send results, it has no conection. This is what the "old client writes to ERROR.LOG :
ERROR: [000.000] {foldtrajlite.c, line 3691} Error during upload: NO RESPONSE
ERROR: [000.000] {foldtrajlite.c, line 1195} Tue Apr 1 08:17:37 2003
Unable to check server status

Contrast that with what the "beta client" writes to ERROR.LOG [truncated] :
ERROR: [777.000] {ncbi_socket.c, line 910} [SOCK::s_Connect] Failed pending c
ERROR: [777.000] {ncbi_connutil.c, line 526} [URL_Connect] Socket connect to
ERROR: [777.000] {ncbi_http_connector.c, line 117} [HTTP] Retry attempt(s) ex
ERROR: [777.000] {ncbi_socket.c, line 910} [SOCK::s_Connect] Failed pending c
ERROR: [777.000] {ncbi_connutil.c, line 526} [URL_Connect] Socket connect to
ERROR: [777.000] {ncbi_http_connector.c, line 117} [HTTP] Retry attempt(s) ex
ERROR: [000.000] {foldtrajlite2.c, line 1445} Tue Apr 1 08:13:14 2003
Unable to check server status
ERROR: [777.000] {ncbi_socket.c, line 910} [SOCK::s_Connect] Failed pending c
ERROR: [777.000] {ncbi_connutil.c, line 526} [URL_Connect] Socket connect to
ERROR: [777.000] {ncbi_http_connector.c, line 117} [HTTP] Retry attempt(s) ex
ERROR: [777.000] {ncbi_socket.c, line 910} [SOCK::s_Connect] Failed pending c
ERROR: [777.000] {ncbi_connutil.c, line 526} [URL_Connect] Socket connect to
ERROR: [777.000] {ncbi_http_connector.c, line 117} [HTTP] Retry attempt(s) ex
ERROR: [000.000] {foldtrajlite2.c, line 3882} Error during upload: NO RESPONSE

This verbiage makes it difficult to scan ERROR.LOG to see if any other kinds of errors are being reported. Let me suggest an optional "less verbose" mode that suppresses "could not connect socket" messages.

mikus

**Brian the Fist** · 04-01-2003, 05:45 PM

Originally posted by Mikus
My DF machine is on a local LAN. From time to time, I dial out on another machine and activate a proxy, to provide a path to the DF server.

Meaning most of the time that the DF machine wants to send results, it has no conection. This is what the "old client writes to ERROR.LOG :
ERROR: [000.000] {foldtrajlite.c, line 3691} Error during upload: NO RESPONSE
ERROR: [000.000] {foldtrajlite.c, line 1195} Tue Apr 1 08:17:37 2003
Unable to check server status

Contrast that with what the "beta client" writes to ERROR.LOG [truncated] :
ERROR: [777.000] {ncbi_socket.c, line 910} [SOCK::s_Connect] Failed pending c
ERROR: [777.000] {ncbi_connutil.c, line 526} [URL_Connect] Socket connect to
ERROR: [777.000] {ncbi_http_connector.c, line 117} [HTTP] Retry attempt(s) ex
ERROR: [777.000] {ncbi_socket.c, line 910} [SOCK::s_Connect] Failed pending c
ERROR: [777.000] {ncbi_connutil.c, line 526} [URL_Connect] Socket connect to
ERROR: [777.000] {ncbi_http_connector.c, line 117} [HTTP] Retry attempt(s) ex
ERROR: [000.000] {foldtrajlite2.c, line 1445} Tue Apr 1 08:13:14 2003
Unable to check server status
ERROR: [777.000] {ncbi_socket.c, line 910} [SOCK::s_Connect] Failed pending c
ERROR: [777.000] {ncbi_connutil.c, line 526} [URL_Connect] Socket connect to
ERROR: [777.000] {ncbi_http_connector.c, line 117} [HTTP] Retry attempt(s) ex
ERROR: [777.000] {ncbi_socket.c, line 910} [SOCK::s_Connect] Failed pending c
ERROR: [777.000] {ncbi_connutil.c, line 526} [URL_Connect] Socket connect to
ERROR: [777.000] {ncbi_http_connector.c, line 117} [HTTP] Retry attempt(s) ex
ERROR: [000.000] {foldtrajlite2.c, line 3882} Error during upload: NO RESPONSE

This verbiage makes it difficult to scan ERROR.LOG to see if any other kinds of errors are being reported. Let me suggest an optional "less verbose" mode that suppresses "could not connect socket" messages.

mikus

If you are keen, you may note that the verbose messages come from the NCBI toolkit part of the code (hence the ncbi*.c) which is the part we did not write. Nevertheless we do have some limited control over the errors it outputs. I am not certain what changed to change this though, I think actually those verbose messages were always there. It's not a prime concern but if you bug me enough times and I have a bit of time I should be able to suppress those (mostly useless) error messages without modifying the NCBI toolkit itself.

**Brian the Fist** · 04-01-2003, 05:51 PM

Originally posted by Digital Parasite

So is SARS affecting your building at all? Are you actually in the hospital complex at all or in a different building completely so you have less to worry about?

Jeff.

Yes, it is very affected. Our entire hospital as with all Toronto hospitals is majorly affected, to the point that we will not be showing up at work any time soon (but I can still work from home of course

). No research staff or students have been infected with SARS yet as far as I know, only nurses/hospital workers in direct contact with patients. But nevertheless in order to contain the virus anyone entering a hospital must fill out a questionaire, have their temperature taken, wear a mask at all times while inside the building, etc etc. (which is why we won't be showing up there). We hope the insanity and paranoia will die down in a couple of weeks...

I should be able to proceed with the protein updates and beta updates without going in though, I just need a bit of time to get it organized here, hence the delay in switching proteins.

**tpdooley** · 04-01-2003, 10:44 PM

CurrentStruc 1 35 123 37 1 19 6.493 -3037.588 -1312.773 -1503.734 173163056.000 1.250 2.300 764.752

Which means that I'm working on generation 37, and I've got to generation 250 once, if I'm reading the stats right.
This is an Athlon xp 1800+ running for 7 days, (quiet, use extra ram), win98se, 256Megs pc133 sdram.

The early generations (first 40-60 generations?) seem to go by really quickly - but several of us seem to get bogged down between the early generations and generation 250. Our overall best score is much better - but I mentioned this in reference to the comment made at the beginning of this thread that slower systems shouldn't have any problem getting to generation 250 in a week.

(and yes, I did grab the native.val out of the wrong directory.. whoops.

Good to hear that you're not part of the quarantined group.

**bwkaz** · 04-02-2003, 10:10 PM

I was about to report this as a bug, until I realized what the problem really was.

I'm working on a new Linux dfGUI version (for anyone interested: it's using in Gtk 2 now, since IMHO Gtk is ten times prettier than Qt, even if C GUI programming is hard

), and no matter what I was doing to start the client, it would print out "checking for newer versions", then "This program has crashed. If you suspect a bug, please contact us at trades@mshri.on.ca. Note: some work may be lost."

It would do this no matter how I was starting it up, as long as I was doing it in a terminal window (and not from a real console). The ncurses init function hadn't even gotten called when it crashed, so I was pretty much at a loss what it could be.

Until I realized that every time I had been starting it from the console, I'd been doing it as root (with my init script, which only root has execute permission for). So I switched to root in the Eterm window I'd been using, and voila, it worked.

So it appeared to be a permissions thing. Quitting the client and chmod go+w filelist.txt let all users start it without a crash -- however, I notice that the error.log file is still mode 755 and owned by root, so I don't know what'll happen if it ever tries writing to that file. progress.txt won't have this problem, since it gets recreated every time the client starts up, owned by whoever owns the client process.

Anyway, this isn't a bug report, it's my stupidity. So if anyone running a Linux client is wondering why it just crashes right at startup with that message, check permissions on filelist.txt. Brian, could you perhaps put a slightly more informative error message in if that happens, as well? I know, it's my stupidity for not checking that kind of thing, but the error of "this program has crashed" wasn't helping, either.

**Brian the Fist** · 04-03-2003, 01:05 PM

Originally posted by bwkaz

Anyway, this isn't a bug report, it's my stupidity. So if anyone running a Linux client is wondering why it just crashes right at startup with that message, check permissions on filelist.txt. Brian, could you perhaps put a slightly more informative error message in if that happens, as well? I know, it's my stupidity for not checking that kind of thing, but the error of "this program has crashed" wasn't helping, either.

It's clearly just a matter of me not checking for a NULL pointer after I open a file. I could put a check in, but I'd probably have to do it everywhere I open a file as any of the files could have incorrect permissions at some time or other. I do check in some cases I think but there'll always be some that get missed.

**bwkaz** · 04-03-2003, 01:37 PM

If you don't want to, then that's your choice, obviously. Whatever -- I know what it was now.

**Mikus** · 04-03-2003, 08:05 PM

Earlier, I had the experience of a single generation taking 7 hours 51 minutes. Now, at this moment, seems like every time I look at the client display it is in "Minimizing energy of best ..." mode. This is the _same_ client run, with the same hardware and software and generation_0 start point, except it is significantly further (in terms of generations completed). Now the run is "knocking off" many generations in less than six minutes apiece (those six minutes INCLUDE the time spent "Minimizing").

I'm hoping that the "scoring" takes into consideration an 80-to-1 ratio in the amount of effort that participants might expend on a single generation.

mikus

**shortfinal** · 04-03-2003, 09:21 PM

I had a similar experience. On my PIII 733 MHZ PC generation 116 took a little over 6 hours to complete. After that subsequent generations averaged about 20 minutes.

Shortfinal

**Digital Parasite** · 04-04-2003, 05:55 AM

It seems the beta.distributed.org site is down, is anyone else experiencing that? I have about 46 generations bufferd on one machine and 14 on another.

Howard, here is a filelist.txt line for you, my values seem to be very high on this client:
CurrentStruc 1 51 123 79 1 47 6.264 -3557.425 -1004.781 -2668.209 371461504.000 2.350 4.500 16552.963

Jeff.

**Brian the Fist** · 04-04-2003, 10:02 AM

Beta site is back awake again now.
Ok, it is clear that plan A did not work to keep our initial helices, so it is on to plan B now. This one will definitely do the trick and, hopefully, get us all teh way down to some 2 or 3A structures as well. Expect 'beta 6' to be coming later today. This will hopefully be the last change in the algorithm.

In response to some ofthe generations taking several hours - yes, remember, it is random. some may take a long tiem and some may not. This is to ensure the integrity of the structures. It tries to make them as good quality as possible and then slowly relaxes this constarint until it gets something that works. However as you pointed out, after this it whips through them very fast. Because once it finds the 'laxness level' that works for the current run, it hovers around there and proceeds much faster. Thus you need to consider the overall time and effort to get to generation 250 (or whatever) and not how long any one given generation takes. This is why the scoring is biased towards later generations - if you get over the initial hurdles you are rewarded later on (kind of like the piece of cheese at the end of the mouse maze

)

**FoBoT** · 04-04-2003, 12:01 PM

**Brian the Fist** · 04-04-2003, 10:01 PM

very cute

**Brian the Roman** · 04-04-2003, 10:10 PM

The beta stats seem to have been reset. Is this for the new and improved beta6?

ms

Thread: Next test - Beta 5

Thread Tools

Rate This Thread

Display

Interesting issue...

Re: Interesting issue...

Posting Permissions