PDA

View Full Version : There seems to be a dfGUI problem



rsbriggs
06-28-2003, 09:11 PM
I've pretty much verified to my satisfaction that there is a dfGUI problem when running 8 copies on a 4 * Xeon * HT system.

I ran for nearly 12 hours without using dfGUI, and didn't see so much as a single hiccup. When I brought dfGUI up to force an upload on the 5th client, the 7th client immediately aborted and wouldn't restart due to an RLE run length error "is 3, should be 400*400."

I then used dfGUI to upload the 4th client, and the 8th client finished it's generation, and died. Attempted to restart it, and it reported a database corruption error.

Now I still can't be certain that I don't have machine problems on this particular box, but I found it very suspicious that things ran for 12 hours just fine, until I brought up dfGUI.

Oh - and something else very interesting I found. If I brought up dfGUI to shut down a client (I normally run them invisibly) and toggled from invisible to visible, then stop the client by pressing Q in the window - that works fine. However, what is really interesting is that if I toggle the configuration of dfGUI back to invisible, then back to visible, ANOTHER client window becomes visible. If I stop it by pressing Q in the window, then jump out of dfGUI and try to start both of them back up, one of them generally complains about an bad file and refuses to run.....

===bob briggs

bwkaz
06-29-2003, 01:52 PM
You do have 8 different directories for your DF clients, right? You're not doing something like running two of them from one directory?

I don't see how dfGUI could affect them like that (it only reads information from protein.trj, progress.txt, and filelist.txt, none of the database files, and it only writes to the foldit[.bat if your OS uses it] script). However, I won't say it's not, because obviously something is happening to your clients. :)

Don't suppose you can read C++ well enough to figure out what might be going on, can you?

rsbriggs
06-29-2003, 02:46 PM
Yes - they run out of separate directories DF-CPU1 through DF-CPU8. I don't quite understand it myself, but there is really is something odd going on with multi-processors. I've verified this sequence:

1.) All clients were started by running foldit, not using dfGUI. All 8 console windows are minimized to the taskbar.

2.) I open one of the directories, and bring up the copy of dfGUI in that directory and check the configuration. It shows the correct path, and shows that the client is running.

3.) I use the dfGUI "upload" button to stop the client and upload buffered generations. When done, one of the console windows minimized to the taskbar disappears.

4.) On the same running instance of dfGUI, I go to the configuration screen, and toggle the radio buttons from visible to invisible, then back to visible - it restores some other client window.

5.) If you again press the upload button, you will end up with at least one dead client due to "hosed" files. Sometimes, if you just close dfGUI at that point, you'll end up with one of the clients dead at the end of its current generation. Sometimes this also leaves a phantom copy of foldtrajlite running, not associated with any of the minimized windows - memory usage about 27k, and consuming no CPU. If you let the running clients continue, at least one of them will die later, due to hosed up files.

Quite odd, and only happening on my 8x box, not my 4x box. I just don't run dfGUI on that box, and everything is OK. I DO run dfGUI on the 4x box, and have never had the slightest problem.

No - don't read C++ (for windows) well at all. I'm an old UNIX and 'C' guy (Solaris, AIX, OSF, Tru64, a couple years worth of Linux) that's been working with C# and .Net over the last year or so. Never did anything with classic 'C' or C++ windows programming, COM, or MFC... (By OLD, I mean that I've been working with Unix since it escaped from Bell labs in about 1975 - first used it on a PDP-11, as I recall. Been working with C since BEFORE the Kernighan, Plauger, and Ritchie book was available.)


<shameless self plug>

Anyone out there looking to hire a C# programmer (with one+ years worth of C# experience), and LOTS of UNIX/'C' experience to back it up??? Have resume, will submit. Location of job not important, will relocate as necessary....

</shameless self plug>

Digital Parasite
06-29-2003, 04:30 PM
Originally posted by rsbriggs
3.) I use the dfGUI "upload" button to stop the client and upload buffered generations. When done, one of the console windows minimized to the taskbar disappears.

4.) On the same running instance of dfGUI, I go to the configuration screen, and toggle the radio buttons from visible to invisible, then back to visible - it restores some other client window.

dfGUI wasn't designed for a multi-CPU systems so the Visible/Hidden code doesn't really work properly running multiple copies. The code I believe will hide the first dfGUI window it finds and might unhide all of them. There is no way to differentiate between all the DF client command line windows so it doesn't know which one you really want to deal with.

What you might be seeing is that if dfGUI is configured for the client to be "Hidden" when you open dfGUI, it will find the first DF Client window it can and make it hidden. Or if you had it set to "Visible" and you load dfGUI, it will probably find a DF Client window and make it visible.

Using the "Upload" button deletes the .lock file which causes the DF client itself to shutdown "gracefully", and dfGUI waits for the .lock and progress.txt to be completely deleted. It then launches the DF client with the upload only switch. After that is finished, if your Client was stopped when you clicked Upload, nothing else will happen. If your client was running, it will restart your DF Client using the configuration you have in the GUI. I don't see how that would corrupt your installation.

As I said before, dfGUI wasn't designed to work with multiple copies of itself running at the same time. One thing it does do is change the current working directory so it can jump between the dfGUI directory and DF Client directory if they happen to be different. If you have 8 copies of dfGUI running, it is possible that one of the GUIs could switch directories to its DF client, then before it had a chance to run the upload command, another copy of dfGUI would switch to its DF client directory (to update its progress say) and then the first dfGUI would launch the upload command for that client instead of the one you intended to.

The only way I can see files getting corrupted is if dfGUI tried to launch foldit.bat in the same directory more than once before the DF client had a chance to create the foldtrajlite.lock file. The DF Client itself will prevent another copy from running in the current directory if the .lock file is present but there is a small window before it gets created that you can still do that. I have never tried so I don't know if that causes corrupt files or not.

I guess the bottom line is that it is not safe to run multiple copies of dfGUI at the same time. I actually run 2 instances of dfGUI at the same time on one of my SMP boxes but it just has 2 CPUs and so far I haven't run into any problems but it could be that with 8 CPUs, there are enough dfGUI running that it causes problem. I'm not exactly sure *what* is causing your problem but giving suggestions on things that might go wrong.

Jeff.

rsbriggs
06-29-2003, 04:55 PM
Ahh - that explains many things. :idea: I won't try to use dfGUI on my 8x box then. (It works wonderfully on my boxen with 2 processors, and 2*HT = 4 processors).

I figured you were having us navigate to the foldtrajlite.exe file during config for a specific reason. Under Unix, if you know the filename of an executable file, you can find out the Process IDs (PIDs) of any instances of it running.

I'm guessing that you can't use that same mechanism in Windows, (or, maybe it isn't compatible with all versions of Window) to find out the right PID, or console window that the specified PID is running in. Too bad there really isn't a good mechanism for renaming foldtrajlite.exe (i.e. when running 8 copies, foldtrajlite1.exe - foldtrajlite8.exe). If nothing else it would really help figure out what-is-what when viewing the task list....

Anyway, no biggie - now that I know running 8 copies of dfGUI at the same time doesn't work quite right on my box, I'll just avoid doing it. :(

I LOVE the new 3.1 version. :thumbs: :|party|: GREAT job !!! :notworthy :notworthy

Digital Parasite
06-29-2003, 05:19 PM
Originally dfGUI was just a simple program to help people start and stop the client since editing and launching .bat files is not trivial for novice computer users. I never figured it would grow into what it is today. :cheers:

At the time when I was coding the section to switch working directories and then run commands from there, I never thought about people running multiple copies. Looking back now, that doesn't seem like a good way to do things so I will sit down and see if I can come up with a better way to handle things so people can run multiple copies.

I will put that on my ToDo list.

Jeff.

Digital Parasite
06-30-2003, 01:20 PM
I just did some testing and found that each instance of dfGUI has its own "working directory" and not some universal one. So having multiple copies of dfGUI running does not cause problems for the changing directories part.

There is still a problem with hiding and unhiding multiple DF client windows.

I will have to take a look at the UPLOAD button functionality. I haven't changed any of the Upload code for a long time so haven't really tested it much with the new Phase II client. Maybe the Phase II client acts a bit differently. I will check it out to see if I can find any problems.

rsbriggs, are you only having problems on your 8 CPU box with dfGUI when you click on UPLOAD? If you don't do any "action" buttons but just let dfGUI monitor your client, do you have any problems?

Jeff.

rsbriggs
06-30-2003, 01:48 PM
Not too certain, since I was using dfGUi to start / stop upload generally, but as I recall, monitoring was OK, taking ANY action wasnt.

I'ld be glad to run a test on the 8x box if you like.

Current state - all clients started via the foldit batch file, and are set up to upload automatically. I'll start 8x dfGUIs, one in each directory, and just let them monitor, to see if I encounter any problems....

(I like to check the graphs from time to time, so I hate to run without it.)

8 monitor copies being started......


===bob

rsbriggs
06-30-2003, 03:15 PM
8 copies monitoring for an hour now, with no problems.... I'll report in another 4 hours or so.

I suspect that this works, just so long as I only use them to watch.. :)

:thumbs: