Results 1 to 6 of 6

Thread: DCMonitor and DFMonitor oddities

  1. #1
    Member
    Join Date
    Jul 2002
    Location
    Down the road from Mr. Fist :D
    Posts
    76

    DCMonitor and DFMonitor oddities

    Been using DCMonitor for a while now and not too long before the last changeover it stopped monitoring three Linux clients properly for some reason.

    Until that time it always told me if the client was running (I'm assuming both these programs only look for the lock file to see if it's running). Nothing has changed whatsoever on any of the Lnux clients at all. "Voyager" is the main machine running NFS exports for the other two machines running Diskless Path from the Windows machine that is running DF and DC Monitor is \\voyager\nfs\exports\Picard\distribfold and \\voyager\nfs\exports\Kirk\distribfold and then \\voyager\DC\distribfold (The last one is the local client on that machine)

    No permissions problem as I've still got full access to those directories and other shared ones on the same machine. (Using SAMBA for Windows access)

    Now for some reason They report the client as always being "stopped" ... Here's the URL of the HTML pages that are updated once every 5 mins for DCMonitor and 10 minutes

    DC Monitor: http://sigs.teampicard.com/DCMonitor/
    DF Monitor: http://sigs.teampicard.com/DFMonitor/

    You'll see that Voyager, Kirk and Picard are all either "Stalled" or "Stopped" (Don't worry about Vinculum as that's a machine that the new client always seems to crash on .. ), but I know for a fact they aren't as I've turned off quiet mode on the client and see it going through the processes and everything, but still receive the output that the client isn't running. I've even gone in and run a ps -e and a top to see what was going on and no problems there either. My output is about what it should be as well.

    So I'm REALLY on this one now ...
    Last edited by ^7_of_9; 10-30-2003 at 12:44 PM.

  2. #2
    Boinc'ing away
    Join Date
    Aug 2002
    Location
    London, UK
    Posts
    982

    DFMonitor ...

    I can't answer for DCMonitor but for DFMonitor:

    (I'm assuming both these programs only look for the lock file to see if it's running)
    DFMonitor doesn't - it looks at progress.txt only...if it's there then it's running, if it's over stalled minutes old then it's there but hasn't been updated and if it's not there, DF is stopped.

    Do you know what the last modfied date/time of those 3 progress.txt files are? And what do you have the stalled setting as? Is there a .lock file for each client?

    I have noticed stalled being flagged a bit more often with this protein with the default setting of 10 mins - but would be a bit concerned if the age of progress.txt is a couple of hours...

    It is a bit odd that the client is running (have you checked progress.txt to confirm this?) but the utils are saying stopped/stalled...if you had lost network access to them DFMon would say the clients had stopped...

    /me getting a bit over what is happening as well

    /edit - just rembered...had a similar issue with dfMon and my Linux client (shared via Samba) where Windows was removing an hour from the modified time which meant it was 'stalled' - restarting Samba on the Linux box fixed it.
    Last edited by pfb; 10-30-2003 at 01:24 PM.

  3. #3
    Member
    Join Date
    Jul 2002
    Location
    Down the road from Mr. Fist :D
    Posts
    76
    Thanks for the quick reply.

    I'll have to check the Date/Timestamp on the files through Windows and also the machines themselves to see if that's the problem there. I know it's not a network not reachable thing as I can still get to their individual files over the network. You might have a good idea on the Samba issue there, I'll restart Samba tonight as well and see if that's the problem.

    I don't think that it's the issue with the new protein being so large either as it started last week as well when it was on the 64 one. For the life of me I can't think of ANYTHING that happened around that time to cause everything to go out of kilter at all.

    Also I just realised that DCMonitor also prob looks at the progress.txt file as well to give info on Generations and stuff too.

  4. #4
    Boinc'ing away
    Join Date
    Aug 2002
    Location
    London, UK
    Posts
    982
    Originally posted by ^7_of_9
    Thanks for the quick reply.

    I'll have to check the Date/Timestamp on the files through Windows and also the machines themselves to see if that's the problem there. I know it's not a network not reachable thing as I can still get to their individual files over the network. You might have a good idea on the Samba issue there, I'll restart Samba tonight as well and see if that's the problem.

    I don't think that it's the issue with the new protein being so large either as it started last week as well when it was on the 64 one. For the life of me I can't think of ANYTHING that happened around that time to cause everything to go out of kilter at all.

    Also I just realised that DCMonitor also prob looks at the progress.txt file as well to give info on Generations and stuff too.
    With the Samba thing - you didn't have a time change last week? My problem occured due to going back to GMT from BST...

  5. #5
    Member
    Join Date
    Jul 2002
    Location
    Down the road from Mr. Fist :D
    Posts
    76
    Originally posted by pfb
    With the Samba thing - you didn't have a time change last week? My problem occured due to going back to GMT from BST...
    AWESOME! I'm EST here and TOTALLY! forgot about that (I'm so used to my Server machine acting as my time server that I forgot the Linux machines don't use that as a time server) Great news that it's not something I screwed up and is most likely only the"Fall Back" Time change!

  6. #6
    Member
    Join Date
    Jul 2002
    Location
    Down the road from Mr. Fist :D
    Posts
    76
    Well the time trick wasn't it as the Linux machines succesfully updated ok by themselves to the proper time.

    I SSH'd into the main Linux machine and restarted Samba ... BOOM! back up and showing as running now.

    Thanks for the help.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •