Results 1 to 10 of 10

Thread: Monitoring Remote Clients

  1. #1

    Monitoring Remote Clients

    So I'm sure all the farmers and cluster owners have their own preferred method of monitoring their clients.

    I generally have mosixview or mosixgui running on my cluster controller and can see 50 machines that way. I know that KDFold -etal can watch the progress.

    Problem is - all of these steal processing power and bandwidth. And none of them work well over a WAN. Since my system is offsite, I'd like to watch it from 10mi away now and again.

    My non-clustered machines have a simple script that runs as a cron. Does an uptime and pushes that to a web server where I can read it. Not optimal - but it got me thinking.

    Being an old sun hacker, I was wondering if rup still appears in modern unices.

    So I logged into a linux machine and sure enough there it was!

    You need portmap and rpc.rstatd running on each machine. Minimal processor utilization.

    Then I wrote a simple script that runs as a cron job on my node controller:

    rup (for example)

    named the script dfuptime

    That script spits out something that looks like:

    Code:             up   6 days,  6:42,    load average: 0.99 0.97 0.95             up   6 days,  6:42,    load average: 0.99 0.97 0.98             up   6 days,  6:42,    load average: 0.99 0.97 0.96             up   6 days,  6:42,    load average: 1.00 0.99 0.95             up   6 days,  6:42,    load average: 0.99 0.97 0.95             up   6 days,  6:41,    load average: 0.99 0.97 0.94             up   6 days,  6:42,    load average: 0.99 0.97 0.94
    I run it as a cron job every 20 mins.

    Then I take the output from that script and as a cron every 22 mins (give it time to finish) - I pipe that into sendmail to my mobile address on my palmtop. Every 25mins or so I get an email that shows the uptime of all my machines.

    They're dedicated to DF, so DF should be running all the time and be nearly the only thing running. That means I should see between .95 and 1.00 or so for a processor load. If I see between 0.00 and 0.10, I know that the client has crashed and can deal with it.

    The next step will be an awk/sed or perl script to parse out the load averages above .80 and only email me if there's one that has crashed. When that happens, I can email my cell phone instead of my palmtop and have notification within +20mins or so of a client crashing.

    Benefit here is that this works on both clustered and non-clustered unices. Unless there's an r* series for windows, you're outta luck. (I think cygnus may have such)

    BIG WARNING however:

    The r* commands are inherently dangerous to your network security. I'm using them here in a push-only mode behind a *very* strong firewall. Use them at your own risk! You've been WARNED. Possible side effects include. . . . .

    Anyway - anyone else have a cleaner way?

  2. #2
    dismembered Scoofy12's Avatar
    Join Date
    Apr 2002
    Between keyboard and chair
    Maybe not a cleaner way, but more secure.
    instead of rup or r* anything, use ssh instead. to script it non-interactively, you can setup a public key login.. ie the public key goes in the ~/.ssh/authorized_keys2 file on your crunchers (you ARE using ssh protocol version 2, arent you? there's a good boy ) and the private key goes in ~/.ssh/id_rsa (you ARE using rsa and not dsa keys, arent you?) on your server. then instead of rup <ip> you can say ssh <ip> uptime
    this will ssh to the ip of choice, login with the public key and execute the uptime command. if you were good at grep/awk/sed (it's all voodoo magic to me) you could even do a ps aux or something like that and parse out more detailed info. you'll want to run the script interactively at least once to be sure that all the ssh host keys get cached and whatnot. i used this kind of setup in a script to start and stop the folding on a linux farm.

  3. #3
    I'll concur that ssh is more secure.

    That said - generating keyfiles on 70 machines is a PITA I don't need. When I absolutely have to, I ssh into them, typing my password which is kept ludicrously short.

    As I said in my post - they're on a private network behind serious firewalling allowing no unrequested inbound traffic, and even then, DF is proxied twice just to move traffic. And the DF systems are the only things on that LAN, and the only things that talk to that T-1, period. The risk posed by rpc.statd in an environment like that is negligible. If your controller gets owned, the ssh keyfile is owned and the rest of the machines are owned. In this case, rpc.statd may be *more* secure rather than less, as it doesn't allow arbitrary command execution as sshd would.

    Thanks for the debate!

    P.S. it's "good girl" not "good boy"

  4. #4
    I find this all very intriguing, but my brain just started to ooze out of my ears so I am going to have to politely ask that you two stop immediately.

  5. #5
    We aim to please - but wouldn't a couple of corks be a longer-term solution?

  6. #6
    dismembered Scoofy12's Avatar
    Join Date
    Apr 2002
    Between keyboard and chair
    Originally posted by Jodie

    P.S. it's "good girl" not "good boy"
    Doh! I knew that of course (ie i know that that spelling is a female spelling generally)... however... I have a male friend named Jodie, and when i first met him i always incorrectly spelled it "Jody" which is generally either male or female, AFAIK. So now i'm conditioned the otherway. Anyway, my apologies.

    Oh, and just to get the last parting shot (i know, i know.. im sorry)... you dont actually have to generate keys on all the machines, you can copy the same public key to each of them. AND to prevent the "if-the-host-gets-owned-youre-screwed" problem, you can use the ssh-agent rather than a passwordless key, which will let you type the password for the key once, then save the hash in memory to use in subsequent ssh commands (it dies when the shell that called it does). good balance between convenience and security. all that said, it sounds as though you're pretty secure anyway

    For those of you not into all this, here's all you really need to remember: in general when dealing with remote access, "telnet and r* BAD (insecure), ssh, scp GOOD (secure)"

  7. #7
    What about those of us who are 'into it,' but just lost as hell?

    That is a rhetorical question, btw.

  8. #8

  9. #9
    Administrator Dyyryath's Avatar
    Join Date
    Dec 2001
    North Carolina
    Come on now, Scoofy, Jodie already explained that she's female.

    You HAD to know she'd get the last word in...

  10. #10

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts