## Remote System Monitoring ## Steven St.Laurent Many tools have been designed to check, monitor and inform the administrator or user on the status of his/her systems. These products, while very good in their own right, tend to require a good deal of configuration for specific situations and in many cases do not quite fit my personal needs. I like simple, I like low load, and I like fast! A large monitoring program might be worth using on a dedicated machine but I wanted something for a loaded server. "Why is this of any importance to me?" you might ask? Well considering I could find no documentation on a using existing tools to monitor and learning something new is fun. You might find this useful somewhere down the road. I will not give you a ready to run C or Perl script, rather this is just a good outline, useful for developing your own applications. None of this is magic, many might already be using this. I also did not invent this, I just present it as part of my work towards a good monitoring solution. First off I needed a set design goal for this first part. Mind you, this is just one part of a monitoring system. My goals (might vary compared to yours) were: o Simple design which could be incorporated into Perl or C scripts o Uses existing tools available for most/all Unix machines o Interoperability across operating systems, Solaris, DU, Linux, FreeBSD, etc. High Fault Tolerance With these goals in mind I decided to use something similar to what we were already familiar with, uptime. Uptime is typically used from the command line, ie: $ uptime 8:30AM up 14 days, 20:40, 7 users, load averages: 0.03, 0.05, 0.06 As we can see this is all we really need from a remote monitoring position, at least all I need to start. I see clearly total time up, number of users, and load. From this one line I can easily write a perl script which can respond to conditions, emailing or paging me if load gets too high or the machine reboots. The problem here is Fault Tolerance. If the machine reboots the script will start anew, if the machine loses it's network interface it cannot communicate, or if load reaches a high point it might not be able to respond. Running a monitoring script from a separate machine is the best solution, with several error conditions taken into account you can easily detect problems quickly and efficiently. Here is how I setup my machines. You might choose otherwise, but the scheme is the same. First determine what machine will be the monitor. For my situation, I decided my workstation would be best for the time being. If redundancy is required I would use two machines, with the second machine monitoring the first and taking over IF the first machine stopped responding. These steps do assume you have root on the machines in question or can convince the admin to do these steps. 1) Install tcp-wrappers. As I write this, the current version for FreeBSD appears to be tcp_wrappers_7.6. Install it from the ports collection if you like. You will want to install this on all machines which will be monitored. 2) Edit /etc/services. You will need to add the following line to your services file: uptime 333/tcp #Monitor Uptime Inetd needs to know which services exist and which do not. I use port 333 since it seems to be unassigned and it's not a good idea to conflict with other services. 3) Edit /etc/inetd.conf and add the following: uptime stream tcp nowait nobody /usr/local/libexec/tcpd /usr/bin/uptime I'm not going to get into a long discussion about the pro's and con's of using inetd. Security-wise it can be lacking unless you know what you are doing. Since we installed tcp-wrappers I can feel safe offering limited services through inetd. In this case, let's just leave telnet and our new entry. If you have not already, comment out all other services you are not using. Also, consider using tcp wrappers for other services as well. 4) Secure up the system. With inetd running it's now time to secure up the system by editing the /etc/hosts.allow and /etc/hosts.deny files. These will be checked against by inetd to see if the service is allowed to a particular client. If you do not already use these DO SO. They are vital for a secure system. Especially when running inetd. Here is an example from mine: uptime: 10.0.0.14 / 10.0.0.25 You might have other entries here. Here I've allowed uptime (port 333 from services) access from just two machines, 10.0.0.14 (mine) and 10.0.0.25 (a backup machine just in case). In hosts.deny I have ALL:ALL I only want my machines to be able to access this information. 5) Test it out from an allowed host telnet to port 333. You should see something like this. $ telnet 10.0.0.1 333 Trying 10.0.0.1... Connected to 10.0.0.1. Escape character is '^]'. 9:16AM up 60 days, 19:42, 20 users, load averages: 0.00, 0.03, 0.00 Connection closed by foreign host. $ If you have problems try restarting inetd. Using 'killall -HUP inetd' should work fine. If inetd is not running, start it! If you still have problems, and inetd is running try 'tail -f /var/log/messages' to see if it is generating any errors. You can do almost anything with this tool. For my project it, gives me the ability to monitor and track load. While not the best solution I can run this with a database backend and store the uptime information. Running every 5 minutes, I can easily monitor gross load figures. I also have the ability to monitor the machine by watching for a response. If a certain machine does not respond to a uptime request, I can set it to email me. In conjunction with other tools you can quickly create a very robust monitoring system. This is not a panacea and does not pretend to perform as well or better than existing products, it is a different way though. You can do other various tricks through the same method. If you imagine it, you can probably do it. Plus, the learning experience is far more valuable that you can imagine. - Steven $Id: sysmon.txt,v 1.1 2000/02/16 08:07:52 jim Exp $