Using NRPE To Monitor Remote Services
by Rick on · Posted in Howto
This whitepaper is a continuation to the previously article, Nagios Howto: Notification Escalations, EventHandlers & Remote Service Monitoring With NRPE.
As previously mentioned, our focus assumes the use of Linux and a working Nagios installation. I highly suggest you go back to read the previous Nagios article as it contains important information that we will building upon as we move into the second part of this whitepaper.
Thank you for rejoining if you have already read the first Crucial Nagios whitepaper.
As you have likely seen, the Nagios docs leave a bit to be desired when it comes to information on the NRPE plugin. In its simplest form, the NRPE plugin allows you to monitor any number of remote network devices and services using a single Nagios installation. However, when we combine EventHandlers with NRPE we then have the ability to repair our remote servers—self-healing servers. For now, we will focus our attention to NRPE and walk through the steps to properly configure your NRPE daemon.
Download NRPE Plugin
The NRPE source code and default plugin is available from the Nagios website. You will need to download the NRPE plugin and any other plugins to the remote machine that you intend to monitor:
cd /usr/src
wget http://umn.dl.sourceforge.net/sourceforge/nagiosplug/nagios-plugins-1.4.6.tar.gz
tar zxvf nagios-plugins-1.4.6.tar.gz
cd nagios-plugins-1.4.6
The instructions above will download and extract the the Nagios plugins, as well as change into that directory.
Build The Source Code
We now need to build the source code. This step needs to be done on each remote system that you plan to monitor. Follow the steps below to build the default plugin set:
./configure --prefix=/usr/local/nagios
make
make install
We now have /usr/local/nagios/libexec/ which contains the default plugin set.
At this time we need to download and install the NRPE daemon and plugin. The steps below detail the commands needed for execution:
cd /usr/src/
wget http://internap.dl.sourceforge.net/sourceforge/nagios/nrpe-2.7.tar.gz
tar zxvf nrpe-2.7.tar.gz
cd nrpe-2.7
./configure
make all
Move Things Around
Now we need to manually move the files into place:
cp src/nrpe /usr/local/nagios/libexec/
cp src/check_nrpe /usr/local/nagios/libexec/
cp sample-config/nrpe.cfg /usr/local/nagios/libexec/
We now have our executables in place and are ready to begin configuring the NRPE daemon on the remote system.
Configuration
The sample configuration file we copied above is a well documented file. You should take the time to read this file and familiarize yourself with the configuration options that we will be setting below. Open the nrpe.cfg file in the Nagios libexec directory in your favorite editor.
We are going to leave some default settings and change a few settings for our needs. Set the following configuration options as follows:
pid_file=/var/run/nrpe.pid
server_port=5666
# Set this if you want to nail NRPE to specific IP address
# server_address=192.168.1.1
nrpe_user=nagios
nrpe_group=nagios
# Set this to the remove Nagios installation IP address
allowed_hosts=127.0.0.1
dont_blame_nrpe=0
# command_prefix=/usr/bin/sudo
# Set this to 1 for logging in syslog
debug=0
command_timeout=60
connection_timeout=300
# allow_weak_random_seed=1
Thats it for the configuration of NRPE.
Commands
We now need to look at the available commands to NRPE. If you scroll to the bottom of the nrpe.cfg file you will see the default commands. The commands are structured like so:
command[check_disk1]=/usr/local/nagios/libexec/check_disk -w 20 -c 10 -p /dev/hda1
Command names are completely arbitrary and can be created on the fly, e.g.:
command[check_disk2]=/usr/local/nagios/libexec/check_disk -w 20 -c 10 -p /dev/hdb1
Very simple format, check_disk1 is the command name located at /usr/local/nagios/libexec/check_disk with the arugments -w 20 -c 10 -p /dev/hda1. I used this particular command because it contains the disk check—this is the one command that you may possibly need to alter immediately for effective use. At the end of the command we see the path of the disk device to check on, /dev/hda1. You may not have this drive configuration so you will need to replace that with the path to your local disk setup. An easy way to figure this out is to issue the command df -h and use the returned entry for home as this is the primary usage space for most.
System Setup
At this point, we have completed configuring NRPE and we need to setup the system to accommodate Nagios.
First we need to setup permissions for the Nagios user.
adduser nagios
chown -R nagios.nagios /usr/local/nagios/
We've setup our Nagios user and changed the ownership of all the files under the nagios/ dir.
Now we need to edit the file /etc/services and add the following line:
nrpe 5666/tcp # NRPE
Now, we need to tell our inetd or xinetd about NRPE. Create a file in /etc/xinetd.d/ called nrpe, and add the following to that file:
# default: on
# description: NRPE
service nrpe {
flags = REUSE
socket_type = stream
wait = no
user = nagios
server = /usr/local/nagios/libexec/nrpe
server_args = -c /usr/local/nagios/libexec/nrpe.cfg --inetd
log_on_failure += USERID
disable = no
# Change this to your primary Nagios server
only_from = 127.0.0.1
}
This describes to the "super server" the various options necessary to launch the NRPE daemon when our remote Nagios monitoring system connects.
Now, open the /etc/hosts.allow file and add an entry for the IP address of your remote monitoring server. If you have a firewall, you will also want to configure it so that you allow remote connections from the IP address of your remote monitoring system to port 5666.
Restart your xinetd daemon to reload the configuration changes:
/etc/init.d/xinetd reload
Let's test it out real quick to make sure nothing has gone wrong so far. From your remote monitoring server issue the following command:
telnet ip.address.of.remote.nrpe 5666
If the connection immediately closes you've got a problem and something isn't right. If the socket opens and you are met with the following:
Escape character is '^]'.
Then y ou're ready to move on. If you've got problems at this point, go back through each of the steps above and check for any errors in configuration. Since we've enabled DEBUG in our nrpe.cfg you can also view your syslog file for failure information.
Add New Host
We are now ready to add our new host to our primary Nagios installation. This is very straight forward and should only take a moment.
Back on the primary Nagios installation server we need to edit our hosts.cfg configuration file. The file is located in /usr/local/nagios/etc/hosts.cfg. This may change depending on your installation and organization of configuration files. Read the first part of this whitepaper for organization advise.
In the hosts.cfg file, add your new host object:
define host{
use generic-host
#Hostname of remote system
host_name host.domain.com
# A friendly name for this server
alias Friendly name
# Remote host IP address
address 127.0.0.1
check_command check-host-alive
max_check_attempts 10
notification_interval 30
notification_period 24x7
notification_options d,r
# Your defined contact group name
contact_groups admins
}
At this time our hosts.cfg file contains two hosts objects, the localhost which is running the Nagios application and our remote host which we will be monitoring.
We now want to add the service objects to our services.cfg file located in the same directory. Add the following single service to your services.cfg file:
define service{
use generic-service
# Hostname of remote system
host_name host.domain.com
service_description Primary Disk Usage
is_volatile 0
check_period 24x7
max_check_attempts 3
normal_check_interval 5
retry_check_interval 1
# Change to your contact group
contact_groups admins
notification_options w,u,c,r
notification_interval 10
notification_period 24x7
check_command check_nrpe!check_disk1
}
You can view the Nagios documentation for the full details on each of these object configuration options. You will likely want to alter from the values shown above to your monitoring environment. However, we will take a look at that last line, the check_nrpe option.
check_nrpe
When monitoring remote services, we first issue a check_nrpe command followed by a ! and the command on the remote machine to run. This means that we are going to need an instance of check_nrpe on our Nagios Server. Simply follow the directions above to download, build, and install the NRPE check_nrpe script and the nrpe daemon. Once you have installed these on the Nagios primary server, then we can proceed.
Now that nrpe is installed on the primary Nagios server, and our new host and host service is configured, we can reload nagios service:
/etc/init.d/nagios reload
Web Interface
With the configuration read, you should now be able to access the web interface of Nagios. Under the Service Detail link you should see both the new remote host and the server/service we have setup to monitor. It is likely in an Unknown State at this time as the service has not been checked yet.
According to our service definition above, this service will be checked once every five minutes. If all has gone well, we should see the green in less than 5 minutes, which confirms proper installation and configuration of NRPE. In failure the service will go into a Soft State for two additional minutes. Once a Hard Failure state is achieved, you will see red and you should be able to check your Nagios log file in nagios/var/nagios.log for further information.
There are a lot of moving parts with this project so it is best to focus on a single server and a single service. Once you have a service properly configured it is a short step to configure the next service. Simply copy the service object created above and change the nrpe_check!'command_issued'.
What You Can Do
Many things can be monitored with NRPE that can not be monitored remotely by Nagios without NRPE. These include:
- Disk space
- Zombie processes
- Number of shell users
- Total processes
- Load average
And any thing else that doesn't run as a public service on the server.
Obviously, the advantage to remotely monitoring these server objects in a central location is that a problem may be much more quickly identified. This combined with the previous whitepaper's escalation procedures provide an effective response tool for reactively monitoring remote servers.
Remember, any of the commands that you have in the nagios/libexec folder are available to NRPE. To run these commands on the remote server, you simply need to setup the command in the nrpe.cfg file on the remote server. Here is an example using check_load:
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
The -w and the -c are the Warn and Critical thresholds.
Follow the steps above to add the new service,check_load, to your services.cfg file. Reload Nagios and that's that.
EventHandler
In the next whitepaper, we will change our focus to the Nagios EventHandler. I will demonstrate to you how to repair problems that Nagios encounters before even contacting a single human. At Crucial Web Hosting we make extensive use of the EventHandler object in Nagios and we credit it for a very happy support team. Using the EventHandler objects we can diagnose and repair common problems that occur on local and remote servers in a matter of minutes and seconds as opposed to hours and days.
We will be performing root tasks using the sudo method and we will create a simple custom EventHandler on a remote server thus demonstrating how you can roll your own Nagios plugins.
In the next whitepaper you will learn how to make your servers heal themselves with no human interaction!
If you missed the first whitepaper in the series I am writing, you can access it here. I look forward to your questions and comments.
![]()