When attempting to check processes on a remote Linux server using Nagios or Icinga, you will most likely use NRPE to call the built in nagios-plugin ‘check_procs’ on the remote host.
This plugin, along with all the other nagios-plugins will typically be installed into /usr/lib/nagios/chek_procs/ on the remote host file system.
But you don’t actually call the plugin directly, you use the server check_nrpe plugin to execute a command defined in /etc/nagios/nrpe.cfg on the remote host.
By default, the file contains sample command definitions on or around line 196 as shown below:
# The following examples use hardcoded command arguments...
command[check_users]=/usr/lib/nagios/plugins/check_users -w 5 -c 10
command[check_load]=/usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20
command[check_hda1]=/usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p /dev/hda1
command[check_zombie_procs]=/usr/lib/nagios/plugins/check_procs -w 5 -c 10 -s Z
command[check_total_procs]=/usr/lib/nagios/plugins/check_procs -w 150 -c 200
# The following examples allow user-supplied arguments and can
# only be used if the NRPE daemon was compiled with support for
# command arguments *AND* the dont_blame_nrpe directive in this
# config file is set to '1'. This poses a potential security risk, so
# make sure you read the SECURITY file before doing this.
#command[check_users]=/usr/lib/nagios/plugins/check_users -w $ARG1$ -c $ARG2$
#command[check_load]=/usr/lib/nagios/plugins/check_load -w $ARG1$ -c $ARG2$
#command[check_disk]=/usr/lib/nagios/plugins/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$
#command[check_procs]=/usr/lib/nagios/plugins/check_procs -w $ARG1$ -c $ARG2$ -s $ARG3$
Looking at the ‘#command[check_procs]…’ on the last line, note how it only takes x3 arguments by default. This would be fine if we were only worried about overall system process numbers in a certain state (-s takes a variable that filters to only include processes in a certain state)
But what if I want to monitor say, only the number of apache processes in a sleeping state ? I need to filter on process name AND state. From the documentation for this plugin, it can take an array of strings of process names to search and filter on. So we rewrite the command deifnition to add another parameter, so:
command[check_procs]=/usr/lib/nagios/plugins/check_procs -w $ARG1$ -c $ARG2$ -a $ARG3$ -s $ARG4$
Now the first param -w is the lower threshold (alert if count less than this), second param -c is the upper threshold (alert if count more than this), third param -a is the process name filter (only count processes that match this string) and forth param -s is the state filter (zombie, sleeping, etc.)
Now on the server side, we can define a command to make NRPE trigger this definition like this:
# 'check_remote_procs' command definition
command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -c check_procs -a $ARG1$ $ARG2$ $ARG3$ $ARG4$
And finally we can define the actual service check for the host like this:
# Check remote node apache proxy process
service_description httpd Process
Note the ‘check_command’ line, which says to run the local ‘check_remote_procs’ command, which in turn causes the local server ‘check_nrpe’ plugin to be executed against the remote host and call the remote ‘-c check_procs’ command, passing $ARG1$ ’1:’ (so if less than 1 alert), $ARG2$ ‘:40′ (so if more than 40 alert), $ARG3$ ‘httpd’ (the name of the apache process) and $ARG4$ ‘S’ (the process status for sleeping).
You can also adapt this to search all running process for zombie process by using ‘*’ for the third param and ‘Z’ for the forth param and adjusting the count to 1: and :1 (not the colon changes location on those !).
Another thing I change in the default remote host nrpe.cfg file is to comment out the ‘check_load’ and ‘check_disk’ commands that use hardcoded values and uncomment the variables versions below it (far more useful).