Posts Tagged ‘nagios’

Using Nagios NRPE To Monitor Windows Services Via WMI Part 2…….

Friday, September 30th, 2011

Have realised my first attempt at using NRPE to monitor Windows services via WMI is in fact badly thought out and badly done. This is what happens when companies want everything yesterday and rush things :o(

Having thought about it, the following has come to mind:

The service string to check should not be hard coded into the script. Otherwise we would need x1 script per service to check (i.e. lots !). The service string should be a variable that we can pass to the script as an argument at run time.

And, we can only check one service at a time with this script. Therefore, placing the service name into an array is whaaaayyy overkill. Will simply replace the array with a single string variable.

This in mind, here’s the revised version of the check script

strComputer = "."
'list services to monitor, comma seperated, inside quotes
strService = Wscript.Arguments.Item(0)
'connect using standard monkier
Set objWMIService = GetObject("winmgmts:" & "{impersonationLevel=impersonate}!\\" & strComputer & "\root\cimv2")
'get an array containing all services
Set objItems = objWMIService.ExecQuery ("Select * from Win32_Service")
'for each service compare it’s display name to the current one we are looking for
For each objService in ObjItems
	'if we get a service display name match
	If objService.DisplayName = strService Then
		'display the current service along with it’s current state
		'wscript.echo "service name = " & objService.DisplayName & " currently :: " & objService.State
		If objService.State = "Running" Then
		'If the service is running return exit code 0 = ok
			Wscript.Echo "SERVICE STATUS: OK"
			Wscript.Quit(0)
		Else
		'otherwise return non 0 = error = fire alert hopefully
			Wscript.Echo "SERVICE STATUS: Critical"
			Wscript.Quit(2)
		End if
	End if
Next

And the command to add to the nrpe.cfg file will now need a parameter adding to the end like so (note the quote marks “” around the $ARG1$ parameter. This is in case our variable has spaces in it !!).

command[check_windows_service]=cscript.exe //T:30 //NoLogo "C:\Program Files (x86)\NRPE_NT\libexec\check_windows_service.vbs" "$ARG1$"

The command.cfg file will need a command definition in it like this

# 'check_windows_service' command definition (using NRPE)
define command{
	command_name	check_windows_service
	command_line	$USER1$/check_nrpe -H $HOSTADDRESS$ -t 60 -p 5666 -c check_windows_service -a $ARG1$
}

And finally, in services.cfg, a service check section using the command, like this

define service{
        service_description     Check Windows Awesome Service
        servicegroups           cust-windows
        host_name               windows_server_1
        check_command           check_windows_service!"Some Windows Service"
        use                     generic-service
}

But we can now use the same script to check other services like this

define service{
        service_description     Check Windows Awesome Service
        servicegroups           cust-windows
        host_name               windows_server_1
        check_command           check_windows_service!"Some Windows Service"
        use                     generic-service
}

define service{
        service_description     Check Windows Spooler Service
        servicegroups           cust-windows
        host_name               windows_server_1
        check_command           check_windows_service!"Print Spooler"
        use                     generic-service
}

Second time’s a charm. At least I got to go back and correct my horrible (but technically working) mistake !

Next stop, monitoring for running processes by their executable name in the process list…….

doh !

Using Nagios NRPE To Monitor Windows Services Via WMI…….

Wednesday, September 28th, 2011

If you are setting up Nagios from scratch, install the NSClient++ agent on your Windows servers and get the increased flexibility that it offers. My predecessor at my current work place has only installed the NRPE addon (the same guy who installed the core datacentre router with a duplex mismatch….that made my first week fun), which means I can’t use much of the cool check_nt stuff to monitor services and processes :o(

I needed a way to tell if a service had stopped on Windows server, but I could only use NRPE. First stop, a script to check the status of a given service.


strComputer = "."
'list services to monitor, comma seperated, inside quotes
arrServices = Array("Awesome Service")
For each strService in arrServices
	'connect using standard monkier
	Set objWMIService = GetObject("winmgmts:" & "{impersonationLevel=impersonate}!\\" & strComputer & "\root\cimv2")
	'get an array containing all services
	Set objItems = objWMIService.ExecQuery ("Select * from Win32_Service")
	'for each service compare it’s display name to the current one we are looking for
	For each objService in ObjItems
		'if we get a service display name match
		If objService.DisplayName = strService Then
			'display the current service along with it’s current state
			'wscript.echo "service name = " & objService.DisplayName & " currently :: " & objService.State
			If objService.State = "Running" Then
			'If the service is running say so
				Wscript.Echo "SERVICE running"
			Else
			'otherwise it must not be runing
				Wscript.Echo "SERVICE not running"
			End if
		End if
	Next
Next

This script binds to WMI, searches for a service called Awesome Service and then echoes a statement to say if it’s running or not. Perfect, but Nagios can’t use this quite yet. We need the script to send some data back to the NRPE engine for this to work.

The Nagios plug-in dev guide tells you most of what you need to know, in this case we need to pass return codes back, which is covered here.

So the finished version now looks like this


strComputer = "."
'list services to monitor, comma seperated, inside quotes
arrServices = Array("Awesome Service")
For each strService in arrServices
	'connect using standard monkier
	Set objWMIService = GetObject("winmgmts:" & "{impersonationLevel=impersonate}!\\" & strComputer & "\root\cimv2")
	'get an array containing all services
	Set objItems = objWMIService.ExecQuery ("Select * from Win32_Service")
	'for each service compare it’s display name to the current one we are looking for
	For each objService in ObjItems
		'if we get a service display name match
		If objService.DisplayName = strService Then
			'display the current service along with it’s current state
			'wscript.echo "service name = " & objService.DisplayName & " currently :: " & objService.State
			If objService.State = "Running" Then
			'If the service is running return exit code 0 = ok
				Wscript.Echo "SERVICE STATUS: OK"
				Wscript.Quit(0)
			Else
			'otherwise return non 0 = error = fire alert hopefully
				Wscript.Echo "SERVICE STATUS: Critical"
				Wscript.Quit(2)
			End if
		End if
	Next
Next

So if the service is running, we exit with return code 0 Wscript.Quit(0). But if it’s not, we exit with a non 0 return code. I need an alert to fire an SMS, so I have used Wscript.Quit(2) for critical, but if you only want a warning you can use Wscript.Quit(1).

Save the file in the NRPE scripts location (mine are located at C:\Program Files\NRPE_NT\libexec\

Final piece of the puzzle is to add the actual command to run the script to the NRPE config file. Mine is located at ‘C:\Program Files\NRPE_NT\bin\nrpe.cfg’, but your may vary.

At the end of the file are a list of demo commands, we just need to add in


command[check_awesome_service]=cscript.exe //T:30 //NoLogo "C:\Program Files\NRPE_NT\libexec\check_awesome_service.vbs"

Now add a command definition to the Nagios commands.cfg


# 'check_awesome_service' command definition (using nrpe)
define command{
        command_name    check_galaxy_service
        command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -t 60 -p 5666 -c check_awesome_service
        }

And finally in my Nagios services.cfg file an service definition that includes the command and the hosts to run this against


define service{
        host_name               windows_server_1
        service_description     Windows Awesome Service
        servicegroups           cust-windows
        check_command           check_awesome_service
        use                     generic-service
}

And that should be it. You need to restart Nagios to include the new commands and service definitions. And then test the monitor by stopping and the starting the service in question.

The next step would be to replace the service name in the .vbs script file with a variable. Then you can reuse the same script to monitor different services by passing the service name from Nagios to NRPE as a variable from the config file. :oD

Make Nagios Web Interface Read-Only…..

Friday, March 5th, 2010

Even though we’re not a massive company (less than 50 butts on seats) we do have quite a bit of kit in an environment that is growing ever more complex.

To help we use Nagios to monitor key systems and services and to alert us via email when issues arise (and hopefully we can correct them before the masses notice)

My boss decided he wanted to share our Nagios screens with others (well, his boss) and so I installed a workstation with x2 flat screens lofted up on high so they could be seen from a distance.

But, I had a slight snag. We use authentication on Nagios and the account used for viewing the web console had enough permissions to be able to execute the host commands listed on the right hand side of interface (shown below)

This meant that should any passer by wish to, they could click the url link to say, turn off a check that was failing (not that any of our users would do such a thing !). So I needed a way to make the web interface either not display those links or be read-only for those links, essentially prevent people from altering the configuration.

Peeking through the config files for Nagios, it seems my predecessor had the same idea at some point, but had not quite managed to pull it off. Inside the cgi.cfg file (which was located at /usr/local/nagios/etc/cgi.cfg) are the following lines


default_user_name=
authorized_for_system_information=
authorized_for_configuration_information=
authorized_for_system_commands=
authorized_for_all_services=
authorized_for_all_hosts=
authorized_for_all_service_commands=
authorized_for_all_host_commands=

The ones of interest are :

authorized_for_all_services=

authorized_for_all_hosts=

By adding a user to these x2 lines *only*, the urls on the pages for running commands and viewing/modifying the config do not work and give a permissions error

You will also need to add the name you add to those x2 line to the /usr/local/nagios/etc/htpasswd file as well

Now, even though you can still see the command urls on the pages, you get this if you try to click them

nagios says no

So, how far had my predecessor gotten ? Well, something I take for granted that I guess he did not know, the list of names supplied should be comma seperated with no space between them

Easy when you know how :o)