Posts Tagged ‘wmi’

Using Nagios NRPE To Monitor Windows Services Via WMI Part 2…….

Friday, September 30th, 2011

Have realised my first attempt at using NRPE to monitor Windows services via WMI is in fact badly thought out and badly done. This is what happens when companies want everything yesterday and rush things :o(

Having thought about it, the following has come to mind:

The service string to check should not be hard coded into the script. Otherwise we would need x1 script per service to check (i.e. lots !). The service string should be a variable that we can pass to the script as an argument at run time.

And, we can only check one service at a time with this script. Therefore, placing the service name into an array is whaaaayyy overkill. Will simply replace the array with a single string variable.

This in mind, here’s the revised version of the check script

strComputer = "."
'list services to monitor, comma seperated, inside quotes
strService = Wscript.Arguments.Item(0)
'connect using standard monkier
Set objWMIService = GetObject("winmgmts:" & "{impersonationLevel=impersonate}!\\" & strComputer & "\root\cimv2")
'get an array containing all services
Set objItems = objWMIService.ExecQuery ("Select * from Win32_Service")
'for each service compare it’s display name to the current one we are looking for
For each objService in ObjItems
	'if we get a service display name match
	If objService.DisplayName = strService Then
		'display the current service along with it’s current state
		'wscript.echo "service name = " & objService.DisplayName & " currently :: " & objService.State
		If objService.State = "Running" Then
		'If the service is running return exit code 0 = ok
			Wscript.Echo "SERVICE STATUS: OK"
			Wscript.Quit(0)
		Else
		'otherwise return non 0 = error = fire alert hopefully
			Wscript.Echo "SERVICE STATUS: Critical"
			Wscript.Quit(2)
		End if
	End if
Next

And the command to add to the nrpe.cfg file will now need a parameter adding to the end like so (note the quote marks “” around the $ARG1$ parameter. This is in case our variable has spaces in it !!).

command[check_windows_service]=cscript.exe //T:30 //NoLogo "C:\Program Files (x86)\NRPE_NT\libexec\check_windows_service.vbs" "$ARG1$"

The command.cfg file will need a command definition in it like this

# 'check_windows_service' command definition (using NRPE)
define command{
	command_name	check_windows_service
	command_line	$USER1$/check_nrpe -H $HOSTADDRESS$ -t 60 -p 5666 -c check_windows_service -a $ARG1$
}

And finally, in services.cfg, a service check section using the command, like this

define service{
        service_description     Check Windows Awesome Service
        servicegroups           cust-windows
        host_name               windows_server_1
        check_command           check_windows_service!"Some Windows Service"
        use                     generic-service
}

But we can now use the same script to check other services like this

define service{
        service_description     Check Windows Awesome Service
        servicegroups           cust-windows
        host_name               windows_server_1
        check_command           check_windows_service!"Some Windows Service"
        use                     generic-service
}

define service{
        service_description     Check Windows Spooler Service
        servicegroups           cust-windows
        host_name               windows_server_1
        check_command           check_windows_service!"Print Spooler"
        use                     generic-service
}

Second time’s a charm. At least I got to go back and correct my horrible (but technically working) mistake !

Next stop, monitoring for running processes by their executable name in the process list…….

doh !

Using Nagios NRPE To Monitor Windows Services Via WMI…….

Wednesday, September 28th, 2011

If you are setting up Nagios from scratch, install the NSClient++ agent on your Windows servers and get the increased flexibility that it offers. My predecessor at my current work place has only installed the NRPE addon (the same guy who installed the core datacentre router with a duplex mismatch….that made my first week fun), which means I can’t use much of the cool check_nt stuff to monitor services and processes :o(

I needed a way to tell if a service had stopped on Windows server, but I could only use NRPE. First stop, a script to check the status of a given service.


strComputer = "."
'list services to monitor, comma seperated, inside quotes
arrServices = Array("Awesome Service")
For each strService in arrServices
	'connect using standard monkier
	Set objWMIService = GetObject("winmgmts:" & "{impersonationLevel=impersonate}!\\" & strComputer & "\root\cimv2")
	'get an array containing all services
	Set objItems = objWMIService.ExecQuery ("Select * from Win32_Service")
	'for each service compare it’s display name to the current one we are looking for
	For each objService in ObjItems
		'if we get a service display name match
		If objService.DisplayName = strService Then
			'display the current service along with it’s current state
			'wscript.echo "service name = " & objService.DisplayName & " currently :: " & objService.State
			If objService.State = "Running" Then
			'If the service is running say so
				Wscript.Echo "SERVICE running"
			Else
			'otherwise it must not be runing
				Wscript.Echo "SERVICE not running"
			End if
		End if
	Next
Next

This script binds to WMI, searches for a service called Awesome Service and then echoes a statement to say if it’s running or not. Perfect, but Nagios can’t use this quite yet. We need the script to send some data back to the NRPE engine for this to work.

The Nagios plug-in dev guide tells you most of what you need to know, in this case we need to pass return codes back, which is covered here.

So the finished version now looks like this


strComputer = "."
'list services to monitor, comma seperated, inside quotes
arrServices = Array("Awesome Service")
For each strService in arrServices
	'connect using standard monkier
	Set objWMIService = GetObject("winmgmts:" & "{impersonationLevel=impersonate}!\\" & strComputer & "\root\cimv2")
	'get an array containing all services
	Set objItems = objWMIService.ExecQuery ("Select * from Win32_Service")
	'for each service compare it’s display name to the current one we are looking for
	For each objService in ObjItems
		'if we get a service display name match
		If objService.DisplayName = strService Then
			'display the current service along with it’s current state
			'wscript.echo "service name = " & objService.DisplayName & " currently :: " & objService.State
			If objService.State = "Running" Then
			'If the service is running return exit code 0 = ok
				Wscript.Echo "SERVICE STATUS: OK"
				Wscript.Quit(0)
			Else
			'otherwise return non 0 = error = fire alert hopefully
				Wscript.Echo "SERVICE STATUS: Critical"
				Wscript.Quit(2)
			End if
		End if
	Next
Next

So if the service is running, we exit with return code 0 Wscript.Quit(0). But if it’s not, we exit with a non 0 return code. I need an alert to fire an SMS, so I have used Wscript.Quit(2) for critical, but if you only want a warning you can use Wscript.Quit(1).

Save the file in the NRPE scripts location (mine are located at C:\Program Files\NRPE_NT\libexec\

Final piece of the puzzle is to add the actual command to run the script to the NRPE config file. Mine is located at ‘C:\Program Files\NRPE_NT\bin\nrpe.cfg’, but your may vary.

At the end of the file are a list of demo commands, we just need to add in


command[check_awesome_service]=cscript.exe //T:30 //NoLogo "C:\Program Files\NRPE_NT\libexec\check_awesome_service.vbs"

Now add a command definition to the Nagios commands.cfg


# 'check_awesome_service' command definition (using nrpe)
define command{
        command_name    check_galaxy_service
        command_line    $USER1$/check_nrpe -H $HOSTADDRESS$ -t 60 -p 5666 -c check_awesome_service
        }

And finally in my Nagios services.cfg file an service definition that includes the command and the hosts to run this against


define service{
        host_name               windows_server_1
        service_description     Windows Awesome Service
        servicegroups           cust-windows
        check_command           check_awesome_service
        use                     generic-service
}

And that should be it. You need to restart Nagios to include the new commands and service definitions. And then test the monitor by stopping and the starting the service in question.

The next step would be to replace the service name in the .vbs script file with a variable. Then you can reuse the same script to monitor different services by passing the service name from Nagios to NRPE as a variable from the config file. :oD

WMI Restart Windows Services

Wednesday, June 10th, 2009

Ok

So the title for this post isn’t smart or quippy. and there is a very good reason for this. I needed a script that could restart a windows service or services. I couldn’t find any good ones :o(

I searched using various combinations of the words “wmi, windows, services, restart, start, stop”. While I found a lots of scripts, they all lacked a certain resiliancy that I like in my automation solutions. Essentially they all went something like this (the wmi has been translated into an english procedure so everyone can understand)

  • connect to windows using wmi
  • find all the services
  • select the one that we are interested in using a for/next loop
  • send it a stop signal
  • wait for some random amount of time (between 1 and 2 mins say !)
  • send the same service a start signal
  • move on to the next service
  • exit the script when we have restarted all the services we want to do

Anyone see the problem ? How long do you give a windows service to stop sucessfully ?  Or start for that matter ! These scripts all seemed to wait for a minute or two, and then procede with the assumption everything happend ok. At some point, that kind of thinking with software will bite you in the ass. Here was what I was looking for

  • connect to windows using wmi
  • find all the services
  • select the one that we are interested in
  • check it current state (running or stopped)if the service is stopped, send it a start signal
  • check every 10 seconds for 5 mins that the status has switched to running
  • if the service does not go into a running state after mins, email an smtp address advising the service is misbehaving and then exit the script
  • if the service is already in a running state, send it a stop signal
  • check every 10 seconds for 5 mins that the status has switched to stopped
  • if the service does not go into a stopped state after mins, email an smtp address advising the service is misbehaving and then exit the script
  • if the service does go into a stopped state within 5 mins, run the section of code for starting a service
  • again, monitor the servce to make sure it does restart, if not for any reason, send a warning email

Using this method no assumptions are made about the running state of the service, or it’s response to being told to stop/start. Worst case scenario, it fails to do what it is told and you get an email warning you the you need to intervene manually, at least the failure is known about and can be managed.

The code for this is shown below. Feel free to copy and adapt to suit your own purpose(s) :oD

'needs be run with administrator privileges in oder to work ! we are doing stuff to services after all !!
'the script gives each service x5 mins to change it's state. if this had not occured within that time
'the script sends a failure email and exits

'define the computer name and the services we want to restart. use "." for local host
'the service names are based on their display names, not their short form/function names !
'define the counter used to determin when 5 mins has elapsed
strComputer = "."
arrServices = Array("Kaspersky Administration Server", "Kaspersky Anti-Virus", "Kaspersky Anti-Virus Script Interceptor Dispatcher", "Kaspersky Lab Cisco NAC Posture Validation Server", "Kaspersky Network Agent")
Dim Count

'loop through each service
For each strService in arrServices
	'connect using standard monkier
	Set objWMIService = GetObject("winmgmts:" & "{impersonationLevel=impersonate}!\\" & strComputer & "\root\cimv2")
	'get an array containing all services
	Set objItems = objWMIService.ExecQuery ("Select * from Win32_Service")
	'for each service compare it's display name to the current one we are looking for
	For each objService in ObjItems
		'if we get a service display name match
		If objService.DisplayName = strService Then
			'display the current service along with it's current state
			wscript.echo "service name = " & objService.DisplayName & " currently :: " & objService.State
			'if it is currently running, attempt to stop it
			If objService.State = "Running" Then
				wscript.echo ""
				wscript.echo "stopping service..."
				wscript.echo ""
				objService.StopService()
				'wait for 10 seconds, then refresh our view of the current object state
				wscript.sleep 10000
				objService.Refresh_
				'if the service is still not in a stopped state, repeatedly re-check the object status every 10 seconds
				'we also check how many times we have already cheked and exit is it is greater than 29 (30*10seconds = 5mins)
				'initialise counter
				Count = 1
				'start checking comparison loop for 'stopped' condition
				'we need to update the objService.State view using objService.Refresh_ for each iteration to make sure we are seeing the
				'current state of the service
				While objService.State <> "Stopped"
					objService.Refresh_
					'for testing/debuging on the console, tell the user what is going on
					'this will not show up when the script is run as a scheduled job
					wscript.echo ""
					wscript.echo "waiting for service to Stop :: current count = " & Count
					wscript.echo ""
					'wait 10 seconds then increase the counter by 1
					wscript.sleep 10000
					Count = Count + 1
					'if we have reached 30 attempts then bow out and send an email advising manual intervention
					If Count > 29 then
						SendFailedMsg
						wscript.echo "service has taken too long to respond. aborting script"
						wscript.quit
					Else
					End if
					'otherwise we have not reached 30, go round again
				Wend
				'once the service has stopped, let us know
				wscript.echo "service is now " & objService.State
				'now attempt to restart the service, making sure it is definateley stopped first
				If objService.State = "Stopped" Then
				wscript.echo ""
				wscript.echo "attempting to restart service " & objService.DisplayName
				wscript.echo ""
				objService.StartService()
				'wait 10 seconds, the refresh our view of the current object state
				wscript.sleep 10000
				objService.Refresh_
				'if the service is not in a running state, repeatedly re-check the object status every 10 seconds
				'we also check how many times we have already cheked and exit is it is greater than 29 (30*10seconds = 5mins)
				'initialise counter
				Count = 1
				'start checking comparison loop for 'running' condition
				'we need to update the objService.State view using objService.Refresh_ for each iteration to make sure we are seeing the
				'current state of the service
				While objService.State <> "Running"
					objService.Refresh_
					'for testing/debuging on the console, tell the user what is going on
					'this will not show up when the script is run as a scheduled job
					wscript.echo ""
					wscript.echo "waiting for service to Start"
					wscript.echo ""
					'wait 10 seconds then increase the counter by 1
					wscript.sleep 10000
					Count = Count + 1
					'if we have reached 30 attempts then bow out and send an email advising manual intervention
					If Count > 29 then
					SendFailedMsg
					wscript.echo "service has taken too long to respond. aborting script"
					wscript.quit
					Else
					End if
					'otherwise we have not reached 30, go round again
				Wend
				'once the service has started, let us know
				wscript.echo ""
				wscript.echo "service is now " & objService.State
			Else
				'otherwise, if the service must already stopped for some reason ? check first, and attempt to start it
				If objService.State = "Stopped" Then
					wscript.echo ""
					wscript.echo "attempting to restart service " & objService.DisplayName
					wscript.echo ""
					objService.StartService()
					'wait 10 seconds, the refresh our view of the current object state
					wscript.sleep 10000
					objService.Refresh_
					'if the service is not in a running state, repeatedly re-check the object status every 10 seconds
					'we also check how many times we have already cheked and exit is it is greater than 29 (30*10seconds = 5mins)
					'initialise counter
					Count = 1
					'start checking comparison loop for 'running' condition
					'we need to update the objService.State view using objService.Refresh_ for each iteration to make sure we are seeing the
					'current state of the service
					While objService.State <> "Running"
						objService.Refresh_
						'for testing/debuging on the console, tell the user what is going on
						'this will not show up when the script is run as a scheduled job
						wscript.echo ""
						wscript.echo "waiting for service to Start"
						wscript.echo ""
						'wait 10 seconds then increase the counter by 1
						wscript.sleep 10000
						Count = Count + 1
						'if we have reached 30 attempts then bow out and send an email advising manual intervention
						If Count > 29 then
						SendFailedMsg
						wscript.echo "service has taken too long to respond. aborting script"
						wscript.quit
						Else
						End if
						'otherwise we have not reached 30, go round again
					Wend
					'once the service has started, let us know
					wscript.echo "service is now " & objService.State
					wscript.echo ""
					End If
				End If
			End If
		End If
	Next
Next

SendSucessMsg

Sub SendFailedMsg()
    Set objEmail = CreateObject("CDO.Message")
    objEmail.From = "email@yourcompany.com"
    objEmail.To = "email@yourcompany.com"
    objEmail.Subject = "KAV Recycle failed on objEmail.Textbody = "KAV services recycle failed on . Please check services manually"
    objEmail.Configuration.Fields.Item ("http://schemas.microsoft.com/cdo/configuration/sendusing") = 2
    objEmail.Configuration.Fields.Item ("http://schemas.microsoft.com/cdo/configuration/smtpserver") = "yourmailserver.company.com"
    objEmail.Configuration.Fields.Item ("http://schemas.microsoft.com/cdo/configuration/smtpserverport") = 25
    objEmail.Configuration.Fields.Update
    objEmail.Send
End Sub

Sub SendSucessMsg()
    Set objEmail = CreateObject("CDO.Message")
    objEmail.From = "email@yourcompany.com"
    objEmail.To = "email@yourcompany.com"
    objEmail.Subject = "KAV Recycle suceeded on . Hooray !!"
    objEmail.Textbody = "KAV services recycle completed OK on :o)"
    objEmail.Configuration.Fields.Item ("http://schemas.microsoft.com/cdo/configuration/sendusing") = 2
    objEmail.Configuration.Fields.Item ("http://schemas.microsoft.com/cdo/configuration/smtpserver") = "yourmailserver.company.com"
    objEmail.Configuration.Fields.Item ("http://schemas.microsoft.com/cdo/configuration/smtpserverport") = 25
    objEmail.Configuration.Fields.Update
    objEmail.Send
End Sub

Once I got this script to run in an admin enabled DOS prompt window, the next step was to run it as a job via the job scheduler under windows. You need to run the job as the SYSTEM user, and tick the box to ‘run with highest available permissions’ in order for this to work. Running as a scheduled job, there is no console to display the output for the job, but you can have the services panel loaded and keep refreshing the view to see your services status changing as the script runs through them.

Overall result will be you get an email advising of a sucessful recycle of your services, or a failure one with a note to check what’s going on.

Enjoy :oD