The Poormans M3 Monitoring – Powershell and REST

Todays post is about the poor mans M3 monitoring, I want to demonstrate that with existing tools you can quickly and easily get emailed notifications if any of your grid applications go offline. You don’t need to install anything for this to work in a current Windows environment.

Maintaining my systems uptimes has been extremely important to me. Downtime has been extremely rare and regularly my servers would typically be sitting in the 1+ year uptime (I primarily used to run Linux/NetWare servers). I often recount the tale of a local ISP who rang me about an internet problem when I had just begun investigating to determine where the fault was. I hold this as the gold standard – notifying the userbase of a problem and that it is being addressed BEFORE the flood of calls. The only way to achieve this is with proactive monitoring.

At IFL I built a fairly comprehensive (but piecemeal as it was done on a $0 budget) monitoring system, graphing utilisation, and alerting me of unusual patterns or quickly alert me when core systems went down – so in the event where something did go down I could quickly jump in, take a look and often fix the issue before my staff knew there was a problem.

I would just like to add some emphasis to the ‘unusual patterns’ comment – establishing a baseline of your system, how it performs normally and its performance patterns when people aren’t complaining is one of the best possible things you can do. Tools like RRDTool can be automated to gather SNMP data and graph it like the image below which represents the traffic to one of my switches – I can see long term baselines – traffic over a year, and I have more granular graphs updated every minute. Often unexplained deviations from your baseline can be indications of impending doom^wproblems.

But I digress.

If you’ve spent some time perusing the InfoCenter documentation which you’ve hopefully installed on your LCM server, then you’ll be aware of the Grid Script Client (Infor ION Grid 11.1.x Product Documentation -> Infor ION Grid Administration Guide -> Managing the Grid -> Managing the Grid Programatically) which allows you to extract some basic status information from a cmd prompt and start and shutdown grids/grid applications (I have a separate post on that coming soon) or you can use nice lightweight REST services.

In the same document I mention above, there is a section entitled “Programmatically Operating on the Grid using REST” which gives you some useful links to retrieve information, or if you set everything up you can shutdown grid node and applications just like the scripting client.

Given some conversations that I have been having over the last 10 months, I thought it would be handy to demonstrate just how quickly and easily you can set up a monitoring system which will alert you to system down event via an email. The goal of doing this was to use only tools which are natively available on Windows, no client software, no copying of a files – just a script.

So, we are going to need Powershell 3 (which is installed by default on Windows 8 and Windows Server 2012 and above) and a script, you’ll also need to allow Powershell to run unsigned scripts. You can install Powershell 3 on previous versions of Windows, but we’re all trying to keep up to date…aren’t we… 🙂

It’s worth pointing out here that I very rarely touch Powershell and I threw this script together over the period of under an hour with a lot of help from google – so the time investment is pretty minimal.

I am going to look at the status call which gives me the state of the various grid applications as per the InfoCenter documentation:

In my case the server is going to be http://ifbenp.indfish.co.nz:16201/grid/status which if we point a browser at it, looks something like this:

Not very helpful!

With Powershell 3.0 there is a cmd-let which is designed for REST services, so I call

$restResults = Invoke-RestMethod -Uri http://ifbenp.indfish.co.nz:16201/grid/status -Method Get;

We are storing the data we get in to the $restResults variable. One of the nice things about these queries is we don’t need to be authenticated as we are only looking at data!

If you look closely at the data we got from the REST service you will notice that we have a hierarchy

applications -> applicationStatus

The applicationStatus has many attributes, and it is these attributes that we are interested in, specifically the “offline”

foreach($node in $restResults.applications.applicationStatus)
{
	if($node.offline -eq 'true')
	{
		$outputString = $outputString + $node.applicationName + " is offline`n`n"
	}
}

here I am looping through each applicationStatus from our $restResults and saving it to $node. Then I am checking the $node.offline attribute to see if it is true. If it is truethen I will save the output to a string.

Then I am going to build an email message which I will relay off one of my internal email servers. I will only email if we actually have something in our output string (where it’s length is greater than 0)

if($outputString.Length -gt 0)
{
	# create the message object
	$msgMessageObject = new-object Net.Mail.MailMessage

	# create our SMTP client object with the relay server as an argument
	$smtpClient = new-object Net.Mail.SmtpClient($smtpRelayServer)

	# I am going to use the date in the subject of my email
	$strDateNow = [System.DateTime]::Now.ToString("yyyyMMdd HH:mm:ss")

	# set the from address to a valid from address for my relay server
	$msgMessageObject.From = "potatoit@potatoit.potatoit"
	# set the recipient address
	$msgMessageObject.To.Add("potatoit@potatoit.potatoit")
	# set the subject
	$msgMessageObject.subject = $strDateNow + " - Grid Status"
	# set the body of the email
	$msgMessageObject.body = $outputString

	# actuall send the email
	$smtpClient.Send($msgMessageObject)
}

What we can do with this is use the Windows scheduler to fire this script every 5 minutes. Now we have a warning for when any of the grid applications go offline.

If you really go the whole hog, then you could even set up the script so it tries to start the offline application by calling the appropriate REST service, or so it will store the results of the queries for the application and only send reminder notifications every hour – it all comes down to the time and effort vs. the nuisance factor.

If you look at the /grid/nodes interface, then there is information about memory and CPU usage, so equally you could create a script which sent you a notification if CPU was above 80%, or memory utilisation was getting high.

There is a wealth of information available from the Grid which you can query in an automated fashion which with a fairly minimal amount of effort you can achieve some pretty nifty things.

Testing the script, I’ll shutdown MDP

I run my script, and I get the following error message emailed to me.

If multiple services were offline, it would list them. If no services are offline I won’t get an email, if the GRID REST service is unavailable then we should get an exception which my script also captures and will email.

It’s worth point out that this is a proof of concept – so you need to test the scenarios properly and tweak for your environment.

And here it is, the script in its unwashed glory.


# this is the IP Address of my internal SMTP server
# through which we will be send our email
$smtpRelayServer = "mail.potatoit.potatoit"

#this is the URL to the REST service
$strURL = "http://ifbenp.indfish.co.nz:16201/grid/status"

$outputString = "";

try
{
	# retrieve the status from the grids REST interface
	$restResults = Invoke-RestMethod -Uri $strURL -Method Get;

	# loop through each of the applicationStatus entries
	foreach($node in $restResults.applications.applicationStatus)
	{
		# look for an offline state
		if($node.offline -eq 'true')
		{
			# save it to a string so we can email the issue
			$outputString = $outputString + $node.applicationName + " is offline`n`n"
		}
	}
}
catch
{
	$outputString = "Failed to retrieve information from the grid ($strURL): $($_.Exception.GetType().FullName)`n`n $($_.Exception.Message)"
}
# only send an email if the length of our string is greater than 0
if($outputString.Length -gt 0)
{
	# create the message object
	$msgMessageObject = new-object Net.Mail.MailMessage

	# create our SMTP client object with the relay server as an argument
	$smtpClient = new-object Net.Mail.SmtpClient($smtpRelayServer)

	# I am going to use the date in the subject of my email
	$strDateNow = [System.DateTime]::Now.ToString("yyyyMMdd HH:mm:ss")

	# set the from address to a valid from address for my relay server
	$msgMessageObject.From = "potatoit@potatoit.potatoit"
	# set the recipient address
	$msgMessageObject.To.Add("potatoit@potatoit.potatoit")
	# set the subject
	$msgMessageObject.subject = $strDateNow + " - Grid Status"
	# set the body of the email
	$msgMessageObject.body = $outputString

	# actuall send the email
	$smtpClient.Send($msgMessageObject)
}
This entry was posted in M3 / MoveX, Misc, Monitoring. Bookmark the permalink.

21 Responses to The Poormans M3 Monitoring – Powershell and REST

  1. Pingback: The Poormans M3 Monitoring Part 2 – the AUTOJOBS! | Potato IT

  2. Scott, thank you for the idea. Coincidentally, I was recently asked to implement a monitor for M3. When Jonathan Amiran mentioned the page /monitor in your blog post part 2 (thank you Jonathan), I did some digging and found the XML page http://host:22107/grid/application/M3BE_15.1_DEV/status
    I wrote a PowerShell script in a Windows Scheduled Task that checks that there is at least one M3 interactive node that is not offline, that is not shutting down, and that is running.
    More specifically, it ensures that the following XPath returns at least one node: //ns:NodeStatus[@nodeName=’M3Interactive’ and @offline=’false’ and @isShuttingDown=’false’ and @modulesRunning > 0] and it sends an email notification otherwise. /Thibaud

    • potatoit says:

      Very nice!

      For the sake of completeness in the InfoCenter documentation -> Infor ION Grid 11.1.x Product Documentation -> Infor ION Grid Administration Guide -> Monitoring the Grid -> Monitoring Tools -> Monitoring the State of the Grid from a Web Browser
      lists several pages you can use to look at the state of the grid.
      http(s)://:/grid/info.html
      http(s)://:/grid/status.html
      http(s)://:/grid/hosts.html
      http(s)://:/grid/nodes.html
      http(s)://:/grid/ports.html
      You can remove the .html from each of these can get script parsable content, eg.
      http(s)://:/grid/info
      http(s)://:/grid/status
      http(s)://:/grid/hosts
      etc

      And the link to the article with the comment from Jonathan
      https://potatoit.wordpress.com/2015/07/18/the-poormans-m3-monitoring-part-2-the-autojobs/

      Cheers,
      Scott

    • And for the sake of completeness:

      My scheduled script is not resilient against Byzantine failures as it depends on the following:
      1) The host operating system (Windows) must be working correctly.
      2) The Windows Scheduled Task must be working correctly.
      3) The specified Grid host must be working correctly (the Infor Grid runs on a distributed set of hosts, and the specified host could be down even though an alternative host could be up, consequently the script would provide a false positive).
      4) The SMTP server must be working correctly.

      Also, the script cannot be installed more than once – for example on two hosts – or the recipients would get double email notifications.

      A foolproof solution would rely on a consensus protocol such as Paxos that works on a distributed network of unreliable parts to agree on a result, but that’s overkill for my requirement.

      An even better solution would be for Infor Product Development to come up with an official solution.

      /Thibaud

      • I forgot to mention that conditions 1, 2, 4 could cause false negatives, i.e. failure to detect that M3 is down. So a foolproof solution would have to prevent false positives and false negatives.

      • I forgot another false positive. The condition expression in my script looks for at least one M3 interactive node. But M3 could be healthy even without M3 interactive nodes. For instance, right after M3 BE restarts, M3 waits for a user to start a program and does not yet have interactive nodes, in which case my script would erroneously detect that M3 is down. So I have to refine my condition. Any suggestions?

      • potatoit says:

        Monitoring the interactive jobs starts to get messy – you can end up with the interactive subsystem in a stale state where it won’t accept any further connections, new connections get pushed to a new instance of the interactive subsystem. The stale instance will terminate once all of the sessions in that subsystem finish which could be many hours.

        Personally, I’d be inclined to only error if I have an interactive job which has high CPU for an extended period of time.

      • Yes, if one interactive subsystem is stale or busy, M3 would create a new instance of the interactive subsystem, and in that case my XPath expression would return more than one XML node so it would consider that M3 is up, which is true, good.

        However, I just run into the following situation today. It’s the week-end and the interactive subsystems went offline; I think they went idle after no users were working on it, I’m not sure. In any case, my XPath expression found no results and erroneously triggered a false alarm. I simply started H5 Client, that was sufficient to wake the subsystems up, and my XPath expression found results again. I have to take this idle week-end thing into account. Any suggestions?

      • potatoit says:

        This is the problem. If you end up with weekends, long weekends or public holidays then the interactive subsystem will shutdown.

        In previous version of M3 I had set the minimum number of jobs for one of the subsystems to 1, so if it shutdown it would autospawn a new subsystem but certain subsystems don’t do that. You could try that and it may work now but I doubt it. IIRC there is a setting which indicates how long it is before a subsystem goes stale, but not sure on the maximum for it and to be honest, I’d prefer M3 to shut it down on a regular basis.

        You can try scripting the H5 client to run, launch a program and then shutdown, or you could even use the URI on Smart Office to launch a script on start up which launches a program and then that program launches a script to shutdown Smart Office. Very messy. Only other way would be for a user maintainable list of days that the interactive jobs could be down.

        This is one where I think you just end up spending a lot of effort for little gain. If the interactive doesn’t spawn then chances are that we have a catastrophic issue which will affect other systems too (ie. memory exhaustion, JVM hosed, M3 reconfigured).
        Have you seen scenarios that would indicate otherwise?

        Cheers,
        Scott

  3. Maybe I should just check that the /monitor or /status page responds without going into the details inside the page.

  4. Billy Willoughby says:

    I’m not sure if my version would help, but I made a few changes to the script you can try out. Thanks for sharing!


    # this is the IP Address of my internal SMTP server
    # through which we will be send our email
    $smtpRelayServer = "mail.relay.server"
    $smtpRelayServerPort = "26"
    $smtpRelayServerUser = "customer"
    $smtpRelayServerPassword = "customerpw"

    $mailTo = "myemail@mydomain.com"
    $mailFrom = "Customer@streamservedev.com"

    #this is the URL to the REST service
    $strURL = "http://be.dev.server:32107/grid/status";
    $strName = "ASC181";

    $messageHead = "This is an automated message, please do not reply.This message is generated by a PowerShell script to check the status of the M3 BE running on " + $strName + ".";

    $outputString = "";

    Write-Host "`n`nConnecting to "$strName

    try
    {
    # retrieve the status from the grids REST interface
    $restResults = Invoke-RestMethod -Uri $strURL -Method Get;

    # loop through each of the applicationStatus entries

    foreach($node in $restResults.applications.applicationStatus)
    {
    #Write-Host "Checking " $node.applicationName "..."
    # look for an offline state
    if($node.offline -eq 'true')
    {
    # save it to a string so we can email the issue
    $outputString = $outputString + $node.applicationName + " is offline"
    $outputString = $outputString.Trim() + "";
    }
    else
    {
    #Comment out this line if you only want to send errors.
    $linkURL = $strURL -replace "status", "application/";
    if($node.applicationName -eq "SYSTEM")
    {
    $linkURL = $strURL -replace "status", "hosts.html";
    $outputString = $outputString + "" + $node.applicationName + " is online/" + $node.globalState + "`tWarnings:" + $node.logWarningCount + " Errors:" + $node.logErrorCount + "";
    }
    else
    {
    $linkURL = $linkURL + $node.applicationName + "/status.html";
    $outputString = $outputString + "" + $node.applicationName + " is online/" + $node.globalState + "`tWarnings:" + $node.logWarningCount + " Errors:" + $node.logErrorCount + "";
    }

    $outputString = $outputString.Trim();
    Write-Host $node.applicationName "is online. (Warnings:"$node.logWarningCount" Errors:"$node.logErrorCount")";
    if($node.applicationName -ne 'SYSTEM')
    {
    $innerURL = $strURL -replace "status", "application/"
    $innerURL = $innerURL + $node.applicationName + "/status"
    Write-Debug $innerURL;

    $innerRestResults = Invoke-RestMethod -Uri $innerURL -Method Get;
    $outputString = $outputString + "";
    foreach($innerNode in $innerRestResults.ApplicationStatus.NodeStatus)
    {
    #Write-Debug "`t"$innerNode.hostName"/"$innerNode.nodeName "Running:"$innerNode.modulesRunning" Stopped:"$innerNode.modulesStopped" Starting:" $innerNode.modulesStarting" Stopping:"$innerNode.modulesStopping;
    $outputString = $outputString + ""+$innerNode.hostName+"/"+$innerNode.nodeName+"Running:"+$innerNode.modulesRunning+"Stopped:"+$innerNode.modulesStopped+"Starting:"+$innerNode.modulesStarting+"Stopping:"+$innerNode.modulesStopping+ "";

    }
    $outputString = $outputString + "";
    }
    }
    }
    }
    catch
    {
    $outputString = "Failed to retrieve information from the grid ($strURL): $($_.Exception.GetType().FullName) $($_.Exception.Message)"
    }
    # only send an email if the length of our string is greater than 0
    if($outputString.Length -gt 0)
    {
    # create the message object
    $msgMessageObject = new-object Net.Mail.MailMessage
    #Message Head
    $outputString = $messageHead + $outputString

    # create our SMTP client object with the relay server as an argument
    $smtpClient = new-object Net.Mail.SmtpClient($smtpRelayServer,$smtpRelayServerPort)
    $SMTPClient.Credentials = New-Object System.Net.NetworkCredential( $smtpRelayServerUser , $smtpRelayServerPassword );

    # I am going to use the date in the subject of my email
    $strDateNow = [System.DateTime]::Now.ToString("yyyyMMdd HH:mm:ss")

    # set the from address to a valid from address for my relay server
    $msgMessageObject.From = $mailFrom
    # set the recipient address
    $msgMessageObject.To.Add($mailTo)
    # set the subject
    $msgMessageObject.subject = $strName + " Grid Status - " + $strDateNow
    # set the body of the email
    $msgMessageObject.IsBodyHtml = 'true';
    $msgMessageObject.body = "" + $outputString + "";

    Write-Host "Attempting to send email using "$smtpRelayServer"..."
    # actual send the email
    $smtpClient.Send($msgMessageObject)
    #Write-Host $outputString;
    }
    else
    {
    Write-Host "Error: Nothing to send"
    }

    Write-Host "Fini"

    • potatoit says:

      Fantastic, thanks Billy. IDs changed as requested 🙂

    • Thanks Billy. Your script checks for attribute offline=true, but that would also trigger false alarms if M3 is just idle because no users are currently using M3, wouldn’t it?

      • Billy Willoughby says:

        In the case I needed the script for, it would be used to send an email after the back up had completed. The backup shuts down the Grid, and when complete, brings it back up. A call was being made to Smart Office to reload the services so the first group in the morning didn’t have to wait for everything to start up. (I don’t remember the exact call, someone else wrote it) By default I’m sending all the status values in an email for someone to review in the morning, but a false positive is possible if the services had not started by the previously mentioned call.

  5. UPDATE: M3 13.3 now includes Notification of events in foundation (Foundation Info Reader).

    From the M3 Core Infrastructure and Technology Release Notes:
    “The M3 Foundation Information Reader is a standalone Grid applicationbundled with the M3 Core Technology. The application is a consumer of
    Descriptionthe system information xml produced by M3 Foundation’s SystemInformation Provider feature.
    The Info Reader has its own Grid Management pages from where it’spossible to create and configure rules. As an example the application isable to send an email if a job is found looping. The application is shippedwith one default rule; if a job is using more than 40 percent CPU and hasan activity level value that is higher than 999 over the course of 5 minutesthe job will be logged and an email sent, if the email functionality has beenconfigured.
    Apart from this rule an arbitrary number of rules can be added to the system.The user can, for example, be notified when a critical auto job stops runningor when a new item has been logged to the news log. The system providesa web based view over which jobs are currently monitored by the system.The web view can be found by following the link “Open web view” in theapplication management pages.
    The email settings are configured in the “General settings” as well as thesession provider username and password needed to connect to theFoundation system information provider. The link “Node settings” providesthe ability to add arbitrary rules. To fully understand these settings the useris recommended to read the document “M3 Foundation System InformationProvider Technical Reference Guide”.”

  6. Pingback: the Poormans M3 Benchmarking and Monitoring | Potato IT

  7. Scott et al., have you seen this?
    The Halcyon Guide to Infor M3 (Movex) Templates, with Donnie MacColl
    https://www.halcyonsoftware.com/guides/Guide-Infor-M3-Templates.pdf

    • potatoit says:

      No, I hadn’t seen this, thank you.

      I was aware of Halcyon, though I wasn’t aware of exactly what it monitored.

      Cheers,
      Scott

      • Scott. Yes, and I like the extensive and detailed collection of rules. We can implement them using your techniques. I looked up Donnie to propose he writes a blog post about it, but according to his LinkedIn he’s not at Halcyon anymore. If someone from Halcyon reads this, please contact me (or another M3 blog) to come share more information and screenshots about your product.

  8. FYI – You forgot the tag “Monitoring” on this post, and it took me a while to find it back.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s