M3 Business Engine and Grid Monitoring

A wee while ago I posted about some monitoring scripts that I wrote which also provide trending – but I was pretty slack and never got around to posting them, here they are. 🙂

This is a little light on the details but it should at least provide people with some ideas on how to achieve some fairly comprehensive monitoring for free.  There is a lot that can be done to automate the deployment of this – but it’s a project for another day and there is a lot in this post as it is.

Note, I’ve only used these scripts on M3 13.2 and 13.3

If you do intend to use these scripts and modify or enhance them, then I ask that you provide the enhancements so everyone can benefit.  I’ll be looking at setting up a github page to make it easier.

We have a combination of Perl and Powershell scripts and for convenience I’ve created a Linux VM which has an ftp server, rrdtool and the Perl packages that the scripts use. SUSE provide some neat tools which allows you to build a JeOS low foot print Linux distro.

You can retrieve the VM from http://susestudio.com and search for M3 Monitoring

The default the username is root, the password is linux (something you should change when you install :-)), it’s also set up for DHCP, again something that you may want to change.

And then use the scripts from this post which I drop in the the /root directory.

The Scripts

The perl and bash scripts will need to be enabled for execution, this can be done by issuing the command

chmod +x <script name>

monitorM3.pl

monitorM3.pl <business engine hostname> <monitor port> <grid port> <configuration file path>

This script is the workhorse – it will read from the grid and from the monitor port of the M3 Business Engine and log most of the counters in to rrdtools archive. It will also check for excessive changes in the interactive jobs (to flag looping interactive jobs) and email if it continues over a predefined number of checks. Equally, it will look for excessive CPU usage over a number of checks and send email notifications.

It will check the autojob count is the expected, if not it will send an email.  Likewise if a grid application is offline it will send a notification email.

It is smart enough not to spam you on every check – rather it will only send notification emails every x number of checks.

#!/usr/bin/perl
#
#	Name:	monitorM3.pl
#	Arguments:
#			monitorM3.pl <M3BE Hostname> <monitor counter port> <grid port> <config file>
#	Description:
#			This script checks certain subsystems and sends alerts if a subsystem
#			doesn't meet certain criteria.  It will also log many counters in to a
#			rrdtool rra database so we can graph the data.
#			It does this by retrieving specific grid .xml files and parsing the content
#
#	**** Configuration that needs to be changed ****
#		$emailFrom		- the from address of the notification emails
#		$emailTo		- where were will send notification emails
#		$userName		- username to log in to the grid
#		$passWord		- password to log in to the grid
#		$autojobCount	- this can be different depending on how a customers environment is set up.  It should reflect the number of autojobs expected to be running
#		$smtpRelayServer	- the mail relay server (blank if we can send directly)
#
#	**** Control Variables ****
#	these variables control how the script operates
#		$checkFile		- when this file exists in the $stateVariablePath/<M3BE Hostname>/<monitor counter port> directory, we suppress monitoring.  This is useful if the environment is down for planned maintenance so we don't get down notifications
#
#	By:		Scott Campbell (scott.campbell@potatoit.kiwi)
#
#	History:
#		2015		SAC	- initial cut completed
#		20160112	SAC	- Corrected issue where the autojob notifications wouldn't fire when the autojobs were down
#						- we are smarter with our email alerts, we don't delete our don't spam me file, when we
#							want to send a 'still down' notification.  We now see if the increment count evenly divides
#							in to the $checksBetweenEmails.  This way we can keep track of the total number of checks we have been down
#		20160118	SAC	- retrieveXMLFromServer updated so it will include a content header to allow us to retrieve the nodes under
#							13.3
#						- added support for mail relay servers
#		20160119	SAC	- added base code for configuration file
#		20160122	SAC	- /monitor under 13.3 has a slightly different structure for the ThreadPools->System, we now allow for it
#		20160201	SAC	- cleaned up the reading of the parameter file so it now supports comments with a #
#		20160210	SAC	- emails not being sent out, declared the %mail variable in the wrong place
#		20160211	SAC	- we check for excessive change counts on the user interactive subsystem
#						- we log all the counters now
#		20160212	SAC	- archives now have LAST and MIN added

use RRDs;
use File::Path qw(make_path remove_tree);
use LWP;
use XML::XPath;
use XML::XPath::XMLParser;
use Mail::Sendmail;
use DateTime;
use DateTime::Format::DateParse;
use Scalar::Util qw(looks_like_number);
use File::Basename;

########
###
### System Specific configuration settings
###
########

###
# Email settings
###

# the from email address of the notifications
my $emailFrom = 'M3Monitoring@potatoit.kiwi';
# where we will send the emails to
my $emailTo = 'monitoring@potatoit.kiwi';

# this is the number of failed checks before we send another
# email.  If we check every 5 minutes, then if we set the check
# to 12, then we will get a notification every 60 minutes (5 x 12)
my $checksBetweenEmails = 12;

###
# user/password to log in to the monitor
###
my $userName = '';
my $passWord = '';

###
# SMTP Relay Server
#	can be blank to send directly
###
my $smtpRelayServer = '';

###
# thresholds
###

# expected autojob count
my $autojobCount = 54;

########################################
###
### Nothing below this line should need to be changed
###
########################################

# check the interactive jobs for a sustained change of >= this value
my $interactiveExcessiveChangeThreadshold = 100;
# we need to have an excessive change count for this number of checks before
# we trigger a warning email
my $interactiveExcessiveChangeThreadsholdCount = 20;

# we will check count times before sending a warning email
my $excessiveCPUUsageCount = 5;
# if we have high cpu on a thread, we will increment a counter
# once that counter exceeds excessiveCPUUsageCount we will send
# a warning email
my $excessiveCPUThreshold = 20;

# how many checks we should make of excessive busy
# threads in the autojobs before we trigger an email
my $maxAutojobWaitCount = 5;

###
# paths and control files
###

# this is where we will store our
# monitoring variables
my $stateVariablePath = '/var/lib/jbcmon/state';

# define location of rrdtool databases
my $rrdDatabasePath = '/var/lib/jbcmon/server';

# where we will output the graph images
my $imagePath = '/srv/www/htdocs/server';

# don't bother checking the monitoring
my $checkFile = 'stopMonitoring';

# control file to prevent emails too frequently
my $stopSpammingWithM3BEEmailsFile = 'dontSpamMeM3BE';
# control file to prevent emails too frequently
my $stopSpammingWithGridEmailsFile = 'dontSpamMeGrid';

# where we shall store our monitoring variables
my $monitoringM3BEPath = '/monitor';
# this is the path to the general grid status xml file
# which allows us to check the status of the different applications
my $monitoringGridApplicationsPath = '/grid/status';

# this provides us with some data for monitoring the nodes themselves
my $monitoringGridNodesPath = '/grid/nodes';

# the prefix for excessive changes state files
my $excessiveChangePrefix = 'xchg_';

my @autojobJVMArray;
my @interactiveJVMArray;
my @batchJVMArray;
my @mijobsJVMArray;

my $graphOnly = 0;
my $debug = 0;

########
###
### Start of code
###
########

my $gridName = "";

my $num_args = $#ARGV + 1;

my $browser = LWP::UserAgent->new;

if($num_args < 3)
{
	print "\nUsage: monitorM3.pl <server> <monitor port> <grid port> [graph only (0 = false, 1 = true)]\n\teg. monitorM3.pl ifbenp 16308 16301 1\n";
	die;
}
else
{
	my $server_url = $ARGV[0];
	my $monitorPort = $ARGV[1];
	my $gridPort = $ARGV[2];

	if($num_args > 3)
	{
		# read the configuration file
		readConfigFile($ARGV[3]);

		my $argPosition = 0;
		foreach(@ARGV)
		{
			if($argPosition > 3)
			{
				if($_ eq 'debug')
				{
					$debug = 1;
				}
				#elsif($_ eq 'graphOnly')
				#{
				#	$graphOnly = 1;
				#}
			}
			$argPosition++;
		}
	}

	$stateVariablePath = "$stateVariablePath/$server_url/$monitorPort";
	$rrdDatabasePath = "$rrdDatabasePath/$server_url/$monitorPort";
	$imagePath = "$imagePath/$server_url/$monitorPort";

	debugInfo("State Path: $stateVariablePath\n");
	debugInfo("RRD Path: $rrdDatabasePath\n");
	debugInfo("Images Path: $imagePath\n\n");

	# check to see if we have our state directory, if not create it
	if(! -d $stateVariablePath)
	{
		make_path($stateVariablePath) or die "Error creating directory: $stateVariablePath";
	}

	# check to see if we have our rrd database directory, if not create it
	if(! -d $rrdDatabasePath)
	{
		make_path($rrdDatabasePath) or die "Error creating directory: $rrdDatabasePath";
	}
	# check to see if we have our image directory, if not create it
	if(! -d $imagePath)
	{
		make_path($imagePath) or die "Error creating directory: $imagePath";
	}

	if(-e "$stateVariablePath/$checkFile")
	{
		debugInfo("\n...suppressed check...\n");
		$checkFileIncrement = incrementFile("$stateVariablePath/$checkFile");

		if($checkFileIncrement < 2) 		{ 			sendEmailCheckIfShouldBeSent($emailFrom, $emailTo, "$server_url checking suppressed", "") or warn "Cant send mail: $Mail::Sendmail::error"; 		} 	} 	else 	{ 		my $emailApplicationBody = ""; 		 		# check the different grid applications state 		 		debugInfo("RetrieveXML From Server: http://$server_url:$gridPort$monitoringGridApplicationsPath\n"); 		my $rawStatus = retrieveXMLFromServer("http://$server_url:$gridPort", $monitoringGridApplicationsPath, "0"); 		 		if(length($rawStatus) > 0)
		{
			my $xpathStatus = XML::XPath->new($rawStatus);

			my $applications = $xpathStatus->findnodes("//applications");

			if($applications)
			{
				foreach my $currentApplicationsNode ($applications->get_nodelist)
				{
					$gridName = $currentApplicationsNode->getAttribute('gridName');
					debugInfo(" Grid: $gridName\n");
				}
			}

			my $applicationStatus = $xpathStatus->findnodes("//applicationStatus");

			foreach my $currentApplicationNode ($applicationStatus->get_nodelist)
			{
				my $applicationName = $currentApplicationNode->getAttribute('applicationName');
				my $applicationOfflineStatus = $currentApplicationNode->getAttribute('offline');

				if($applicationOfflineStatus eq "true")
				{
					$emailApplicationBody = "$emailApplicationBody\n $applicationName is offline!\n";
					debugInfo("\t:-( $applicationName is offline\n");
				}
				else
				{
					debugInfo("\t:-) $applicationName is online\n");
				}
			}
		}
		else
		{
			$emailApplicationBody = "Failed to retrieve the state of the Grid applications $server_url:$gridPort/$monitoringGridApplicationsPath";

			debugInfo(" Failed to retrieve the state of the Grid applications $server_url:$gridPort/$monitoringGridApplicationsPath\n");
		}

		debugInfo("RetrieveXML From Server: http://$server_url:$gridPort$monitoringGridNodesPath\n");
		#
		my $rawNodes = retrieveXMLFromServer("http://$server_url:$gridPort", $monitoringGridNodesPath, "0");
		if(length($rawNodes) > 0)
		{
			my $xpathStatus = XML::XPath->new($rawNodes);

			my $nodes = $xpathStatus->findnodes("//NodeStatus");

			open my $nodeFD, '>', "$rrdDatabasePath/nodes";

			foreach my $currentNode ($nodes->get_nodelist)
			{
				my $nodeName = $currentNode->getAttribute('nodeName');
				if(length($nodeName) > 0)
				{
					my $nodeCPUPercentage = $currentNode->getAttribute('cpuPercent');
					my $nodeLogErrCount = $currentNode->getAttribute('logErrCount');
					my $nodeLogWarnCount = $currentNode->getAttribute('logWarnCount');
					my $nodeLogSysErrCount = $currentNode->getAttribute('logSysErrCount');
					my $nodeLogSysWarnCount = $currentNode->getAttribute('logSysWarnCount');
					my $nodeMemoryMax = $currentNode->getAttribute('memoryMax');
					my $nodeMemoryUsed = $currentNode->getAttribute('memoryUsed');
					my $nodehostName = $currentNode->getAttribute('hostName');

					updateRRDFile("$rrdDatabasePath/nodesStatus_$nodehostName\_$nodeName\_CPUPercentage", $timeStamp, "", $nodeCPUPercentage, "GAUGE", "CPU Usage");
					updateRRDFile("$rrdDatabasePath/nodesStatus_$nodehostName\_$nodeName\_MemoryMax", $timeStamp, "", $nodeMemoryMax, "GAUGE", "Memory Max");
					updateRRDFile("$rrdDatabasePath/nodesStatus_$nodehostName\_$nodeName\_MemoryUsed", $timeStamp, "", $nodeMemoryUsed, "GAUGE", "Memory Used");

					updateRRDFile("$rrdDatabasePath/nodesStatus_$nodehostName\_$nodeName\_LogErrCount", $timeStamp, "", $nodeLogErrCount, "COUNTER", "Log Error Count");
					updateRRDFile("$rrdDatabasePath/nodesStatus_$nodehostName\_$nodeName\_LogWarnCount", $timeStamp, "", $nodeLogWarnCount, "COUNTER", "Log Warning Count");
					updateRRDFile("$rrdDatabasePath/nodesStatus_$nodehostName\_$nodeName\_LogSysErrCount", $timeStamp, "", $nodeLogSysErrCount, "COUNTER", "Log System Error Count");
					updateRRDFile("$rrdDatabasePath/nodesStatus_$nodehostName\_$nodeName\_LogSysWarnCount", $timeStamp, "", $nodeLogSysWarnCount, "COUNTER", "Log System Warn Count");

					print $nodeFD "$nodehostName\_$nodeName\n";

					debugInfo("\t$nodeName statistics logged\n");
				}
			}
			close($nodeFD);
		}
		else
		{
			$emailApplicationBody = "Failed to retrieve the state of the Grid nodes $server_url:$gridPort/$monitoringGridNodesPath";
		}

		debugInfo("RetrieveXML From Server: http://$server_url:$monitorPort$monitoringM3BEPath\n");
		my $rawM3BEContent = retrieveXMLFromServer("http://$server_url:$monitorPort", $monitoringM3BEPath, "1");

		if(length($rawM3BEContent) > 0)
		{
			my $xpathM3BEContent = XML::XPath->new($rawM3BEContent);

			my $categories = $xpathM3BEContent->findnodes("//category");

			foreach my $currentM3BENode ($categories->get_nodelist)
			{
				my $name = $currentM3BENode->getAttribute('name');

				if($name eq "autojobs" || $name eq "interactivejobs" || $name eq "batch" || $name eq "mijobs")
				{
					processM3BEJob($currentM3BENode);
				}
				elsif($name eq "jvms")
				{
					my $timeStamp = $currentM3BENode->getAttribute('timestamp');
					my $parsedTime = $timeStamp;

					$parsedTime =~ tr/T/ /;

					my $dt = DateTime::Format::DateParse->parse_datetime($parsedTime);

					my $jvms = $currentM3BENode->findnodes("./jvm");

					foreach my $currentJVM ($jvms->get_nodelist)
					{
						processJVMs($currentJVM, $dt->strftime('%b %d %Y %H:%M'), $server_url);
					}
				}
			}
			# now we will process the counters - we want to process all of the other jobs first to get the JVM IDs
			foreach my $currentM3BENode ($categories->get_nodelist)
			{
				my $name = $currentM3BENode->getAttribute('name');

				my $timeStamp = $currentM3BENode->getAttribute('timestamp');
				my $parsedTime = $timeStamp;

				$parsedTime =~ tr/T/ /;

				my $dt = DateTime::Format::DateParse->parse_datetime($parsedTime);

				if($name eq "counters")
				{
					my $subsystemCounters = $currentM3BENode->findnodes("./subsystem");
					foreach my $currentSubsystem ($subsystemCounters->get_nodelist)
					{
						my $subsystemAddressPort = $currentSubsystem->getAttribute('addressPort');
						my $subsystemType;

						if (grep(/^$subsystemAddressPort$/, @autojobJVMArray))
						{
							$subsystemType = "auto";
						}
						elsif (grep(/^$subsystemAddressPort$/, @interactiveJVMArray))
						{
							$subsystemType = "interactive";
						}
						elsif (grep(/^$subsystemAddressPort$/, @batchJVMArray))
						{
							$subsystemType = "batch";
						}
						elsif (grep(/^$subsystemAddressPort$/, @mijobsJVMArray))
						{
							$subsystemType = "mi";
						}
						else
						{
							$subsystemType = "unknown";
						}
						my $counters = $currentSubsystem->findnodes("./counter");

						foreach my $currentCounter ($counters->get_nodelist)
						{
							#
							processCounters($currentCounter, $dt->strftime('%b %d %Y %H:%M'), $subsystemType);
						}
					}
				}
			}
		}
		else
		{
			$emailApplicationBody = "Failed to retrieve the monitoring data $server_url:$monitorPort/$monitoringM3BEPath";
		}

		if(length($emailApplicationBody) > 0)
		{
			my $shouldWeSendEMail = 1;

			if(! -e "$stateVariablePath/$stopSpammingWithGridEmailsFile")
			{
			}
			else
			{
				$shouldWeSendEMail = 0;
			}

			my $spamIncrementCount = incrementFile("$stateVariablePath/$stopSpammingWithGridEmailsFile");

			if($shouldWeSendEMail == 0)
			{
				if($spamIncrementCount % $checksBetweenEmails)
				{
				}
				else
				{
					$shouldWeSendEMail = 1;
				}
			}	

			if($shouldWeSendEMail == 1)
			{
				my $currentDT = DateTime->now->set_time_zone('local');
				my $currentDate = $currentDT->ymd;
				my $currentTime = $currentDT->hms;

				$emailApplicationBody = "Some grid applications are offline.\n$emailApplicationBody.\nChecks with error $spamIncrementCount";
				sendEmailCheckIfShouldBeSent($emailFrom, $emailTo, "$gridName - Grid Application Monitoring $currentDate $currentTime Error" , $emailApplicationBody);
			}
		}
		else
		{
			if(-e "$stateVariablePath/$stopSpammingWithGridEmailsFile")
			{
				sendEmailAlways($emailFrom, $emailTo, "$gridName - Grid Application Monitoring $currentDate $currentTime Error cleared", $emailBody) or warn "Cant send mail: $Mail::Sendmail::error";
			}		

			clearFile("$stateVariablePath/$stopSpammingWithGridEmailsFile");
		}

	}

	cleanUpOrphanedExcessChangeFiles();
}

# Process the counters that we get from the /monitor xml file.
# We will write the statistics to the rrd archive
# Args:
#	$_[0]	=	counter element to process
#	$_[1]	=	timestamp
#	$_[2]	=	type of subsystem
sub processCounters
{
	my $currentCounter = $_[0];
	my $timeStamp = $_[1];
	my $subsystemType = $_[2];

	my $name = $currentCounter->getAttribute('name');
	my $value = $currentCounter->getAttribute('value');

	if($name =~ /Node Counters->BusyMonitor->Memory Usage Threshold$/)
	{
		updateRRDFile("$rrdDatabasePath/$subsystemType\_NodeCounter_BusyMonitor_MemoryUsageThreshold", $timeStamp, "ndcputotals", $value, "GAUGE", "BusyMonitor Memory Usage Threshold");
	}
	elsif($name =~ /Node Counters->BusyMonitor->Memory Usage$/)
	{
		updateRRDFile("$rrdDatabasePath/$subsystemType\_NodeCounter_BusyMonitor_MemoryUsage", $timeStamp, "ndcputotals", $value, "GAUGE", "BusyMonitor Memory Usage");
	}
	elsif($name =~ /Node Counters->CPU Usage->Total$/)
	{
		updateRRDFile("$rrdDatabasePath/$subsystemType\_NodeCounter_CPUUsage_Totals", $timeStamp, "ndcputotals", $value, "GAUGE", "CPU Usage Total");
	}
	elsif($name =~ /Node Counters->JDBC Configuration Connection Pool->Released and closed$/)
	{
		updateRRDFile("$rrdDatabasePath/$subsystemType\_NodeCounter_JDBCConfigurationConnectionPool_Releasedandclosed", $timeStamp, "ndcputotals", $value, "COUNTER", "JDBC Configuration Connection Pool Released and closed");
	}
	elsif($name =~ /Node Counters->Memory->Max Heap$/)
	{
		updateRRDFile("$rrdDatabasePath/$subsystemType\_NodeCounter_Memory_MaxHeap", $timeStamp, "maxheap", $value, "GAUGE", "Max Heap");
	}
	elsif($name =~ /Node Counters->Memory->Peak Heap$/)
	{
		updateRRDFile("$rrdDatabasePath/$subsystemType\_NodeCounter_Memory_PeakHeap", $timeStamp, "peakheap", $value, "GAUGE", "Peak Heap");
	}
	elsif($name =~ /Node Counters->Memory->Used Heap$/)
	{
		updateRRDFile("$rrdDatabasePath/$subsystemType\_NodeCounter_Memory_UsedHeap", $timeStamp, "usedheap", $value, "GAUGE", "Used Heap");
	}
	elsif($name =~ /Node Counters->ThreadPools->SYSTEM->Avg Requests Time$/ || $name =~ /Node Counters->ThreadPools->SYSTEM\/SYSTEM->Avg Requests Time$/)
	{
		updateRRDFile("$rrdDatabasePath/$subsystemType\_ThreadPool_System_AvgReqTime", $timeStamp, "avgreqtm", $value, "GAUGE", "Avg Requests Time");
	}
	elsif($name =~ /Node Counters->ThreadPools->SYSTEM->Created Threads$/ || $name =~ /Node Counters->ThreadPools->SYSTEM\/SYSTEM->Created Threads$/)
	{
		updateRRDFile("$rrdDatabasePath/$subsystemType\_NodeCounter_ThreadPools_Created Threads", $timeStamp, "ndcputotals", $value, "COUNTER", "Thread Pools Created Threads");
	}
	elsif($name =~ /Node Counters->ThreadPools->SYSTEM->Live Threads$/ || $name =~ /Node Counters->ThreadPools->SYSTEM\/SYSTEM->Live Threads$/)
	{
		updateRRDFile("$rrdDatabasePath/$subsystemType\_ThreadPool_System_LiveThreads", $timeStamp, "live", $value, "GAUGE", "Live Threads");
	}
	elsif($name =~ /Node Counters->ThreadPools->SYSTEM->Queued Threads$/ || $name =~ /Node Counters->ThreadPools->SYSTEM\/SYSTEM->Queued Threads$/)
	{
		updateRRDFile("$rrdDatabasePath/$subsystemType\_ThreadPool_System_QueuedThreads", $timeStamp, "queued", $value, "GAUGE", "Queued Threads");
	}
	elsif($name =~ /Node Counters->ThreadPools->SYSTEM->Total Denied$/ || $name =~ /Node Counters->ThreadPools->SYSTEM\/SYSTEM->Total Denied$/)
	{
		updateRRDFile("$rrdDatabasePath/$subsystemType\_ThreadPool_System_TotalDenied", $timeStamp, "denied", $value, "COUNTER", "Total Denied");
	}
	elsif($name =~ /Node Counters->ThreadPools->SYSTEM->Total Queued Time$/ || $name =~ /Node Counters->ThreadPools->SYSTEM\/SYSTEM->Total Queued Time$/)
	{
		updateRRDFile("$rrdDatabasePath/$subsystemType\_ThreadPool_System_TotalQueuedTime", $timeStamp, "queued", $value, "COUNTER", "Total Queued Time");
	}
	elsif($name =~ /Node Counters->ThreadPools->SYSTEM->Total Queued$/ || $name =~ /Node Counters->ThreadPools->SYSTEM\/SYSTEM->Total Queued$/)
	{
		updateRRDFile("$rrdDatabasePath/$subsystemType\_ThreadPool_System_TotalQueued", $timeStamp, "queued", $value, "GAUGE", "Total Queued");
	}
	elsif($name =~ /Node Counters->ThreadPools->SYSTEM->Total Requests Time$/ || $name =~ /Node Counters->ThreadPools->SYSTEM\/SYSTEM->Total Requests Time$/)
	{
		updateRRDFile("$rrdDatabasePath/$subsystemType\_ThreadPool_System_TotalRequestTime", $timeStamp, "requesttm", $value, "GAUGE", "Total Req. Time");
	}
	elsif($name =~ /Node Counters->ThreadPools->SYSTEM->Total Requests$/ || $name =~ /Node Counters->ThreadPools->YSTEM\/SYSTEM->Total Requests$/)
	{
		updateRRDFile("$rrdDatabasePath/$subsystemType\_ThreadPool_System_TotalRequest", $timeStamp, "requests", $value, "COUNTER", "Total Requests");
	}
	elsif($name =~ /Subsystem:Complex Pool Read-only->Lookups$/)
	{
		updateRRDFile("$rrdDatabasePath/$subsystemType\SubsystemCompexPoolReadOnlyLookups", $timeStamp, "requests", $value, "COUNTER", "Subsystem Complex Pool Readonly lookups");
	}
	elsif($name =~ /Subsystem:Complex Pool Read-only->Open connections$/)
	{
		updateRRDFile("$rrdDatabasePath/$subsystemType\SubsystemCompexPoolReadOnlyOpenConnections", $timeStamp, "requests", $value, "GAUGE", "Subsystem Complex Pool Readonly Open Connections");
	}
	elsif($name =~ /Subsystem:DB Handle Pool->DBHandlePool psCount$/)
	{
	}
	elsif($name =~ /Subsystem:ProxyRequests:Subsystem->Avg Requests Time$/)
	{
		updateRRDFile("$rrdDatabasePath/$subsystemType\_subsystem_ProxyRequests_AvgReqTime", $timeStamp, "requesttm", $value, "GAUGE", "Avg. Req Time");
	}
	elsif($name =~ /Subsystem:ProxyRequests:Subsystem->Queued Threads$/)
	{
		updateRRDFile("$rrdDatabasePath/$subsystemType\_subsystem_ProxyRequests_QueuedThreads", $timeStamp, "queued", $value, "GAUGE", "Queued Threads");
	}
	elsif($name =~ /Subsystem:ProxyRequests:Subsystem->Running Threads$/)
	{
		updateRRDFile("$rrdDatabasePath/$subsystemType\_subsystem_ProxyRequests_RunningThreads", $timeStamp, "queued", $value, "GAUGE", "Running Threads");
	}
	elsif($name =~ /Subsystem:ProxyRequests:Subsystem->Total Denied$/)
	{
		updateRRDFile("$rrdDatabasePath/$subsystemType\_subsystem_ProxyRequests_TotalDenied", $timeStamp, "denied", $value, "COUNTER", "Total Denied");
	}
	elsif($name =~ /Subsystem:ProxyRequests:Subsystem->Total Queued Time$/)
	{
		updateRRDFile("$rrdDatabasePath/$subsystemType\_subsystem_ProxyRequests_TotalQueuedTime", $timeStamp, "qtime", $value, "COUNTER", "Total Queued Time");
	}
	elsif($name =~ /Subsystem:ProxyRequests:Subsystem->Total Queued$/)
	{
		updateRRDFile("$rrdDatabasePath/$subsystemType\_subsystem_ProxyRequests_TotalQueued", $timeStamp, "totalq", $value, "GAUGE", "Total Queued");
	}
	elsif($name =~ /Subsystem:ProxyRequests:Subsystem->Total Requests Time$/)
	{
		updateRRDFile("$rrdDatabasePath/$subsystemType\_subsystem_ProxyRequests_TotalReqTime", $timeStamp, "totalreq", $value, "COUNTER", "Total Requests Time");
	}
	elsif($name =~ /Subsystem:ProxyRequests:Subsystem->Total Requests$/)
	{
		#print "$subsystemType, value=$value\n";
		updateRRDFile("$rrdDatabasePath/$subsystemType\_subsystem_ProxyRequests_TotalRequests", $timeStamp, "totalreq", $value, "COUNTER", "Total Requests");
	}
	elsif($name =~ /Subsystem:Publisher:M3->Average Delivery Count$/)
	{
		updateRRDFile("$rrdDatabasePath/$subsystemType\_subsystem_Publisher_AvgDelCount", $timeStamp, "deltime", $value, "GAUGE", "Avg. Delivery Count");
	}
	elsif($name =~ /Subsystem:Publisher:M3->Average Delivery Time$/)
	{
		updateRRDFile("$rrdDatabasePath/$subsystemType\_subsystem_Publisher_AvgDelTime", $timeStamp, "deltime", $value, "GAUGE", "Avg. Delivery Time");
	}
	elsif($name =~ /Subsystem:Publisher:M3->Average Event Size$/)
	{
		updateRRDFile("$rrdDatabasePath/$subsystemType\_subsystem_Publisher_AvgEventSize", $timeStamp, "deltime", $value, "GAUGE", "Avg. Event Size");
	}
	elsif($name =~ /Subsystem:Publisher:M3->Duplicates Discarded$/)
	{
		updateRRDFile("$rrdDatabasePath/$subsystemType\_subsystem_Publisher_DiplicatesDiscarded", $timeStamp, "deltime", $value, "COUNTER", "Duplicates Discarded");
	}
	elsif($name =~ /Subsystem:Publisher:M3->Events Received$/)
	{
		updateRRDFile("$rrdDatabasePath/$subsystemType\_subsystem_Publisher_EventsRec", $timeStamp, "eventsrec", $value, "COUNTER", "Events Recd.");
	}
	elsif($name =~ /Subsystem:Publisher:M3->Failed Delivery Attempts$/)
	{
		updateRRDFile("$rrdDatabasePath/$subsystemType\_subsystem_Publisher_FailedDelAttempts", $timeStamp, "faileddel", $value, "COUNTER", "Failed Del Attempts");
	}
	elsif($name =~ /Subsystem:Publisher:M3->Moving Average Delivery Count$/)
	{
		updateRRDFile("$rrdDatabasePath/$subsystemType\_subsystem_Publisher_MvAverageDeliveryCount", $timeStamp, "buffiltm", $value, "GAUGE", "Mov. Avg. Delivery Count");
	}
	elsif($name =~ /Subsystem:Publisher:M3->Moving Average Delivery Time$/)
	{
		updateRRDFile("$rrdDatabasePath/$subsystemType\_subsystem_Publisher_MvAverageDeliveryTime", $timeStamp, "buffiltm", $value, "GAUGE", "Mov. Avg. Delivery Time");
	}
	elsif($name =~ /Subsystem:Publisher:M3->Rejected Delivery Attempts$/)
	{
		updateRRDFile("$rrdDatabasePath/$subsystemType\_subsystem_Publisher_RejectedDelAttempts", $timeStamp, "faileddel", $value, "COUNTER", "Rejected Del Attempts");
	}
	elsif($name =~ /Subsystem:Publisher->Average Buffer Fill Time$/)
	{
		updateRRDFile("$rrdDatabasePath/$subsystemType\_subsystem_Publisher_AverageBufferFillTime", $timeStamp, "buffiltm", $value, "GAUGE", "Avg. Buffer Fill Time");
	}
	elsif($name =~ /Subsystem:Publisher->Average Commit Duration$/)
	{
		updateRRDFile("$rrdDatabasePath/$subsystemType\_subsystem_Publisher_AverageCommitDuration", $timeStamp, "comdur", $value, "GAUGE", "Avg. Commit Duration");
	}
	elsif($name =~ /Subsystem:Publisher->Average Transaction Size$/)
	{
		updateRRDFile("$rrdDatabasePath/$subsystemType\_subsystem_Publisher_MoveAverageBufferTransSize", $timeStamp, "comdur", $value, "GAUGE", "Mov Avg. Buffer Trans Time");
	}
	elsif($name =~ /Subsystem:Publisher->Moving Average Buffer Fill Time$/)
	{
		updateRRDFile("$rrdDatabasePath/$subsystemType\_subsystem_Publisher_MoveAverageBufferFillTime", $timeStamp, "comdur", $value, "GAUGE", "Mov Avg. Buffer Fill Time");
	}
	elsif($name =~ /Subsystem:Publisher->Moving Average Commit Duration$/)
	{
		updateRRDFile("$rrdDatabasePath/$subsystemType\_subsystem_Publisher_MoveAverageCommitDuration", $timeStamp, "comdur", $value, "GAUGE", "Mov Avg. Commit Duration");
	}
	elsif($name =~ /Subsystem:Publisher->Moving Average Event Size$/)
	{
		updateRRDFile("$rrdDatabasePath/$subsystemType\_subsystem_Publisher_MoveAverageEventSize", $timeStamp, "comdur", $value, "GAUGE", "Mov Avg. Event Size");
	}
	elsif($name =~ /Subsystem:Publisher->Moving Average Transaction Size$/)
	{
		updateRRDFile("$rrdDatabasePath/$subsystemType\_subsystem_Publisher_MoveAverageTransactionSize", $timeStamp, "comdur", $value, "GAUGE", "Transaction. Event Size");
	}
	elsif($name =~ /Subsystem:Publisher->Rejections$/)
	{
		updateRRDFile("$rrdDatabasePath/$subsystemType\_subsystem_Publisher_Rejections", $timeStamp, "comdur", $value, "COUNTER", "Rejections");
	}
	elsif($name =~ /Subsystem:Smart Cache->SmartCache.approxSize$/)
	{
		updateRRDFile("$rrdDatabasePath/$subsystemType\_subsystem_SmartCache_ApproxSize", $timeStamp, "apprsize", $value, "GAUGE", "Size");
	}
	elsif($name =~ /Subsystem:Smart Cache->SmartCache.hitRate$/)
	{
		updateRRDFile("$rrdDatabasePath/$subsystemType\_subsystem_SmartCache_HitRate", $timeStamp, "hitrate", $value, "COUNTER", "Hit Rate");
	}
	elsif($name =~ /Subsystem:Smart Cache->SmartCache.keyCount$/)
	{
		updateRRDFile("$rrdDatabasePath/$subsystemType\_subsystem_SmartCache_Keycount", $timeStamp, "apprsize", $value, "GAUGE", "Key count");
	}
	elsif($name =~ /Subsystem:Smart Cache->SmartCache.readRequests$/)
	{
		updateRRDFile("$rrdDatabasePath/$subsystemType\_subsystem_SmartCache_ReadRequests", $timeStamp, "readreq", $value, "COUNTER", "Read Req.");
	}
	elsif($name =~ /Subsystem:Smart Cache->SmartCache.receivedInvalidates$/)
	{
		updateRRDFile("$rrdDatabasePath/$subsystemType\_subsystem_SmartCache_RecInvalidates", $timeStamp, "invalid", $value, "COUNTER", "Invalidates");
	}
	elsif($name =~ /Subsystem:Smart Cache->SmartCache.receivedResets$/)
	{
		updateRRDFile("$rrdDatabasePath/$subsystemType\_subsystem_SmartCache_RecResets", $timeStamp, "resets", $value, "COUNTER", "Resets");
	}
	elsif($name =~ /Subsystem:Smart Cache->SmartCache.recordCount$/)
	{
		updateRRDFile("$rrdDatabasePath/$subsystemType\_subsystem_SmartCache_RecordCount", $timeStamp, "reccount", $value, "COUNTER", "Record count");
	}
	elsif($name =~ /Subsystem:Smart Cache->SmartCache.sentChanges$/)
	{
		updateRRDFile("$rrdDatabasePath/$subsystemType\_subsystem_SmartCache_SentChanges", $timeStamp, "changes", $value, "COUNTER", "Sent Changes");
	}
	elsif($name =~ /Subsystem:Smart Cache->SmartCache.sizeReductionAverage$/)
	{
		updateRRDFile("$rrdDatabasePath/$subsystemType\_subsystem_SmartCache_SizeReductionAverage", $timeStamp, "apprsize", $value, "GAUGE", "Size Reduction Average");
	}
	elsif($name =~ /Subsystem:Smart Cache->SmartCache.sizeReductionCount$/)
	{
		updateRRDFile("$rrdDatabasePath/$subsystemType\_subsystem_SmartCache_SizeReductionCount", $timeStamp, "changes", $value, "COUNTER", "Size Reduction Count");
	}
}

# Args:
#	$_[0]	= the JVM node
#	$_[1]	= timestamp of this record
#	$_[2]	= the name of the server we queried
sub processJVMs
{
	my $currentJVM = $_[0];
	my $timeStamp = $_[1];
	my $computerName = $_[2];

	my $type = $currentJVM->getAttribute('type');
	my $jvmID = $currentJVM->getAttribute('JVMId');
	my $heapSizeKB = $currentJVM->getAttribute('heapSizeKB');
	my $threads = $currentJVM->getAttribute('threads');
	my $jobCount = $currentJVM->getAttribute('jobCount');
	my $status = $currentJVM->getAttribute('status');

	if($status eq 'normal')
	{
		if($type eq 'autojobs')
		{
			@autojobJVMArray[scalar @autojobJVMArray] = $jvmID;
			my $count = scalar @autojobJVMArray;
			my $value = @autojobJVMArray[(scalar @autojobJVMArray-1)];
			# print "\t\t ** Autojob array count ** $count, Value = $value\n";
		}
		elsif($type eq 'interactivejobs')
		{
			@interactiveJVMArray[scalar @interactiveJVMArray] = $jvmID;
		}
		elsif($type eq 'batchjobs')
		{
			@batchJVMArray[scalar @batchJVMArray] = $jvmID;
		}
		elsif($type eq 'mijobs')
		{
			@mijobsJVMArray[scalar @mijobsJVMArray] = $jvmID;
		}

		my $baseDatabasePath = "$rrdDatabasePath/";

		updateRRDFile("$baseDatabasePath$type\_heapSizeKB", $timeStamp, "HeapSizeKB", $heapSizeKB, "GAUGE", "Heap Size KB");
		updateRRDFile("$baseDatabasePath$type\_Threads", $timeStamp, "Threads", $threads, "GAUGE", "Threads");
		updateRRDFile("$baseDatabasePath$type\_JobCount", $timeStamp, "JobCount", $jobCount, "GAUGE", "Job Count");
	}
}

# processM3BEJob
#	process the jobs from the /monitor xml file
# Args:
#	$_[0]	= category node with job nodes
sub processM3BEJob
{
	my $jobCategory = $_[0];

	my $categoryName = $jobCategory->getAttribute('name');
	my $categoryJobCount = $jobCategory->getAttribute('count');
	my $categoryTotalCPUUsage = $jobCategory->getAttribute('totCpuUsage');

	debugInfo("\t processM3BEJob() from /monitor $categoryName count: $categoryJobCount, cpu: $categoryTotalCPUUsage, expected Autojob Count: $autojobCount\n");

	my $emailBody;
	my $emailSubject;
	my $emailAutojobMessage;
	my $emailCPUMessage;

	if(!(($categoryName eq 'autojobs') && ($categoryJobCount < ($autojobCount)))) 	{ 		my $xpathJobContent = XML::XPath->new(context => $jobCategory);
		my $jobs = $xpathJobContent->findnodes("./job");
		$emailSubject = 'M3 Monitoring Alert';

		foreach $currentJob ($jobs->get_nodelist)
		{
			# current jobs variables extracted
			my $name = $currentJob->getAttribute('name');
			my $cpu = $currentJob->getAttribute('cpuUsage');
			my $status = $currentJob->getAttribute('status');
			my $owner = $currentJob->getAttribute('owner');
			my $type = $currentJob->getAttribute('type');
			my $activityLevel = $currentJob->getAttribute('activityLevel');
			my $threadID = $currentJob->getAttribute('threadId');
			my $jvmID = $currentJob->getAttribute('JVMId');

			my $filenamePrefix = "$type_";

			# check for excessive CPU usage
			if($cpu > $excessiveCPUThreshold)
			{
				my $incrementCPUCount = incrementFile("$stateVariablePath/$filenamePrefix$name_$threadID");
				if($incrementCPUCount >= $excessiveCPUUsageCount)
				{
					$emailBody = "$emailBody$name (Type: $type, Thread ID: $threadID) has exceeded CPU of $excessiveCPUThreshold for the last $excessiveCPUUsageCount checks\n";
					clearFile("$stateVariablePath/$filenamePrefix$name_$threadID");
				}
			}
			else
			{
				clearFile("$stateVariablePath/$filenamePrefix$name_$threadID");
			}			

			# check for type specific issues
			if($type eq 'auto')
			{
				if($status =~ m/(SLEEP|WAIT)/)
				{
					clearFile("$stateVariablePath/$filenamePrefix$name");
				}
				else
				{
					# possibily not ok
					#print " -- doing some work, this is normally ok unless we are like this for a long time -- ";
					my $incrementCount = incrementFile("$stateVariablePath/$filenamePrefix$name");
					if($incrementCount >= $maxAutojobWaitCount)
					{
						if(length($emailAutojobMessage) > 0)
						{
						}
						else
						{
							$emailAutojobMessage = "Autojob Warnings\n";
						}
						$emailBody = "$emailBody$name is busy\n";
					}
				}
			}
			elsif($type eq 'interactive')
			{
				my($checkResult, $checkIncrementCount) = checkExcessiveChanges("$stateVariablePath/$excessiveChangePrefix$filenamePrefix$name$owner$threadID", $interactiveExcessiveChangeThreadshold, $interactiveExcessiveChangeThreadsholdCount, $activityLevel);

				if($checkResult > 0)
				{
					if($checkResult == 2)
					{
						# we should email
						$emailBody = "$emailBody$name for $owner with threadID $threadID has been busy for $checkIncrementCount checks\n";
					}
				}
				else
				{
					clearFile("$stateVariablePath/$excessiveChangePrefix$filenamePrefix$name$owner$threadID");
				}
			}
			elsif($type eq 'mi')
			{
				# print "$name, $owner, $cpu\n";
			}
			elsif($type eq 'batch')
			{
				# print "$name, $owner, $cpu\n";
			}
		}
	}
	else
	{
		$emailSubject = "Error job count incorrect";
		$emailBody = "Error, autojob count was $categoryJobCount, it should have been $autojobCount"
	}

	if(length($emailBody) > 0)
	{
		my $shouldWeSendEMail = 1;

		if(! -e "$stateVariablePath/$stopSpammingWithM3BEEmailsFile$categoryName")
		{
		}
		else
		{
			$shouldWeSendEMail = 0;
		}

		my $spamIncrementCount = incrementFile("$stateVariablePath/$stopSpammingWithM3BEEmailsFile$categoryName");

		if($shouldWeSendEMail == 0)
		{
			if($spamIncrementCount % $checksBetweenEmails)
			{
			}
			else
			{
				$shouldWeSendEMail = 1;
			}
		}

		if($shouldWeSendEMail == 1)
		{
			my $currentDT = DateTime->now->set_time_zone('local');
			my $currentDate = $currentDT->ymd;
			my $currentTime = $currentDT->hms;

			$emailBody = "$emailAutojobMessage\n$emailBody.\nChecks with error $spamIncrementCount";
			$emailSubject = "$gridName - $emailSubject $currentDate $currentTime $categoryName";

			sendEmailCheckIfShouldBeSent($emailFrom, $emailTo, $emailSubject, $emailBody);
		}
	}
	else
	{
		if(-e "$stateVariablePath/$stopSpammingWithM3BEEmailsFile$categoryName")
		{
			sendEmailAlways($emailFrom, $emailTo, "$gridName - Autojob Error cleared $categoryName", $emailBody) or warn "Cant send mail: $Mail::Sendmail::error";
		}
		clearFile("$stateVariablePath/$stopSpammingWithM3BEEmailsFile$categoryName");
	}
}

# Check to see if we have exceeded our tolerances, increment our file if needs be
# and return our current increment count
#
# Args:
#	$_[0]	= counter file path
#	$_[1]	= excessive threshold
#	$_[2]	= excessive threshold count
#	$_[3]	= current value
#
# Returns: (result, incrementCount)
#	0	= everything is ok
#	1	= threshold exceeded
#	2	= threshold exceeded and should notify
sub checkExcessive
{
	my $filePath = $_[0];
	my $excessiveThreshold = $_[1];
	my $excessiveThresholdCount = $_[2];
	my $current = $_[3];
	my $result = 0;

	if($current > $excessiveThreshold)
	{
		my $incrementCount = incrementFile($filePath);
		if($incrementCount >= $excessiveThresholdCount)
		{
			$result = 1;
			if($incrementCount % $excessiveThresholdCount)
			{
				$result = 2;
			}
		}
	}
	else
	{
		clearFile($filePath);
	}
	return($result, $incrementCount);
}

# check to see if we have exceeded our CPU tolerances
# Args:
#	$_[0]	= counter file path
#	$_[1]	= excessive CPU threshold
#	$_[2]	= excessive CPU threshold count
#	$_[3]	= current CPU
#
# Returns: (result, incrementCount)
#	0	= everything is ok
#	1	= threshold exceeded
#	2	= threshold exceeded and should notify
sub checkExcessiveCPU
{
	return(checkExcessive($_[0], $_[1], $_[2], $_[3]));
}

# check to see if we have exceeded our change tolerances
# Args:
#	$_[0]	= counter file path
#	$_[1]	= excessive Change threshold
#	$_[2]	= excessive Change threshold count
#	$_[3]	= current Change
#
# Returns: (result, incrementCount)
#	0	= everything is ok
#	1	= threshold exceeded
#	2	= threshold exceeded and should notify
sub checkExcessiveChanges
{
	return(checkExcessive($_[0], $_[1], $_[2], $_[3]));
}

# Args:
#	$_[0]	= string to output
sub debugInfo
{
	if($debug == 1)
	{
		print "$_[0]";
	}
}

# Retrieve the XML data from a URL
# Args:
#	$_[0] = server url
#	$_[1] = monitor path
#	$_[2] = require authentication? true/false
sub retrieveXMLFromServer
{
        #my $request = HTTP::Request->new(GET => "$_[0]$_[1]");

		# this gets around an issue where we weren't getting any data from the nodes
		my $request = HTTP::Request->new(GET => "$_[0]$_[1]", HTTP::Headers->new('Accept' => 'application/xml'));

		if($_[2] eq "1")
		{
			$request->authorization_basic($userName, $passWord);
		}

        my $response = $browser->request($request);

        if($response->is_error)
        {
			print "\nError: $_[0]$_[1]", $response->error_as_HTML, "\n";
        }
        else
        {
                #print "Returned: \n", $response->content, "\n";
			return $response->content;
        }
}

sub shouldEmailBeSent
{
	my $result = 1;

	#if(-e "$stateVariablePath/$stopSpammingWithM3BEEmailsFile")
	#{
	#	$result = 0;
	#}
	#if(-e "$stateVariablePath/$stopSpammingWithGridEmailsFile")
	#{
	#	$result = 0;
	#}

	return($result);
}

# Check to see if we should send emails
# Args
#	$_[0]	= from
#	$_[1]	= to
#	$_[2]	= subject
#	$_[3]	= body
sub sendEmailCheckIfShouldBeSent
{
	if(1 == shouldEmailBeSent())
	{
		my %mail;
		if(length($smtpRelayServer) > 0)
		{
			%mail = (From => $_[0], To => $_[1], Subject => $_[2], Message => $_[3], Smtp => $smtpRelayServer);
		}
		else
		{
			%mail = (From => $_[0], To => $_[1], Subject => $_[2], Message => $_[3]);
		}

		sendmail(%mail) or warn "Cant send mail: $Mail::Sendmail::error";
	}
	else
	{
		print "Send email error suppressed\n";
	}
}

# Args
#	$_[0]	= from
#	$_[1]	= to
#	$_[2]	= subject
#	$_[3]	= body
sub sendEmailAlways
{
	my %mail;
	if(length($smtpRelayServer) > 0)
	{
		%mail = (From => $_[0], To => $_[1], Subject => $_[2], Message => $_[3], Smtp => $smtpRelayServer);
	}
	else
	{
		%mail = (From => $_[0], To => $_[1], Subject => $_[2], Message => $_[3]);
	}

	sendmail(%mail);
}

# Args:
#	$_[0]	= filename
sub incrementFile
{

	my $number = 0;
	open(FILE, "<$_[0]") or {};

	undef $/;
	chomp($number = <FILE>);
	close(FILE);

	open(FILE, ">$_[0]") or return();

	$number++;

	print FILE $number;
	close(FILE);

	return($number);
}

sub clearFile
{
	open(FILE, "+<$_[0]") or return(); 	close(FILE); 	unlink $_[0]; } # Args: #	$_[0] = file path # 	$_[1] = DS definition #	$_[2] = type (GAUGE, COUNTER) sub createGaugeRRDFile { 	if(! -d $rrdDatabasePath) 	{ 		make_path ($rrdDatabasePath) or die "Error creating directory: $rrdDatabasePath"; 	} 	 	if(! -e $_[0].rrd) 	{ 		RRDs::create "$_[0]", 			"-s 60", 			"DS:value:$_[2]:120:0:U", 			"RRA:MAX:0.5:1:525600", 			"RRA:AVERAGE:0.5:1:525600", 			"RRA:LAST:0.5:1:525600", 			"RRA:MIN:0.5:1:525600" 			or warn "createGaugeRRDFile() failed to create file $_[0] $ERROR\n"; 	} } # $_[0]	= filename # $_[1]	= Time # $_[2]	= DS Name # $_[3]	= Value # $_[4] = type (GAUGE, COUNTER) # $_[5] = legend name # sub updateRRDFile() { 	my $path = "$_[0].rrd"; 	 	chomp($_[3]); 	my $updateValue = "N:$_[3]"; 	#print "\t\tUpdate: $updateValue\n"; 	 	if(!$graphOnly) 	{ 		if(! -e $path) 		{ 			createGaugeRRDFile($path, $_[2], $_[4]); 		} 	} 	if(looks_like_number($_[3])) 	{ 		if(!$graphOnly) 		{ 			RRDs::update "$path", 				"-t", "value", 				$updateValue; 		} 		my $error = RRDs::error; 		 		if($error) 		{ 			#my($file, $dir, $ext) = fileparse($_[0]); 			#print "updateRRDFile() failed to update file $file, with $updateValue\n\tError: $error\n"; 		} 		#else 		#{ 		#	if($graphOnly) 		#	{ 		#		CreateGraph($path, 'day', $_[2], $_[2], $_[5]); 		#	} 		#} 	} } # creates graph # inputs: $_[0]: name of rrd #	$_[1]: interval (ie, day, week, month, year) #	$_[2]: interface description  #	$_[3]: DS Name #	$_[4]: legend name # sub CreateGraph # { 	# my($file, $dir, $ext) = fileparse($_[0]); 	# my $dt = DateTime->now( time_zone => 'local' );

	# my $now = $dt->strftime('%Y-%m-%d %H\:%M');

	# RRDs::graph "$imagePath/$file-$_[1].png",
		# "-s -1$_[1]",
		# "--lazy",
		# "-h", "150", "-w", "600",
		# "-l 0",
		# "-a", "PNG",
		# "DEF:gen=$_[0]:value:AVERAGE",
		# "AREA:gen#7272D0:Avg $_[4]",
		# "LINE1:gen#1E1EFF",
		# "GPRINT:gen:MAX:  Max\\: %5.1lf %s",
		# "GPRINT:gen:AVERAGE: Avg\\: %5.1lf %S",
		# "GPRINT:gen:LAST: Current\\: %5.1lf %S\\n",
		# "COMMENT:Generated $now",
		# "HRULE:0#000000";
	# if ($ERROR = RRDs::error) { print "$0: unable to generate $_[0] $_[1] traffic graph: $ERROR\n"; }
# }

# Remove any orphaned excessive count files
sub cleanUpOrphanedExcessChangeFiles
{
	@file = <"$stateVariablePath/$excessiveChangePrefix*">;
	foreach(@file)
	{
		my $daysSinceLastChange = int(-C "$stateVariablePath/$_");
		if($daysSinceLastChange >= 1)
		{
			debugInfo("cleanUpOrphanedExcessChangeFiles() about to delete: $stateVariablePath/$_ it was last modified $daysSinceLastChange days ago\n");
			unlink "$stateVariablePath/$_";
		}
	}
}

# Read the configuration file and set the variables to the appropriate values
# Args:
# 	$_[0]	= Configuration file path
sub readConfigFile
{
	open my $cfg, $_[0];

	while(my $currentLine = <$cfg>)
	{
		chomp($currentLine);

		# remove anything after the first #
		my ($commentsRemoved) = ($currentLine =~ /([^#]*)/);

		# split the string on the =
		my ($parameter, $value) = split /=/, $commentsRemoved;

		if('emailFrom' eq $parameter)
		{
			if(length($value) > 0)
			{
				$emailFrom = $value;
			}
		}
		elsif('emailTo' eq $parameter)
		{
			if(length($value) > 0)
			{
				$emailTo = $value;
			}
		}
		elsif('checksBetweenEmails' eq $parameter)
		{
			if(length($value) > 0)
			{
				$checksBetweenEmails = $value;
			}
		}
		elsif('userName' eq $parameter)
		{
			if(length($value) > 0)
			{
				$userName = $value;
			}
		}
		elsif('passWord' eq $parameter)
		{
			if(length($value) > 0)
			{
				$passWord = $value;
			}
		}
		elsif('smtpRelayServer' eq $parameter)
		{
			if(length($value) > 0)
			{
				$smtpRelayServer = $value;
			}
		}
		elsif('autojobCount' eq $parameter)
		{
			if(length($value) > 0)
			{
				$autojobCount = $value;
			}
		}
		elsif('excessiveCPUUsageCount' eq $parameter)
		{
			if(length($value) > 0)
			{
				$excessiveCPUUsageCount = $value;
			}
		}
		elsif('excessiveCPUThreshold' eq $parameter)
		{
			if(length($value) > 0)
			{
				$excessiveCPUThreshold = $value;
			}
		}
		elsif('maxAutojobWaitCount' eq $parameter)
		{
			if(length($value) > 0)
			{
				$maxAutojobWaitCount = $value;
			}
		}
		elsif('monitoringM3BEPath' eq $parameter)
		{
			if(length($value) > 0)
			{
				$monitoringM3BEPath = $value;
			}
		}
		elsif('monitoringGridApplicationsPath' eq $parameter)
		{
			if(length($value) > 0)
			{
				$monitoringGridApplicationsPath = $value;
			}
		}
		elsif('monitoringGridNodesPath' eq $parameter)
		{
			if(length($value) > 0)
			{
				$monitoringGridNodesPath = $value;
			}
		}
		elsif('interactiveExcessiveChangeThreadshold' eq $parameter)
		{
			if(length($value) > 0)
			{
				$interactiveExcessiveChangeThreadshold = $value;
			}
		}
		elsif('interactiveExcessiveChangeThreadsholdCount' eq $parameter)
		{
			if(length($value) > 0)
			{
				$interactiveExcessiveChangeThreadsholdCount = $value;
			}
		}
	}

	close($cfg);
}

monitor.cfg

This is the configuration file for the monitorM3.pl script, it will override the settings in the script itself (see the script comments for more details)


# configuration file for the monitorM3.pl script
emailFrom=m3monitorserver@potatoit.kiwi
emailTo=admin@potatoit.kiwi
#checksBetweenEmails=
userName=m3srvadm
passWord=p@55w0rd
smtpRelayServer=mail.monitor.potato.kiwi
autojobCount=54
#excessiveCPUUsageCount=
#excessiveCPUThreshold=
#maxAutojobWaitCount=
#monitoringM3BEPath=
#monitoringGridApplicationsPath=
#monitoringGridNodesPath=

monitorM3Graph.pl

monitorM3Graph.pl <business engine hostname> <monitor port> <database server hostname> <time frame>

This does the bulk of our graphing, it will read the rrdtool archives and use those values to generate the graphs.

#!/usr/bin/perl
#	Name:	monitorM3Graph.pl
#	Arguments:
#			monitorM3Graph.pl <M3BE Hostname> <monitor counter port> <grid port> <db server hostname> <timeframe (day week month year)>
#	Description:
# 			Generate summary graphs relating to M3
#
#	By:		Scott Campbell (scott.campbell@potatoit.kiwi)
#
#	$ARGV[0]: m3be
#	$ARGV[1]: the path to the subsystem (beneath the server)
#	$ARGV[2]: database server
#	$ARGV[3]: timeframe (eg. day, week, month, year)
#
#	History:
#		20160121	SAC	- Dynamically generate the summary HTML files if they don't already exist
#

use RRDs;
use File::Path qw(make_path remove_tree);
use LWP;
use XML::XPath;
use XML::XPath::XMLParser;
use Mail::Sendmail;
use DateTime;
use DateTime::Format::DateParse;
use Scalar::Util qw(looks_like_number);
use File::Basename;

# define location of rrdtool databases
my $rrdDatabasePath = '/var/lib/jbcmon/server';

# where we will output the graph images
my $imagePath = '/srv/www/htdocs/server';

#$rrdDatabasePath = "$rrdDatabasePath/$ARGV[0]";
#$imagePath = "$imagePath/$ARGV[0]";

### Server stats paths
# where we store the RRD archive for the subsystems
$rrdServerStatsPath = "$rrdDatabasePath/$ARGV[2]";

# this is the path for the subsystem images
$imageServerStatsPath = "$imagePath/$ARGV[2]";

### Subsystem paths - determined
# where we store the RRD archive for the subsystems
$rrdSubSystemPath = "$rrdDatabasePath/$ARGV[0]/$ARGV[1]";

# this is the path for the subsystem images
$imageSubSystemPath = "$imagePath/$ARGV[0]/$ARGV[1]";

$deleteImgFirst = 0;

$timeFrame = $ARGV[3];

if(@ARGV > 4)
{
	#print "Delete Image First $ARGV[2]\n";
	if($ARGV[4] eq 'true')
	{
		$deleteImgFirst = 1;
		print "Images will be deleted before graphing first\n";
	}
}

# job graphs
createJobSummary("JobSummary", $timeFrame, "", "", "", "Auto", "auto");
createJobSummary("JobSummary", $timeFrame, "", "", "", "Interactive", "interactive");
createJobSummary("JobSummary", $timeFrame, "", "", "", "Batch", "batch");
createJobSummary("JobSummary", $timeFrame, "", "", "", "Mi", "mi");

if (! -e "$imageSubSystemPath/JobSummary-$timeFrame.html")
{
	open my $summaryFile, ">", "$imageSubSystemPath/JobSummary-$timeFrame.html";

	print $summaryFile "<HTML>
	 <HEAD>
	  <TITLE>Job Summary ($timeFrame)</TITLE>
	  <meta http-equiv=\"refresh\" content=\"60\">
	 </HEAD>
	 <BODY>
<table cellspacing='0' border='0' cellpadding='2.5' style='font-family:Arial; font-size:small; border:1px solid black;'>
<tr>
<td>
<div align='center' style='background-color:#4f81bd;  color:White;'><b>Auto (24hrs)</b></div>
<img src='./autoJobSummary-$timeFrame.png'/></td>
<td>
<div align='center' style='background-color:#4f81bd;  color:White;'><b>Interactive (24hrs)</b></div>
<img src='./interactiveJobSummary-$timeFrame.png'/></td>
</tr>
<tr>
<td>
<div align='center' style='background-color:#4f81bd;  color:White;'><b>Batch (24hrs)</b></div>
<img src='./batchJobSummary-$timeFrame.png'/></td>
<td>
<div align='center' style='background-color:#4f81bd;  color:White;'><b>MI (24hrs)</b></div>
<img src='./miJobSummary-$timeFrame.png'/></td>
</tr>
</table>
</BODY>
	 </HTML>
	  ";

	close $summaryFile;
}

# perf time graphs
createTimeSummary("TimeSummary", $timeFrame, "", "", "", "Auto", "auto");
createTimeSummary("TimeSummary", $timeFrame, "", "", "", "Interactive", "interactive");
createTimeSummary("TimeSummary", $timeFrame, "", "", "", "Batch", "batch");
createTimeSummary("TimeSummary", $timeFrame, "", "", "", "MI", "mi");

if (! -e "$imageSubSystemPath/TimeSummary-$timeFrame.html")
{
	open my $summaryFile, ">", "$imageSubSystemPath/TimeSummary-$timeFrame.html";

	print $summaryFile "<HTML>
	 <HEAD>
	  <TITLE>Job Summary ($timeFrame)</TITLE>
	  <meta http-equiv=\"refresh\" content=\"60\">
	 </HEAD>
	 <BODY>
<table cellspacing='0' border='0' cellpadding='2.5' style='font-family:Arial; font-size:small; border:1px solid black;'>
<tr>
<td>
<div align='center' style='background-color:#4f81bd;  color:White;'><b>Auto (24hrs)</b></div>
<img src='./autoTimeSummary-$timeFrame.png'/></td>
<td>
<div align='center' style='background-color:#4f81bd;  color:White;'><b>Interactive (24hrs)</b></div>
<img src='./interactiveTimeSummary-$timeFrame.png'/></td>
</tr>
<tr>
<td>
<div align='center' style='background-color:#4f81bd;  color:White;'><b>Batch (24hrs)</b></div>
<img src='./batchTimeSummary-$timeFrame.png'/></td>
<td>
<div align='center' style='background-color:#4f81bd;  color:White;'><b>MI (24hrs)</b></div>
<img src='./miTimeSummary-$timeFrame.png'/></td>
</tr>
</table>
</BODY>
	 </HTML>
	  ";

	close $summaryFile;
}

#####
##	Database server
#####

# graph the db server disks
createDiskSummary("ServerSummary", $timeFrame, "", "", "", "", "D");
createDiskSummary("ServerSummary", $timeFrame, "", "", "", "", "E");
createDiskSummary("ServerSummary", $timeFrame, "", "", "", "", "F");

if (! -e "$imageServerStatsPath/ServerSummary-$timeFrame.html")
{
	open my $summaryFile, ">", "$imageServerStatsPath/ServerSummary-$timeFrame.html";

	print $summaryFile "<HTML>
	 <HEAD>
	  <TITLE>Job Summary ($timeFrame)</TITLE>
	  <meta http-equiv=\"refresh\" content=\"60\">
	 </HEAD>
	 <BODY>
<table cellspacing='0' border='0' cellpadding='2.5' style='font-family:Arial; font-size:small; border:1px solid black;'>
<tr>
<td>
<div align='center' style='background-color:#4f81bd;  color:White;'><b>DB Server Summary (24hrs)</b></div>
<img src='./ServerSummary-$timeFrame.png'/></td>
<td></td>
</tr>
<tr>
<td style='height: 100%'>
<div align='top' style='background-color:#4f81bd;  color:White; height: 100%;'><b>DB Page Fault Summary (24hrs)</b></div>
<img align='top' src='./DBPageFaultSummary-$timeFrame.png'/></td>
<td>
<div align='center' style='background-color:#4f81bd;  color:White;'><b>DB D Drive (24hrs)</b></div>
<img src='./DServerSummary-$timeFrame.png'/></td>
</tr>
<tr>
<td>
<div align='center' style='background-color:#4f81bd;  color:White;'><b>DB E Drive (24hrs)</b></div>
<img src='./EServerSummary-$timeFrame.png'/></td>
<td>
<div align='center' style='background-color:#4f81bd;  color:White;'><b>DB F Drive (24hrs)</b></div>
<img src='./FServerSummary-$timeFrame.png'/></td>
</tr>
</table>
</BODY>
	 </HTML>
	  ";

	close $summaryFile;
}

# graph the db server general stats
createServerSummary("ServerSummary", $timeFrame, "", "", "", "", "");
createDBServerPageFaultsSummary("DBPageFaultSummary", $timeFrame, "", "", "", "", "");

# creates graph
# inputs: $_[0]: image file name
#	$_[1]: interval (ie, day, week, month, year)
#	$_[2]: interface description
#	$_[3]: DS Name
#	$_[4]: legend name
#	$_[5]: type for title
#	$_[6]: file prefix
sub createJobSummary
{
	my $file = "$_[6]$_[0]";

	my $dt = DateTime->now( time_zone => 'local' );

	my $now = $dt->strftime('%Y-%m-%d %H\:%M');

	if(! -d $imageSubSystemPath)
	{
		make_path ($imageSubSystemPath) or die "Error creating directory: $imageSubSystemPath";
	}	

	RRDs::graph "$imageSubSystemPath/$file-$_[1].png",
		"-s -1$_[1]",
		"--lazy",
		"-h", "150", "-w", "600",
		"-l 0",
		"-a", "PNG",
		"-t", "Consolidated $_[5] Job Summary",
		"--right-axis", "0.0000001:0",
		#"-v %",
		"DEF:memMaxHP=$rrdSubSystemPath/$_[6]_NodeCounter_Memory_MaxHeap.rrd:value:AVERAGE",
		"DEF:memPekHP=$rrdSubSystemPath/$_[6]_NodeCounter_Memory_PeakHeap.rrd:value:AVERAGE",
		"DEF:memUskHP=$rrdSubSystemPath/$_[6]_NodeCounter_Memory_UsedHeap.rrd:value:AVERAGE",
		"DEF:jobs=$rrdSubSystemPath/$_[6]jobs_JobCount.rrd:value:AVERAGE",
		"DEF:threads=$rrdSubSystemPath/$_[6]jobs_Threads.rrd:value:AVERAGE",
		"DEF:cpu=$rrdSubSystemPath/$_[6]_NodeCounter_CPUUsage_Totals.rrd:value:AVERAGE",
 		"CDEF:jobs_act=jobs,-1000000,*",
		"CDEF:threads_act=threads,-1000000,*",
		"CDEF:cpu_act=cpu,-1000000,*",
		"AREA:memMaxHP#D07272:Max Heap    ",
		"LINE1:memMaxHP#FF1E1E",
		"GPRINT:memMaxHP:MAX: Max\\:\\t%5.1lf %s\\l",
		"AREA:memPekHP#A06262:Peak Heap  ",
		"LINE1:memPekHP#AF1E1E",
		"GPRINT:memPekHP:MAX:  Max\\: \\t%5.1lf %s",
		"GPRINT:memPekHP:AVERAGE: Avg\\: \\t%5.1lf %S",
		"GPRINT:memPekHP:LAST: Cur\\: \\t%5.1lf %S\\n",
		"AREA:memUskHP#905252:Used Heap  ",
		"LINE1:memUskHP#9F1E1E",
		"GPRINT:memUskHP:MAX:  Max\\: \\t%5.1lf %s",
		"GPRINT:memUskHP:AVERAGE: Avg\\: \\t%5.1lf %S",
		"GPRINT:memUskHP:LAST: Cur\\: \\t%5.1lf %S\\n",
		"AREA:threads_act#6262A0:Avg Threads",
		"LINE1:threads_act#1E1EAF",
		"GPRINT:threads:MAX:  Max\\: \\t%5.1lf %s",
		"GPRINT:threads:AVERAGE: Avg\\: \\t%5.1lf %S",
		"GPRINT:threads:LAST: Cur\\: \\t%5.1lf %S\\n",
		"AREA:jobs_act#7272D0:Avg Jobs   ",
		"LINE1:jobs_act#1E1EFF",
		"GPRINT:jobs:MAX:  Max\\: \\t%5.1lf %s",
		"GPRINT:jobs:AVERAGE: Avg\\: \\t%5.1lf %S",
		"GPRINT:jobs:LAST: Cur\\: \\t%5.1lf %S\\n",
		"LINE1:cpu_act#101010:Total CPU  ",
		"GPRINT:cpu:MAX:  Max\\: \\t%5.1lf %s",
		"GPRINT:cpu:AVERAGE: Avg\\: \\t%5.1lf %S",
		"GPRINT:cpu:LAST: Cur\\: \\t%5.1lf %S\\n",
		"COMMENT:Generated $now",
		"HRULE:0#000000";
	if ($ERROR = RRDs::error) { print "$0: unable to generate $_[0] $_[1] traffic graph: $ERROR\n"; }

}

# creates graph
# inputs: $_[0]: image file name
#	$_[1]: interval (ie, day, week, month, year)
#	$_[2]: interface description
#	$_[3]: DS Name
#	$_[4]: legend name
#	$_[5]: type for title
#	$_[6]: file prefix
sub createTimeSummary
{
	my $file = "$_[6]$_[0]";

	my $dt = DateTime->now( time_zone => 'local' );

	my $now = $dt->strftime('%Y-%m-%d %H\:%M');

	if(! -d $imageSubSystemPath)
	{
		make_path ($imageSubSystemPath) or die "Error creating directory: $imageSubSystemPath";
	}	

	RRDs::graph "$imageSubSystemPath/$file-$_[1].png",
		"-s -1$_[1]",
		"--lazy",
		"-h", "150", "-w", "600",
		"-l 0",
		"-a", "PNG",
		"-t", "Consolidated $_[5] Job Summary",
		"DEF:thrdplReqTm=$rrdSubSystemPath/$_[6]_ThreadPool_System_AvgReqTime.rrd:value:AVERAGE",
		"DEF:proxyReqTm=$rrdSubSystemPath/$_[6]_subsystem_ProxyRequests_AvgReqTime.rrd:value:AVERAGE",
		"DEF:pubBufFlTm=$rrdSubSystemPath/$_[6]_subsystem_Publisher_AverageBufferFillTime.rrd:value:AVERAGE",
		"DEF:pubComDur=$rrdSubSystemPath/$_[6]_subsystem_Publisher_AverageCommitDuration.rrd:value:AVERAGE",

		"CDEF:pubComDur_neg=pubComDur,-1,*",

		"AREA:proxyReqTm#72D072:Proxy Avg. Req. Time     ",
		"GPRINT:proxyReqTm:MAX:\\tMax\\: \\t%5.1lf %s",
		"GPRINT:proxyReqTm:AVERAGE: Avg\\: \\t%5.1lf %S",
		"GPRINT:proxyReqTm:LAST: Cur\\: \\t%5.1lf %S\\n",

		"AREA:thrdplReqTm#D07272:Thread Pool Avg. Req Time",
		"GPRINT:thrdplReqTm:MAX:\\tMax\\: \\t%5.1lf %s",
		"GPRINT:thrdplReqTm:AVERAGE: Avg\\: \\t%5.1lf %S",
		"GPRINT:thrdplReqTm:LAST: Cur\\: \\t%5.1lf %S\\n",

		"AREA:pubBufFlTm#7272D0:Pub. Buf. Fill. Avg.     ",
		"GPRINT:pubBufFlTm:MAX:\\tMax\\: \\t%5.1lf %s",
		"GPRINT:pubBufFlTm:AVERAGE: Avg\\: \\t%5.1lf %S",
		"GPRINT:pubBufFlTm:LAST: Cur\\: \\t%5.1lf %S\\n",

		"AREA:pubComDur_neg#6262A0:Pub. Commit Dur.         ",
		"GPRINT:pubComDur:MAX:\\tMax\\: \\t%5.1lf %s",
		"GPRINT:pubComDur:AVERAGE: Avg\\: \\t%5.1lf %S",
		"GPRINT:pubComDur:LAST: Cur\\: \\t%5.1lf %S\\n",		

		"LINE1:thrdplReqTm#B05252:",
		"LINE1:proxyReqTm#52B052:",
		"LINE1:pubBufFlTm#5252B0:",
		"LINE1:pubComDur_neg#424280:",

		"COMMENT:Generated $now",
		"HRULE:0#000000";
	if ($ERROR = RRDs::error) { print "$0: unable to generate $_[0] $_[1] traffic graph: $ERROR\n"; }

}

# creates graph
# inputs: $_[0]: image file name
#	$_[1]: interval (ie, day, week, month, year)
#	$_[2]: interface description
#	$_[3]: DS Name
#	$_[4]: legend name
#	$_[5]: type for title
#	$_[6]: file prefix
sub createDiskSummary
{
	my $file = "$_[6]$_[0]";

	my $dt = DateTime->now( time_zone => 'local' );

	my $now = $dt->strftime('%Y-%m-%d %H\:%M');

	if(! -d $imageServerStatsPath)
	{
		make_path ($imageServerStatsPath) or die "Error creating directory: $imageServerStatsPath";
	}	

	RRDs::graph "$imageServerStatsPath/$file-$_[1].png",
		"-s -1$_[1]",
		"--lazy",
		"-h", "150", "-w", "600",
		"-l 0",
		"-a", "PNG",
		"-t", "Disk Activity Volume $_[6]",
		#"--right-axis", "0.0001:0",
		#"-v %",
		"DEF:ReadSec=$rrdServerStatsPath/$_[6]_AvgDisksecPerRead.rrd:value:AVERAGE",
		"DEF:WriteSec=$rrdServerStatsPath/$_[6]_AvgDisksecPerWrite.rrd:value:AVERAGE",
		"DEF:DskQLen=$rrdServerStatsPath/$_[6]_CurrentDiskQueueLength.rrd:value:AVERAGE",

		"CDEF:WriteSec_neg=WriteSec,-1,*",

		"AREA:ReadSec#72D072:Reads/Second     ",
		"LINE1:ReadSec#92F092:",
		"GPRINT:ReadSec:MAX:\\tMax\\: \\t%5.1lf %s",
		"GPRINT:ReadSec:AVERAGE: Avg\\: \\t%5.1lf %S",
		"GPRINT:ReadSec:LAST: Cur\\: \\t%5.1lf %S\\n",

		"AREA:WriteSec_neg#D07272:Writes/Second    ",
		"LINE1:WriteSec_neg#92F092:",
		"GPRINT:WriteSec:MAX:\\tMax\\: \\t%5.1lf %s",
		"GPRINT:WriteSec:AVERAGE: Avg\\: \\t%5.1lf %S",
		"GPRINT:WriteSec:LAST: Cur\\: \\t%5.1lf %S\\n",

		"LINE1:DskQLen#101010:Disk Queue Length",
		"GPRINT:DskQLen:MAX:\\tMax\\: \\t%5.1lf %s",
		"GPRINT:DskQLen:AVERAGE: Avg\\: \\t%5.1lf %S",
		"GPRINT:DskQLen:LAST: Cur\\: \\t%5.1lf %S\\n",

		"COMMENT:Generated $now",
		"HRULE:0#000000",
		"-X 0";
	if ($ERROR = RRDs::error) { print "$0: unable to generate $_[0] $_[1] traffic graph: $ERROR\n"; }
}

sub createDBServerPageFaultsSummary
{

	my $file = "$_[6]$_[0]";

	my $dt = DateTime->now( time_zone => 'local' );

	my $now = $dt->strftime('%Y-%m-%d %H\:%M');

	if(! -d $imageServerStatsPath)
	{
		make_path ($imageServerStatsPath) or die "Error creating directory: $imageServerStatsPath";
	}

	if($deleteImgFirst > 0)
	{
		unlink "$imageServerStatsPath/$file-$_[1].png";
	}		

	RRDs::graph "$imageServerStatsPath/$file-$_[1].png",
		"-s -1$_[1]",
		"--lazy",
		"-h", "150", "-w", "600",
		"-l 0",
		"-a", "PNG",
		"-t", "SQL Server Stats",
		#"--right-axis", "0.0001:0",
		#"-v %",
		"DEF:cacheFaults=$rrdServerStatsPath/CacheFaultsSec.rrd:value:AVERAGE",
		"DEF:MemPgReads=$rrdServerStatsPath/MemoryPageReadsSec.rrd:value:AVERAGE",
		"DEF:PageFaults=$rrdServerStatsPath/PageFaultsSec.rrd:value:AVERAGE",

		"LINE1:cacheFaults#0000FF:Cache Faults/sec      :STACK",
		"AREA:cacheFaults#0000BF::STACK",
		"GPRINT:cacheFaults:MAX:\\tMax\\: \\t%5.1lf %s",
		"GPRINT:cacheFaults:AVERAGE: Avg\\: \\t%5.1lf %S",
		"GPRINT:cacheFaults:LAST: Cur\\: \\t%5.1lf %S\\n",

		"LINE1:MemPgReads#00FF00:Memory Page Reads/sec :STACK",
		"AREA:MemPgReads#00BF00::STACK",
		"GPRINT:MemPgReads:MAX:\\tMax\\: \\t%5.1lf %s",
		"GPRINT:MemPgReads:AVERAGE: Avg\\: \\t%5.1lf %S",
		"GPRINT:MemPgReads:LAST: Cur\\: \\t%5.1lf %S\\n",

		"LINE1:PageFaults#FF0000:Page Faults/sec       :STACK",
		"AREA:PageFaults#BF0000::STACK",
		"GPRINT:PageFaults:MAX:\\tMax\\: \\t%5.1lf %s",
		"GPRINT:PageFaults:AVERAGE: Avg\\: \\t%5.1lf %S",
		"GPRINT:PageFaults:LAST: Cur\\: \\t%5.1lf %S\\n",

		"COMMENT:Generated $now",
		"HRULE:0#000000";
	if ($ERROR = RRDs::error) { print "$0: unable to generate $_[0] $_[1] traffic graph: $ERROR\n"; }
}

# creates graph
# inputs: $_[0]: image file name
#	$_[1]: interval (ie, day, week, month, year)
#	$_[2]: interface description
#	$_[3]: DS Name
#	$_[4]: legend name
#	$_[5]: type for title
#	$_[6]: file prefix
sub createServerSummary
{

	my $file = "$_[6]$_[0]";

	my $dt = DateTime->now( time_zone => 'local' );

	my $now = $dt->strftime('%Y-%m-%d %H\:%M');

	if(! -d $imageServerStatsPath)
	{
		make_path ($imageServerStatsPath) or die "Error creating directory: $imageServerStatsPath";
	}

	if($deleteImgFirst > 0)
	{
		unlink "$imageServerStatsPath/$file-$_[1].png";
	}		

	RRDs::graph "$imageServerStatsPath/$file-$_[1].png",
		"-s -1$_[1]",
		"--lazy",
		"-h", "150", "-w", "600",
		"-l 0",
		"-a", "PNG",
		"-t", "SQL Server Stats",
		#"--right-axis", "0.0001:0",
		#"-v %",
		"DEF:cacheFaults=$rrdServerStatsPath/CacheFaultsSec.rrd:value:AVERAGE",
		"DEF:MemPgReads=$rrdServerStatsPath/MemoryPageReadsSec.rrd:value:AVERAGE",
		"DEF:PageFaults=$rrdServerStatsPath/PageFaultsSec.rrd:value:AVERAGE",
		"DEF:Pages=$rrdServerStatsPath/PageSec.rrd:value:AVERAGE",
		"DEF:TotalCPU=$rrdServerStatsPath/_total_currentProcessorUtilisation.rrd:value:AVERAGE",
		"DEF:ProcQLen=$rrdServerStatsPath/systemProcessorQueueLength.rrd:value:AVERAGE",

		"DEF:UsrConn=$rrdServerStatsPath/sqlServerGeneralUserConnections.rrd:value:AVERAGE",
		"DEF:MemGrants=$rrdServerStatsPath/sqlServerMemoryManagerMemoryGrantsPending.rrd:value:AVERAGE",
		"DEF:BtchReq=$rrdServerStatsPath/sqlServerSQLStatisticsBatchRequestsSec.rrd:value:AVERAGE",
		"DEF:Compilations=$rrdServerStatsPath/sqlServerSQLStatisticsCompilationsSec.rrd:value:AVERAGE",
		"DEF:Recompilations=$rrdServerStatsPath/sqlServerSQLStatisticsRecompilationsSec.rrd:value:AVERAGE",
		"DEF:RecompMax=$rrdServerStatsPath/sqlServerSQLStatisticsRecompilationsSec.rrd:value:MAX",

		"CDEF:cacheFaults_neg=cacheFaults,-1,*",
		"CDEF:MemPgReads_neg=MemPgReads,-1,*",
		"CDEF:PageFaults_neg=PageFaults,-1,*",
		"CDEF:Pages_neg=Pages,-1,*",
		"CDEF:TotalCPU_neg=TotalCPU,-1,*",
		"CDEF:ProcQLen_neg=ProcQLen,-1,*",

		"LINE1:UsrConn#0000FF:User Connections      ",
		"GPRINT:UsrConn:MAX:\\tMax\\: \\t%5.1lf %s",
		"GPRINT:UsrConn:AVERAGE: Avg\\: \\t%5.1lf %S",
		"GPRINT:UsrConn:LAST: Cur\\: \\t%5.1lf %S\\n",

		"LINE1:MemGrants#008FDF:Memory Grants pending :dashes=on",
		"GPRINT:MemGrants:MAX:\\tMax\\: \\t%5.1lf %s",
		"GPRINT:MemGrants:AVERAGE: Avg\\: \\t%5.1lf %S",
		"GPRINT:MemGrants:LAST: Cur\\: \\t%5.1lf %S\\n",

		"LINE1:BtchReq#00AFBF:Batch Requests/sec    ",
		"GPRINT:BtchReq:MAX:\\tMax\\: \\t%5.1lf %s",
		"GPRINT:BtchReq:AVERAGE: Avg\\: \\t%5.1lf %S",
		"GPRINT:BtchReq:LAST: Cur\\: \\t%5.1lf %S\\n",

		"LINE1:Compilations#00CF9F:Compilations/sec      ",
		"GPRINT:Compilations:MAX:\\tMax\\: \\t%5.1lf %s",
		"GPRINT:Compilations:AVERAGE: Avg\\: \\t%5.1lf %S",
		"GPRINT:Compilations:LAST: Cur\\: \\t%5.1lf %S\\n",

		"LINE1:Recompilations#00FF7F:Recompilations/sec    ",
		"GPRINT:RecompMax:MAX:\\tMax\\: \\t%5.1lf %s",
		"GPRINT:Recompilations:AVERAGE: Avg\\: \\t%5.1lf %S",
		"GPRINT:Recompilations:LAST: Cur\\: \\t%5.1lf %S\\n",

		####
		#
		"LINE1:cacheFaults_neg#F07272:Cache Faults/sec      ",
		"GPRINT:cacheFaults:MAX:\\tMax\\: \\t%5.1lf %s",
		"GPRINT:cacheFaults:AVERAGE: Avg\\: \\t%5.1lf %S",
		"GPRINT:cacheFaults:LAST: Cur\\: \\t%5.1lf %S\\n",

		"LINE1:MemPgReads_neg#E06262:Memory Page Read/sec  ",
		"GPRINT:MemPgReads:MAX:\\tMax\\: \\t%5.1lf %s",
		"GPRINT:MemPgReads:AVERAGE: Avg\\: \\t%5.1lf %S",
		"GPRINT:MemPgReads:LAST: Cur\\: \\t%5.1lf %S\\n",

		"LINE1:PageFaults_neg#D06262:Page Faults/sec       ",
		"GPRINT:PageFaults:MAX:\\tMax\\: \\t%5.1lf %s",
		"GPRINT:PageFaults:AVERAGE: Avg\\: \\t%5.1lf %S",
		"GPRINT:PageFaults:LAST: Cur\\: \\t%5.1lf %S\\n",

		"LINE1:Pages_neg#C05252:Pages                 ",
		"GPRINT:Pages:MAX:\\tMax\\: \\t%5.1lf %s",
		"GPRINT:Pages:AVERAGE: Avg\\: \\t%5.1lf %S",
		"GPRINT:Pages:LAST: Cur\\: \\t%5.1lf %S\\n",

		"LINE1:TotalCPU_neg#FF0000:Total CPU             ",
		"GPRINT:TotalCPU:MAX:\\tMax\\: \\t%5.1lf %s",
		"GPRINT:TotalCPU:AVERAGE: Avg\\: \\t%5.1lf %S",
		"GPRINT:TotalCPU:LAST: Cur\\: \\t%5.1lf %S\\n",

		"LINE1:ProcQLen_neg#FF5050:Process Queue         ",
		"GPRINT:ProcQLen:MAX:\\tMax\\: \\t%5.1lf %s",
		"GPRINT:ProcQLen:AVERAGE: Avg\\: \\t%5.1lf %S",
		"GPRINT:ProcQLen:LAST: Cur\\: \\t%5.1lf %S\\n",		

		"COMMENT:Generated $now",
		"HRULE:0#000000";
	if ($ERROR = RRDs::error) { print "$0: unable to generate $_[0] $_[1] traffic graph: $ERROR\n"; }

}

monitorM3GraphNodes.pl

monitorM3GraphNodes.pl <business engine hostname>/<monitor port> <time frame>

This script does the graphing for our grid nodes.

#!/usr/bin/perl
#
# Retrieve the monitoring information from M3
#
#	monitorM3GraphNodes.pl <path> <timeframe> [deleteimages] [debug]
#
# Args:
#	@ARGV[0]	=	server/port (eg. ifbepd/16008)
#	@ARGV[1]	=	timeframe (eg. day, week, month, year)
#	deleteimages	= delete the images first
#	debug		= turn on debugging messages

use RRDs;
use File::Path qw(make_path remove_tree);
use LWP;
use XML::XPath;
use XML::XPath::XMLParser;
use Mail::Sendmail;
use DateTime;
use DateTime::Format::DateParse;
use Scalar::Util qw(looks_like_number);
use File::Basename;

# define location of rrdtool databases
my $rrdDatabasePath = '/var/lib/jbcmon/server';

# where we will output the graph images
my $imagePath = '/srv/www/htdocs/server';

$rrdDatabasePath = "$rrdDatabasePath/$ARGV[0]";
$imagePath = "$imagePath/$ARGV[0]";

# delete the images before we generate the graph
$deleteImgFirst = 0;

$debug = 0;

if(@ARGV > 2)
{
	my $argPosition = 0;
	foreach(@ARGV)
	{
		if($argPosition > 2)
		{
			if($_ eq 'debug')
			{
				$debug = 1;
			}
			elsif($_ eq 'deleteimages')
			{
				$deleteImgFirst = 1;
				print "Images will be deleted before graphing first\n";
			}
		}
		$argPosition++;
	}
}

$timeFrame = $ARGV[1];

open my $nodeFD, '<', "$rrdDatabasePath/nodes"; open my $summaryFile, ">", "$imagePath/summary-nodesStatus.html";
open my $summaryFileLog, ">", "$imagePath/summary-nodesStatusLog.html";

print $summaryFile "<HTML>
 <HEAD>
  <TITLE>Grid Nodes Health Summary</TITLE>
  <meta http-equiv=\"refresh\" content=\"60\">
 </HEAD>
 <BODY>
<table cellspacing='0' border='0' cellpadding='2.5' style='font-family:Arial; font-size:small; border:1px solid black;'>
  ";

print $summaryFileLog "<HTML>
 <HEAD>
  <TITLE>Grid Nodes Health Log Summary</TITLE>
  <meta http-equiv=\"refresh\" content=\"60\">
 </HEAD>
 <BODY>
<table cellspacing='0' border='0' cellpadding='2.5' style='font-family:Arial; font-size:small; border:1px solid black;'>
  ";  

my $count = 0;

while(my $currentLine = <$nodeFD>)
{
	if($count == 0)
	{
		print $summaryFile "
<tr>";
	}

	chomp ($currentLine);
	createNodeSummary($currentLine, $timeFrame, "", "", "", "", "nodesStatus");

	my $fileName = generatePNGName("$currentLine", $timeFrame, "nodesStatus");
	my $fileNameLog = generatePNGName("$currentLine", "log_$timeFrame", "nodesStatus");

	print $summaryFile "
<td>
<div align='center' style='background-color:#4f81bd;  color:White;'><b>$currentLine (24hrs)</b></div>
<img src='./$fileName'/></td>
";

	print $summaryFileLog "
<td>
<div align='center' style='background-color:#4f81bd;  color:White;'><b>$currentLine (24hrs)</b></div>
<img src='./$fileNameLog'/></td>
";

	if($count == 1)
	{
		print $summaryFile "</tr>
";
		print $summaryFileLog "</tr>
";
		$count = 0;
	}
	else
	{
		$count = 1;
	}
}
close $summaryFile;
close $summaryFileLog;

close $nodeFD;

# $_[0]: image file name
# $_[1]: interval (ie, day, week, month, year)
# $_[2]: file prefix
sub generatePNGName
{
	my $file = "$_[2]_$_[0]-$_[1].png";

	return($file);
}

# creates graph
# inputs: $_[0]: image file name
#	$_[1]: interval (ie, day, week, month, year)
#	$_[2]: interface description
#	$_[3]: DS Name
#	$_[4]: legend name
#	$_[5]: type for title
#	$_[6]: file prefix
sub createNodeSummary
{
	my $fileName = generatePNGName($_[0], $_[1], $_[6]);
	my $fileNameLog = generatePNGName($_[0], "log_$_[1]", $_[6]);

	my $dt = DateTime->now( time_zone => 'local' );

	my $now = $dt->strftime('%Y-%m-%d %H\:%M');

	if(! -d $imagePath)
	{
		make_path ($imagePath) or die "Error creating directory: $imagePath";
	}	

	if($deleteImgFirst > 0)
	{
		unlink "$imagePath/$fileName";
		unlink "$imagePath/$fileNameLog";

	}

	RRDs::graph "$imagePath/$fileName",
		"-s -1$_[1]",
		"--lazy",
		"-h", "150", "-w", "600",
		"-l 0",
		"-a", "PNG",
		"DEF:CPUUtil=$rrdDatabasePath/$_[6]_$_[0]_CPUPercentage.rrd:value:AVERAGE",
		"DEF:MemMax=$rrdDatabasePath/$_[6]_$_[0]_MemoryMax.rrd:value:AVERAGE",
		"DEF:MemUsed=$rrdDatabasePath/$_[6]_$_[0]_MemoryUsed.rrd:value:AVERAGE",

		"CDEF:CPUUtil_neg=CPUUtil,-1,*",

		"AREA:MemMax#72D072:Memory Max          ",
		"LINE1:MemMax#92F092:",
		"GPRINT:MemMax:MAX:\\tMax\\: \\t%5.1lf %s",
		"GPRINT:MemMax:AVERAGE: Avg\\: \\t%5.1lf %S",
		"GPRINT:MemMax:LAST: Cur\\: \\t%5.1lf %S\\n",

		"AREA:MemUsed#A06262:Memory Used         ",
		"LINE1:MemUsed#B07272:",
		"GPRINT:MemUsed:MAX:\\tMax\\: \\t%5.1lf %s",
		"GPRINT:MemUsed:AVERAGE: Avg\\: \\t%5.1lf %S",
		"GPRINT:MemUsed:LAST: Cur\\: \\t%5.1lf %S\\n",

		"LINE1:CPUUtil_neg#101010:CPU                 ",
		"GPRINT:CPUUtil:MAX:\\tMax\\: \\t%5.1lf %s",
		"GPRINT:CPUUtil:AVERAGE: Avg\\: \\t%5.1lf %S",
		"GPRINT:CPUUtil:LAST: Cur\\: \\t%5.1lf %S\\n",

		"COMMENT:Generated $now",
		"HRULE:0#000000";
	if ($ERROR = RRDs::error) { print "$0: unable to generate $_[0] $_[1] traffic graph: $ERROR\n"; }

	RRDs::graph "$imagePath/$fileNameLog",
		"-s -1$_[1]",
		"--lazy",
		"-h", "150", "-w", "600",
		"-l 0",
		"-a", "PNG",
		"-X", "0",

		"DEF:LogErr=$rrdDatabasePath/$_[6]_$_[0]_LogErrCount.rrd:value:AVERAGE",
		"DEF:LogSysErr=$rrdDatabasePath/$_[6]_$_[0]_LogSysErrCount.rrd:value:AVERAGE",
		"DEF:LogSysWrn=$rrdDatabasePath/$_[6]_$_[0]_LogSysWarnCount.rrd:value:AVERAGE",
		"DEF:LogWrn=$rrdDatabasePath/$_[6]_$_[0]_LogWarnCount.rrd:value:AVERAGE",

		"LINE1:LogErr#F08282:Log Error           ",
		"GPRINT:LogErr:MAX:\\tMax\\: \\t%5.0lf %s",
		"GPRINT:LogErr:AVERAGE: Avg\\: \\t%5.0lf %s",
		"GPRINT:LogErr:LAST: Cur\\: \\t%5.0lf %s\\n",

		"LINE1:LogSysErr#F04242:Log System Error    ",
		"GPRINT:LogSysErr:MAX:\\tMax\\: \\t%5.0lf %s",
		"GPRINT:LogSysErr:AVERAGE: Avg\\: \\t%5.0lf %s",
		"GPRINT:LogSysErr:LAST: Cur\\: \\t%5.0lf %s\\n",

		"LINE1:LogWrn#8282F0:Log Warnings        ",
		"GPRINT:LogWrn:MAX:\\tMax\\: \\t%5.0lg %s",
		"GPRINT:LogWrn:AVERAGE: Avg\\: \\t%5.0lg %s",
		"GPRINT:LogWrn:LAST: Cur\\: \\t%5.0lg %s\\n",

		"LINE1:LogSysWrn#4242F0:Log System Warnings ",
		"GPRINT:LogSysWrn:MAX:\\tMax\\: \\t%5.0lf %s",
		"GPRINT:LogSysWrn:AVERAGE: Avg\\: \\t%5.0lf %s",
		"GPRINT:LogSysWrn:LAST: Cur\\: \\t%5.0lf %s\\n",

		"COMMENT:Generated $now",
		"HRULE:0#000000";
	if ($ERROR = RRDs::error) { print "$0: unable to generate $_[0] $_[1] traffic graph: $ERROR\n"; }
}

generateGraphs.pl

generateGraphs.pl <business engine hostname>/<monitor port> <time frame>

Generic graphing.

#!/usr/bin/perl
#
# Retrieve the monitoring information from M3

use RRDs;
use File::Path qw(make_path remove_tree);
use LWP;
use XML::XPath;
use XML::XPath::XMLParser;
use Mail::Sendmail;
use DateTime;
use DateTime::Format::DateParse;
use Scalar::Util qw(looks_like_number);
use File::Basename;

# define location of rrdtool databases
my $rrdDatabasePath = '/var/lib/jbcmon/server';

# where we will output the graph images
my $imagePath = '/srv/www/htdocs/server';

#	$ARGV[0]: relative path (from the server directory) for the rrdDatabase
#	$ARGV[1]: timeframe (eg. day, week, month, year)
#	$ARGV[3]: 

$rrdDatabasePath = "$rrdDatabasePath/$ARGV[0]";
$imagePath = "$imagePath/$ARGV[0]";

$deleteImgFirst = 0;

if(@ARGV > 2)
{
	#print "Delete Image First $ARGV[2]\n";
	if($ARGV[2] eq 'true')
	{
		$deleteImgFirst = 1;
		print "Images will be deleted before graphing first\n";
	}
}

$timeFrame = $ARGV[1];

open my $summaryFile, ">", "$imagePath/general.html";

print $summaryFile "<HTML>
 <HEAD>
  <TITLE>- All Graphs -</TITLE>
  <meta http-equiv=\"refresh\" content=\"300\">
 </HEAD>
 <BODY>
<table cellspacing='0' border='0' cellpadding='2.5' style='font-family:Arial; font-size:small; border:1px solid black;'>
  ";

my $count = 0;

opendir (DIR, $rrdDatabasePath) or die $!;

@files = grep(/\.rrd$/, readdir(DIR));

foreach my $file (@files)
# ( sort { $a <=> $b } readdir DIR )
{
	my $fullPath = "$rrdDatabasePath/$file";

	if(-f "$fullPath")
	{
		if($count == 0)
		{
			print $summaryFile "
<tr>";
		}
		createSummary($file, $timeFrame);

		print $summaryFile "
<td>
<div align='center' style='background-color:#4f81bd;  color:White;'><b>$file (24hrs)</b></div>
<img src='./$file.png'/></td>
";

		if($count == 1)
		{
			print $summaryFile "</tr>
";
			$count = 0;
		}
		else
		{
			$count = 1;
		}
	}
}

closedir(DIR);

close $summaryFile;

# creates graph
# inputs:
#	$_[0]: filename
#	$_[1]: interval (ie, day, week, month, year)
sub createSummary
{
	my $fullPath = "$rrdDatabasePath/$_[0]";

	#my $file = "$_[6]_$_[0]";

	my $dt = DateTime->now( time_zone => 'local' );

	my $now = $dt->strftime('%Y-%m-%d %H\:%M');

	if(! -d $imagePath)
	{
		make_path ($imagePath) or die "Error creating directory: $imagePath";
	}	

	if($deleteImgFirst > 0)
	{
		unlink "$imagePath/$_[0].png";
	}	

	# print "Image Path: $imagePath/$fileName\n";

	RRDs::graph "$imagePath/$_[0].png",
		"-s -1$_[1]",
		"--lazy",
		"-h", "150", "-w", "600",
		"-l 0",
		"-a", "PNG",
		#"-t", "Disk Activity Volume $_[6]",
		#"--right-axis", "0.0001:0",
		#"-v %",
		"DEF:CPUUtil=$rrdDatabasePath/$_[0]:value:AVERAGE",

		"AREA:CPUUtil#72D072:          ",
		"LINE1:CPUUtil#92F092:",
		"GPRINT:CPUUtil:MAX:\\tMax\\: \\t%5.1lf %s",
		"GPRINT:CPUUtil:AVERAGE: Avg\\: \\t%5.1lf %S",
		"GPRINT:CPUUtil:LAST: Cur\\: \\t%5.1lf %S\\n",

		"COMMENT:Generated $now",
		"HRULE:0#000000";
	if ($ERROR = RRDs::error) { print "$0: unable to generate $_[0] $_[1] traffic graph: $ERROR\n"; }
}

perfData.ps1

This is a Powershell script that we run from a Windows server as a user that has the rights to query the performance counters on a Windows SQL Server server. It will gather the stats and then FTP them to our monitoring server. (username and password for the ftp server are set at the bottom of the script)

In my example, D: E: F: refer to the drives that the database is on – we are extracting the perf counters for the drives.

#
#	Retrieve some interesting performance statistics that we will use for graphing
# 	Drop an output file in the current directory
#
# -server <name of the machine we will query>
# -ftpout <server we will ftp the data to>
# -drives <comma delimited list of the drives we want to check>
param(
	[string]$server,
	[string]$ftpout,
	[string]$drives
)

# we want to ignore the errors
$ErrorActionPreference = "Ignore"

[string[]] $disksToMonitor = $drives.split(',')

$computerName = $server
# "ifdbnp"
#$ENV:Computername

$logicalDisks = Get-WmiObject -Query "select Name, AvgDisksecPerRead, AvgDisksecPerWrite, CurrentDiskQueueLength, PercentFreeSpace from Win32_perfformatteddata_perfdisk_LogicalDisk" | where { $disksToMonitor -contains $_.Name }

# $physicalDisks = Get-WmiObject -Query "select Name, AvgDisksecPerRead, AvgDisksecPerWrite from Win32_perfformatteddata_perfdisk_PhysicalDisk" | where { $disksToMonitor -contains $_.Name }

$procs = (Get-Counter "\\$computerName\Processor(*)\% Processor Time").CounterSamples

$memPageFaultsSec = (Get-Counter "\\$computerName\Memory\Page Faults/sec").CounterSamples
$memPageSec = (Get-Counter "\\$computerName\Memory\Pages/sec").CounterSamples

$memCacheFaultsSec = (Get-Counter "\\$computerName\Memory\Cache Faults/sec").CounterSamples
$memMemoryPageReadsSec = (Get-Counter "\\$computerName\Memory\Page Reads/sec").CounterSamples

$memAvailableBytes = (Get-Counter "\\$computerName\Memory\Available Bytes").CounterSamples
$memCommittedBytes = (Get-Counter "\\$computerName\Memory\Committed Bytes").CounterSamples
$memAvailableMBytes = (Get-Counter "\\$computerName\Memory\Available MBytes").CounterSamples

# http://www.brentozar.com/archive/2006/12/dba-101-using-perfmon-for-sql-performance-tuning/
$pagingFileUsage = (Get-Counter "\\$computerName\Paging File(_total)\% Usage").CounterSamples

$systemProcessorQueueLength = (Get-Counter "\\$computerName\System\Processor Queue Length").CounterSamples

# don't stop the script execution if we don't have these counters on the server, if SQL Server isn't installed, they won't be there
$sqlServerGeneralUserConnections = (Get-Counter "\\$computerName\SQLServer:General Statistics\User Connections").CounterSamples
$sqlServerMemoryManagerMemoryGrantsPending = (Get-Counter "\\$computerName\SQLServer:Memory Manager\Memory Grants Pending").CounterSamples
$sqlServerSQLStatisticsBatchRequestsSec = (Get-Counter "\\$computerName\SQLServer:SQL Statistics\Batch Requests/sec").CounterSamples
$sqlServerSQLStatisticsRecompilationsSec = (Get-Counter "\\$computerName\SQLServer:SQL Statistics\SQL Re-compilations/sec").CounterSamples
$sqlServerSQLStatisticsCompilationsSec = (Get-Counter "\\$computerName\SQLServer:SQL Statistics\SQL Compilations/sec").CounterSamples

#
$sqlServerBuffManagerBufferCacheHitRatio = (Get-Counter "\\$computerName\\SQLServer:Buffer Manager\Buffer Cache Hit Ratio").CounterSamples
$sqlServerMemManagerTotalServerMemory = (Get-Counter "\\$computerName\\SQLServer:Memory Manager\Total Server Memory (KB)").CounterSamples
$sqlServerBuffManagerPageLifeExpectancy = (Get-Counter "\\$computerName\\SQLServer:Buffer Manager\Page Life Expectancy").CounterSamples
$sqlServerBuffManagerPageReadsSec = (Get-Counter "\\$computerName\\SQLServer:Buffer Manager\Page reads/sec").CounterSamples

$timeStamp = (Get-Date -Format "yyyyMMddHHmm")

# now we do a bit of a butchery to get the xml out
# should probably look at better ways to handle this in the future

$perfObject = "<?xml version=""1.0""?><PerfSample>"
$perfObject = $perfObject + "<Timestamp>" + (Get-Date -Format "yyyy-MM-dd HH:mm") + "</Timestamp>"
$perfObject = $perfObject + "<ComputerName>" + $computerName  + "</ComputerName>"

$perfObject = $perfObject + "<Disks>"
foreach($disk in $logicalDisks)
{
	$perfObject = $perfObject + "<Disk>"

	$perfObject = $perfObject + "<Name>" + $disk.Name + "</Name>"
	$perfObject = $perfObject + "<AvgDisksecPerRead>" + $disk.AvgDisksecPerRead + "</AvgDisksecPerRead>"
	$perfObject = $perfObject + "<CurrentDiskQueueLength>" + $disk.CurrentDiskQueueLength + "</CurrentDiskQueueLength>"
	$perfObject = $perfObject + "<AvgDisksecPerWrite>" + $disk.AvgDisksecPerWrite + "</AvgDisksecPerWrite>"
	$perfObject = $perfObject + "<PercentFreeSpace>" + $disk.PercentFreeSpace + "</PercentFreeSpace>"

	$perfObject = $perfObject + "</Disk>"
}
$perfObject = $perfObject + "</Disks>"

$perfObject = $perfObject + "<Processors>"
foreach($processor in $procs)
{
	$perfObject = $perfObject + "<Processor>"

	$perfObject = $perfObject + "<Name>" + $processor.InstanceName + "</Name>"
	$perfObject = $perfObject + "<Utilisation>" + $processor.CookedValue + "</Utilisation>"

	$perfObject = $perfObject + "</Processor>"
}
$perfObject = $perfObject + "</Processors>"

$perfObject = $perfObject + "<SQLCounters>"

$perfObject = $perfObject + "<sqlServerGeneralUserConnections>" + $sqlServerGeneralUserConnections.CookedValue + "</sqlServerGeneralUserConnections>"
$perfObject = $perfObject + "<sqlServerMemoryManagerMemoryGrantsPending>" + $sqlServerMemoryManagerMemoryGrantsPending.CookedValue + "</sqlServerMemoryManagerMemoryGrantsPending>"
$perfObject = $perfObject + "<sqlServerSQLStatisticsBatchRequestsSec>" + $sqlServerSQLStatisticsBatchRequestsSec.CookedValue + "</sqlServerSQLStatisticsBatchRequestsSec>"
$perfObject = $perfObject + "<sqlServerSQLStatisticsRecompilationsSec>" + $sqlServerSQLStatisticsRecompilationsSec.CookedValue + "</sqlServerSQLStatisticsRecompilationsSec>"
$perfObject = $perfObject + "<sqlServerSQLStatisticsCompilationsSec>" + $sqlServerSQLStatisticsCompilationsSec.CookedValue + "</sqlServerSQLStatisticsCompilationsSec>"

$perfObject = $perfObject + "<sqlServerBuffManagerBufferCacheHitRatio>" + $sqlServerBuffManagerBufferCacheHitRatio.CookedValue + "</sqlServerBuffManagerBufferCacheHitRatio>"
$perfObject = $perfObject + "<sqlServerMemManagerTotalServerMemory>" + $sqlServerMemManagerTotalServerMemory.CookedValue + "</sqlServerMemManagerTotalServerMemory>"
$perfObject = $perfObject + "<sqlServerBuffManagerPageLifeExpectancy>" + $sqlServerBuffManagerPageLifeExpectancy.CookedValue + "</sqlServerBuffManagerPageLifeExpectancy>"
$perfObject = $perfObject + "<sqlServerBuffManagerPageReadsSec>" + $sqlServerBuffManagerPageReadsSec.CookedValue + "</sqlServerBuffManagerPageReadsSec>"

$perfObject = $perfObject + "</SQLCounters>"

$perfObject = $perfObject + "<Memory>"
$perfObject = $perfObject + "<PageFaultsSec>" + $memPageFaultsSec.CookedValue  + "</PageFaultsSec>"
$perfObject = $perfObject + "<CacheFaultsSec>" + $memCacheFaultsSec.CookedValue  + "</CacheFaultsSec>"
$perfObject = $perfObject + "<MemoryPageReadsSec>" + $memMemoryPageReadsSec.CookedValue  + "</MemoryPageReadsSec>"
$perfObject = $perfObject + "<PageSec>" + $memPageSec.CookedValue  + "</PageSec>"
$perfObject = $perfObject + "<CommittedBytes>" + $memCommittedMBytes.CookedValue  + "</CommittedBytes>"
$perfObject = $perfObject + "<AvailableMBytes>" + $memAvailableMBytes.CookedValue  + "</AvailableMBytes>"
$perfObject = $perfObject + "<AvailableBytes>" + $memAvailableBytesBytes.CookedValue  + "</AvailableBytes>"
$perfObject = $perfObject + "</Memory>"

$perfObject = $perfObject + "<pagingFileUsage>" + $pagingFileUsage.CookedValue  + "</pagingFileUsage>"
$perfObject = $perfObject + "<systemProcessorQueueLength>" + $systemProcessorQueueLength.CookedValue  + "</systemProcessorQueueLength>"

$perfObject = $perfObject + "</PerfSample>"

$outFileName = "$timeStamp-$computerName-Perf.xml"

$perfObject > $outFileName

if($ftpout.length -gt 0)
{
	# http://stackoverflow.com/questions/1867385/upload-files-with-ftp-using-powershell
	# http://www.unixmen.com/how-to-setup-ftp-server-on-opensuse-42-1/

	$srv = $ftpout

	$user = "monitorupload"
	$password = "P@55w0rd"

	#write-host "Filename: $outFileName" 

	$ftp = "ftp://$ftpout/$outFileName"

	#write-host "$PSScriptRoot\$outFileName"

	$ftp = [System.Net.FtpWebRequest]::Create("ftp://$ftpout/$outFileName")
	$ftp = [System.Net.FtpWebRequest]$ftp
	$ftp.Method = [System.Net.WebRequestMethods+Ftp]::UploadFile
	$ftp.Credentials = new-object System.Net.NetworkCredential($user,$password)
	$ftp.UseBinary = $true
	$ftp.UsePassive = $true
	# read in the file to upload as a byte array
	$content = [System.IO.File]::ReadAllBytes("$PSScriptRoot\$outFileName")
	$ftp.ContentLength = $content.Length
	#write-host "Content Length: " + $content.Length
	# get the request stream, and write the bytes into it
	$rs = $ftp.GetRequestStream()
	$rs.Write($content, 0, $content.Length)
	# be sure to clean up after ourselves
	$rs.Close()
	$rs.Dispose()

	Remove-Item "$PSScriptRoot\$outFileName"
}

monitorProcWindowsXML.pl

This is a bit of a special script and is used in conjunction with perfData.ps1, it will take the data from perfData.ps1 that has been ftped and it will read the data and push it in to rrdtool archives that we can use for graphing.

#!/usr/bin/perl
#
# based upon  http://martybugs.net/linux/rrdtool/traffic.cgi
#
# rrd_interfaces.pl

use RRDs;
use LWP;
use File::Path qw(make_path remove_tree);
use XML::XPath;
use XML::XPath::XMLParser;
use DateTime::Format::DateParse;

use Scalar::Util qw(looks_like_number);

# define location of rrdtool databases
my $rrdDatabasePath = '/var/lib/jbcmon/server';
# define location of images
my $imgPath = '/srv/www/htdocs/jbcmon/server';

my $graphOnly = 0;

####
# Start
####

$fileRawContent = readXMLFile($ARGV[0]);

#print "Paths: $ARGV[0]\n";

my $xpathContent = XML::XPath->new($fileRawContent);

my $perfSamples = $xpathContent->findnodes("//PerfSample");

foreach my $currentNode ($perfSamples->get_nodelist)
{
	my $timeStamp = $currentNode->findnodes('./Timestamp');
	my $computerName = $currentNode->findnodes('./ComputerName');

	my $baseDatabasePath = "$rrdDatabasePath/$computerName";

	#print "My Database Path = $baseDatabasePath\n";

	my $dt = DateTime::Format::DateParse->parse_datetime($timeStamp);

	$timeStamp = $dt->strftime('%b %d %Y %H:%M');

	my $currentDisks = $currentNode->findnodes("./Disks");

	my $diskCount = $currentDisks->size;
	#print "Disk Count: $diskCount\n";

	foreach my $disks ($currentDisks->get_nodelist)
	{
		my $currentDisks = $disks->findnodes("./Disk");

		foreach my $currentDisk ($currentDisks->get_nodelist)
		{
			#my $xpathcurrentDisk = XML::XPath->new(context => $currentDisk);
			#print "\t$xpathcurrentDisk\n";

			my $currentDiskName = $currentDisk->findnodes('./Name');
			#print "\tCurrent Disk Name: $currentDiskName\n";
			my $safeDiskName = $currentDiskName;
			$safeDiskName =~ tr/:/_/;
			#print "\tCurrent safe Disk Name: $safeDiskName\n";

			my $AvgDisksecPerRead = $currentDisk->findnodes('./AvgDisksecPerRead');

			updateRRDFile("$baseDatabasePath/$safeDiskName\AvgDisksecPerRead", $timeStamp, "AvgDisksecPerRead", $AvgDisksecPerRead, "GAUGE", "Average Disk Reads per Sec");

			my $CurrentDiskQueueLength = $currentDisk->findnodes('./CurrentDiskQueueLength');

			updateRRDFile("$baseDatabasePath/$safeDiskName\CurrentDiskQueueLength", $timeStamp, "CurDiskQueueLen", $CurrentDiskQueueLength, "GAUGE", "Current Disk Queue Length");

			my $AvgDisksecPerWrite = $currentDisk->findnodes('./AvgDisksecPerWrite');

			updateRRDFile("$baseDatabasePath/$safeDiskName\AvgDisksecPerWrite", $timeStamp, "AvgDisksecPerWrite", $AvgDisksecPerWrite, "GAUGE", "Average Disk Writes per Sec");

			my $PercentFreeSpace = $currentDisk->findnodes('./PercentFreeSpace');

			updateRRDFile("$baseDatabasePath/$safeDiskName\PercentFreeSpace", $timeStamp, "PercentFreeSpace", $PercentFreeSpace, "GAUGE", "Percentage Free Space");
		}
	}

	my $currentProcessors = $currentNode->findnodes("./Processors");

	my $procCount = $currentProcessors->size;
	#print "Proc Count: $procCount\n";	

	foreach my $proc ($currentProcessors->get_nodelist)
	{
		my $currentProcessors = $proc->findnodes("./Processor");

		foreach my $currentProcessor ($currentProcessors->get_nodelist)
		{
			my $currentProcessorName = $currentProcessor->findnodes('./Name');
			my $currentProcessorUtilisation = $currentProcessor->findnodes('./Utilisation');

			#print "$baseDatabasePath\_$currentProcessorName\_currentProcessorUtilisation\n";

			updateRRDFile("$baseDatabasePath/$currentProcessorName\_currentProcessorUtilisation", $timeStamp, "curProcUtil", $currentProcessorUtilisation, "GAUGE", "Current Processor Utilisation");
		}
	}

	my $PageFaultsSec = $currentNode->findnodes("./PageFaultsSec");

	#print "BasePath: $baseDatabasePath\_PageFaultsSec\n";

	updateRRDFile("$baseDatabasePath/PageFaultsSec", $timeStamp, "PageFaultsSec", $PageFaultsSec, "GAUGE", "Page Faults per Second");

	my $CacheFaultsSec = $currentNode->findnodes("./CacheFaultsSec");

	updateRRDFile("$baseDatabasePath/CacheFaultsSec", $timeStamp, "CacheFaultsSec", $CacheFaultsSec, "GAUGE", "Cache Faults per Second");

	my $MemoryPageReadsSec = $currentNode->findnodes("./MemoryPageReadsSec");

	updateRRDFile("$baseDatabasePath/MemoryPageReadsSec", $timeStamp, "MemoryPageReadsSec", $MemoryPageReadsSec, "GAUGE", "Memory Page Reads per Second");

	my $PageSec = $currentNode->findnodes("./PageSec");

	updateRRDFile("$baseDatabasePath/PageSec", $timeStamp, "PageSec", $PageSec, "GAUGE", "Pages per Second");

	#my $CommittedBytes = $currentNode->findnodes("./CommittedBytes");
	my $AvailableMBytes = $currentNode->findnodes("./AvailableMBytes");

	updateRRDFile("$baseDatabasePath/AvailableMBytes", $timeStamp, "AvailableMBytes", $AvailableMBytes, "GAUGE", "Available MBytes");
	#my $AvailableBytes = $currentNode->findnodes("./AvailableBytes");

	my $pagingFileUsage = $currentNode->findnodes("./pagingFileUsage");
	updateRRDFile("$baseDatabasePath/pagingFileUsage", $timeStamp, "pagingFileUsage", $pagingFileUsage, "GAUGE", "Paging File Usage");
	my $systemProcessorQueueLength = $currentNode->findnodes("./systemProcessorQueueLength");
	updateRRDFile("$baseDatabasePath/systemProcessorQueueLength", $timeStamp, "ProcQueueLen", $systemProcessorQueueLength, "GAUGE", "Processor Queue Length");

	# sql counters
	my $sqlServerGeneralUserConnections = $currentNode->findnodes("./SQLCounters/sqlServerGeneralUserConnections");
	#print "SQL User Connections: $sqlServerGeneralUserConnections\n";
	#print "$baseDatabasePath\_sqlServerGeneralUserConnections\n";
	updateRRDFile("$baseDatabasePath/sqlServerGeneralUserConnections", $timeStamp, "sqlUserConn", $sqlServerGeneralUserConnections, "GAUGE", "SQL User Connection");

	my $sqlServerMemoryManagerMemoryGrantsPending = $currentNode->findnodes("./SQLCounters/sqlServerMemoryManagerMemoryGrantsPending");
	updateRRDFile("$baseDatabasePath/sqlServerMemoryManagerMemoryGrantsPending", $timeStamp, "sqlMemGrantsPend", $sqlServerMemoryManagerMemoryGrantsPending, "GAUGE", "SQL Memory Grants Pending");

	my $sqlServerSQLStatisticsBatchRequestsSec = $currentNode->findnodes("./SQLCounters/sqlServerSQLStatisticsBatchRequestsSec");
	updateRRDFile("$baseDatabasePath/sqlServerSQLStatisticsBatchRequestsSec", $timeStamp, "sqlBatchReqSec", $sqlServerSQLStatisticsBatchRequestsSec, "GAUGE", "SQL Batch Requests per Second");

	my $sqlServerSQLStatisticsRecompilationsSec = $currentNode->findnodes("./SQLCounters/sqlServerSQLStatisticsRecompilationsSec");
	updateRRDFile("$baseDatabasePath/sqlServerSQLStatisticsRecompilationsSec", $timeStamp, "sqlRecompSec", $sqlServerSQLStatisticsRecompilationsSec, "GAUGE", "SQL Recomplications per Second");

	my $sqlServerSQLStatisticsCompilationsSec = $currentNode->findnodes("./SQLCounters/sqlServerSQLStatisticsCompilationsSec");
	updateRRDFile("$baseDatabasePath/sqlServerSQLStatisticsCompilationsSec", $timeStamp, "sqlCompSec", $sqlServerSQLStatisticsCompilationsSec, "GAUGE", "SQL Compilations per Second");

	my $sqlServerBuffManagerBufferCacheHitRatio = $currentNode->findnodes("./SQLCounters/sqlServerBuffManagerBufferCacheHitRatio");
	updateRRDFile("$baseDatabasePath/sqlServerBuffManagerBufferCacheHitRatio", $timeStamp, "sqlCompSec", $sqlServerBuffManagerBufferCacheHitRatio, "GAUGE", "SQL CBuffer Cache Hit Ratio");

	my $sqlServerMemManagerTotalServerMemory = $currentNode->findnodes("./SQLCounters/sqlServerMemManagerTotalServerMemory");
	updateRRDFile("$baseDatabasePath/sqlServerMemManagerTotalServerMemory", $timeStamp, "sqlCompSec", $sqlServerMemManagerTotalServerMemory, "GAUGE", "SQL Total Server Memory");

	my $sqlServerBuffManagerPageLifeExpectancy = $currentNode->findnodes("./SQLCounters/sqlServerBuffManagerPageLifeExpectancy");
	updateRRDFile("$baseDatabasePath/sqlServerBuffManagerPageLifeExpectancy", $timeStamp, "sqlCompSec", $sqlServerBuffManagerPageLifeExpectancy, "GAUGE", "SQL Buff Page Life Expectancy");

	my $sqlServerBuffManagerPageReadsSec = $currentNode->findnodes("./SQLCounters/sqlServerBuffManagerPageReadsSec");
	updateRRDFile("$baseDatabasePath/sqlServerBuffManagerPageReadsSec", $timeStamp, "sqlCompSec", $sqlServerBuffManagerPageReadsSec, "GAUGE", "SQL Buf Page Reads Sec");

}

sub readXMLFile
{
  local $/ = undef;
  open FILE, $_[0] or warn "Couldn't open file: $!";
  binmode FILE;
  $string = <FILE>;
  close FILE;

  return $string
}

# Args:
#	$_[0] = file path
# 	$_[1] = DS definition
#	$_[2] = type (GAUGE, COUNTER)
sub createGaugeRRDFile
{
	if(! -d $rrdDatabasePath)
	{
		make_path ($rrdDatabasePath) or die "Error creating directory: $rrdDatabasePath";
	}

	# max 1 minute
	# max 5 minutes
	if(! -e $_[0].rrd)
	{
		#print "About to create rrd file $_[0].rrd\n";
		#print " $_[1]\n";
		RRDs::create "$_[0]",
			"-s 300",
			#"DS:value:$_[2]:300:0:1000000000000",
			"DS:value:$_[2]:600:0:U",
			"RRA:MAX:0.5:1:525600",
			"RRA:AVERAGE:0.5:1:525600"
			or warn "createGaugeRRDFile() failed to create file $_[0] $ERROR\n";
			#,
			#"RRA:MAX:0.5:5:2y" or warn "createGaugeRRDFile() failed to create file $_[0] $ERROR\n";
	}
}

# $_[0]	= filename
# $_[1]	= Time
# $_[2]	= DS Name
# $_[3]	= Value
# $_[4] = type (GAUGE, COUNTER)
# $_[5] = legend name
#
sub updateRRDFile()
{
	my $path = "$_[0].rrd";

	chomp($_[3]);
	my $updateValue = "N:$_[3]";
	#print "\t\tUpdate: $updateValue\n";

	if(!$graphOnly)
	{
		if(! -e $path)
		{
			createGaugeRRDFile($path, $_[2], $_[4]);
		}
	}

	#print "\t\t number: $_[3]\n";
	#if(looks_like_number($_[3]))
	#{
		#print "\t\tUpdate looks like a number: $_[3]\n";
		if(!$graphOnly)
		{
			#print "\t\tAbout to update file $path\n";
			RRDs::update "$path",
				"-t", "value",
				$updateValue;
		}
		my $error = RRDs::error;

		if($error)
		{
			#my($file, $dir, $ext) = fileparse($_[0]);
			#print "updateRRDFile() failed to update file $file, with $updateValue\n\tError: $error\n";
		}
		else
		{
			if($graphOnly)
			{
				CreateGraph($path, 'day', $_[2], $_[2], $_[5]);
			}

		}
	#}

}

# creates graph
# inputs: $_[0]: name of rrd
#	$_[1]: interval (ie, day, week, month, year)
#	$_[2]: interface description
#	$_[3]: DS Name
#	$_[4]: legend name
sub CreateGraph
{
	my($file, $dir, $ext) = fileparse($_[0]);

	my $dt = DateTime->now( time_zone => 'local' );

	my $now = $dt->strftime('%Y-%m-%d %H\:%M');

	RRDs::graph "$imagePath/$file-$_[1].png",
		"-s -1$_[1]",
		"--lazy",
		"-h", "150", "-w", "600",
		"-l 0",
		"-a", "PNG",
		#"-P",
		#"-v %",
		"DEF:gen=$_[0]:value:AVERAGE",
		#"DEF:genmax=$_[0]:value:MAX",
 		#"CDEF:mgmtcpu_act=mgmtcpu,100,/",
		#"AREA:genmax#D07272:Max $_[4]",
		#"LINE1:genmax#FF1E1E",
		"AREA:gen#7272D0:Avg $_[4]",
		"LINE1:gen#1E1EFF",
		"GPRINT:gen:MAX:  Max\\: %5.1lf %s",
		"GPRINT:gen:AVERAGE: Avg\\: %5.1lf %S",
		"GPRINT:gen:LAST: Current\\: %5.1lf %S\\n",
		"COMMENT:Generated $now",
		"HRULE:0#000000",
		"-X 0";
	if ($ERROR = RRDs::error) { print "$0: unable to generate $_[0] $_[1] traffic graph: $ERROR\n"; }
}

setupFTP.sh

This script will set up a ftp server

#!/bin/bash

systemctl start vsftpd
systemctl enable vsftpd
systemctl start apache2
systemctl enable apache2

mkdir /srv/ftp

groupadd ftp-users

useradd -g ftp-users -d /srv/ftp/ monitorupload

echo "monitorupload:P@55w0rd" | chpasswd

chmod 750 /srv/ftp/
chown monitorupload:ftp-users /srv/ftp/

systemctl restart vsftpd

Scheduling the Linux Scripts

We schedule the scripts from /etc/crontab

An example is below

Files Locations

rrd archives

/var/lib/jbcmon/server/<hostname>/<port>

Testing states (used for the counts of errors)

/var/lib/jbcmon/state/<hostname>/<port>

Images and Webpages

/srv/htdocs/server/<hostname>/<port>

/srv/htdocs/server/<db server>

Example Graphs

In some instances, I’ve provided .html files, creating your own to display information that you find useful is very easy.  Equally, it’s pretty easy to create your own graphs.  RRDTool is fantastic for doing so.

Database Server performance information.  We log numerous perf counters from our Windows Server that hosts SQL Server.  The graphed counters could probably be done better but I haven’t got to revisiting…

M3 Subsystems

This is the most useful of the graphs – we are looking at the M3 subsystems – we can see the memory usage, jobs, threads and CPU.  The blank sections are where the subsystem shuts down due to inactivity.

Same graphs over a month

Grid Component logs

JVM memory usage & CPU

This shows us the JVM memory – max, used against the CPU – handy for locating situations where you have a run-away process or a grid component doesn’t have enough memory allocated.

Performance Counters from the Grid

There are many performance counters and there is little to no documentation on them – as I already have the data I figured I might aswell log it, and why not graph it 🙂

Some M3BE Performance Counters –

I’ve had difficulty getting information on the specifics of what some of the counters means so these graphs may not make a huge amount of sense.

In Closing

As mentioned at the beginning, the details are pretty light – and as I get time I’ll be looking at creating an install script which prompts you for server details and builds a config file and cron file to make it easier to get up and running.

 

 

This entry was posted in M3 / MoveX, Monitoring. Bookmark the permalink.

2 Responses to M3 Business Engine and Grid Monitoring

  1. Very nice to visually see the historic data

  2. Anthony says:

    This looks fantastic! Can’t wait to give it a spin!

    Thank you very much for sharing all of this great work!

    Quite busy in the coming days but hope to get this going very soon.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s