High CPU Usage per Chassis and Blade-checkpoint-secureplatform

error
secureplatform
health-checks
checkpoint
High CPU Usage per Chassis and Blade-checkpoint-secureplatform
0

#1

High CPU Usage per Chassis and Blade-checkpoint-secureplatform

Vendor: checkpoint

OS: secureplatform

Description:
indeni will trigger an issue when CPU usage per chassis and blade is high.

Remediation Steps:
Determine the cause for the high CPU usage of the listed CPUs.

How does this work?
CPU statistics are taken from /proc/stat and a 5 second average is calculated from this.

Why is this important?
High CPU could cause traffic to be dropped, and would indicate a performance problem.

Without Indeni how would you find this?
An administrator could log in and manually check CPU usage. It is also visible in SmartView Monitor.

chkp-secureplatform-proc-stat

#! META
name: chkp-secureplatform-proc-stat
description: displays CPU usage
type: monitoring
monitoring_interval: 1 minute
includes_resource_data: true
requires:
    vendor: checkpoint
    os.name: secureplatform

#! COMMENTS
cpu-usage:
    why: |
        High CPU could cause traffic to be dropped, and would indicate a performance problem.
    how: |
        CPU statistics are taken from /proc/stat and a 5 second average is calculated from this.
    without-indeni: |
        An administrator could log in and manually check CPU usage. It is also visible in SmartView Monitor.
    can-with-snmp: true
    can-with-syslog: false
    vendor-provided-management: |
        CPU usage can be checked with CLI commands "top" and "cpview". It can also be viewed in SmartView Monitor.

#! REMOTE::SSH
${nice-path} -n 15 cat /proc/stat && echo "END" && sleep 5 && ${nice-path} -n 15 cat /proc/stat

#! PARSER::AWK

# SecurePlatform does not have commands such as "mpstat" and does not allow top to run in batch mode.
# The only available option is to parse /proc/stat
# Information in /proc/stat is since last boot, so the output needs to be collected at least twice with a pre determined intervall between
# and then compare the difference.
# Info taken from: https://github.com/Leo-G/DevopsWiki/wiki/How-Linux-CPU-Usage-Time-and-Percentage-is-calculated

BEGIN {
	runCount = 1
	cputags["resource-metric"] = "true"
}

# cpu  17528 282 42758 379482349 10786 3427 56029 0
/^cpu/ {
	# If the name is CPU withouth a number, then its the average of all CPUs
	if ($1 == "cpu") {
		
		# Calculate total CPU time since boot by adding all counters
		split($0, splitArr, " ")
		if (runCount == 1) {
			# Data is collected for the first time, and only stored
			for (id in splitArr) {
				cpuAvgTotal = cpuAvgTotal + splitArr[id]
			}
		} else {
			# Data is collected a second time, and compared with the first data.
			for (id in splitArr) {
				cpuAvgTotalTmp = cpuAvgTotalTmp + splitArr[id]
			}
			cpuAvgTotal = cpuAvgTotalTmp - cpuAvgTotal
		}
		
		
		# Calculate idle CPU time since boot
		if (runCount == 1) {
			# Data is collected for the first time, and only stored
			cpuAvgIdle = $5 + $6
		} else {
			# Data is collected a second time, and compared with the first data.
			cpuAvgIdleTmp = $5 + $6
			cpuAvgIdle = cpuAvgIdleTmp - cpuAvgIdle
		}
		
		
		# Calculate CPU usage time since boot
		if (runCount == 1) {
			# Data is collected for the first time, and only stored
			cpuAvgUsage = cpuAvgTotal - cpuAvgIdle
		} else {
			# Data is collected a second time, and compared with the first data.
			cpuAvgUsageTmp = cpuAvgTotalTmp - cpuAvgIdleTmp
			cpuAvgUsage = cpuAvgUsageTmp - cpuAvgUsage
		}
		
		
		# Calculate CPU usage in percentage
		if (runCount == 2) {
			# Second run, compare data.
			cpuAvgUsagePercent = (cpuAvgUsage / cpuAvgTotal) * 100
			cputags["cpu-is-avg"] = "true"
			cputags["cpu-id"] = "all-average"
			writeDoubleMetricWithLiveConfig("cpu-usage", cputags, "gauge", "60", cpuAvgUsagePercent, "CPU", "percentage", "cpu-id")
		}
	} else {
		# If the name of the CPU is not withouth a number, then its a specific CPU.
		
		# Get CPU ID
		split($1, splitArr, "u")
		cpuId = splitArr[2]
		
		
		# Calculate total CPU time since boot by adding all counters
		split($0, splitArr, " ")
		if (runCount == 1) {
			# Data is collected for the first time, and only stored
			for (id in splitArr) {
				cpuTotal[cpuId] = cpuTotal[cpuId] + splitArr[id]
			}
		} else {
			for (id in splitArr) {
				# Data is collected a second time, and compared with the first data.
				cpuTotalTmp[cpuId] = cpuTotalTmp[cpuId] + splitArr[id]
			}
			cpuTotal[cpuId] = cpuTotalTmp[cpuId] - cpuTotal[cpuId]
		}
		
		
		# Calculate idle CPU time since boot
		if (runCount == 1) {
			# Data is collected for the first time, and only stored
			cpuIdle[cpuId] = $5 + $6
		} else {
			# Data is collected a second time, and compared with the first data.
			cpuIdleTmp[cpuId] = $5 + $6
			cpuIdle[cpuId] = cpuIdleTmp[cpuId] - cpuIdle[cpuId]
		}
		
	
		
		# Calculate CPU usage time since boot
		if (runCount == 1) {
			# Data is collected for the first time, and only stored
			cpuUsage[cpuId] = cpuTotal[cpuId] - cpuIdle[cpuId]
		} else {
			# Data is collected a second time, and compared with the first data.
			cpuUsageTmp[cpuId] = cpuTotalTmp[cpuId] - cpuIdleTmp[cpuId]
			cpuUsage[cpuId] = cpuUsageTmp[cpuId] - cpuUsage[cpuId]
		}
		
		
		# Calculate CPU usage in percentage
		if (runCount == 2) {
			cpuUsagePercent[cpuId] = (cpuUsage[cpuId] / cpuTotal[cpuId]) * 100
			cputags["cpu-is-avg"] = "false"
			cputags["cpu-id"] = cpuId
			writeDoubleMetricWithLiveConfig("cpu-usage", cputags, "gauge", "60", cpuUsagePercent[cpuId], "CPU", "percentage", "cpu-id")
		}
		
	}
}

/^END/ {
	# Count number of runs.
	runCount++
}

high_per_chassis_blade_cpu_usage

Failed to fetch the data: https://bitbucket.org/indeni/indeni-knowledge/src/master/rules/sync_core_rules/HighPerChassisBladeCpuUsageRule.scala

pinned #2