Identifying whether the Indeni-server is picking up custom metric values

,

I created an .IND script that parses for a custom metric: "cpu-soft-lockup-state" and have seen the indeni-collector correctly apply the metric in my test environment:


Command chkp-asg-var-log-messages ran as monitoring for device CP-TEST1(192.168.197.30), returned 1 metrics: cpu-soft-lockup-state (1)
INFO  [2017-11-22 14:20:43,435] com.indeni.server.rules.manager.factory.FileSystemRuleFactory: Loading rule from file "/usr/share/indeni/rules/test.rule"
INFO  [2017-11-22 14:20:56,314] com.indeni.server.rules.manager.factory.FileSystemRuleFactory: Successfully loaded rule "CpuSoftLockup" from file "/usr/share/indeni/rules/test.rule"

----------------------------


Here is how the IND script looks like:

#! META
name: chkp-asg-var-log-messages
description: Retrieve soft lock-up from var/log/messages
type: monitoring
monitoring_interval: 2 minute
requires:
vendor: checkpoint
asg: true

#! REMOTE::SSH
awk '$0>=from' from="$(date +%b" "%d" "%H:%M:%S -d -10min)" /var/log/messages | grep "soft lockup"

#! PARSER::AWK
BEGIN {
logEntries = 0
}
#Jun 20 13:02:18 [remote host] kernel: BUG: soft lockup - CPU#3 stuck for 10s! [cphaconf:12533]
/BUG: soft lockup/ {
logEntries++
lockups[logEntries, "entries"] = $0
}


END{
writeComplexMetricObjectArray("cpu-softlockups", tags, lockups)
}

---------------


Sample Scala Rule


package com.indeni.server.rules.library.templatebased.crossvendor

import com.indeni.data.conditions.{Equals => DataEquals, Not => DataNot}
import com.indeni.ruleengine.expressions.conditions.{Equals => RuleEquals, Not => RuleNot, Or => RuleOr}
import com.indeni.ruleengine.expressions.data.SnapshotExpression
import com.indeni.server.rules.RuleContext
import com.indeni.server.rules.library._

/**
*
*/
case class CrossVendorCpuSoftLockup(context: RuleContext) extends MultiSnapshotValueCheckTemplateRule(context,
ruleName = "CpuSoftLockup",
ruleFriendlyName = "All Devices: CPU Soft Lockup",
ruleDescription = "Many devices that rely on Linux kernels can experience CPU soft lockups. It can be critical to identify them early enough before all the cores are seized.",
metricName = "cpu-softlockups",
alertItemsHeader = "Affected Cores",
alertDescription = "The following device had core(s) experience lockup. Please be aware that if all cores lockup, the device may fail. Indeni tracks if the cores have lockedup in the last 10 minutes",
baseRemediationText = "Please review the process that is causing the lockup for the core. Indeni will autoresolve the alert over the next 20 minutes if the cores are unlocked.",
complexCondition = RuleNot(RuleEquals(RuleHelper.createEmptyComplexArrayConstantExpression(), SnapshotExpression("cpu-softlockups").asMulti().mostRecent().noneable)))()


------------------------




I can't identify in the logs whether the Rule is picking up the new metrics from the TSDB and I can't seem to trigger the alert in the GUI. How can I find whether the indeni-server is picking up the metric "cpu-soft-lockup-state"?

You can always check the metric DB: https://indeni.atlassian.net/wiki/spaces/IKP/pages/76742659/Testing+on+a+live+indeni+server


But first thing first, the IND script need improve. Try to put writeDoubleMetric in an END block for starter - otherwise each line of input will cause writeDoubleMetric be executed which might be a problem.

This is the same command from the wiki but without start time and for all devices. This can be helpfull if you want to see if the metrc was recorded for any device at all.

curl -G -k -u "admin:admin123!" "https://localhost:9009/api/v1/metrics" --data-urlencode "query=(im.name==cpu-soft-lockup-state)" | sed "s/tags/\ntags/g"