Critical process(es) down-checkpoint-gaia

error
gaia
health-checks
checkpoint
Critical process(es) down-checkpoint-gaia
0

#1

Critical process(es) down-checkpoint-gaia

Vendor: checkpoint

OS: gaia

Description:
Many devices have critical processes, usually daemons, that must be up for certain functions to work. Indeni will alert if any of these goes down.

Remediation Steps:
Review the cause for the processes being down.
Check if “cpstop” was run.

chkp-mgmt-mdsstat-mds

#! META
name: chkp-mgmt-mdsstat-mds
description: Monitor CMA processes
type: monitoring
monitoring_interval: 1 minutes
requires:
    vendor: checkpoint
    os.name: gaia
    role-management: true
    mds: true

#! COMMENTS
process-state:
    skip-documentation: true

#! REMOTE::SSH
COLUMNS=150 && export COLUMNS && ${nice-path} -n 15 mdsstat

#! PARSER::AWK

# Reads the status of the critical processes for each CMA. This will not be covered with using only "cpwd_admin list" since if stopping a CMA the process for that will be removed from that list.

BEGIN {
	# Input is divided on pipe
	FS = "|"
}




#| CMA |lab-CP-MGMT-MDM-VS1_Management_Server                                         | 192.168.197.34  | up 5814    | up 5739  | up 5546  | up 8450  |
#| MDS |                                       -                                      | 192.168.197.33  | down       | up 5840  | up 5837  | up 8375  |
/^\|\s+(MDS|CMA)/ {
	# Remove old tags
	delete tags

	vsName = trim($3)
	vsIp = trim($4)

	fwm = $5
	fwd = $6
	cpd = $7
	cpca = $8

	# Set VS tags if this is the CMA, but do not set them if this is the MDS
	if (trim($2) != "MDS") {
		tags["vs.ip"] = vsIp
		tags["vs.name"] = vsName
	}

	# FWM
	if (fwm ~ "up") {
		fwmStatus = 1
	} else {
		fwmStatus = 0
	}
	tags["description"] = "FWM"
	writeDoubleMetric("process-state", tags, "gauge", 60, fwmStatus)

	# FWD
	if (fwd ~ "up") {
		fwdStatus = 1
	} else {
		fwdStatus = 0
	}
	tags["description"] = "FWD"
	writeDoubleMetric("process-state", tags, "gauge", 60, fwdStatus)

	# CPD
	if (cpd ~ "up") {
		cpdStatus = 1
	} else {
		cpdStatus = 0
	}
	tags["description"] = "CPD"
	writeDoubleMetric("process-state", tags, "gauge", 60, cpdStatus)

	# CPCA
	if (cpca ~ "up") {
		cpcaStatus = 1
	} else {
		cpcaStatus = 0
	}
	tags["description"] = "CPCA"
	writeDoubleMetric("process-state", tags, "gauge", 60, cpcaStatus)
}

cross_vendor_critical_process_down_novsx

package com.indeni.server.rules.library.templatebased.crossvendor

import com.indeni.server.common.data.conditions.Equals
import com.indeni.server.rules.RuleContext
import com.indeni.server.rules.library._
import com.indeni.server.rules.library.templates.StateDownTemplateRule

case class cross_vendor_critical_process_down_novsx() extends StateDownTemplateRule(
  ruleName = "cross_vendor_critical_process_down_novsx",
  ruleFriendlyName = "All Devices: Critical process(es) down",
  ruleDescription = "Many devices have critical processes, usually daemons, that must be up for certain functions to work. Indeni will alert if any of these goes down.",
  metricName = "process-state",
  applicableMetricTag = "process-name",
  descriptionMetricTag = "description",
  alertItemsHeader = "Processes Affected",
  descriptionStringFormat = "${scope(\"description\")}",
  alertDescription = "One or more processes which are critical to the operation of this device, are down.",
  baseRemediationText = "Review the cause for the processes being down.",
  metaCondition = !Equals("vsx", "true"))(
  ConditionalRemediationSteps.VENDOR_CP -> "Check if \"cpstop\" was run.",
  ConditionalRemediationSteps.OS_NXOS ->
    """|
      |1. Use the "show processes cpu" NX-OS command in order to show the CPU usage at the process level.
      |2. Use the "show process cpu detail <pid> " NX-OS command to find out the CPU usage for all threads that belong to a specific process ID (PID).
      |3. Use the "show system internal sysmgr service pid <pid> " NX-OS command in order to display additional details, such as restart time, crash status, and current state, on the process/service by PID.
      |4. Run the "show system internal processes cpu" NX-OS command which is equivalent to the top command in Linux and provides an ongoing look at processor activity in real time""".stripMargin,
  ConditionalRemediationSteps.VENDOR_FORTINET ->
    """
      |1. Login via ssh to the Fortinet firewall and run the FortiOS command "diagnose sys top [refresh_time_sec] [number_of_lines]"
        |>>> to get the Proccess-id, State, CPU & Memory utilization per process. Press <shift-P> to sort by CPU usage or <shift-M> to sort by memory usage.
      |2. Login via ssh to the Fortinet firewall and run the FortiOS command "diagnose sys top-summary '-h' " to get the command options and receive additional
        |>>> info per process. A sample command could be "diagnose sys top-summary '-s mem -i 60 -n 10' ". In case that the value to the FDS (File Descriptors)
        |>>> column keeps constantly increasing, it might indicate a memory leak problem.
      |3. Review the state of each process provided by the above commands. The normal states are S (Sleeping), R (Running) and D (Do not Disturb).
        |>>> The abnormal states are Z (Zombie) and D (Do not Disturb).
      |4. Try to restart the process which has problem by running the command "diag sys kill 11 <process-Id>". The <process-Id> can be found by the aforementioned commands.
      |5. Check the logs for any reasons why the process stops or can't restart.
      |6. If the problem persists, contact Fortinet Technical support at https://support.fortinet.com/ for further assistance.""".stripMargin.replaceAll("\n>>>", "")

)


pinned #2