Critical process(es) down (per VS)-checkpoint-False

error
health-checks
false
checkpoint
Critical process(es) down (per VS)-checkpoint-False
0

#1

Critical process(es) down (per VS)-checkpoint-False

Vendor: checkpoint

OS: False

Description:
Many devices have critical processes, usually daemons, that must be up for certain functions to work. indeni will alert if any of these goes down.

Remediation Steps:
Review the cause for the processes being down.
Check if “cpstop” was run.

How does this work?
The status of all important processes is retrieved using the built-in Check Point “cpwd_admin list” command. Descriptions are added, based on information from Check Point KB: https://supportcenter.checkpoint.com/supportcenter/portal?eventSubmit_doGoviewsolutiondetails=&solutionid=sk97638

Why is this important?
The device functionality is dependent on software processes. It is vital for the operation of the device that these processes are running at all times.

Without Indeni how would you find this?
An administrator could login and manually run the command.

chkp-process-state-fw

#! META
name: chkp-process-state-fw
description: Checking the state of important processes.
type: monitoring
monitoring_interval: 1 minute
requires:
    vendor: "checkpoint"
    role-firewall: "true"

#! COMMENTS
process-state:
    why: |
        The device functionality is dependent on software processes. It is vital for the operation of the device that these processes are running at all times. 
    how: |
        The status of all important processes is retrieved using the built-in Check Point "cpwd_admin list" command. Descriptions are added, based on information from Check Point KB: https://supportcenter.checkpoint.com/supportcenter/portal?eventSubmit_doGoviewsolutiondetails=&solutionid=sk97638
    without-indeni: |
        An administrator could login and manually run the command.
    can-with-snmp: false
    can-with-syslog: false
    vendor-provided-management: |
        Listing the status of important processes is only available from the command line.

#! REMOTE::SSH
${nice-path} -n 15 cpstat mg | grep "Active status:";${nice-path} -n 15 cpwd_admin list

#! PARSER::AWK

BEGIN {
    # Define descriptions for known processes. Descriptions from SK97638
    processArr["FWD"] = "Logging and spawning child processes (eg vpnd)"
    processArr["CPD"] = "Generic process for many Check Point services such as installing and fetching policy and online updates"
    processArr["CPVIEWD"] = "CPView utility daemon"
    processArr["FWM"] = "Communication between SmartConsole applications and Security Management Server" 
    processArr["STPR"] = "Status collection of ROBO Gateways - SmartLSM / SmartProvisioning status proxy. This process runs only on Security Management Server / Domain Management Servers that are activated for Large Scale Management / SmartProvisioning"
    processArr["SVR"] = "Controller for the SmartReporter product. Traffic is sent via SSL"
    processArr["CPSEAD"] = "Responsible for Correlation Unit functionality"
    processArr["CPWMD"] = "Check Point Web Management Daemon - back-end for Management Portal / SmartPortal"
    processArr["CPHTTPD"] = "HTTP Server for Management Portal (SmartPortal) and for OS WebUI"
    processArr["DASERVICE"] = "Check Point Upgrade Service Engine (CPUSE) - former 'Gaia Software Updates' service"
    processArr["CPSM"] = "Process is responsible for collecting and sending information to SmartView Monitor"    
    processArr["HISTORYD"] = "CPView Utility History daemon"
    processArr["MPDAEMON"] = "Platform Portal / Multi Portal. mpdaemon process is responsible for starting these web servers"
    processArr["CI_CLEANUP"] = "Shell script (from $FWDIR/bin/) that periodically deletes various old temporary Anti-Virus files"
    processArr["CIHS"] = "HTTP Server for Content Inspection"
    processArr["cposd"] = "SMB-specific daemon responsible for OS Networking operations"
    processArr["RTDB"] = "Real Time database daemon"
    processArr["SFWD"] = "Logging, Policy installation, VPN negotiation, Identity Awareness enforcement, UserCheck enforcement, etc"
    processArr["CPHAMCSET"] = "Clustering daemon. Responsible for opening sockets on the NICs in order to allow them to pass multicast traffic CCP to the machine"

}

#Active status: active
#Active status: standby
#Active status: -
/^Active status:/{
    managementStandby = ($NF == "standby")
    next
}

#APP        PID    STAT  #START  START_TIME             MON  COMMAND
#APP        CTX        PID    STAT  #START  START_TIME             MON  COMMAND
/^APP\s+(PID|CTX)/ {
    # Parse the line into a column array.
    getColumns(trim($0), "[ ]{1,}", columns)
    next
}

#CPD        18259  E     1       [08:25:14] 18/9/2016   Y    cpd
/[0-9]/ {

    gsub(/\'/, "", $0)
    
    # Use getColData to parse out the data for the specific column from the current line. The current line will be
    # split according to the same separator we've passed in the getColumns function (it's stored in the "columns" variable).
    # If the column cannot be found, the result of getColData is null (not "null").
    
    processName = getColData(trim($0), columns, "APP")

    # The CPSM process is always down on standby machines so we don't want to include it
    # in case it's marked as down on a standby management machine

    if (processName == "CPSM" && managementStandby){
        next
    }

    stat = getColData(trim($0), columns, "STAT")
    
    # E stands for "existing" which means the process is running
    # T stands for terminated, which means that the process is not running
    # We don't want to write metrics for other states than E or T

    if (stat ~ /^(E|T)$/) {

        tags["process-name"] = processName

        state = (stat == "E")

        if (processName in processArr) {
            tags["description"] = processArr[$1]
        } else {
            tags["description"] = "N/A"
        }
        
        writeDoubleMetric("process-state", tags, "gauge", 60, state)

    }
}

cross_vendor_critical_process_down_vsx

package com.indeni.server.rules.library.templatebased.crossvendor

import com.indeni.server.rules.RuleContext
import com.indeni.server.rules.library.{ConditionalRemediationSteps, StateDownTemplateRule}
import com.indeni.apidata.time.TimeSpan

/**
  *
  */
case class cross_vendor_critical_process_down_vsx() extends StateDownTemplateRule(
  ruleName = "cross_vendor_critical_process_down_vsx",
  ruleFriendlyName = "All Devices: Critical process(es) down (per VS)",
  ruleDescription = "Many devices have critical processes, usually daemons, that must be up for certain functions to work. indeni will alert if any of these goes down.",
  metricName = "process-state",
  applicableMetricTag = "process-name",
  descriptionMetricTag = "vs.name",
  alertItemsHeader = "Processes Affected",
  alertDescription = "One or more processes which are critical to the operation of this device, are down.",
  baseRemediationText = "Review the cause for the processes being down.")(
  ConditionalRemediationSteps.VENDOR_CP -> "Check if \"cpstop\" was run.",
  ConditionalRemediationSteps.OS_NXOS ->
    """|
      |1. Use the "show processes cpu" NX-OS command in order to show the CPU usage at the process level.
      |2. Use the "show process cpu detail <pid>" NX-OS command to find out the CPU usage for all threads that belong to a specific process ID (PID).
      |3. Use the "show system internal sysmgr service pid <pid>" NX-OS command in order to display additional details, such as restart time, crash status, and current state, on the process/service by PID.
      |4. Run the "show system internal processes cpu" NX-OS command which is equivalent to the top command in Linux and provides an ongoing look at processor activity in real time.""".stripMargin
)