Critical process(es) down (per VS)-f5-False

error
health-checks
false
f5
Critical process(es) down (per VS)-f5-False
0

#1

Critical process(es) down (per VS)-f5-False

Vendor: f5

OS: False

Description:
Many devices have critical processes, usually daemons, that must be up for certain functions to work. indeni will alert if any of these goes down.

Remediation Steps:
Review the cause for the processes being down.

How does this work?
This script logs into the device through SSH and extracts the guest process states via TMSH.

Why is this important?
Each device has certain executable processes which are critical to the stable operation of it. Within F5 units, these processes are responsible for the management layer. An example is the watchdog service which ensures that the system will reboot in the event of a lockup. A process being down may indicate a critical failure.

Without Indeni how would you find this?
An administrator could track this manually by logging into the device via SSH, entering TMSH and executing the command “show vcmp health”.

f5-tmsh-show-vcmp-health

#! META
name: f5-tmsh-show-vcmp-health
description: Extract the process-state and provisioned modules from guests
type: monitoring
monitoring_interval: 10 minutes
requires:
    vendor: "f5"
    product: "load-balancer"
    linux-based: "true"
    vsx: "true"
    shell: "bash"

#! COMMENTS
process-state:
    why: |
        Each device has certain executable processes which are critical to the stable operation of it. Within F5 units, these processes are responsible for the management layer. An example is the watchdog service which ensures that the system will reboot in the event of a lockup. A process being down may indicate a critical failure.
    how: |
        This script logs into the device through SSH and extracts the guest process states via TMSH.
    without-indeni: |
        An administrator could track this manually by logging into the device via SSH, entering TMSH and executing the command "show vcmp health".
    can-with-snmp: false
    can-with-syslog: false
features-enabled:
    why: |
        This information could be useful for troubleshooting purposes could be used to correlate high usage of resources with the enabled modules.
    how: |
        This script logs into the device through SSH and extracts the enabled modules via TMSH.
    without-indeni: |
        An administrator could track this manually by logging into the device via SSH, entering TMSH and executing the command "show vcmp health".
    can-with-snmp: false
    can-with-syslog: false

#! REMOTE::SSH
tmsh -q -c "show vcmp health"

#! PARSER::AWK

BEGIN {

    #Reset the features index
    iFeatures = 0

    #Declare the module dictionary for a more friendly feature name
    moduleDictionary["mgmt"] = "Management"
    moduleDictionary["cgnat"] = "Carrier Grade NAT"
    moduleDictionary["ltm"] = "Local Traffic Manager"
    moduleDictionary["asm"] = "Application Security Manager"
    moduleDictionary["lc"] = "Link Controller"
    moduleDictionary["apm"] = "Access Policy Manager"
    moduleDictionary["avr"] = "Application Visibility and Reporting"
    moduleDictionary["afm"] = "Advanced Firewall Module"
    moduleDictionary["aam"] = "Application Acceleration Manager"
    moduleDictionary["swg"] = "Secure Web Gateway"

    #Predefined "human friendly" descriptions for process-state
    processInfo["bcm56xxd"] = "Controls the BIG-IP switch hardware"
    processInfo["bigd"] = "Controls health monitoring"
    processInfo["bigdb"] = "Provides initial bigdb database values to the MCPD service and persists any database changes to the BigDB.dat file"
    processInfo["chmand"] = "The chassis manager daemon implements the following HAL capabilities: platform identification synchronization with SCCP/AOM and device discovery and chassis sensor monitoring and chassis configuration (management & serial interfaces)"
    processInfo["mcpd"] = "Known as the Master Control Program. Controls messaging and configuration"
    processInfo["mysqlhad"] = "MySQL service used for AVR"
    processInfo["snmpd"] = "Provides System Network Management Protocol (SNMP) functions. Also includes the two subagents rmondsnmpd andtmsnmpd"
    processInfo["sod"] = "Controls failover for redundant systems"
    processInfo["tmm"] = "Known as the Traffic Management Microkernel. Manages switch traffic"
    processInfo["clusterd"] = "The clusterd process manages blade clustering for VIPRION systems"
    processInfo["cn-crypto"] = "Controls SSL and compression hardware acceleration"
    processInfo["qa-crypto"] = "Controls SSL and compression hardware acceleration"
    processInfo["cbrd"] = "The XML content based routing daemon provides document parsing functionality for the XML profile"
    processInfo["lind"] = "The lind process manages software installation/volume creation tasks"
    processInfo["named"] = "The named process is the DNS server daemon"
    processInfo["scriptd"] = "The scriptd process runs application template implementation scripts when an application service is created or updated"
    processInfo["tmrouted"] = "The routing table management daemon updates the TMM routing table based on the kernel routing table"
    processInfo["wccpd"] = "Web Cache Communication Protocol (WCCP) process in BIG-IP AAM. The wccpd can be stopped if the WCCP feature is not in use. If wccpd is disabled the WCCP feature will not function"
    processInfo["vxland"] = "The vxland daemon manages multicast sockets and routing for IGMP protocol activity"
    processInfo["watchdog"] = "The watchdog process ensures that the BIG-IP system will reboot in the event of a system lock-up prompting a failover"
    processInfo["vcmpd"] = "The vcmpd process performs most of the work to create and manage guests as well as configure the virtual network"
    processInfo["traffic-group"] = "Contains floating objects used by the active member of a cluster"

}

#Reset the section so we don't accidentally get stuff from the other ones
#Vcmp::Guest HA Status
/^Vcmp::$/{
    section = ""
}

#Vcmp::Guest HA Status
/^Vcmp::Guest\sHA\sStatus/{
    section = "haStatus"
    next
}

#Vcmp::Guest Module Provision
/^Vcmp::Guest\sModule\sProvision/{
    section = "moduleProvision"
    next
}

#mylb1.domain.local    compression-failsafe  tmm0             none                          no
/(no|yes)$/{

    if(section == "haStatus"){

        description = $3

        #mylb1.domain.local    compression-failsafe  tmm0             none                          no
        if(match(description, /(^tmm|crypto|crypto|traffic-group)/)){
            #Remove process id suffix in order to be able to match against the processInfo array. Example: tmm1 -> tmm
            sub(/[0-9\-]+$/, "", description)
        }

        #Make sure that the process exists in the processInfo array
        if(description in processInfo){
            description = processInfo[description]
        } else {
            #For unknown processes, use the features column
            description = $2
        }

        if($NF == "no"){
            state = 1
        } else {
            state = 0
        }

        processTags["vs.name"] = $1
        processTags["process-name"] = $3
        processTags["description"] = description

        writeDoubleMetric("process-state", processTags, "gauge", 600, state)
        
        next

    }

}

#mylb1.domain.local       afm     none
/(none|nominal|dedicated|minimum|custom|small|medium|large|disabled|enabled)$/{

    if(section = "moduleProvision"){

        if(!match($3, /(none|disabled)/)){

            #Look up the module in the moduleDictionary. If it does not exist, resort to the original value
            if($2 in moduleDictionary){
                name = "F5 " moduleDictionary[$2]
            } else {
                name = "F5 " $2
            }

            iFeatures++
            featuresArray[iFeatures, "vs.name"] = $1
            featuresArray[iFeatures, "name"] = name
        }

        next
		
    }
    
}

END {
    writeComplexMetricObjectArray("features-enabled", null, featuresArray)
}

cross_vendor_critical_process_down_vsx

package com.indeni.server.rules.library.templatebased.crossvendor

import com.indeni.server.rules.RuleContext
import com.indeni.server.rules.library.{ConditionalRemediationSteps, StateDownTemplateRule}
import com.indeni.apidata.time.TimeSpan

/**
  *
  */
case class cross_vendor_critical_process_down_vsx(context: RuleContext) extends StateDownTemplateRule(context,
  ruleName = "cross_vendor_critical_process_down_vsx",
  ruleFriendlyName = "All Devices: Critical process(es) down (per VS)",
  ruleDescription = "Many devices have critical processes, usually daemons, that must be up for certain functions to work. indeni will alert if any of these goes down.",
  metricName = "process-state",
  applicableMetricTag = "process-name",
  descriptionMetricTag = "vs.name",
  alertItemsHeader = "Processes Affected",
  alertDescription = "One or more processes which are critical to the operation of this device, are down.",
  baseRemediationText = "Review the cause for the processes being down.")(
  ConditionalRemediationSteps.VENDOR_CP -> "Check if \"cpstop\" was run.",
  ConditionalRemediationSteps.OS_NXOS ->
    """|
      |1. Use the "show processes cpu" NX-OS command in order to show the CPU usage at the process level.
      |2. Use the "show process cpu detail <pid>" NX-OS command to find out the CPU usage for all threads that belong to a specific process ID (PID).
      |3. Use the "show system internal sysmgr service pid <pid>" NX-OS command in order to display additional details, such as restart time, crash status, and current state, on the process/service by PID.
      |4. Run the "show system internal processes cpu" NX-OS command which is equivalent to the top command in Linux and provides an ongoing look at processor activity in real time.""".stripMargin
)