Blade(s) down-checkpoint-False

error
health-checks
false
checkpoint
Blade(s) down-checkpoint-False
0

#1

Blade(s) down-checkpoint-False

Vendor: checkpoint

OS: False

Description:
Indeni will alert one or more blades in a chassis is down.

Remediation Steps:
Review the cause for the blades being down.
If the blade was not stopped intentionally (admin down), check to see it wasn’t disconnected physically.

How does this work?
Indeni uses the built-in Check Point “asg stat -v” command to retrieve the current blade state.

Why is this important?
A down blade in the security group can mean loss of redundancy and performance.

Without Indeni how would you find this?
An administrator could login and manually run the command.

chkp-asg-stat-v

#! META
name: chkp-asg-stat-v
description: Retrieve status data
type: monitoring
monitoring_interval: 5 minute
requires:
    vendor: checkpoint
    asg: true

#! COMMENTS
blade-state:
    why: |
        A down blade in the security group can mean loss of redundancy and performance.
    how: |
        Indeni uses the built-in Check Point "asg stat -v" command to retrieve the current blade state.
    without-indeni: |
        An administrator could login and manually run the command.
    can-with-snmp: false
    can-with-syslog: false
    vendor-provided-management: |
        Listing the blade state is only available from the command line interface.

blade-state-live-config
    skip-documentation: true

cluster-preemption-enabled:
    skip-documentation: true

cluster-member-active:
    skip-documentation: true

cluster-member-states:
    skip-documentation: true

cluster-state:
    skip-documentation: true

cluster-state-live-config:
    skip-documentation: true

#! REMOTE::SSH
${nice-path} -n 15 asg stat -v

#! PARSER::AWK

BEGIN {
    cluster_state = 1
    tags["name"] = "ASG"
    cluster_state_description = "UP / Enforcing security"
}

#| Chassis 1                     ACTIVE                                         |
/^\| Chassis/ {
    chassis = $3
}


#| 1  (local)     UP             Enforcing Security        10Feb17 19:37        |
#| 3              DOWN           Inactive                  NA                   |
/^\| [0-9].*(UP|DOWN)/ {

    # Remove (local) to not mess up counting columns
    gsub(/\(local\)/, "", $0)

    state = $3
    blade = $2

    # To catch a status with a space in between we need to split on two spaces or more.
    split($0, split_arr, /[ ]{2,}/)
    process = split_arr[3]

    # The blade needs to be up, and active, aka "Enforcing Security" to be considered ok.
    if (state == "UP" && process == "Enforcing Security") {
        blade_state = 1
    } else {
        blade_state = 0
        cluster_state = 0
        cluster_state_description = state " / " process
    }
    blade_state_description = state " / " process

    tags["name"] = "chassis: " chassis " blade: " blade

    writeDoubleMetric("blade-state", tags, "gauge", "300", blade_state)
    writeComplexMetricStringWithLiveConfig("blade-state-live-config", tags, blade_state_description, "Blade Status")

    # asg is always active if blade is ok
    writeDoubleMetric("cluster-member-active", tags, "gauge", "300", blade_state)

    cluster_member_states_index++
    cluster_member_states[cluster_member_states_index, "state-description"] = process
    cluster_member_states[cluster_member_states_index, "name"] = "chassis: " chassis " blade: " blade
}



#| Chassis HA mode:               Active Up                                     |
#| Chassis Mode                | Active Up                                      |
/Chassis HA mode|Chassis Mode/ {
    ha_mode = $(NF-2) " " $(NF-1)
    if (ha_mode == "Active Up") {
        cluster_preempt = 0
    } else {
        cluster_preempt = 1
    }
    writeDoubleMetric("cluster-preemption-enabled", null, "gauge", 300, cluster_preempt)
}

END {
    writeComplexMetricStringWithLiveConfig("cluster-state-live-config", tags, cluster_state_description, "Cluster State")
    writeDoubleMetric("cluster-state", tags, "gauge", "300", cluster_state)
    writeComplexMetricObjectArray("cluster-member-states", null, cluster_member_states)
}

chassis_blade_down

package com.indeni.server.rules.library.templatebased.crossvendor

import com.indeni.server.rules.RuleContext
import com.indeni.server.rules.library.ConditionalRemediationSteps
import com.indeni.server.rules.library.templates.StateDownTemplateRule

/**
  *
  */
case class chassis_blade_down() extends StateDownTemplateRule(
  ruleName = "chassis_blade_down",
  ruleFriendlyName = "Chassis Devices: Blade(s) down",
  ruleDescription = "Indeni will alert one or more blades in a chassis is down.",
  metricName = "blade-state",
  applicableMetricTag = "name",
  alertItemsHeader = "Blades Affected",
  alertDescription = "One or more blades in this chassis are down.",
  baseRemediationText = "Review the cause for the blades being down.")(
  ConditionalRemediationSteps.VENDOR_CP -> "If the blade was not stopped intentionally (admin down), check to see it wasn't disconnected physically.",
  ConditionalRemediationSteps.OS_NXOS ->
    """|
      |Most of the module related failures (such as the module not coming up, the module getting reloaded, and so on) can be analyzed by looking at the logs stored on the switch. Use the following CLI commands to identify the problem:
      |•show system reset-reason module
      |•show version
      |•show logging
      |•show module internal exception-log
      |•show module internal event-history module
      |•show module internal event-history errors
      |•show platform internal event-history errors
      |•show platform internal event-history module
      |Further details can be found to the next CISCO troubleshooting guide:
      |https://www.cisco.com/en/US/products/ps5989/prod_troubleshooting_guide_chapter09186a008067a0ef.html""".stripMargin
)