Blade(s) down-paloaltonetworks-panos

Blade(s) down-paloaltonetworks-panos
0

Blade(s) down-paloaltonetworks-panos

Vendor: paloaltonetworks

OS: panos

Description:
Indeni will alert one or more blades in a chassis is down.

Remediation Steps:
Review the cause for the blades being down.

How does this work?
This script logs into the Palo Alto Networks device using SSH and retrieves the output of the “show log system subtype equal hw direction equal backward csv-output equal yes opaque contains Slot receive_time in last-hour” command. The output includes logs from the past one hour. The CLI command will get only the logs related to this issue, I used the filter “csv-output” as it will easier to deal with csv than regular output with spaces/tabs. The desired state is when “Slot # is up” is assigned a value of 1, anything else will be assigned a value of 0.

Why is this important?
Dataplane restarts can cause network outages, and it is important to immediately detect this type of failures and address them asap. This failure can be hardware/software and customers who receive this alert need to engage vendor support to invistigate the root cause of this restart.

Without Indeni how would you find this?
An administrator could physically view the LED lights for alarm status. Or by looking at the system logs from GUI or CLI.

panos-show-log-system-subtype-equal-hw

name: panos-show-log-system-subtype-equal-hw
description: Query system logs for any slot failure
type: monitoring
monitoring_interval: 5 minutes
requires:
    vendor: paloaltonetworks
    os.name: panos
    product: firewall
comments:
    blade-state:
        why: |
            Dataplane restarts can cause network outages, and it is important to immediately detect this type of failures and address them asap. This failure can be hardware/software and customers who receive this alert need to engage vendor support to invistigate the root cause of this restart.
        how: |
            This script logs into the Palo Alto Networks device using SSH and retrieves the output of the "show log system subtype equal hw direction equal backward csv-output equal yes opaque contains Slot receive_time in last-hour" command. The output includes logs from the past one hour. The CLI command will get only the logs related to this issue, I used the filter "csv-output" as it will easier to deal with csv than regular output with spaces/tabs. The desired state is when "Slot # is up" is assigned a value of 1, anything else will be assigned a value of 0.
        can-with-snmp: false
        can-with-syslog: true
steps:
-   run:
        type: SSH
        command: show log system subtype equal hw direction equal backward csv-output
            equal yes opaque contains Slot receive_time in last-hour
    parse:
        type: AWK
        file: show-log-system-subtype-eq-hw.parser.1.awk

chassis_blade_down

// Deprecation warning : Scala template-based rules are deprecated. Please use YAML format rules instead.

package com.indeni.server.rules.library.templatebased.crossvendor

import com.indeni.server.rules.RuleContext
import com.indeni.server.rules.library.templates.StateDownTemplateRule
import com.indeni.server.rules.RemediationStepCondition

/**
  *
  */
case class chassis_blade_down() extends StateDownTemplateRule(
  ruleName = "chassis_blade_down",
  ruleFriendlyName = "Chassis Devices: Blade(s) down",
  ruleDescription = "Indeni will alert one or more blades in a chassis is down.",
  metricName = "blade-state",
  applicableMetricTag = "name",
  alertItemsHeader = "Blades Affected",
  alertDescription = "One or more blades in this chassis are down.",
  baseRemediationText = "Review the cause for the blades being down.")(
  RemediationStepCondition.VENDOR_CP -> "If the blade was not stopped intentionally (admin down), check to see it wasn't disconnected physically.",
  RemediationStepCondition.VENDOR_CISCO ->
    """|
      |Most of the module related failures (such as the module not coming up, the module getting reloaded, and so on) can be analyzed by looking at the logs stored on the switch. Use the following CLI commands to identify the problem:
      |•show system reset-reason module
      |•show version
      |•show logging
      |•show module internal exception-log
      |•show module internal event-history module
      |•show module internal event-history errors
      |•show platform internal event-history errors
      |•show platform internal event-history module
      |Further details can be found to the next CISCO troubleshooting guide:
      |https://www.cisco.com/en/US/products/ps5989/prod_troubleshooting_guide_chapter09186a008067a0ef.html""".stripMargin
)