Blade(s) down-f5-all

Blade(s) down-f5-all
0

Blade(s) down-f5-all

Vendor: f5

OS: all

Description:
Indeni will alert one or more blades in a chassis is down.

Remediation Steps:
Review the cause for the blades being down.

How does this work?
This script uses the F5 iControl API to retrieve the state of the blades.

Why is this important?
A blade that is not powered up could indicate a hardware issue. This could result in reduced performance, or in worst case system downtime.

Without Indeni how would you find this?
An administrator can check the status of the blades by entering TMSH and running “show sys hardware”.

f5-rest-mgmt-tm-sys-hardware

name: f5-rest-mgmt-tm-sys-hardware
description: Get hardware status metrics
type: monitoring
monitoring_interval: 5 minutes
requires:
    vendor: f5
    product: load-balancer
    rest-api: 'true'
comments:
    hardware-element-status:
        why: |
            A critical aspect to track on a given device is the health of the hardware components. A power supply which stopped working or a dead fan can spell trouble down the line.
        how: |
            This alert uses the F5 iControl REST API to retrieve the health of the power components in a chassis.
        can-with-snmp: true
        can-with-syslog: false
    hardware-eos-date:
        why: |
            Ensuring the hardware being used is always within the vendor's list of supported models is critical. Otherwise, during a critical issue, the vendor may decline to provide technical support. indeni tracks the official list from F5 and updates this script to match.
        how: |
            This script uses the F5 iControl API to retrieve the current hardware model (the equivalent of running "show sys hardware" in TMSH), and based on the model and the F5 documentation at https://support.f5.com/csp/article/K4309 the correct end of support date is used.
        can-with-snmp: false
        can-with-syslog: false
    serial-numbers:
        why: |
            Capture the device's serial number. This makes inventory tracking and opening support cases with the vendor easier.
        how: |
            This script uses the F5 iControl API to retrieve the serial number.
        can-with-snmp: false
        can-with-syslog: false
    blade-state:
        why: |
            A blade that is not powered up could indicate a hardware issue. This could result in reduced performance, or in worst case system downtime.
        how: |
            This script uses the F5 iControl API to retrieve the state of the blades.
        can-with-snmp: true
        can-with-syslog: false
    model:
        why: |
            Two or more devices which operate as part of a single cluster must be running on the same hardware.
        how: |
            This script uses the F5 REST API to retrieve the hardware model of the device. Indeni then compares the result to the same script run on other members of the same cluster.
        can-with-snmp: false
        can-with-syslog: false
steps:
-   run:
        type: HTTP
        command: /mgmt/tm/sys/hardware
    parse:
        type: JSON
        file: rest-mgmt-tm-sys-hardware.parser.1.json.yaml

chassis_blade_down

// Deprecation warning : Scala template-based rules are deprecated. Please use YAML format rules instead.

package com.indeni.server.rules.library.templatebased.crossvendor

import com.indeni.server.rules.RuleContext
import com.indeni.server.rules.library.templates.StateDownTemplateRule
import com.indeni.server.rules.RemediationStepCondition

/**
  *
  */
case class chassis_blade_down() extends StateDownTemplateRule(
  ruleName = "chassis_blade_down",
  ruleFriendlyName = "Chassis Devices: Blade(s) down",
  ruleDescription = "Indeni will alert one or more blades in a chassis is down.",
  metricName = "blade-state",
  applicableMetricTag = "name",
  alertItemsHeader = "Blades Affected",
  alertDescription = "One or more blades in this chassis are down.",
  baseRemediationText = "Review the cause for the blades being down.")(
  RemediationStepCondition.VENDOR_CP -> "If the blade was not stopped intentionally (admin down), check to see it wasn't disconnected physically.",
  RemediationStepCondition.VENDOR_CISCO ->
    """|
      |Most of the module related failures (such as the module not coming up, the module getting reloaded, and so on) can be analyzed by looking at the logs stored on the switch. Use the following CLI commands to identify the problem:
      |•show system reset-reason module
      |•show version
      |•show logging
      |•show module internal exception-log
      |•show module internal event-history module
      |•show module internal event-history errors
      |•show platform internal event-history errors
      |•show platform internal event-history module
      |Further details can be found to the next CISCO troubleshooting guide:
      |https://www.cisco.com/en/US/products/ps5989/prod_troubleshooting_guide_chapter09186a008067a0ef.html""".stripMargin
)