Blade(s) down-f5-False

error
health-checks
false
f5
Blade(s) down-f5-False
0

#1

Blade(s) down-f5-False

Vendor: f5

OS: False

Description:
Indeni will alert one or more blades in a chassis is down.

Remediation Steps:
Review the cause for the blades being down.

How does this work?
This script uses the F5 iControl API to retrieve the state of the blades.

Why is this important?
A blade that is not powered up could indicate a hardware issue. This could result in reduced performance, or in worst case system downtime.

Without Indeni how would you find this?
An administrator can check the status of the blades by entering TMSH and running “show sys hardware”.

f5-rest-mgmt-tm-sys-hardware

#! META
name: f5-rest-mgmt-tm-sys-hardware
description: Get hardware status metrics
type: monitoring
monitoring_interval: 5 minutes
requires:
    vendor: "f5"
    product: "load-balancer"
    rest-api: "true"

#! COMMENTS
hardware-element-status:
    why: |
        A critical aspect to track on a given device is the health of the hardware components. A power supply which stopped working or a dead fan can spell trouble down the line.
    how: |
        This alert uses the F5 iControl REST API to retrieve the health of the power components in a chassis.
    without-indeni: |
        An administrator would be able to extract this information by logging into the device through SSH, entering TMSH and executing the command "show sys hardware". The output would then show the status of each hardware element.
    can-with-snmp: true
    can-with-syslog: false
hardware-eos-date:
    why: |
        Ensuring the hardware being used is always within the vendor's list of supported models is critical. Otherwise, during a critical issue, the vendor may decline to provide technical support. indeni tracks the official list from F5 and updates this script to match.
    how: |
        This script uses the F5 iControl API to retrieve the current hardware model (the equivalent of running "show sys hardware" in TMSH), and based on the model and the F5 documentation at https://support.f5.com/csp/article/K4309 the correct end of support date is used.
    without-indeni: |
        Manual tracking by an administrator is usually the only method for knowing when a given device may be nearing its end of support and is in need of replacement.
    can-with-snmp: false
    can-with-syslog: false
serial-numbers:
    skip-documentation: true
blade-state:
    why: |
        A blade that is not powered up could indicate a hardware issue. This could result in reduced performance, or in worst case system downtime.
    how: |
        This script uses the F5 iControl API to retrieve the state of the blades.
    without-indeni: |
        An administrator can check the status of the blades by entering TMSH and running "show sys hardware".
    can-with-snmp: true
    can-with-syslog: false
model:
    why: |
        Two or more devices which operate as part of a single cluster must be running on the same hardware.
    how: |
        This script uses the F5 REST API to retrieve the hardware model of the device. Indeni then compares the result to the same script run on other members of the same cluster.
    without-indeni: |
        Manual tracking by an administrator is usually the only method for knowing when two devices are not running on the same hardware.
    can-with-snmp: false
    can-with-syslog: false

#! REMOTE::HTTP
url: /mgmt/tm/sys/hardware
protocol: HTTPS

#! PARSER::JSON

_metrics:
    -   #Collecting metrics pertaining to blade temperature
        _groups:
            "$.entries.https://localhost/mgmt/tm/sys/hardware/blade-temperature-status-index.nestedStats.entries.*.nestedStats.entries":
                _tags:
                    "im.name":
                        _constant: "hardware-element-status"
                    "im.dstype.displaytype":
                        _constant: "state"
                    "live-config":
                        _constant: "true"
                    "display-name":
                        _constant: "Hardware Element Status"
                    "im.identity-tags":
                        _constant: "name"
                _temp:
                    "temperatureIndex":
                        _value: "index.value"
                    "lowTemperatureLimit":
                        _value: "loLimit.value"
                    "highTemperatureLimit":
                        _value: "hiLimit.value"
                    "currentTemperature":
                        _value: "temperature.value"
                    "slot":
                        _value: "slot.value"
                    "temperatureLocation":
                        _value: "location.description"
        _transform:
            _tags:
                "name": |
                    {
                        #Concatenate meta data and use as name tag
                        slot = temp("slot")
                        temperatureIndex = temp("temperatureIndex")
                        temperatureLocation = temp("temperatureLocation")

                        name = "Temperature measurement - Slot: " slot " Index: " temperatureIndex " Location: " temperatureLocation

                        print name
                    }
            _value.double: |
                {
                    lowTemperatureLimit = temp("lowTemperatureLimit")
                    currentTemperature = temp("currentTemperature")
                    highTemperatureLimit = temp("highTemperatureLimit")

                    #Verify that the temperature is within the defined range
                    if(currentTemperature < highTemperatureLimit && currentTemperature > lowTemperatureLimit ){
                        print "1"
                    } else {
                        print "0"
                    }
                }
    -   #Collecting metrics pertaining to chassis temperature
        _groups:
            "$.entries.https://localhost/mgmt/tm/sys/hardware/chassis-temperature-status-index.nestedStats.entries.*.nestedStats.entries":
                _tags:
                    "im.name":
                        _constant: "hardware-element-status"
                    "im.dstype.displaytype":
                        _constant: "state"
                    "live-config":
                        _constant: "true"
                    "display-name":
                        _constant: "Hardware Elements"
                    "im.identity-tags":
                        _constant: "name"
                _temp:
                    "temperatureIndex":
                        _value: "index.value"
                    "lowTemperatureLimit":
                        _value: "loLimit.value"
                    "highTemperatureLimit":
                        _value: "hiLimit.value"
                    "currentTemperature":
                        _value: "temperature.value"
                    "temperatureLocation":
                        _value: "location.description"
        _transform:
            _tags:
                "name": |
                    {
                        #Concatenate meta data and use as name tag
                        temperatureIndex = temp("temperatureIndex")
                        temperatureLocation = temp("temperatureLocation")

                        name = "Temperature measurement - Index: " temperatureIndex " Location: " temperatureLocation

                        print name
                    }
            _value.double: |
                {
                    lowTemperatureLimit = temp("lowTemperatureLimit")
                    currentTemperature = temp("currentTemperature")
                    highTemperatureLimit = temp("highTemperatureLimit")

                    #Verify that the temperature is within the defined range
                    if(currentTemperature < highTemperatureLimit && currentTemperature > lowTemperatureLimit ){
                        print "1"
                    } else {
                        print "0"
                    }
                }
    -   #Collecting metrics pertaining to chassis fans
        _groups:
            "$.entries.https://localhost/mgmt/tm/sys/hardware/chassis-fan-status-index.nestedStats.entries.*.nestedStats.entries":
                _tags:
                    "im.name":
                        _constant: "hardware-element-status"
                    "im.dstype.displaytype":
                        _constant: "state"
                    "live-config":
                        _constant: "true"
                    "display-name":
                        _constant: "Hardware Elements"
                    "im.identity-tags":
                        _constant: "name"
                _temp:
                    "index":
                        _value: "index.value"
                    "status":
                        _value: "status.description"
        _transform:
            _tags:
                "name": |
                    {
                        name = "fan-" temp("index")
                        print name
                    }
            _value.double: |
                {
                    if(temp("status") == "up") { print "1" } else { print "0" }
                }
    -   #Collecting metrics pertaining to power supplies
        _groups:
            "$.entries.https://localhost/mgmt/tm/sys/hardware/chassis-power-supply-status-index.nestedStats.entries.*.nestedStats.entries[?(@.status.description != 'not-present')]":
                _tags:
                    "im.name":
                        _constant: "hardware-element-status"
                    "im.dstype.displaytype":
                        _constant: "state"
                    "live-config":
                        _constant: "true"
                    "display-name":
                        _constant: "Hardware Elements"
                    "im.identity-tags":
                        _constant: "name"
                _temp:
                    "index":
                        _value: "index.value"
                    "status":
                        _value: "status.description"
        _transform:
            _tags:
                "name": |
                    {
                        name = "psu-" temp("index")
                        print name
                    }
            _value.double: |
                {
                    if(temp("status") == "up") { print "1" } else { print "0" }
                }
    -   #Collecting metrics pertaining to blade state
        _groups:
            "$.entries.https://localhost/mgmt/tm/sys/hardware/slot-status-index.nestedStats.entries.*.nestedStats.entries":
                _tags:
                    "im.name":
                        _constant: "blade-state"
                    "im.dstype.displaytype":
                        _constant: "state"
                    "live-config":
                        _constant: "true"
                    "display-name":
                        _constant: "Blades"
                    "im.identity-tags":
                        _constant: "name"
                _temp:
                    "slot":
                        _value: "slot.value"
                    "status":
                        _value: "status.description"
        _transform:
            _tags:
                "name": |
                    {
                        name = "blade-" temp("slot")
                        print name
                    }
            _value.double: |
                {
                    if(temp("status") == "powered-up") { print "1" } else { print "0" }
                }
    -   #Collecting metrics pertaining to hardware end of support date
        _groups:
            "$.entries.https://localhost/mgmt/tm/sys/hardware/system-info.nestedStats.entries.*.nestedStats.entries[?(@.platform.description in ['D35','C114','C36','C102','C62','C100','C103','C106','D101','D43','D63','D68','D104','D84','D88','D106','D107','E101','E102','J100','J101','A109','A100','PB100','A105','A107','PB200','A111','D38','D46','D39','D45','D44','D50','D51c','D51f'])]":
                _tags:
                    "im.name":
                        _constant: "hardware-eos-date"
                    "live-config":
                        _constant: "true"
                    "display-name":
                        _constant: "Hardware End of Support"
                    "im.dstype.displayType":
                        _constant: "date"
                    "serial":
                        _value: "hostBoardSerialNum.description"
                _temp:
                    "platform":
                        _value: "platform.description"
        _transform:
            _value.double: |
                {
                    #This array contains entries with "No date set" but they are not in the query above
                    #Just here to show that they were in the table on F5, but did not have a date set yet.
                    #When adding a platform that has a date to this list you must update BOTH the dictionary below and the query above

                    EndOfPlatformTechnicalSupport["D35"] = "2014-10-01"
                    EndOfPlatformTechnicalSupport["C114"] = "2024-01-31"
                    EndOfPlatformTechnicalSupport["C36"] = "2016-07-31"
                    EndOfPlatformTechnicalSupport["C102"] = "2021-10-01"
                    EndOfPlatformTechnicalSupport["C112"] = "No Date Set"
                    EndOfPlatformTechnicalSupport["C62"] = "2016-07-31"
                    EndOfPlatformTechnicalSupport["C100"] = "2016-07-31"
                    EndOfPlatformTechnicalSupport["C103"] = "2021-10-01"
                    EndOfPlatformTechnicalSupport["C106"] = "2022-02-01"
                    EndOfPlatformTechnicalSupport["C113"] = "No Date Set"
                    EndOfPlatformTechnicalSupport["D101"] = "2017-06-01"
                    EndOfPlatformTechnicalSupport["D43"] = "2016-07-01"
                    EndOfPlatformTechnicalSupport["C109"] = "No Date Set"
                    EndOfPlatformTechnicalSupport["D63"] = "2016-12-31"
                    EndOfPlatformTechnicalSupport["D68"] = "2016-12-01"
                    EndOfPlatformTechnicalSupport["D104"] = "2022-02-01"
                    EndOfPlatformTechnicalSupport["D110"] = "No Date Set"
                    EndOfPlatformTechnicalSupport["D84"] = "2017-07-01"
                    EndOfPlatformTechnicalSupport["D88"] = "2017-07-01"
                    EndOfPlatformTechnicalSupport["D106"] = "2022-02-01"
                    EndOfPlatformTechnicalSupport["D107"] = "2022-02-01"
                    EndOfPlatformTechnicalSupport["D113"] = "No Date Set"
                    EndOfPlatformTechnicalSupport["D112"] = "No Date Set"
                    EndOfPlatformTechnicalSupport["E101"] = "2023-04-01"
                    EndOfPlatformTechnicalSupport["E102"] = "2023-04-01"
                    EndOfPlatformTechnicalSupport["F100"] = "No Date Set"
                    EndOfPlatformTechnicalSupport["F101"] = "No Date Set"
                    EndOfPlatformTechnicalSupport["D114"] = "No Date Set"
                    EndOfPlatformTechnicalSupport["J100"] = "2021-04-01"
                    EndOfPlatformTechnicalSupport["J101"] = "2021-07-01"
                    EndOfPlatformTechnicalSupport["J102"] = "No Date Set"
                    EndOfPlatformTechnicalSupport["S100"] = "No Date Set"
                    EndOfPlatformTechnicalSupport["101"] = "No Date Set"
                    EndOfPlatformTechnicalSupport["A109"] = "2022-10-01"
                    EndOfPlatformTechnicalSupport["A113"] = "No Date Set"
                    EndOfPlatformTechnicalSupport["A112"] = "No Date Set"
                    EndOfPlatformTechnicalSupport["A100"] = "2019-06-30"
                    EndOfPlatformTechnicalSupport["PB100"] = "2019-06-30"
                    EndOfPlatformTechnicalSupport["A105"] = "2019-06-30"
                    EndOfPlatformTechnicalSupport["A107"] = "2021-04-01"
                    EndOfPlatformTechnicalSupport["PB200"] = "2021-04-01"
                    EndOfPlatformTechnicalSupport["A111"] = "2021-07-01"
                    EndOfPlatformTechnicalSupport["A108"] = "No Date Set"
                    EndOfPlatformTechnicalSupport["A110"] = "No Date Set"
                    EndOfPlatformTechnicalSupport["A114"] = "No Date Set"
                    EndOfPlatformTechnicalSupport["D38"] = "2017-03-01"
                    EndOfPlatformTechnicalSupport["D46"] = "2016-07-30"
                    EndOfPlatformTechnicalSupport["D39"] = "2012-12-31"
                    EndOfPlatformTechnicalSupport["D45"] = "2008-06-30"
                    EndOfPlatformTechnicalSupport["D44"] = "2013-03-31"
                    EndOfPlatformTechnicalSupport["D50"] = "2008-06-30"
                    EndOfPlatformTechnicalSupport["D51c"] = "2013-03-31"
                    EndOfPlatformTechnicalSupport["D51f"] = "2013-03-31"
                    EndOfPlatformTechnicalSupport["C119"] = "No Date Set"

                    platform = temp("platform")

                    split(EndOfPlatformTechnicalSupport[platform], dateArr, /-/)
                    secondsSinceEpoch = date(dateArr[1], dateArr[2], dateArr[3])
                    print secondsSinceEpoch

                }
    -   #Collecting metrics pertaining to chassis fans
        _groups:
            "$.entries.https://localhost/mgmt/tm/sys/hardware/system-info.nestedStats.entries.*.nestedStats.entries":
                _tags:
                    "im.name":
                        _constant: "serial-numbers"
                _value.complex:
                    "name":
                        _constant: "chassis"
                    "serialnumber":
                        _value: "bigipChassisSerialNum.description"
        #  "serialnumber" : "chs000000s",
        #  "name" : "chassis"
        _value: complex-array
    -
        _tags:
            "im.name":
                _constant: "model"
        _temp:
            "model":
                _value: "$.entries.https://localhost/mgmt/tm/sys/hardware/platform.nestedStats.entries.*.nestedStats.entries.marketingName.description"
        _transform:
            _value.complex:
                value: |
                    {
                        model = temp("model")
                        sub(/BIG-IP /, "", model)
                        print model
                    }

chassis_blade_down

package com.indeni.server.rules.library.templatebased.crossvendor

import com.indeni.server.rules.RuleContext
import com.indeni.server.rules.library.{ConditionalRemediationSteps, StateDownTemplateRule}

/**
  *
  */
case class chassis_blade_down() extends StateDownTemplateRule(
  ruleName = "chassis_blade_down",
  ruleFriendlyName = "Chassis Devices: Blade(s) down",
  ruleDescription = "Indeni will alert one or more blades in a chassis is down.",
  metricName = "blade-state",
  applicableMetricTag = "name",
  alertItemsHeader = "Blades Affected",
  alertDescription = "One or more blades in this chassis are down.",
  baseRemediationText = "Review the cause for the blades being down.")(
  ConditionalRemediationSteps.VENDOR_CP -> "If the blade was not stopped intentionally (admin down), check to see it wasn't disconnected physically.",
  ConditionalRemediationSteps.OS_NXOS ->
    """|
      |Most of the module related failures (such as the module not coming up, the module getting reloaded, and so on) can be analyzed by looking at the logs stored on the switch. Use the following CLI commands to identify the problem:
      |•show system reset-reason module
      |•show version
      |•show logging
      |•show module internal exception-log
      |•show module internal event-history module
      |•show module internal event-history errors
      |•show platform internal event-history errors
      |•show platform internal event-history module
      |Further details can be found to the next CISCO troubleshooting guide:
      |https://www.cisco.com/en/US/products/ps5989/prod_troubleshooting_guide_chapter09186a008067a0ef.html""".stripMargin
)