Indeni will alert one or more blades in a chassis is down.
Review the cause for the blades being down.
How does this work?
This script uses the F5 iControl API to retrieve the state of the blades.
Why is this important?
A blade that is not powered up could indicate a hardware issue. This could result in reduced performance, or in worst case system downtime.
Without Indeni how would you find this?
An administrator can check the status of the blades by entering TMSH and running “show sys hardware”.
name: f5-rest-mgmt-tm-sys-hardware description: Get hardware status metrics type: monitoring monitoring_interval: 5 minutes requires: vendor: f5 product: load-balancer rest-api: 'true' comments: hardware-element-status: why: | A critical aspect to track on a given device is the health of the hardware components. A power supply which stopped working or a dead fan can spell trouble down the line. how: | This alert uses the F5 iControl REST API to retrieve the health of the power components in a chassis. can-with-snmp: true can-with-syslog: false hardware-eos-date: why: | Ensuring the hardware being used is always within the vendor's list of supported models is critical. Otherwise, during a critical issue, the vendor may decline to provide technical support. indeni tracks the official list from F5 and updates this script to match. how: | This script uses the F5 iControl API to retrieve the current hardware model (the equivalent of running "show sys hardware" in TMSH), and based on the model and the F5 documentation at https://support.f5.com/csp/article/K4309 the correct end of support date is used. can-with-snmp: false can-with-syslog: false serial-numbers: why: | Capture the device's serial number. This makes inventory tracking and opening support cases with the vendor easier. how: | This script uses the F5 iControl API to retrieve the serial number. can-with-snmp: false can-with-syslog: false blade-state: why: | A blade that is not powered up could indicate a hardware issue. This could result in reduced performance, or in worst case system downtime. how: | This script uses the F5 iControl API to retrieve the state of the blades. can-with-snmp: true can-with-syslog: false model: why: | Two or more devices which operate as part of a single cluster must be running on the same hardware. how: | This script uses the F5 REST API to retrieve the hardware model of the device. Indeni then compares the result to the same script run on other members of the same cluster. can-with-snmp: false can-with-syslog: false steps: - run: type: HTTP command: /mgmt/tm/sys/hardware parse: type: JSON file: rest-mgmt-tm-sys-hardware.parser.1.json.yaml
// Deprecation warning : Scala template-based rules are deprecated. Please use YAML format rules instead. package com.indeni.server.rules.library.templatebased.crossvendor import com.indeni.server.rules.RuleContext import com.indeni.server.rules.library.templates.StateDownTemplateRule import com.indeni.server.rules.RemediationStepCondition /** * */ case class chassis_blade_down() extends StateDownTemplateRule( ruleName = "chassis_blade_down", ruleFriendlyName = "Chassis Devices: Blade(s) down", ruleDescription = "Indeni will alert one or more blades in a chassis is down.", metricName = "blade-state", applicableMetricTag = "name", alertItemsHeader = "Blades Affected", alertDescription = "One or more blades in this chassis are down.", baseRemediationText = "Review the cause for the blades being down.")( RemediationStepCondition.VENDOR_CP -> "If the blade was not stopped intentionally (admin down), check to see it wasn't disconnected physically.", RemediationStepCondition.VENDOR_CISCO -> """| |Most of the module related failures (such as the module not coming up, the module getting reloaded, and so on) can be analyzed by looking at the logs stored on the switch. Use the following CLI commands to identify the problem: |•show system reset-reason module |•show version |•show logging |•show module internal exception-log |•show module internal event-history module |•show module internal event-history errors |•show platform internal event-history errors |•show platform internal event-history module |Further details can be found to the next CISCO troubleshooting guide: |https://www.cisco.com/en/US/products/ps5989/prod_troubleshooting_guide_chapter09186a008067a0ef.html""".stripMargin )