Hardware element down-f5-all
Alert if any hardware elements are not operating correctly.
Troubleshoot the hardware element as soon as possible.
How does this work?
This alert uses the F5 iControl REST API to retrieve the health of the power components in a chassis.
Why is this important?
A critical aspect to track on a given device is the health of the hardware components. A power supply which stopped working or a dead fan can spell trouble down the line.
Without Indeni how would you find this?
An administrator would be able to extract this information by logging into the device through SSH, entering TMSH and executing the command “show sys hardware”. The output would then show the status of each hardware element.
name: f5-rest-mgmt-tm-sys-hardware description: Get hardware status metrics type: monitoring monitoring_interval: 5 minutes requires: vendor: f5 product: load-balancer rest-api: 'true' comments: hardware-element-status: why: | A critical aspect to track on a given device is the health of the hardware components. A power supply which stopped working or a dead fan can spell trouble down the line. how: | This alert uses the F5 iControl REST API to retrieve the health of the power components in a chassis. without-indeni: | An administrator would be able to extract this information by logging into the device through SSH, entering TMSH and executing the command "show sys hardware". The output would then show the status of each hardware element. can-with-snmp: true can-with-syslog: false hardware-eos-date: why: | Ensuring the hardware being used is always within the vendor's list of supported models is critical. Otherwise, during a critical issue, the vendor may decline to provide technical support. indeni tracks the official list from F5 and updates this script to match. how: | This script uses the F5 iControl API to retrieve the current hardware model (the equivalent of running "show sys hardware" in TMSH), and based on the model and the F5 documentation at https://support.f5.com/csp/article/K4309 the correct end of support date is used. without-indeni: | Manual tracking by an administrator is usually the only method for knowing when a given device may be nearing its end of support and is in need of replacement. can-with-snmp: false can-with-syslog: false serial-numbers: skip-documentation: true blade-state: why: | A blade that is not powered up could indicate a hardware issue. This could result in reduced performance, or in worst case system downtime. how: | This script uses the F5 iControl API to retrieve the state of the blades. without-indeni: | An administrator can check the status of the blades by entering TMSH and running "show sys hardware". can-with-snmp: true can-with-syslog: false model: why: | Two or more devices which operate as part of a single cluster must be running on the same hardware. how: | This script uses the F5 REST API to retrieve the hardware model of the device. Indeni then compares the result to the same script run on other members of the same cluster. without-indeni: | Manual tracking by an administrator is usually the only method for knowing when two devices are not running on the same hardware. can-with-snmp: false can-with-syslog: false steps: - run: type: HTTP command: /mgmt/tm/sys/hardware parse: type: JSON file: rest-mgmt-tm-sys-hardware.parser.1.json.yaml
// Deprecation warning : Scala template-based rules are deprecated. Please use YAML format rules instead. package com.indeni.server.rules.library.templatebased.crossvendor import com.indeni.server.rules.RuleContext import com.indeni.server.rules.library.templates.StateDownTemplateRule import com.indeni.server.rules.RemediationStepCondition /** * */ case class cross_vendor_hardware_element_status() extends StateDownTemplateRule( ruleName = "cross_vendor_hardware_element_status", ruleFriendlyName = "All Devices: Hardware element down", ruleDescription = "Alert if any hardware elements are not operating correctly.", metricName = "hardware-element-status", applicableMetricTag = "name", alertItemsHeader = "Hardware Elements Affected", alertDescription = "The hardware elements listed below are not operating correctly.", baseRemediationText = "Troubleshoot the hardware element as soon as possible.")( RemediationStepCondition.VENDOR_CISCO -> """ |While the port may be in up status, the link quality might be degraded and is not between the threshold levels. Check the following to troubleshoot this issue. |1. Run the “show interface transceiver detailed” NX-OS command to display information about the transceivers connected to a specific interface. Besides, this NX-OS command output provides information about the Cisco SFP Product ID (PID). NOTE: In case that have been used 3rd party SFPs it is possible to get an Indeni alert because the current light signal is different than the recommended min/max thresholds defined by Cisco. |2. Use the “show interface transceiver calibrations” NX-OS command to display calibration information for the transceiver interfaces. |3. Consider to enable DOM (if supported). Digital Optical Monitoring or DOM is an industry wide standard, intended to define a SFP to access real-time operating parameters such as Tx power, Rx power etc. More details can be found below: https://www.cisco.com/c/en/us/td/docs/interfaces_modules/transceiver_modules/compatibility/matrix/DOM_matrix.html |4. Cisco has published official specifications (Rx, Tx power level etc) per transceiver category and can be found at the following link: https://www.cisco.com/c/en/us/products/interfaces-modules/transceiver-modules/index.""".stripMargin, RemediationStepCondition.VENDOR_FORTINET -> """ |1. Login via ssh to the Fortinet firewall and run the FortiOS command "exec sensor list" to review the status of the hardware components and temperature thresholds. When the flag to the command output is set to 0, the component is working correctly and when flag is set to 1, the component has a problem. The FortiOS command "execute sensor detail" will show extra information such as the low/high thresholds. More details can be found here: http://kb.fortinet.com/kb/viewContent.do?externalId=FD36793&sliceId=1 |2. Consider running the fotrinet hardware diagnostics commands. While they do not detect all hardware malfunctions, tests for the most common hardware problems are performed. More details can be found here: |- http://kb.fortinet.com/kb/viewContent.do?externalId=FD39581&sliceId=1 |- http://kb.fortinet.com/kb/documentLink.do?externalID=FD34745 |3. It is recommended that any failed fan or power supply unit should be replaced immediately. |4. The cooling system for the devices should be installed to avoid overheat. |5. If the problem persists, contact Fortinet Technical support at https://support.fortinet.com/ for further assistance.""".stripMargin )