Hardware element down-checkpoint-ipso

Hardware element down-checkpoint-ipso
0

Hardware element down-checkpoint-ipso

Vendor: checkpoint

OS: ipso

Description:
Alert if any hardware elements are not operating correctly.

Remediation Steps:
Troubleshoot the hardware element as soon as possible.

How does this work?
Use clish “show sysenv all” command to list hardware health.

Why is this important?
It is not uncommon for hardware components to fail inside a device without the device itself failing. In such an event they need to be replaced quickly.

Without Indeni how would you find this?
An administrator could login and manually run the command.

chkp-ipso-show_sysenv

#! META
name: chkp-ipso-show_sysenv
description: Show list of hardware status
type: monitoring
monitoring_interval: 5 minutes
requires:
    vendor: checkpoint
    os.name: ipso
    asg:
        neq: true

#! COMMENTS
hardware-element-status:
    why: |
        It is not uncommon for hardware components to fail inside a device without the device itself failing. In such an event they need to be replaced quickly.
    how: |
        Use clish "show sysenv all" command to list hardware health.
    without-indeni: |
        An administrator could login and manually run the command.
    can-with-snmp: false
    can-with-syslog: false
    vendor-provided-management: |
        Listing hardware health is only available from the command line interface and WebUI.

#! REMOTE::SSH
stty rows 80 ; /usr/bin/nice -n 15 clish -c "show sysenv all"

#! PARSER::AWK

# The following two sections has been added by request of Dan Shouky
# https://indeni.atlassian.net/browse/IKP-1221

# Unfortunately, the following code is duplicated in many .ind scripts.
# If you change something in the following two sections, please find all
# of the other instances of this code and make the change there also.

#Could not acquire the config lock
/Could not acquire the config lock/ {
	if (NR == 1) {
		next
	}
}

#CLINFR0829  Unable to get user permissions
#CLINFR0819  User: johndoe denied access via CLI
#CLINFR0599  Failed to build ACLs
/(CLINFR0829\s+Unable to get user permissions|CLINFR0819\s+User: .+ denied access via CLI|CLINFR0599\s+Failed to build ACLs)/ {
	exit
}

# The command returns a variety of tables, with a variety of columns. The script here will need to
# address them accordingly.

/(Location|ID)/ {
	delete columns
	getColumns(trim($0), "[ \t]+", columns)
}

# 1       SYS_FAN1  Normal  109            108           155  
# PS-A  Yes      n/a     n/a     OK      0  
# 1       SYSTEM    Good    40             75          1  
# 1       3.3V      Good    3.300    3.266     -0.034    3.096     3.487     
/[0-9]/ {
	name = getColData(trim($0), columns, "Location")
	if (name == "" || name == null) {
		name = getColData(trim($0), columns, "ID")   # The power supply status table is different
	}

	statusName = getColData(trim($0), columns, "Status")

	if (statusName == "Good" || statusName == "OK" || statusName == "Normal") {
		status = 1
	} else {
		status = 0
	}
	hwTags["name"] = name
	writeDoubleMetricWithLiveConfig("hardware-element-status", hwTags, "gauge", 300, status, "Hardware Status", "state", "name")
}


cross_vendor_hardware_element_status

package com.indeni.server.rules.library.templatebased.crossvendor

import com.indeni.server.rules.RuleContext
import com.indeni.server.rules.library.ConditionalRemediationSteps
import com.indeni.server.rules.library.templates.StateDownTemplateRule

/**
  *
  */
case class cross_vendor_hardware_element_status() extends StateDownTemplateRule(
  ruleName = "cross_vendor_hardware_element_status",
  ruleFriendlyName = "All Devices: Hardware element down",
  ruleDescription = "Alert if any hardware elements are not operating correctly.",
  metricName = "hardware-element-status",
  applicableMetricTag = "name",
  alertItemsHeader = "Hardware Elements Affected",
  alertDescription = "The hardware elements listed below are not operating correctly.",
  baseRemediationText = "Troubleshoot the hardware element as soon as possible.")(
  ConditionalRemediationSteps.OS_NXOS ->
    """|While the port may be in up status, the link quality might be degraded and is not between the threshold levels. Check the following to troubleshoot this issue.
       |1.	Run the “show interface transceiver detailed” NX-OS command to display information about the transceivers connected to a specific interface. Besides, this NX-OS command output provides information about the Cisco SFP Product ID (PID). NOTE: In case that have been used 3rd party SFPs it is possible to get an Indeni alert because the current light signal is different than the recommended min/max thresholds defined by Cisco.
       |2.	Use the “show interface transceiver calibrations” NX-OS command to display calibration information for the transceiver interfaces.
       |3.	Consider to enable DOM (if supported). Digital Optical Monitoring or DOM is an industry wide standard, intended to define a SFP to access real-time operating parameters such as Tx power, Rx power etc. More details can be found below: https://www.cisco.com/c/en/us/td/docs/interfaces_modules/transceiver_modules/compatibility/matrix/DOM_matrix.html
       |4.	Cisco has published official specifications (Rx, Tx power level etc) per transceiver category and can be found at the following link:
        https://www.cisco.com/c/en/us/products/interfaces-modules/transceiver-modules/index.""".stripMargin,
  ConditionalRemediationSteps.VENDOR_FORTINET ->
    """
      |1. Login via ssh to the Fortinet firewall and run the FortiOS command "exec sensor list" to review the status of the hardware components and temperature
      |>>> thresholds. When the flag to the command output is set to 0, the component is working correctly and when flag is set to 1, the component has a problem.
      |>>> The FortiOS command "execute sensor detail" will show extra information such as the low/high thresholds. More details can be found here:
      |>>> http://kb.fortinet.com/kb/viewContent.do?externalId=FD36793&sliceId=1
      |2. Consider running the fotrinet hardware diagnostics commands. While they do not detect all hardware malfunctions, tests for the most common hardware
      |>>> problems are performed. More details can be found here:
      |- http://kb.fortinet.com/kb/viewContent.do?externalId=FD39581&sliceId=1
      |- http://kb.fortinet.com/kb/documentLink.do?externalID=FD34745
      |3. It is recommended that any failed fan or power supply unit should be replaced immediately.
      |4. The cooling system for the devices should be installed to avoid overheat.
      |5. If the problem persists, contact Fortinet Technical support at https://support.fortinet.com/ for further assistance.""".stripMargin.replaceAll("\n>>>", "")
)