Cluster down-paloaltonetworks-panos
Vendor: paloaltonetworks
OS: panos
Description:
Indeni will alert if a cluster is down or any of the members are inoperable.
Remediation Steps:
Review the cause for one or more members being down or inoperable.
Log into the device over SSH and run “less mp-log ha-agent.log” for more information.",
How does this work?
This script uses the Palo Alto Networks API to retrieve the status of the high availability function of the cluster and specifically retrieves the local member’s and peer’s states.
Why is this important?
Tracking the state of a cluster is important. If a cluster which used to be healthy no longer is, it may be the result of an issue. In some cases, it is due to maintenance work (and so was anticipated), but in others it may be due to a failure in the members of the cluster or another component in the network.
Without Indeni how would you find this?
The status of high availability is visible in the web interface, as a widget on the main screen.
panos-show-high-availability-all-monitoring-panorama
name: panos-show-high-availability-all-monitoring-panorama
description: Track health of HA
type: monitoring
monitoring_interval: 5 minute
requires:
vendor: paloaltonetworks
os.name: panos
high-availability: 'true'
product: panorama
comments:
cluster-member-active:
why: |
Tracking the state of a cluster member is important. If a cluster member which used to be the active member of the cluster no longer is, it may be the result of an issue. In some cases, it is due to maintenance work (and so was anticipated), but in others it may be due to a failure in the firewall or another component in the network.
how: |
This script uses the Palo Alto Networks API to retrieve the status of the high availability function of the firewall and specifically retrieves the local member's state.
can-with-snmp: true
can-with-syslog: true
cluster-state:
why: |
Tracking the state of a cluster is important. If a cluster which used to be healthy no longer is, it may be the result of an issue. In some cases, it is due to maintenance work (and so was anticipated), but in others it may be due to a failure in the members of the cluster or another component in the network.
how: |
This script uses the Palo Alto Networks API to retrieve the status of the high availability function of the cluster and specifically retrieves the local member's and peer's states.
can-with-snmp: true
can-with-syslog: true
cluster-preemption-enabled:
why: |
Preemption is a function in clustering which sets a primary member of the cluster to always strive to be the active member. The trouble with this is that if the active member that is set with preemption on has a critical failure and reboots, the cluster will fail over to the secondary and then immediately fail over back to the primary when it completes the reboot. This can result in another crash and the process would happen again and again in a loop. The Palo Alto Networks firewalls have a means of dealing with this ( https://live.paloaltonetworks.com/t5/Learning-Articles/Understanding-Preemption-with-the-Configured-Device-Priority-in/ta-p/53398 ) but it is generally a good idea not to have the preemption feature enabled.
how: |
This script uses the Palo Alto Networks API to retrieve the status of the high availability function of this cluster member and specifically the preemption setting.
can-with-snmp: true
can-with-syslog: true
cluster-config-synced:
why: |
Normally two Palo Alto Networks firewalls in a cluster work together to ensure their configurations are synchronized. Sometimes, due to connectivity or other issues, the configuration sync may be lost. In the event of a fail over, the secondary member will take over but will be running with a different configuration compared to the primary (the original active member). This can result in service disruption.
how: |
This script uses the Palo Alto Networks API to retrieve the status of the high availability function of this cluster and specifically the status of the config synchronization.
can-with-snmp: true
can-with-syslog: true
device-is-passive:
why: |
This metric describe whether this device is a passive device. For passive device, port down alert should not be triggered.
how: |
This script uses the Palo Alto Networks API to retrieve the active/passive state of the device.
can-with-snmp: true
can-with-syslog: true
steps:
- run:
type: HTTP
command: /api?type=op&cmd=<show><high-availability><all></all></high-availability></show>&key=${api-key}
parse:
type: XML
file: show-high-availability-all-monitoring-panorama.parser.1.xml.yaml
cross_vendor_cluster_down_vsx
Failed to fetch the data: https://bitbucket.org/indeni/indeni-knowledge/src/master/rules/templatebased/crossvendor/cross_vendor_cluster_down_vsx.scala