HA heartbeat link does not have at least one more operational redundant link-fortinet-FortiOS

discobot · July 24, 2019, 8:28pm

HA heartbeat link does not have at least one more operational redundant link-fortinet-FortiOS

Vendor: fortinet

OS: FortiOS

Description:
The number of the operational heartbeat links are less than two (no redundancy). If heartbeat communication fails, all cluster members will think they are the primary unit resulting in multiple devices on the network with the same IP addresses and MAC addresses (condition referred to as Split Brain) and communication will be disrupted until heartbeat communication can be reestablished.

Remediation Steps:

You should configure and connect redundant heartbeat interfaces so that if one heartbeat interface fails or becomes disconnected, HA heartbeat traffic can continue to be transmitted using the backup heartbeat interface.
|2. By default, for most FortiGate models two interfaces are configured to be heartbeat interfaces. Selecting more heartbeat interfaces increases reliability. You can select up to 8 heartbeat interfaces. This limit only applies to FortiGates with more than 8 physical interfaces.
|3. It is recommended to isolate heartbeat interfaces from user networks. Heartbeat packets contain sensitive cluster configuration information and can consume a considerable amount of network bandwidth.
|4. More information can be found here: https://help.fortinet.com/fos50hlp/54/Content/FortiOS/fortigate-high-availability-52/HA_failoverHeartbeat.htm

How does this work?
The script runs the FortiOS command “get system ha status” to retrieve HA heartbeat interfaces information.

Why is this important?
This metric captures if the HA heartbeat keeps cluster units communicating with each other. It is highly recommended by the vendor to have minimum two interfaces per fortinet firewall as heartbeat interfaces (two links). If all the heartbeat interfaces are down, it will cause a severe network outage. This metric checks to see if there are at least two operational HA heartbeat interfaces per firewall (i.e. two links). More details can be found here: https://help.fortinet.com/fos50hlp/54/Content/FortiOS/fortigate-high-availability-52/HA_failoverHeartbeat.htm

Without Indeni how would you find this?
An administrator can run the FortiOS command “get system ha status” via SSH connection to retrieve the same information.

fortios-get-system-ha-status

name: fortios-get-system-ha-status
description: FortiGate Cluster High Availability
type: monitoring
monitoring_interval: 5 minutes
requires:
    vendor: fortinet
    os.name: FortiOS
    product: firewall
    high-availability: true
comments:
    ha-health-status:
        why: |
            Indicates if all cluster units are operating normally (OK) or if a problem was detected with the cluster. For example, a message similar to ERROR <serial-number> is lost @ <date> <time> appears if one the subordinate units leaves the cluster. More details can be found here: https://help.fortinet.com/fos50hlp/54/Content/FortiOS/fortigate-high-availability-52/HA_operating.htm
        how: |
            The script runs the FortiOS command "get system ha status" to retrieve High Availability status information.
        can-with-snmp: true
        can-with-syslog: true
    ha-health-mode:
        why: |
            This metric collects and displays the HA mode of the cluster, for example, HA A-P or HA A-A. This metric should be the same for all the members of the cluster. More details can be found here: https://help.fortinet.com/fos50hlp/54/Content/FortiOS/fortigate-high-availability-52/HA_operating.htm
        how: |
            The script runs the FortiOS command "get system ha status" to retrieve HA mode information.
        can-with-snmp: true
        can-with-syslog: false
    ha-group-id:
        why: |
            This metric captures the configured group ID of the cluster which should be the same for all the members of the cluster. HA problems or dropped traffic may be noticed in case of misconfiguration of the group id. More details can be found here: https://help.fortinet.com/fos50hlp/54/Content/FortiOS/fortigate-high-availability-52/HA_operating.htm
        how: |
            The script runs the FortiOS command "get system ha status" to retrieve group id information.
        can-with-snmp: true
        can-with-syslog: false
    ha-debug-status:
        why: |
            This metric captures the HA debug status of the cluster. It is recommended to not be enabled for all the members of the cluster to avoid additional HW resourse consumption. More details can be found here: https://help.fortinet.com/fos50hlp/54/Content/FortiOS/fortigate-high-availability-52/HA_operating.htm
        how: |
            The script runs the FortiOS command "get system ha status" to retrieve debug status information.
        can-with-snmp: false
        can-with-syslog: false
    ha-cluster-uptime:
        why: |
            This metric captures the number of days, hours, minutes, and seconds that the cluster has been operating. Any unexpected low uptime should be troubleshot and investigated. More details can be found here: https://help.fortinet.com/fos50hlp/54/Content/FortiOS/fortigate-high-availability-52/HA_operating.htm
        how: |
            The script runs the FortiOS command "get system ha status" to retrieve uptime information.
        can-with-snmp: true
        can-with-syslog: false
    ha-session-pickup:
        why: |
            This metric captures the status of session pickup (enabled or disabled). When session-pickup is enabled, the FGCP synchronizes the primary unit's TCP session table to all cluster units. As soon as a new TCP session is added to the primary unit session table, that session is synchronized to all cluster units. This synchronization happens as quickly as possible to keep the session tables synchronized. If the primary unit fails, the new primary unit uses its synchronized session table to resume all TCP sessions that were being processed by the former primary unit with only minimal interruption. Under ideal conditions all TCP sessions should be resumed. This is not guaranteed though and under less than ideal conditions some TCP sessions may need to be restarted. More details can be found here: https://help.fortinet.com/fos50hlp/54/Content/FortiOS/fortigate-high-availability-52/HA_operating.htm
        how: |
            The script runs the FortiOS command "get system ha status" to retrieve HA session pickup status information.
        can-with-snmp: false
        can-with-syslog: false
    ha-session-override:
        why: |
            This metric captures the status of the override option for the current cluster unit (enable or disable). When override is disabled a cluster may not always renegotiate when an event occurs that affects primary unit selection. For example, when override is disabled a cluster will not renegotiate when you change a cluster unit device priority or when you add a new cluster unit to a cluster. This is true even if the unit added to the cluster has a higher device priority than any other unit in the cluster. Also, when override is disabled a cluster does not negotiate if the new unit added to the cluster has a failed or disconnected monitored interface. More details can be found here: https://help.fortinet.com/fos50hlp/54/Content/FortiOS/fortigate-high-availability-52/HA_operating.htm
        how: |
            The script runs the FortiOS command "get system ha status" to retrieve HA override status information.
        can-with-snmp: false
        can-with-syslog: false
    ha-config-sync-status:
        why: |
            This setting shows whether or not the configurations of each of the cluster units are synchronized. A configuration that is not synchronized can cause service outage in case of a switchover and should be investigated. More details can be found here: https://help.fortinet.com/fos50hlp/54/Content/FortiOS/fortigate-high-availability-52/HA_operating.htm
        how: |
            The script runs the FortiOS command "get system ha status" to retrieve HA configuration sync status information.
        can-with-snmp: false
        can-with-syslog: false
    ha-heartbeat-link-status:
        why: |
            This setting shows the status of each cluster unit's heartbeat interfaces.  All of the heartbeat interfaces being down will cause a severe network impact. More details can be found here: https://help.fortinet.com/fos50hlp/54/Content/FortiOS/fortigate-high-availability-52/HA_operating.htm
        how: |
            The script runs the FortiOS command "get system ha status" to retrieve HA heartbeat link status information.
        can-with-snmp: true
        can-with-syslog: true
    ha-heartbeat-total-bytes:
        why: |
            This metric captures how much data the heartbeat interfaces have processed.  If all the heartbeat interfaces do not receive/send packets, this needs to be troubleshot since it may have a severe network impact. More details can be found here: https://help.fortinet.com/fos50hlp/54/Content/FortiOS/fortigate-high-availability-52/HA_operating.htm
        how: |
            The script runs the FortiOS command "get system ha status" to retrieve the total number of bytes received via the heartbeat interfaces.
        can-with-snmp: false
        can-with-syslog: false
    mon-interface-link-status:
        why: |
            This setting shows the status of each of the monitored interfaces of the cluster. If a monitored interface on the primary unit fails, the cluster renegotiates to select a new primary unit using the process for Primary unit selection. Because the cluster unit with the failed monitored interface has the lowest monitor priority, a different cluster unit becomes the primary unit. The new primary unit should have fewer link failures. More details can be found here: https://help.fortinet.com/fos50hlp/54/Content/FortiOS/fortigate-high-availability-52/HA_operating.htm
        how: |
            The script runs the FortiOS command "get system ha status" to retrieve HA monitored interface status information.
        can-with-snmp: true
        can-with-syslog: true
    mon-interface-total-bytes:
        why: |
            This setting shows how much data the interfaces monitored by the cluster have processed. If all the monitored interfaces do not receive/send packets, this needs to be troubleshot, since this may have severe network impact. More details can be found here: https://help.fortinet.com/fos50hlp/54/Content/FortiOS/fortigate-high-availability-52/HA_operating.htm
        how: |
            The script runs the FortiOS command "get system ha status" to retrieve the total number of bytes received via the monitored interfaces.
        can-with-snmp: false
        can-with-syslog: false
    ha-heartbeat-int-oper-status-over1:
        why: |
            This metric captures if the HA heartbeat keeps cluster units communicating with each other. It is highly recommended by the vendor to have minimum two interfaces per fortinet firewall as heartbeat interfaces (two links). If all the heartbeat interfaces are down, it will cause a severe network outage. This metric checks to see if there are at least two operational HA heartbeat interfaces per firewall (i.e. two links). More details can be found here: https://help.fortinet.com/fos50hlp/54/Content/FortiOS/fortigate-high-availability-52/HA_failoverHeartbeat.htm
        how: |
            The script runs the FortiOS command "get system ha status" to retrieve HA heartbeat interfaces information.
        can-with-snmp: false
        can-with-syslog: false
    model:
        why: |
            Two or more devices which operate as part of a single cluster must be running on the same hardware.
        how: |
            This script logs into the devices to retrieve the hardware model of the device. Indeni then compares the result to the same script run on other members of the same cluster.
        can-with-snmp: false
        can-with-syslog: false
steps:
-   run:
        type: SSH
        command: get system ha status
    parse:
        type: AWK
        file: get_system_ha_status.parser.1.awk

FortinetHaHeartbeatOperStatusRule

Failed to fetch the data: https://bitbucket.org/indeni/indeni-knowledge/src/master/rules/templatebased/fortinet/FortinetHaHeartbeatOperStatusRule.scala