Cluster has preemption enabled-f5-False

f5
warn
high-availability
false
Cluster has preemption enabled-f5-False
0

#1

Cluster has preemption enabled-f5-False

Vendor: f5

OS: False

Description:
Preemption is generally a bad idea in clustering, although sometimes it is the default setting. indeni will trigger an issue if it’s on.

Remediation Steps:
It is generally best to have preemption disabled. Instead, once this device returns from a crash, you can conduct the failover manually.

How does this work?
This script uses the F5 iControl API to retrieve the traffic group configuration to determine if auto failback is enabled or not.

Why is this important?
Preemption, or auto failback as F5 calls is, is a function in clustering which sets a primary member of the cluster to always strive to be the active member. The trouble with this is that if the active member that is set with preemption on has a critical failure and reboots, the cluster will fail over to the secondary and then immediately fail over back to the primary when it completes the reboot. This can result in another crash and the process would happen again and again in a loop. It is generally a good idea not to have the preemption feature enabled.

Without Indeni how would you find this?
An administrator would have to log in to his devices via the Web interface and verify that the auto failback option is not set. This could also be detected the hard way during an upgrade of a redundant pair if a recently upgraded node takes over prematurely.

f5-rest-mgmt-tm-cm-traffic-group

#! META
name: f5-rest-mgmt-tm-cm-traffic-group
description: Check if auto-failback-enabled has been set to true and set cluster-preemption if it has
type: monitoring
monitoring_interval: 60 minute
requires:
    vendor: "f5"
    product: "load-balancer"
    rest-api: "true"

#! COMMENTS
cluster-preemption-enabled:
    why: |
        Preemption, or auto failback as F5 calls is, is a function in clustering which sets a primary member of the cluster to always strive to be the active member. The trouble with this is that if the active member that is set with preemption on has a critical failure and reboots, the cluster will fail over to the secondary and then immediately fail over back to the primary when it completes the reboot. This can result in another crash and the process would happen again and again in a loop. It is generally a good idea not to have the preemption feature enabled.
    how: |
        This script uses the F5 iControl API to retrieve the traffic group configuration to determine if auto failback is enabled or not.
    without-indeni: |
        An administrator would have to log in to his devices via the Web interface and verify that the auto failback option is not set. This could also be detected the hard way during an upgrade of a redundant pair if a recently upgraded node takes over prematurely.
    can-with-snmp: false
    can-with-syslog: false

#! REMOTE::HTTP
url: /mgmt/tm/cm/traffic-group?$select=fullPath,autoFailbackEnabled
protocol: HTTPS

#! PARSER::JSON

_metrics:
    -
        _groups:
            "$.items[0:]":
                _temp:
                    #Count number of instances of autoFailbackEnabled where it's set to true
                    #This is used later on in the transform section
                    "autoFailbackCount":
                        _count: "[?(@.autoFailbackEnabled == 'true')]"
                _tags:
                    #Metric name
                    "im.name":
                        _constant: "cluster-preemption"
                    "im.dstype.displaytype":
                        _constant: "state"
                    #A tag called name with value of the name of the traffic group in
                    "name":
                        _value: fullPath
        _transform:
            #Check the number of instances of autoFailBackEnabled that was found. If no instances was found set the value of the metric to 0
            _value.double: |
                {
                    if (temp("autoFailbackCount") == 1) { print "1" } else { print "0" }
                }

cross_vendor_cluster_preempt

package com.indeni.server.rules.library

import com.indeni.apidata.time.TimeSpan
import com.indeni.ruleengine.expressions.conditions.Equals
import com.indeni.ruleengine.expressions.core._
import com.indeni.ruleengine.expressions.data.{SelectTagsExpression, SelectTimeSeriesExpression, TimeSeriesExpression}
import com.indeni.server.common.data.conditions.True
import com.indeni.server.rules.library.core.PerDeviceRule
import com.indeni.server.rules.{RuleContext, _}
import com.indeni.server.sensor.models.managementprocess.alerts.dto.AlertSeverity


case class ClusterPreemptionEnabledRule() extends PerDeviceRule with RuleHelper {

  override val metadata: RuleMetadata = RuleMetadata.builder("cross_vendor_cluster_preempt", "Clustered Devices: Cluster has preemption enabled",
    "Preemption is generally a bad idea in clustering, although sometimes it is the default setting. indeni will trigger an issue if it's on.",
    AlertSeverity.WARN).interval(TimeSpan.fromMinutes(5)).build()


  override def expressionTree(context: RuleContext): StatusTreeExpression = {
    val inUseValue = TimeSeriesExpression[Double]("cluster-preemption-enabled").last

    StatusTreeExpression(
      // Which objects to pull (normally, devices)
      SelectTagsExpression(context.metaDao, Set(DeviceKey), True),

      StatusTreeExpression(
        // The time-series we check the test condition against:
        SelectTimeSeriesExpression[Double](context.tsDao, Set("cluster-preemption-enabled"), denseOnly = false),

        // The condition which, if true, we have an issue. Checked against the time-series we've collected
        Equals(
          inUseValue,
          ConstantExpression[Option[Double]](Some(1.0)))
      ).withoutInfo().asCondition()

      // Details of the alert itself
    ).withRootInfo(
      getHeadline(),
      ConstantExpression("This cluster member has preemption enabled. This means that it will have priority over other cluster members. If this device reboots or crashes, it'll try to assume priority in the cluster when it finishes its boot process. This may result in it crashing again, and causing a preemption loop."),
      ConditionalRemediationSteps("It is generally best to have preemption disabled. Instead, once this device returns from a crash, you can conduct the failover manually.",
        ConditionalRemediationSteps.VENDOR_PANOS ->
          """|Palo Alto Networks firewalls have a special way of handling preemption loops, review the following article:
             |<a target="_blank" href="https://live.paloaltonetworks.com/t5/Learning-Articles/Understanding-Preemption-with-the-Configured-Device-Priority-in/ta-p/53398">Understanding Preemption with the Configured Device Priority in HA Active/Passive Mode</a>.""".stripMargin,
        ConditionalRemediationSteps.OS_NXOS ->
          """|FHRP preemption and delays features are not required. The vPC will forward traffic as soon as the links become available. Once a device recovers from a crash or reboot, you can conduct the failover manually.
             |Cisco recommends:
             |1. Configuring the FHRP with the default settings and without preempt when using vPC.
             |2. Make the vPC primary switch the FHRP active switch. This is not intended to improve performance or stability. It does make one switch responsible for the control plane traffic. This is a little easier on the administrator while troubleshooting.""".stripMargin,
        ConditionalRemediationSteps.VENDOR_JUNIPER ->
          """1. Generally, it is recommended to have preemption disabled. Instead, once this device returns from a crash, you can conduct the failover manually.
            |2. If preemption is added to a redundancy group configuration, the device with the high priority in the group can initiate a failover to become a master.
            |3. On the device command line interface execute "request chassis cluster failover node"  or  "request chassis cluster failover redundancy-group"  commands to override the priority setting and preemption.
            |4. Review the following article on Juniper TechLibrary for more information: <a target="_blank" href="https://www.juniper.net/documentation/en_US/junos/topics/reference/command-summary/request-chassis-cluster-failover-node.html">Operational Commands: request chassis cluster failover node</a>""".stripMargin
      )
    )
  }
}