Cluster configuration not synced-juniper-junos

error
high-availability
junos
juniper
Cluster configuration not synced-juniper-junos
0

#1

Cluster configuration not synced-juniper-junos

Vendor: juniper

OS: junos

Description:
For devices that support full configuration synchronization, indeni will trigger an issue if the configuration is out of sync.

Remediation Steps:
Log into the device and synchronize the configuration across the cluster.
|||1. Run “show chassis cluster information configuration-synchronization” command to review configuration synchronization status of a chassis cluster (Junos OS Release 12.1X47-D10 or later).
|2. Check the activation and last sync status if these options are enabled.
|3. Check the link connectivity.
|4. Check the cluster configuration for synchronization.
|5. Review this article on Juniper TechLibrary: Operational Commands
|6. Contact Juniper Networks Technical Assistance Center (JTAC) if further assistance is required.

How does this work?
The script runs the “show chassis cluster information configuration synchronization” command via SSH and retrieves the configuration synchronization status.

Why is this important?
The failure of configuration synchronization will cause misbehaviors when the cluster failover occurs. For examples, an interfacer which should be enabled is still in disabled state, the latest configuration fails to apply to the new active node, and etc.

Without Indeni how would you find this?
An administrator could log on to the device to manually run this command to get configuration synchronization status.

junos-show-chassis-cluster-information-configuration-synchronization

#! META
name: junos-show-chassis-cluster-information-configuration-synchronization
description: Get chassis cluster configuration synchronization status
type: monitoring
monitoring_interval: 10 minute
requires:
    vendor: juniper
    os.name: junos
    product: firewall
    high-availability: true

#! COMMENTS
cluster-config-synced:
    why: |
        The failure of configuration synchronization will cause misbehaviors when the cluster failover occurs. For examples, an interfacer which should be enabled is still in disabled state, the latest configuration fails to apply to the new active node, and etc. 
    how: |
        The script runs the "show chassis cluster information configuration synchronization" command via SSH and retrieves the configuration synchronization status.
    without-indeni: |
        An administrator could log on to the device to manually run this command to get configuration synchronization status.
    can-with-snmp:
    can-with-syslog:
    vendor-provided-management: |
        The configuration synchronization status can be retrieved via the command line.

#! REMOTE::SSH
show chassis hardware node local | match node
show chassis cluster information configuration-synchronization

#! PARSER::AWK
BEGIN {
    node0 = 0
    node1 = 0
    feature_supported = 1
    node_sync_idx = 0
}

#error: syntax error, expecting <command>: informationconfiguration-synchronization
#the firmware is below 12.1X47
/^(errors:\s+syntax\s+error)/ {
    feature_supported = 0
}

#node0:
/^node0/ {
    node0++ 
}

#        Last sync result: Succeeded
#        Last sync result: Not needed 
/(Last sync result:)/ {
    split($0, get_status, ": ")
    if ( get_status[2] == "Succeeded" || get_status[2] == "Not needed" ){ 
        SyncStatus = 1
    } else {
        SyncStatus = 0
    }
    node_sync_status[node_sync_idx] = SyncStatus 
    node_sync_idx++
}

END {
    if ( feature_supported == 1 ) {
        if ( node0 == 2) {
            node_sync_idx = 0
            cluster_node["node"] = "node0"
        } else {
            node_sync_idx = 1
            cluster_node["node"] = "node1"
        }
        writeDoubleMetric("cluster-config-synced", cluster_node, "gauge", 60, node_sync_status[node_sync_idx]) 
    }
}

junos-show-chassis-cluster-status

#! META
name: junos-show-chassis-cluster-status
description: JUNOS collect clustering status
type: monitoring
monitoring_interval: 1 minute
requires:
    vendor: juniper
    os.name: junos
    product: firewall
    high-availability: true

#! COMMENTS
cluster-member-active:
    why: |
        Tracking the state of a cluster member is important. If a cluster member which used to be the active member of the cluster no longer is, it may be the result of an issue. In some cases, it is due to maintenance work (and so was anticipated), but in others it may be due to a failure in the firewall or another component in the network.
    how: |
        This script logs into the Juniper JUNOS-based device using SSH and retrieves the output of the "show chassis cluster status" command. The output includes the status of all redundancy groups across the cluster.
    without-indeni: |
        The administrator has to run the "show chassis cluster status" on the device to find whether the cluster member is active or not. 
    can-with-snmp: true
    can-with-syslog: true
cluster-state:
    why: |
        Tracking the state of a cluster is important. If a cluster which used to be healthy no longer is, it may be the result of an issue. In some cases, it is due to maintenance work (and so was anticipated), but in others it may be due to a failure in the members of the cluster or another component in the network.
    how: |
        This script logs into the Juniper JUNOS-based device using SSH and retrieves the output of the "show chassis cluster status" command. The output includes the status of all redundancy groups across the cluster.
    without-indeni: |
        The administrator has to run the "show chassis cluster status" on the device to find whether neither of cluster nodes is in primary state. 
    can-with-snmp: true
    can-with-syslog: true
cluster-preemption-enabled:
    why: |
        Preemption is a function in clustering which sets a primary member of the cluster to always strive to be the active member. The trouble with this is that if the active member that is set with preemption on has a critical failure and reboots, the cluster will fail over to the secondary and then immediately fail over back to the primary when it completes the reboot. This can result in another crash and the process would happen again and again in a loop.
    how: |
        This script logs into the Juniper JUNOS-based device using SSH and retrieves the output of the "show chassis cluster status" command. The output includes the status of all redundancy groups across the cluster.
    without-indeni: |
        The administrator has to run the "show chassis cluster status" on the device to find whether preemption is enabled and correctly configured if one of the nodes is expected to be always primary node. 
    can-with-snmp: false
    can-with-syslog: false

#! REMOTE::SSH
show chassis hardware node local | match node
show chassis cluster status

#! PARSER::AWK
BEGIN {
    RG = 0
}

#Node   Priority Status         Preempt Manual   Monitor-failures
/^Node.*Priority*/ {
    getColumns(trim($0), "[ \t]+", columns)
}

#Redundancy group: 0 , Failover count: 1
/^Redundancy group/ {
    regroup = $3
    group_state[regroup] = 0
    group_preempt [regroup] = 0
    RG = 1
    node_idx = 0
    cluster_tags["name"] = "redundancy group "regroup
}

#node0  1        primary        no      no       None           
/^node.*/ {
    if (RG == 0) {
       node_local = $1
       if (node_local ~ /node0/){
           myself = 0
       } else {
           myself = 1
       }
    } else {
        node = $getColId(columns, "Node")
        if (node == "node0") {
            node_idx == 0
        }else {
            node_idx = 1
        }

        statusDesc = $getColId(columns, "Status")
        monitor_failures = $getColId(columns, "Monitor-failures")

        if ( node_idx == myself ) {
            if ((statusDesc == "primary" && monitor_failures == "None") || (statusDesc == "secondary" && monitor_failures == "None")) {
                node_status[node_idx] = 1
            } else {
                node_status[node_idx] = 0
            }
            writeDoubleMetricWithLiveConfig("cluster-member-active", cluster_tags, "gauge", "60", node_status[myself], "Cluster Member Active", "state", "name")
        }
        node_idx++

        if (statusDesc == "primary" && monitor_failures == "None") {
            # either of nodes is primary, the state for this redundancy group is up
            group_state[regroup] = 1 
        }

        preempt = $getColId(columns, "Preempt")
        if (preempt == "yes") {
            group_preempt[regroup] = 1
        }
    }
}

END {
        for (regroup in group_state) {
            cluster_tags["name"] = "redundancy group "regroup
            writeDoubleMetricWithLiveConfig("cluster-state", cluster_tags, "gauge", "60", group_state[regroup], "Cluster State", "state", "name")
            writeDoubleMetricWithLiveConfig("cluster-preemption-enabled", cluster_tags, "gauge", "60", group_preempt[regroup], "Cluster Preemption Enabled", "boolean", "name")
        }
}

cluster_config_unsynced

package com.indeni.server.rules.library

import com.indeni.ruleengine.expressions.conditions.{And, EndsWithRepetition, Equals}
import com.indeni.ruleengine.expressions.core._
import com.indeni.ruleengine.expressions.data.{SelectTagsExpression, SelectTimeSeriesExpression, TimeSeriesExpression}
import com.indeni.server.common.data.conditions.True
import com.indeni.server.rules.library.core.PerDeviceRule
import com.indeni.server.rules.{RuleContext, _}
import com.indeni.server.sensor.models.managementprocess.alerts.dto.AlertSeverity


case class ClusterConfigNotSyncedRule() extends PerDeviceRule with RuleHelper {

  override val metadata: RuleMetadata = RuleMetadata.builder("cluster_config_unsynced", "Clustered Devices: Cluster configuration not synced",
    "For devices that support full configuration synchronization, indeni will trigger an issue if the configuration is out of sync.", AlertSeverity.ERROR).build()

  override def expressionTree(context: RuleContext): StatusTreeExpression = {
    val tsToTestAgainst = TimeSeriesExpression[Double]("cluster-config-synced")
    val activeMemberValue = TimeSeriesExpression[Double]("cluster-member-active").last

    StatusTreeExpression(
      // Which objects to pull (normally, devices)
      SelectTagsExpression(context.metaDao, Set(DeviceKey), True),

      StatusTreeExpression(
        // The time-series we check the test condition against:
        SelectTimeSeriesExpression[Double](context.tsDao, Set("cluster-config-synced", "cluster-member-active"), denseOnly = false),

        // The condition which, if true, we have an issue. Checked against the time-series we've collected
        And(
          EndsWithRepetition(tsToTestAgainst, ConstantExpression(0.0), 3),
          Equals(activeMemberValue, ConstantExpression[Option[Double]](Some(1.0)))
        )
      ).withoutInfo().asCondition()
    ).withRootInfo(
      getHeadline(),
      ConstantExpression("The configuration has been changed on this device, but has not yet been synced to other members of the cluster. This may result in an unexpected behavior of other cluster members should this member go down."),
      ConditionalRemediationSteps("Log into the device and synchronize the configuration across the cluster.",
        ConditionalRemediationSteps.OS_NXOS ->
          """|1. Login to the device to review the FHRP configuration across the vPC cluster if it is configured.
             |2. Execute the "show hsrp brief" command to check the HSRP state and configuration to the cluster.
             |3. Execute the “show vrrp detail” command to check the VRRP state and configuration to the cluster.
             |4. Log into the device and synchronize the configuration across the vPC peer switches by reviewing  the “show run vpc” command output from both peers.
             |5. Execute the “show vpc consistency-parameters” command and review the output.  Ensure that type 1 & 2 vPC consistency parameters match. If they do not match, then vPC is suspended. Items that are type 2 do not have to match on both Nexus 5000 switches for the vPC to be operational.
             |6. Check that there are not unsaved configuration changes by running the “show running-config diff” NX-OS command.
             |7. Log into both peers and save the configuration with the "copy running-config startup-config" NX-OS command.""".stripMargin,
        ConditionalRemediationSteps.VENDOR_JUNIPER ->
          """|1. Run "show chassis cluster information configuration-synchronization" command to review configuration synchronization status of a chassis cluster (Junos OS Release 12.1X47-D10 or later).
             |2. Check the activation and last sync status if these options are enabled.
             |3. Check the link connectivity.
             |4. Check the cluster configuration for synchronization.
             |5. Review this article on Juniper TechLibrary: <a target="_blank" href="https://www.juniper.net/documentation/en_US/junos/topics/reference/command-summary/show-chassis-cluster-information-detail-config-sync.html">Operational Commands</a>
             |6. Contact Juniper Networks Technical Assistance Center (JTAC) if further assistance is required.""".stripMargin
      )
    )
  }
}


pinned #2