Cluster configuration not synced-juniper-junos

discobot · November 22, 2018, 1:42pm

Cluster configuration not synced-juniper-junos

Vendor: juniper

OS: junos

Description:
For devices that support full configuration synchronization, indeni will trigger an issue if the configuration is out of sync.

Remediation Steps:
Log into the device and synchronize the configuration across the cluster.
|||1. Run “show chassis cluster information configuration-synchronization” command to review configuration synchronization status of a chassis cluster (Junos OS Release 12.1X47-D10 or later).
|2. Check the activation and last sync status if these options are enabled.
|3. Check the link connectivity.
|4. Check the cluster configuration for synchronization.
|5. Review this article on Juniper TechLibrary: Operational Commands
|6. Contact Juniper Networks Technical Assistance Center (JTAC) if further assistance is required.

How does this work?
The script runs the “show chassis cluster information configuration synchronization” command via SSH and retrieves the configuration synchronization status.

Why is this important?
The failure of configuration synchronization will cause misbehaviors when the cluster failover occurs. For examples, an interfacer which should be enabled is still in disabled state, the latest configuration fails to apply to the new active node, and etc.

Without Indeni how would you find this?
An administrator could log on to the device to manually run this command to get configuration synchronization status.

junos-show-chassis-cluster-information-configuration-synchronization

name: junos-show-chassis-cluster-information-configuration-synchronization
description: Get chassis cluster configuration synchronization status
type: monitoring
monitoring_interval: 10 minute
requires:
    vendor: juniper
    os.name: junos
    product: firewall
    high-availability: true
comments:
    cluster-config-synced:
        why: "The failure of configuration synchronization will cause misbehaviors\
            \ when the cluster failover occurs. For examples, an interfacer which\
            \ should be enabled is still in disabled state, the latest configuration\
            \ fails to apply to the new active node, and etc. \n"
        how: |
            The script runs the "show chassis cluster information configuration synchronization" command via SSH and retrieves the configuration synchronization status.
        can-with-snmp: null
        can-with-syslog: null
steps:
-   run:
        type: SSH
        file: show-chassis-cluster-information-configuration-synchronization.remote.1.bash
    parse:
        type: AWK
        file: show-chassis-cluster-information-configuration-synchronization.parser.1.awk

junos-show-chassis-cluster-status

name: junos-show-chassis-cluster-status
description: JUNOS collect clustering status
type: monitoring
monitoring_interval: 1 minute
requires:
    vendor: juniper
    os.name: junos
    product: firewall
    high-availability: true
comments:
    cluster-member-active:
        why: |
            Tracking the state of a cluster member is important. If a cluster member which used to be the active member of the cluster no longer is, it may be the result of an issue. In some cases, it is due to maintenance work (and so was anticipated), but in others it may be due to a failure in the firewall or another component in the network.
        how: |
            This script logs into the Juniper JUNOS-based device using SSH and retrieves the output of the "show chassis cluster status" command. The output includes the status of all redundancy groups across the cluster.
        can-with-snmp: true
        can-with-syslog: true
    cluster-state:
        why: |
            Tracking the state of a cluster is important. If a cluster which used to be healthy no longer is, it may be the result of an issue. In some cases, it is due to maintenance work (and so was anticipated), but in others it may be due to a failure in the members of the cluster or another component in the network.
        how: |
            This script logs into the Juniper JUNOS-based device using SSH and retrieves the output of the "show chassis cluster status" command. The output includes the status of all redundancy groups across the cluster.
        can-with-snmp: true
        can-with-syslog: true
    cluster-preemption-enabled:
        why: |
            Preemption is a function in clustering which sets a primary member of the cluster to always strive to be the active member. The trouble with this is that if the active member that is set with preemption on has a critical failure and reboots, the cluster will fail over to the secondary and then immediately fail over back to the primary when it completes the reboot. This can result in another crash and the process would happen again and again in a loop.
        how: |
            This script logs into the Juniper JUNOS-based device using SSH and retrieves the output of the "show chassis cluster status" command. The output includes the status of all redundancy groups across the cluster.
        can-with-snmp: false
        can-with-syslog: false
steps:
-   run:
        type: SSH
        file: show-chassis-cluster-status.remote.1.bash
    parse:
        type: AWK
        file: show-chassis-cluster-status.parser.1.awk

cluster_config_unsynced

package com.indeni.server.rules.library.core
import com.indeni.ruleengine.expressions.conditions.{And, EndsWithRepetition, Equals}
import com.indeni.ruleengine.expressions.core.{ConstantExpression, StatusTreeExpression}
import com.indeni.ruleengine.expressions.data.{SelectTagsExpression, SelectTimeSeriesExpression, TimeSeriesExpression}
import com.indeni.server.common.data.conditions.True
import com.indeni.server.rules.library.{ConditionalRemediationSteps, PerDeviceRule, RuleHelper}
import com.indeni.server.rules.{DeviceCategory, DeviceKey, RemediationStepCondition, RuleCategory, RuleContext, RuleMetadata}
import com.indeni.server.sensor.models.managementprocess.alerts.dto.AlertSeverity

case class ClusterConfigNotSyncedRule() extends PerDeviceRule with RuleHelper {

  override val metadata: RuleMetadata = RuleMetadata.builder("cluster_config_unsynced", "Cluster configuration not synced",
    "For devices that support full configuration synchronization, indeni will trigger an issue if the configuration is out of sync.", AlertSeverity.ERROR,
    categories = Set(RuleCategory.HighAvailability), deviceCategory = DeviceCategory.ClusteredDevices).build()

  override def expressionTree(context: RuleContext): StatusTreeExpression = {
    val tsToTestAgainst = TimeSeriesExpression[Double]("cluster-config-synced")
    val activeMemberValue = TimeSeriesExpression[Double]("cluster-member-active").last

    StatusTreeExpression(
      // Which objects to pull (normally, devices)
      SelectTagsExpression(context.metaDao, Set(DeviceKey), True),

      StatusTreeExpression(
        // The time-series we check the test condition against:
        SelectTimeSeriesExpression[Double](context.tsDao, Set("cluster-config-synced", "cluster-member-active"), denseOnly = false),

        // The condition which, if true, we have an issue. Checked against the time-series we've collected
        And(
          EndsWithRepetition(tsToTestAgainst, ConstantExpression(0.0), 3),
          Equals(activeMemberValue, ConstantExpression[Option[Double]](Some(1.0)))
        )
      ).withoutInfo().asCondition()
    ).withRootInfo(
      getHeadline(),
      ConstantExpression("The configuration has been changed on this device, but has not yet been synced to other members of the cluster. This may result in an unexpected behavior of other cluster members should this member go down."),
      ConditionalRemediationSteps("Log into the device and synchronize the configuration across the cluster.",
        RemediationStepCondition.VENDOR_CISCO ->
          """|1. Login to the device to review the FHRP configuration across the vPC cluster if it is configured.
             |2. Execute the "show hsrp brief" command to check the HSRP state and configuration to the cluster.
             |3. Execute the “show vrrp detail” command to check the VRRP state and configuration to the cluster.
             |4. Log into the device and synchronize the configuration across the vPC peer switches by reviewing  the “show run vpc” command output from both peers.
             |5. Execute the “show vpc consistency-parameters” command and review the output.  Ensure that type 1 & 2 vPC consistency parameters match. If they do not match, then vPC is suspended. Items that are type 2 do not have to match on both Nexus 5000 switches for the vPC to be operational.
             |6. Check that there are not unsaved configuration changes by running the “show running-config diff” NX-OS command.
             |7. Log into both peers and save the configuration with the "copy running-config startup-config" NX-OS command.""".stripMargin,
        RemediationStepCondition.VENDOR_JUNIPER ->
          """|1. Run "show chassis cluster information configuration-synchronization" command to review configuration synchronization status of a chassis cluster (Junos OS Release 12.1X47-D10 or later).
             |2. Check the activation and last sync status if these options are enabled.
             |3. Check the link connectivity.
             |4. Check the cluster configuration for synchronization.
             |5. Review this article on Juniper TechLibrary: <a target="_blank" href="https://www.juniper.net/documentation/en_US/junos/topics/reference/command-summary/show-chassis-cluster-information-detail-config-sync.html">Operational Commands</a>
             |6. Contact Juniper Networks Technical Assistance Center (JTAC) if further assistance is required.""".stripMargin
      )
    )
  }
}

Kyle · January 11, 2019, 6:23pm

Kyle · July 24, 2019, 10:08pm

Kyle · July 24, 2019, 10:08pm