High Memory Usage per Chassis and Blade-juniper-junos

error
junos
health-checks
juniper
High Memory Usage per Chassis and Blade-juniper-junos
0

#1

High Memory Usage per Chassis and Blade-juniper-junos

Vendor: juniper

OS: junos

Description:
Alert when Memory usage is high

Remediation Steps:
Review the load on this thread to see if the memory utilization is valid.
|||1. On the device command line interface execute “show chassis routing-engine” command to check overall routing engine memory usage.
|2. Run “show system processes extensive” command to review the memory allocation status for processes.
|3. Identify the processes which are consuming too much memory.
|4. Consider turning off some processes which are not vital to ensure bigger memory space allocation for other sessions and processes.
|5. Review the following article on Juniper tech support site: Checking Memory Status.

How does this work?
This script and others use the CLI over SSH to retrieve the current status of multiple different memory elements.

Why is this important?
The various memory components of a Juniper JUNOS device are important to track to ensure a smooth operation. This includes the routing engine’s memory element (RE) as well as the variety of data plane elements.

Without Indeni how would you find this?
Some of the memory elements’ status is accessible over SNMP, but many of the memory elements in the data plane are solely accessible over SSH. An administrator would need to write their own scripts to collect this information.

junos-show-chassis-routing-engine

#! META
name: junos-show-chassis-routing-engine
description: JUNOS get routing engine stats (CPU/mem)
type: monitoring
includes_resource_data: true
monitoring_interval: 1 minute
requires:
    vendor: juniper
    os.name: junos
    product: firewall
    high-availability: 
        neq: true

#! COMMENTS
cpu-usage:
    why: |
        The control and data plane CPU utilization of a Juniper JUNOS device are important to track to ensure a smooth operation. A high CPU utilization of the control plane may impact the management interface, while a high CPU utilization in the data plane may impact traffic handling.
    how: |
        This script and others use the CLI over SSH to retrieve the current status of multiple different CPU elements.
    without-indeni: |
        CPU utilization information at both the control and data plane levels is available via SNMP and can be monitored using an SNMP-based tool. An administrator can then define thresholds against this.
    can-with-snmp: true
    can-with-syslog: false
memory-usage:
    why: |
        The various memory components of a Juniper JUNOS device are important to track to ensure a smooth operation. This includes the routing engine's memory element (RE) as well as the variety of data plane elements.
    how: |
        This script and others use the CLI over SSH to retrieve the current status of multiple different memory elements.
    without-indeni: |
        Some of the memory elements' status is accessible over SNMP, but many of the memory elements in the data plane are solely accessible over SSH. An administrator would need to write their own scripts to collect this information.
    can-with-snmp: false
    can-with-syslog: false

#! REMOTE::SSH
show chassis routing-engine | display xml

#! PARSER::XML
_vars:
    root: /rpc-reply//route-engine-information[1]
_metrics:
    -
        _tags:
            "im.name":
                _constant: "cpu-usage"
            "live-config":
                _constant: "true"
            "display-name":
                _constant: "CPU Usage"
            "im.dstype.displayType":
                _constant: "percentage"
            "cpu-id":
                _constant: "RE"
            "cpu-is-avg":
                _constant: "false"
            "resource-metric":
                _constant: "true"
            "im.identity-tags":
                _constant: "cpu-id"
        _temp:
            "cpu_idle":
                _text: ${root}/route-engine/cpu-idle
        _transform:
            _value.double: |
                {
                    idle_cpu = 100 - temp("cpu_idle")
                    print idle_cpu 
                }

# CONTROL PLANE MEMORY
    -
        _tags:
            "im.name":
                _constant: "memory-total-kbytes"
            "live-config":
                _constant: "true"
            "display-name":
                _constant: "Memory Used"
            "im.dstype.displayType":
                _constant: "kilobytes"
            "name":
                _constant: "Control Plane"
            "im.identity-tags":
                _constant: "name"

        _temp:
            "cp_total_mem":
                _text: ${root}/route-engine/memory-control-plane
        _transform:
            _value.double: |
                {
                    cp_total_memory = temp("cp_total_mem") * 1024
                    print cp_total_memory
                }
    -
        _tags:
            "im.name":
                _constant: "memory-free-kbytes"
            "live-config":
                _constant: "true"
            "display-name":
                _constant: "Memory Free"
            "im.dstype.displayType":
                _constant: "kilobytes"
            "name":
                _constant: "Control Plane"
            "im.identity-tags":
                _constant: "name"

        _temp:
            "cp_total_mem":
                _text: ${root}/route-engine/memory-control-plane
            "cp_used_mem":
                _text: ${root}/route-engine/memory-control-plane-used
        _transform:
            _value.double: |
                {
                    cp_free_memory = (temp("cp_total_mem") - temp("cp_used_mem")) * 1024
                    print cp_free_memory 
                }
    -
        _tags:
            "im.name":
                _constant: "memory-usage"
            "live-config":
                _constant: "true"
            "display-name":
                _constant: "Memory Usage"
            "im.dstype.displayType":
                _constant: "percentage"
            "name":
                _constant: "Control Plane"
            "resource-metric":
                _constant: "true"
            "im.identity-tags":
                _constant: "name"
        _temp:
            "cp_mem_usage":
                _text: ${root}/route-engine/memory-control-plane-util
        _transform:
            _value.double: |
                {
                    print temp("cp_mem_usage")
                }

# DATA PLANE MEMORY
    -
        _tags:
            "im.name":
                _constant: "memory-total-kbytes"
            "live-config":
                _constant: "true"
            "display-name":
                _constant: "Memory Used"
            "im.dstype.displayType":
                _constant: "kilobytes"
            "name":
                _constant: "Data Plane"
            "im.identity-tags":
                _constant: "name"

        _temp:
            "dp_total_mem":
                _text: ${root}/route-engine/memory-data-plane
        _transform:
            _value.double: |
                {
                    dp_total_memory = temp("dp_total_mem") * 1024
                    print dp_total_memory
                }
    -
        _tags:
            "im.name":
                _constant: "memory-free-kbytes"
            "live-config":
                _constant: "true"
            "display-name":
                _constant: "Memory Free"
            "im.dstype.displayType":
                _constant: "kilobytes"
            "name":
                _constant: "Data Plane"
            "im.identity-tags":
                _constant: "name"

        _temp:
            "dp_total_mem":
                _text: ${root}/route-engine/memory-data-plane
            "dp_used_mem":
                _text: ${root}/route-engine/memory-data-plane-used
        _transform:
            _value.double: |
                {
                    dataplane_memory = (temp("dp_total_mem") - temp("dp_used_mem")) * 1024
                    print dataplane_memory
                }
    -
        _tags:
            "im.name":
                _constant: "memory-usage"
            "live-config":
                _constant: "true"
            "display-name":
                _constant: "Memory Usage"
            "im.dstype.displayType":
                _constant: "percentage"
            "name":
                _constant: "Data Plane"
            "im.identity-tags":
                _constant: "name"
        _temp:
            "dp_mem_usage":
                _text: ${root}/route-engine/memory-data-plane-util
        _transform:
            _value.double: |
                {
                    print temp("dp_mem_usage")
                }

high_per_chassis_blade_memory_usage

package com.indeni.server.rules.library

import com.indeni.ruleengine.Scope.{Scope, ScopeValueHelper}
import com.indeni.ruleengine.expressions.Expression
import com.indeni.ruleengine.expressions.conditions.GreaterThanOrEqual
import com.indeni.ruleengine.expressions.core.{StatusTreeExpression, _}
import com.indeni.ruleengine.expressions.data.{SelectTagsExpression, SelectTimeSeriesExpression, TimeSeriesExpression}
import com.indeni.ruleengine.expressions.math.AverageExpression
import com.indeni.ruleengine.expressions.scope.ScopableExpression
import com.indeni.server.common.ParameterValue
import com.indeni.server.common.data.conditions.{Equals, True}
import com.indeni.server.params.ParameterDefinition
import com.indeni.server.params.ParameterDefinition.UIType
import com.indeni.server.rules.RuleCategory.RuleCategory
import com.indeni.server.rules._
import com.indeni.server.rules.config.expressions.DynamicParameterExpression
import com.indeni.server.rules.library.core.PerDeviceRule
import com.indeni.server.sensor.models.managementprocess.alerts.dto.AlertSeverity


/**
  * Created by amir on 04/02/2016.
  */
case class HighPerChassisBladeMemoryUsageRule() extends PerDeviceRule {

  private val highThresholdParameterName: String = "High_Threshold_of_Memory_Usage"

  private val highThresholdParameter = new ParameterDefinition(highThresholdParameterName,
    "",
    "High Threshold of Memory Usage",
    "What is the threshold for the memory usage for which once it is crossed an issue will be triggered.",
    UIType.DOUBLE,
    new ParameterValue((85.0).asInstanceOf[Object])
  )

  override val metadata: RuleMetadata =
    RuleMetadata.builder(
      "high_per_chassis_blade_memory_usage",
      "High Memory Usage per Chassis and Blade",
      "Alert when Memory usage is high",
      AlertSeverity.ERROR,
      Set(RuleCategory.HealthChecks)).configParameter(highThresholdParameter).build()

  override def expressionTree(context: RuleContext): StatusTreeExpression = {

    val usagePercentage = AverageExpression(TimeSeriesExpression[Double]("memory-usage"))
    val usagePercentageThreshold = DynamicParameterExpression.withConstantDefault(highThresholdParameter.getName, highThresholdParameter.getDefaultValue.asDouble.toDouble).noneable
    val isUsagePercentageAboveThreshold = GreaterThanOrEqual(usagePercentage, usagePercentageThreshold)

    val mountSpaceFailDescription = new ScopableExpression[String] {
      override protected def evalWithScope(time: Long, scope: Scope): String =
        "Memory usage (" + usagePercentage.eval(time) + "%) above threshold (" + usagePercentageThreshold.eval(time) + "%) " +
          "for chassis: " + scope.getVisible("Chassis").get + ", blade: " + scope.getVisible("Blade").get

      override def args: Set[Expression[_]] = Set(usagePercentage, usagePercentageThreshold)
    }
    val mountSpaceFailHeadline = new ScopableExpression[String] {
      override protected def evalWithScope(time: Long, scope: Scope): String = "chassis: " + scope.getVisible("Chassis").get + ", blade: " + scope.getVisible("Blade").get

      override def args: Set[Expression[_]] = Set()
    }
    val tsQuery = SelectTimeSeriesExpression[Double](context.tsDao, Set("memory-usage"), denseOnly = false)
    val forTsCondition = StatusTreeExpression(tsQuery, isUsagePercentageAboveThreshold).withSecondaryInfo(
      mountSpaceFailHeadline, mountSpaceFailDescription, title = "Problematic Blades"
    ).asCondition()

    val chassisBladeQuery = SelectTagsExpression(context.tsDao, Set("Chassis", "Blade"), True)
    val highMemoryUsagePerDevicePerChassisBladeLogic = StatusTreeExpression(chassisBladeQuery, forTsCondition)
      .withoutInfo().asCondition()

    val headline = ConstantExpression("High memory usage")
    val description = ConstantExpression("The memory usage in the operating system is higher than the high threshold.")
    val remediation = ConditionalRemediationSteps("Review the load on this thread to see if the memory utilization is valid.",
      ConditionalRemediationSteps.OS_NXOS ->
        """1. Check from the Indeni  the memory utilization history graph for this device an review the pattern. Correlate any change to the pattern with any configuration change.
          |2. The next NX-OS commands output can inform whether the platform memory utilization is normal or un-expected:
          | a. "show system resources"
          | b. "show processes memory"
          |3. For more information, please review: <a target="_blank" href="http://docwiki.cisco.com/wiki/Cisco_Nexus_7000_Series_NX-OS_Troubleshooting_Guide_--_Troubleshooting_Memory">Troubleshooting Guide For High Memory Utilization</a>.""".stripMargin,
      ConditionalRemediationSteps.VENDOR_JUNIPER ->
        """|1. On the device command line interface execute "show chassis routing-engine" command to check overall routing engine memory usage.
           |2. Run "show system processes extensive" command to review the memory allocation status for processes.
           |3. Identify the processes which are consuming too much memory.
           |4. Consider turning off some processes which are not vital to ensure bigger memory space allocation for other sessions and processes.
           |5. Review the following article on Juniper tech support site: <a target="_blank" href="https://www.juniper.net/documentation/en_US/release-independent/nce/topics/task/operational/security-policy-memory-testing.html">Checking Memory Status</a>.""".stripMargin
    )

    val devicesFilter = Equals("model", "CheckPoint61k")
    val devicesQuery = SelectTagsExpression(context.metaDao, Set(DeviceKey), devicesFilter)

    StatusTreeExpression(devicesQuery, highMemoryUsagePerDevicePerChassisBladeLogic).withRootInfo(
      headline, description, remediation
    )
  }
}