High memory usage-juniper-junos

error
health-checks
junos
juniper
High memory usage-juniper-junos
0

#1

High memory usage-juniper-junos

Vendor: juniper

OS: junos

Description:
Indeni will alert if the memory utilization of a device is above a high threshold. If the device has multiple memory elements, each will be inspected separately and alert for.

Remediation Steps:
Determine the cause for the high memory usage of the listed elements.

How does this work?
This script and others use the CLI over SSH to retrieve the current status of multiple different memory elements.

Why is this important?
The various memory components of a Juniper JUNOS device are important to track to ensure a smooth operation. This includes the routing engine’s memory element (RE) as well as the variety of data plane elements.

Without Indeni how would you find this?
Some of the memory elements’ status is accessible over SNMP, but many of the memory elements in the data plane are solely accessible over SSH. An administrator would need to write their own scripts to collect this information.

junos-show-chassis-routing-engine

#! META
name: junos-show-chassis-routing-engine
description: JUNOS get routing engine stats (CPU/mem)
type: monitoring
includes_resource_data: true
monitoring_interval: 1 minute
requires:
    vendor: juniper
    os.name: junos
    product: firewall
    high-availability: 
        neq: true

#! COMMENTS
cpu-usage:
    why: |
        The control and data plane CPU utilization of a Juniper JUNOS device are important to track to ensure a smooth operation. A high CPU utilization of the control plane may impact the management interface, while a high CPU utilization in the data plane may impact traffic handling.
    how: |
        This script and others use the CLI over SSH to retrieve the current status of multiple different CPU elements.
    without-indeni: |
        CPU utilization information at both the control and data plane levels is available via SNMP and can be monitored using an SNMP-based tool. An administrator can then define thresholds against this.
    can-with-snmp: true
    can-with-syslog: false
memory-usage:
    why: |
        The various memory components of a Juniper JUNOS device are important to track to ensure a smooth operation. This includes the routing engine's memory element (RE) as well as the variety of data plane elements.
    how: |
        This script and others use the CLI over SSH to retrieve the current status of multiple different memory elements.
    without-indeni: |
        Some of the memory elements' status is accessible over SNMP, but many of the memory elements in the data plane are solely accessible over SSH. An administrator would need to write their own scripts to collect this information.
    can-with-snmp: false
    can-with-syslog: false

#! REMOTE::SSH
show chassis routing-engine | display xml

#! PARSER::XML
_vars:
    root: /rpc-reply//route-engine-information[1]
_metrics:
    -
        _tags:
            "im.name":
                _constant: "cpu-usage"
            "live-config":
                _constant: "true"
            "display-name":
                _constant: "CPU Usage"
            "im.dstype.displayType":
                _constant: "percentage"
            "cpu-id":
                _constant: "RE"
            "cpu-is-avg":
                _constant: "false"
            "resource-metric":
                _constant: "true"
            "im.identity-tags":
                _constant: "cpu-id"
        _temp:
            "cpu_idle":
                _text: ${root}/route-engine/cpu-idle
        _transform:
            _value.double: |
                {
                    idle_cpu = 100 - temp("cpu_idle")
                    print idle_cpu 
                }

# CONTROL PLANE MEMORY
    -
        _tags:
            "im.name":
                _constant: "memory-total-kbytes"
            "live-config":
                _constant: "true"
            "display-name":
                _constant: "Memory Used"
            "im.dstype.displayType":
                _constant: "kilobytes"
            "name":
                _constant: "Control Plane"
            "im.identity-tags":
                _constant: "name"

        _temp:
            "cp_total_mem":
                _text: ${root}/route-engine/memory-control-plane
        _transform:
            _value.double: |
                {
                    cp_total_memory = temp("cp_total_mem") * 1024
                    print cp_total_memory
                }
    -
        _tags:
            "im.name":
                _constant: "memory-free-kbytes"
            "live-config":
                _constant: "true"
            "display-name":
                _constant: "Memory Free"
            "im.dstype.displayType":
                _constant: "kilobytes"
            "name":
                _constant: "Control Plane"
            "im.identity-tags":
                _constant: "name"

        _temp:
            "cp_total_mem":
                _text: ${root}/route-engine/memory-control-plane
            "cp_used_mem":
                _text: ${root}/route-engine/memory-control-plane-used
        _transform:
            _value.double: |
                {
                    cp_free_memory = (temp("cp_total_mem") - temp("cp_used_mem")) * 1024
                    print cp_free_memory 
                }
    -
        _tags:
            "im.name":
                _constant: "memory-usage"
            "live-config":
                _constant: "true"
            "display-name":
                _constant: "Memory Usage"
            "im.dstype.displayType":
                _constant: "percentage"
            "name":
                _constant: "Control Plane"
            "resource-metric":
                _constant: "true"
            "im.identity-tags":
                _constant: "name"
        _temp:
            "cp_mem_usage":
                _text: ${root}/route-engine/memory-control-plane-util
        _transform:
            _value.double: |
                {
                    print temp("cp_mem_usage")
                }

# DATA PLANE MEMORY
    -
        _tags:
            "im.name":
                _constant: "memory-total-kbytes"
            "live-config":
                _constant: "true"
            "display-name":
                _constant: "Memory Used"
            "im.dstype.displayType":
                _constant: "kilobytes"
            "name":
                _constant: "Data Plane"
            "im.identity-tags":
                _constant: "name"

        _temp:
            "dp_total_mem":
                _text: ${root}/route-engine/memory-data-plane
        _transform:
            _value.double: |
                {
                    dp_total_memory = temp("dp_total_mem") * 1024
                    print dp_total_memory
                }
    -
        _tags:
            "im.name":
                _constant: "memory-free-kbytes"
            "live-config":
                _constant: "true"
            "display-name":
                _constant: "Memory Free"
            "im.dstype.displayType":
                _constant: "kilobytes"
            "name":
                _constant: "Data Plane"
            "im.identity-tags":
                _constant: "name"

        _temp:
            "dp_total_mem":
                _text: ${root}/route-engine/memory-data-plane
            "dp_used_mem":
                _text: ${root}/route-engine/memory-data-plane-used
        _transform:
            _value.double: |
                {
                    dataplane_memory = (temp("dp_total_mem") - temp("dp_used_mem")) * 1024
                    print dataplane_memory
                }
    -
        _tags:
            "im.name":
                _constant: "memory-usage"
            "live-config":
                _constant: "true"
            "display-name":
                _constant: "Memory Usage"
            "im.dstype.displayType":
                _constant: "percentage"
            "name":
                _constant: "Data Plane"
            "im.identity-tags":
                _constant: "name"
        _temp:
            "dp_mem_usage":
                _text: ${root}/route-engine/memory-data-plane-util
        _transform:
            _value.double: |
                {
                    print temp("dp_mem_usage")
                }

cross_vendor_high_memory_usage

package com.indeni.server.rules.library.templatebased.crossvendor

import com.indeni.server.rules.RuleContext
import com.indeni.server.rules.library.{ConditionalRemediationSteps, NearingCapacityWithItemsTemplateRule}

/**
  *
  */
case class cross_vendor_high_memory_usage() extends NearingCapacityWithItemsTemplateRule(
  ruleName = "cross_vendor_high_memory_usage",
  ruleFriendlyName = "All Devices: High memory usage",
  ruleDescription = "Indeni will alert if the memory utilization of a device is above a high threshold. If the device has multiple memory elements, each will be inspected separately and alert for.",
  usageMetricName = "memory-usage",
  applicableMetricTag = "name",
  threshold = 92.0,
  alertDescription = "Some memory elements are nearing their maximum capacity.",
  alertItemDescriptionFormat = "Current memory utilization is: %.0f%%",
  baseRemediationText = "Determine the cause for the high memory usage of the listed elements.",
  alertItemsHeader = "Memory Elements Affected",
  itemsToIgnore = Set("^vCMP host - (swap|linux).*".r, "^PA firewall management plane.*".r))(
  ConditionalRemediationSteps.VENDOR_CP ->
    """
      |Consider reading https://supportcenter.checkpoint.com/supportcenter/portal?eventSubmit_doGoviewsolutiondetails=&solutionid=sk33781#MEMORY
      |
      |Note: In trying to understand this alert, you can use the Linux "free" command to view memory utilization. The output of this command can be confusing. To be sure you correctly understand the output, see: https://serverfault.com/questions/85470/meaning-of-the-buffers-cache-line-in-the-output-of-free
      |
      |Also note that Linux has recently changed the "free" output format to make it more intuitive. This change was made in procps 3.3.10 ("procps" is the group of utilities which includes "free"). Use "ps -V" in expert mode to see your version of procps. See also: https://askubuntu.com/questions/770108/what-do-the-changes-in-free-output-from-14-04-to-16-04-mean""".stripMargin,
  ConditionalRemediationSteps.VENDOR_PANOS -> "Consider opening a support ticket with Palo Alto Networks.",
  ConditionalRemediationSteps.VENDOR_CISCO -> "Review http://docwiki.cisco.com/wiki/Cisco_Nexus_7000_Series_NX-OS_Troubleshooting_Guide_--_Troubleshooting_Memory",
  ConditionalRemediationSteps.OS_NXOS ->
    """|
      |1. Check from the Indeni  the memory utilization history graph for this device and review the pattern. Correlate any change to the pattern with any configuration change
      |2. The next NX-OS commands output can inform whether the platform memory utilization is normal or un-expected:
      |• show system resources
      |• show processes memory.
      |3. For more information please review the next  troubleshooting guide for high memory utilization: http://docwiki.cisco.com/wiki/Cisco_Nexus_7000_Series_NX-OS_Troubleshooting_Guide_--_Troubleshooting_Memory""".stripMargin,
  ConditionalRemediationSteps.VENDOR_FORTINET ->
    """
      |1. Login via https to the Fortinet firewall and go to menu System > Dashboard > Status. Look at the system resources widget to review the current Memory utilization graph.
      |2. Login via ssh to the Fortinet firewall and run the FortiOS command "diagnose hardware sysinfo memory" which provides information about current memory usage.
      |3. Check if the unit is dealing with high traffic volume or with connection pool limits.
      |4. Check if the Fortinet firewall is in "conserve mode" state by running the FortiOS command "diagnose hardware sysinfo conserve". For more information review the following Fortinet guides:
      |- http://kb.fortinet.com/kb/viewContent.do?externalId=FD33103
      |- http://kb.fortinet.com/kb/viewContent.do?externalId=11076
      |5. If the problem persists, contact Fortinet Technical support at https://support.fortinet.com/ for further assistance.""".stripMargin,
  ConditionalRemediationSteps.VENDOR_BLUECOAT ->
    """
      |1. Login via https to the ProxySG and go to Statistics > System > Resources > Memory use. Review the current Memory utilization graph.
      |2. Login via ssh to the ProxySG and run the command "show resources" which provides information about current memory usage.
      |3. Check if the unit is dealing with high traffic volume.
      |4. Check the ICAP service maximum number of connections. For more information review the following Bluecoat guides:
      |- https://origin-symwisedownload.symantec.com/resources/webguides/proxysg/certification/sg_firststeps_webguide/Content/Troubleshooting/Malware%20Prevention/troubleshoot_sg_unresponsive.htm
      |- https://origin-symwisedownload.symantec.com/resources/webguides/contentanalysis/13/system_webguide/Content/Topics/Tasks/Stats_Mem.htm
      |5. If the problem persists, contact Symantec Technical support at https://support.symantec.com for further assistance.
    """.stripMargin
)