High Mount Space per Chassis and Blade-juniper-junos

error
health-checks
junos
juniper
High Mount Space per Chassis and Blade-juniper-junos
0

#1

High Mount Space per Chassis and Blade-juniper-junos

Vendor: juniper

OS: junos

Description:
Alert when Disk usage is high

Remediation Steps:
Review the contents of the mount points to see what can be deleted or moved and attempt to identify whether there’s a specific cause for this.
|||1. Clean all unused files (request system storage disconnect) routinely.
|2. Remove the debug files after debug is done.
|3. Configure sending logs to remote log servers.
|4. Review the following article on Juniper tech support site: Operational Commands: show system storage partitions (View SRX Series)..
|5. If the problem persists, contact the Juniper Networks Technical Assistance Center (JTAC).

How does this work?
This script logs into the Juniper JUNOS-based device using SSH and retrieves the output of the “show system storage detail” command. The output includes the device’s storage utilization.

Why is this important?
It is very important to monitor the disk space usage of a system. If the disk space is full it will prevent writing more data to the disk. Compressing and moving data from a disk that is 100% full is time consuming, which is why it is important to take care of any such issue early.

Without Indeni how would you find this?
An administrator could login and manually list the disk space usage. Vendors generally provide tools which provide access to this information.

junos-show-system-storage-detail

#! META
name: junos-show-system-storage-detail
description: JUNOS retrieve the storage status
type: monitoring
monitoring_interval: 10 minute
requires:
    vendor: juniper
    os.name: junos

#! COMMENTS
disk-usage-percentage:
    why: |
        It is very important to monitor the disk space usage of a system. If the disk space is full it will prevent writing more data to the disk. Compressing and moving data from a disk that is 100% full is time consuming, which is why it is important to take care of any such issue early.
    how: |
       This script logs into the Juniper JUNOS-based device using SSH and retrieves the output of the "show system storage detail" command. The output includes the device's storage utilization.
    without-indeni: |
        An administrator could login and manually list the disk space usage. Vendors generally provide tools which provide access to this information.
    can-with-snmp: true
    can-with-syslog: false
    vendor-provided-management: |
        This is accessible from the command line interface or vendor-provided tools, as well as SNMP.

disk-used-kbytes:
    why: |
        Used to display how much, in kilobytes, of the partition being used. If the file system gets data that should be written to disk can be lost.
    how: |
       This script logs into the Juniper JUNOS-based device using SSH and retrieves the output of the "show system storage detail" command. The output includes the device's storage utilization.
    without-indeni: |
        An administrator could login and manually list the disk space usage. Vendors generally provide tools which provide access to this information.
    can-with-snmp: true
    can-with-syslog: false
    vendor-provided-management: |
        This is accessible from the command line interface or vendor-provided tools, as well as SNMP.

disk-total-kbytes:
    why: |
        Used to display the total partition size, in kilobytes.
    how: |
       This script logs into the Juniper JUNOS-based device using SSH and retrieves the output of the "show system storage detail" command. The output includes the device's storage utilization.
    without-indeni: |
        An administrator could login and manually list the disk space usage. Vendors generally provide tools which provide access to this information.
    can-with-snmp: true
    can-with-syslog: false
    vendor-provided-management: |
        This is accessible from the command line interface or vendor-provided tools, as well as SNMP.

#! REMOTE::SSH
show chassis hardware node local | match node
show system storage detail

#! PARSER::AWK
BEGIN {
    node0 = 0
    node1 = 0
    cluster = 0
    # List of uninteresting mount points for JUNOS specifically:
    uninterestingmounts["/dev"] = "true"
    uninterestingmounts["/jail/dev"] = "true"
    uninterestingmounts["/junos"] = "true"
    uninterestingmounts["/junos/cf/dev"] = "true"
    uninterestingmounts["/junos/dev/"] = "true"
    uninterestingmounts["/proc"] = "true"
}

#node0:
/^node0/ {
    node0++ 
    cluster = 1
}

#node1:
/^node1/ {
    node1++
    cluster = 1
    if (node0 == 2) {
        node0 = 1
    }
}

#/dev/sda1  295561     24017    256284   9% /boot
/(\d+)%/ {
    mount = trim($NF)

    if (cluster == 0 || node0 == 2 || node1 == 2) {
        if (!uninterestingmounts[mount]) {
            usage = $(NF-1)
            sub(/%/, "", usage)
            available = $(NF-2)
            used = $(NF-3)
    	    total = $(NF-4)

            mounttags["file-system"] = mount

            writeDoubleMetricWithLiveConfig("disk-usage-percentage", mounttags, "gauge", "60", usage, "Mount Points - Usage", "percentage", "file-system")
            writeDoubleMetricWithLiveConfig("disk-used-kbytes", mounttags, "gauge", "60", used, "Mount Points - Used", "kbytes", "file-system")
            writeDoubleMetricWithLiveConfig("disk-total-kbytes", mounttags, "gauge", "60", total, "Mount Points - Total", "kbytes", "file-system")
        }  
    }
}

high_per_chassis_blade_mount_space

package com.indeni.server.rules.library

import com.indeni.ruleengine.Scope.{Scope, ScopeValueHelper}
import com.indeni.ruleengine.expressions.Expression
import com.indeni.ruleengine.expressions.conditions.{ConditionHelper, GreaterThanOrEqual}
import com.indeni.ruleengine.expressions.core.{StatusTreeExpression, _}
import com.indeni.ruleengine.expressions.data._
import com.indeni.ruleengine.expressions.math.AverageExpression
import com.indeni.ruleengine.expressions.scope.{ScopableExpression, ScopeValueExpression}
import com.indeni.server.common.ParameterValue
import com.indeni.server.common.data.conditions.{Equals, True}
import com.indeni.server.params.ParameterDefinition
import com.indeni.server.params.ParameterDefinition.UIType
import com.indeni.server.rules._
import com.indeni.server.rules.config.expressions.DynamicParameterExpression
import com.indeni.server.rules.library.core.PerDeviceRule
import com.indeni.server.sensor.models.managementprocess.alerts.dto.AlertSeverity


/**
  * Created by amir on 04/02/2016.
  */
case class HighPerChassisBladeMountSpaceUsageRule() extends PerDeviceRule {

  private val remediation = ConditionalRemediationSteps("Review the contents of the mount points to see what can be deleted or moved and attempt to identify whether there's a specific cause for this.",
    ConditionalRemediationSteps.OS_NXOS ->
      """|1. Run the "show system internal flash" NX-OS command to display the file system utilization (the output is similar to df -hT included in the memory alerts log)
         |Note: When reviewing this output, the value of "none" in the Filesystem column means that it is a tmpfs type, so the files exist in RAM only.
         |3. Determine what types of files are filling the partition and where they came from (cores/debugs/etc).
         |4. Delete the files to reduce disk utilization, but you should try to determine what type of files are taking up the space and what process left them in tmpfs.
         |5. Use the next commands to further troubleshoot the issue:
         | a. "show system internal dir <full directory path>" command lists all the files and sizes for the specified path (hidden command)
         | b. "filesys delete <full file path>" command deletes a specific file (hidden command)
         |Note: Use caution when using this command. You cannot recover a deleted file.""".stripMargin,
    ConditionalRemediationSteps.VENDOR_JUNIPER ->
      """|1. Clean all unused files (request system storage disconnect) routinely.
         |2. Remove the debug files after debug is done.
         |3. Configure sending logs to remote log servers.
         |4. Review the following article on Juniper tech support site: <a target="_blank" href="https://www.juniper.net/documentation/en_US/junos/topics/reference/command-summary/show-system-storage-partitions.html">Operational Commands: show system storage partitions (View SRX Series).</a>.
         |5. If the problem persists, contact the Juniper Networks Technical Assistance Center (JTAC).""".stripMargin
  )

  private val excludeDisks = Set("/dev", "/mnt/cdrom", "/proc", "/dev/shm", "/dev/shm")

  private val highThresholdParameterName: String = "High_Threshold_of_Space_Usage"

  private val highThresholdParameter = new ParameterDefinition(highThresholdParameterName,
    "",
    "High Threshold of Space Usage",
    "What is the threshold for the mount point's disk usage for which once it is crossed an issue will be triggered.",
    UIType.DOUBLE,
    new ParameterValue((80.0).asInstanceOf[Object])
  )


  override val metadata: RuleMetadata = RuleMetadata.builder("high_per_chassis_blade_mount_space", "High Mount Space per Chassis and Blade", "Alert when Disk usage is high", AlertSeverity.ERROR).configParameter(highThresholdParameter).build()

  override def expressionTree(context: RuleContext): StatusTreeExpression = {

    val usagePercentage = AverageExpression(TimeSeriesExpression[Double]("disk-usage-percentage"))
    val usagePercentageThreshold = DynamicParameterExpression.withConstantDefault(highThresholdParameter.getName, highThresholdParameter.getDefaultValue.asDouble.toDouble).noneable
    val isUsagePercentageAboveThreshold = GreaterThanOrEqual(usagePercentage, usagePercentageThreshold)

    val shouldCheckDisk = ScopeValueExpression("file-system").visible().isIn(excludeDisks).not

    val isDiskWithIssue = com.indeni.ruleengine.expressions.conditions.And(isUsagePercentageAboveThreshold, shouldCheckDisk)

    val mountSpaceFailDescription = new ScopableExpression[String] {
      override protected def evalWithScope(time: Long, scope: Scope): String =
        "Storage usage (" + "%.2f".format(usagePercentage.eval(time).get) + "%) above threshold (" + "%.2f".format(usagePercentageThreshold.eval(time).get) + "%) " +
          "for chassis: " + scope.getVisible("Chassis").get + ", blade: " + scope.getVisible("Blade").get + ", mount point: " + scope.getVisible("file-system").get

      override def args: Set[Expression[_]] = Set(usagePercentage, usagePercentageThreshold)
    }
    val mountSpaceFailHeadline = new ScopableExpression[String] {
      override protected def evalWithScope(time: Long, scope: Scope): String = "chassis: " + scope.getVisible("Chassis").get + ", blade: " + scope.getVisible("Blade").get + ", mount point: " + scope.getVisible("file-system").get

      override def args: Set[Expression[_]] = Set()
    }

    val tsQuery = SelectTimeSeriesExpression[Double](context.tsDao, Set("disk-usage-percentage"))
    val forTsCondition = StatusTreeExpression(tsQuery, isDiskWithIssue).withSecondaryInfo(
      mountSpaceFailHeadline, mountSpaceFailDescription, title = "Problematic Mount Points"
    ).asCondition()

    val disksQuery = SelectTagsExpression(context.tsDao, Set("Chassis", "Blade", "file-system"), True)
    val highMountSpacePerDevicePerDiskLogic = StatusTreeExpression(disksQuery, forTsCondition).withoutInfo().asCondition()

    val headline = ConstantExpression("High storage usage has been measured")
    val description = ConstantExpression("Some mounts/drives have reached a high level of storage use. This may result in system failure in the near future.")

    val devicesFilter = Equals("model", "CheckPoint61k")
    val devicesQuery = SelectTagsExpression(context.metaDao, Set(DeviceKey), devicesFilter)

    StatusTreeExpression(devicesQuery, highMountSpacePerDevicePerDiskLogic).withRootInfo(
      headline, description, remediation
    )
  }
}