Core dump files found-juniper-junos

error
health-checks
junos
juniper
Core dump files found-juniper-junos
0

#1

Core dump files found-juniper-junos

Vendor: juniper

OS: junos

Description:
A core dump is created when a process crashes. Indeni will alert when a core dump file is created.

Remediation Steps:
The list of core dumps is retrieved by logging into the shell over SSH and retrieving the details of files found in /var/log/dump/usermode/, /var/tmp/core or /var/crash. Investigate the core dump files. If the issue is not clear, open up a case with vendor support and send them the file.
|||1. Run “show system core-dumps” command to locate system core files. The core dump files usually are located in “/var/log/dump/usermode/”, “/var/tmp/core” or “/var/crash” directories.
|2. Check for core-dumps that were created at to the time of failover.
|3. If there is a core-dump created at the time of failover, consider uploading the file to the Juniper Networks Technical Assistance Center (JTAC) FTP server for further analysis and opening a case with the JTAC.
|4. Review the following articles on Juniper TechLibrary for more information: Syslog message: Core dumped
|How to get a core-dump off the router and to the Juniper FTP server

How does this work?
This script periodically checks to find whether any new core files are generated by the system by running the “show system core-dumps”.

Why is this important?
Usually when the system crashes, it generates core files for further analysis. These core files help to identify where the root causes are.

Without Indeni how would you find this?
The administrator has to log on the system to locate the core files by running the “show system core-dumps”.

junos-show-system-core-dumps

#! META
name: junos-show-system-core-dumps
description: check whether the SRX device has core dumps 
type: monitoring
monitoring_interval: 10 minute
requires:
     vendor: juniper
     os.name: junos
     product: firewall

#! COMMENTS
core-dumps:
    why: |
        Usually when the system crashes, it generates core files for further analysis. These core files help to identify where the root causes are.
    how: |
        This script periodically checks to find whether any new core files are generated by the system by running the "show system core-dumps".
    without-indeni: |
        The administrator has to log on the system to locate the core files by running the "show system core-dumps".
    can-with-snmp: false
    can-with-syslog: false
    vendor-provided-management: |
        The administrator has to log on the system to check whether any core files are generated when the system crashes. This passive check will slow down the process to find root causes.
total-core-dumps:
    why: |
        Under normal operation, there should be no core files. The number of cores indicates there are multiple reasons for the system crash. 
    how: |
        This script periodically checks to find the number of cores generated by running the "show system core-dumps".
    without-indeni: |
        The administrator has to log on the system to locate the core files by running the "show system core-dumps".
    can-with-snmp: false
    can-with-syslog: false 
    vendor-provided-management: |
        The administrator has to log on the system to check whether any core files are generated when the system crashes. This passive check will slow down the process to find root causes.

#! REMOTE::SSH
show system uptime | match Current
show chassis hardware node local | match node
show system core-dumps 

#! PARSER::AWK
BEGIN {
    node0 = 0
    cluster = 0
    node_idx = 0
    file_idx = 1
    node_core_files[0] = 0
    node_core_files[1] = 0
    month["Jan"] = 1
    month["Feb"] = 2
    month["Mar"] = 3
    month["Apr"] = 4
    month["May"] = 5
    month["Jun"] = 6
    month["Jul"] = 7
    month["Aug"] = 8
    month["Sep"] = 9
    month["Oct"] = 10
    month["Nov"] = 11
    month["Dec"] = 12
}

#Current time: 2017-06-10 06:29:10 UTC
/^Current/ {
    cur_date = $3
    split(cur_date, YMD, "-")
    created_year = YMD[1]
}

#node0:
/^node0/ {
    node0++ 
    cluster = 1
}

#node1:
/^node1/ {
    if ( node0 > 0 ) {
        node_idx++
        file_idx = 1
    }
    cluster = 1
}

#-rw-r--r--  1 root  wheel          0 May 21 16:58 /var/tmp/vmcore.0
/^(-r)/ {
    core_file = $NF
    created_month = month[$(NF-3)]
    created_date  = $(NF-2)
    created_time  = $(NF-1)
    if ( created_time !~ /:/ ) {
        created_year = created_time
        created_hour = 0
        created_minute = 0
    } else {
        split(created_time, HM, ":")
        created_hour = HM[1]
        created_minute = HM[2]
    }

    created_time = datetime(created_year, created_month, created_date, created_hour, created_minute, 0)
    if ( core_file !~ /\// ) {
        core_file = "/var/crash/corefiles/"core_file
    }
    if ( node_idx == 0 ) {
        node0_core_files[file_idx,"path"] = core_file
        node0_core_files[file_idx,"created"] = created_time 
        
    } else {
        node1_core_files[file_idx,"path"] = core_file
        node1_core_files[file_idx,"created"] = created_time 
    }
    file_idx++
}

END {
    if ( file_idx > 1) { 
        if ( cluster == 1 ) {
            if ( node0 == 2 ) {
                node_idx = 0
                cluster_node["node"] = "node0"
                writeComplexMetricObjectArray( "core-dumps", null, node0_core_files )
            } else {
                node_idx = 1
                cluster_node["node"] = "node1"
                writeComplexMetricObjectArray( "core-dumps", null, node1_core_files )
            }
        } else {
            node_idx = 0
            cluster_node["node"] = "standalone"
            writeComplexMetricObjectArray( "core-dumps", null, node0_core_files )
        }
    }
}



cross_vendor_core_dump_created

package com.indeni.server.rules.library.crossvendor

import com.indeni.ruleengine.expressions.conditions.And
import com.indeni.ruleengine.expressions.core.{StatusTreeExpression, _}
import com.indeni.ruleengine.expressions.data._
import com.indeni.ruleengine.expressions.utility.IsEmptyExpression.IsEmptyExpressionHelper
import com.indeni.server.common.data.conditions.True
import com.indeni.server.rules._
import com.indeni.server.rules.library.core.PerDeviceRule
import com.indeni.server.rules.library.{ConditionalRemediationSteps, RuleHelper}
import com.indeni.server.sensor.models.managementprocess.alerts.dto.AlertSeverity

import scala.language.reflectiveCalls

case class CrossVendorCoreDumpCreatedRule(context: RuleContext) extends PerDeviceRule with RuleHelper {

  override val metadata: RuleMetadata = RuleMetadata.builder("cross_vendor_core_dump_created", "All Devices: Core dump files found",
    "A core dump is created when a process crashes. Indeni will alert when a core dump file is created.", AlertSeverity.ERROR)
    .build()

  override def expressionTree: StatusTreeExpression = {
    val currentValue = SnapshotExpression("core-dumps").asMulti().mostRecent().value()
    val previousValue = SnapshotExpression("core-dumps").asMulti().middle().optionValue()

    StatusTreeExpression(
      // Which objects to pull (normally, devices)
      SelectTagsExpression(context.metaDao, Set(DeviceKey), True),

      // What constitutes an issue
      StatusTreeExpression(
        // The time-series we check the test condition against:
        SelectSnapshotsExpression(context.snapshotsDao, Set("core-dumps")).multi(),

        // The condition which, if true, we have an issue. Checked against the time-series we've collected
        And(
          currentValue.nonEmpty,
          previousValue.isNot(currentValue)
        )

      ).withoutInfo().asCondition()

      // Details of the alert itself
    ).withRootInfo(
      getHeadline(),
      ConstantExpression("A core dump file has been created. This happens when a process crashes, and the core dump can contain information on why this happened. This usually means that there is an issue that should be investigated."),
      ConditionalRemediationSteps("The list of core dumps is retrieved by logging into the shell over SSH and retrieving the details of files found in /var/log/dump/usermode/, /var/tmp/*core* or /var/crash. Investigate the core dump files. If the issue is not clear, open up a case with vendor support and send them the file.",
        ConditionalRemediationSteps.VENDOR_F5 ->
          """|Login to your device with SSH and run "ls /var/core". Investigate the core dump files.
             |If the cause of the issue is not clear, open up a case with F5 and follow the instructions on this page in order to provide them with the information needed:
             |<a target="_blank" href="https://support.f5.com/csp/article/K10062">Article: K10062 on AskF5</a>.""".stripMargin,
        ConditionalRemediationSteps.VENDOR_JUNIPER ->
          """|1. Run "show system core-dumps" command to locate system core files. The core dump files usually are located in "/var/log/dump/usermode/", "/var/tmp/*core*" or "/var/crash" directories.
             |2. Check for core-dumps that were created at to the time of failover.
             |3. If there is a core-dump created at the time of failover, consider uploading the file to the Juniper Networks Technical Assistance Center (JTAC) FTP server for further analysis and opening a case with the JTAC.
             |4. Review the following articles on Juniper TechLibrary for more information: <a target="_blank" href="https://kb.juniper.net/InfoCenter/index?page=content&id=KB18867">Syslog message: Core dumped</a>
             |<a target="_blank" href="https://kb.juniper.net/InfoCenter/index?page=content&id=KB26963">How to get a core-dump off the router and to the Juniper FTP server</a>""".stripMargin)
    )
  }
}