Device restarted (uptime low)-fortinet-FortiOS

health-checks
critical
fortios
fortinet
Device restarted (uptime low)-fortinet-FortiOS
0

#1

Device restarted (uptime low)-fortinet-FortiOS

Vendor: fortinet

OS: FortiOS

Description:
Indeni will alert when a device has restarted.

Remediation Steps:
Determine why the device was restarted.

  |1. Watch the system reboot time.
  |2. Review the log messages and focus on error messages that were generated at least 5 minutes prior to system reboot, especially before unexpected system reboot.
  |3. Verify the status of the scheduled restart command to making  sure it's an irregular restart
  |   - config sys global
  |   - get | grep restart
  |   - end
  |4. Login via ssh to the Fortinet firewall and review the crash log in a readable format by using the FortiOS command “diag debug crashlog read”.
  |5. Contact Fortinet Technical support at https://support.fortinet.com/ for further assistance.

How does this work?
Indeni uses the built-in Fortinet “get system performance status” command to retrieve the current device up-time.

Why is this important?
Capture the uptime of the device. If the uptime is lower than the previous sample, the device must have reloaded.

Without Indeni how would you find this?
An administrator could login and manually run the command via CLI, check the system resources widget via the GUI, enable SNMP, or use Fortinet FortiAnalyzer.

fortios-get-system-performance-status

#! META
name: fortios-get-system-performance-status
description: Performance metrics based on "get system performance status" command on Fortinet firewall
type: monitoring
monitoring_interval: 1 minute
includes_resource_data: true
requires:
    vendor: "fortinet"
    os.name: "FortiOS"
    product: "firewall"
    vdom_enabled: "false"

#! COMMENTS
memory-usage:
    why: |
        If the firewall memory becomes fully utilized, performance may be impacted and traffic may be dropped, and in extreme cases the firewall could crash. It is critical to monitor the memory usage and handle the issue prior to resource exhaustion.
    how: |
        Indeni uses the built-in Fortinet "get system performance status" command to retrieve the device memory utilization.
    without-indeni: |
        An administrator could login and manually run the command via CLI, check the system resources widget via the GUI, enable SNMP, configure a syslog server for a log message every 5 minutes containing the utilization, or use Fortinet FortiAnalyzer.
    can-with-snmp: true
    can-with-syslog: true

cpu-usage:
    why: |
        If the firewall CPU becomes fully utilized, performance may be impacted and traffic may be dropped, and in extreme cases the firewall could crash. It is critical to monitor the memory usage and handle the issue prior to resource exhaustion.
    how: |
        Indeni uses the built-in Fortinet "get system performance status" command to retrieve the device CPU utilization.
    without-indeni: |
        An administrator could login and manually run the command via CLI, check the system resources widget via the GUI, enable SNMP, configure a syslog server for a log message every 5 minutes containing the utilization, or use Fortinet FortiAnalyzer.
    can-with-snmp: true
    can-with-syslog: true

uptime-milliseconds:
    why: |
        Capture the uptime of the device. If the uptime is lower than the previous sample, the device must have reloaded.
    how: |
        Indeni uses the built-in Fortinet "get system performance status" command to retrieve the current device up-time.
    without-indeni: |
        An administrator could login and manually run the command via CLI, check the system resources widget via the GUI, enable SNMP, or use Fortinet FortiAnalyzer.
    can-with-snmp: true
    can-with-syslog: false

memory-free-kbytes:
    skip-documentation: true
memory-total-kbytes:
    skip-documentation: true
memory-used-kbytes:
    skip-documentation: true

#! REMOTE::SSH
get system performance status

#! PARSER::AWK

function writeCpuUsageMetric(id, cpuIdleAmount, cpuIsAverage) {
    sub(/%/, "", cpuIdleAmount)

    tags_cpu["cpu-id"] = id
    tags_cpu["cpu-is-avg"] = cpuIsAverage
    tags_cpu["resource-metric"] = "true"
    writeDoubleMetricWithLiveConfig("cpu-usage", tags_cpu, "gauge", 0, 100 - cpuIdleAmount, "CPU Usage", "percentage", "cpu-id")
}

# v5.4
#Memory states: 66% used
/^Memory states:/ {
    memory_usage = substr($3, 1, 2)

    # the following "RAM" tag value does NOT surface in the UI. It's here just to satisfy the
    # requirements of the rule -- for some reason, we need to have this tag _with_ a value for things
    # to function properly.

    tags_memory["name"] = "RAM"
    tags_memory["resource-metric"] = "true"
    writeDoubleMetricWithLiveConfig("memory-usage", tags_memory, "gauge", 0, memory_usage, "Memory Usage", "percentage", "")
}

# v5.6
#Memory: 1019996k total, 354312k used (34%), 665684k free (66%), 1616k buffers
/^Memory:/ {
    percent_memory_usage = substr($6, 2, 2)
    free = substr($7, 1, length($7) - 1)
    total = substr($2, 1, length($2) - 1)
    used = substr($4, 1, length($4) - 1)

    tags_memory["name"] = "Memory: Free"
    writeDoubleMetricWithLiveConfig("memory-free-kbytes", tags_memory, "gauge", "60", free, "Memory Usage", "kilobytes", "name")

    tags_memory["name"] = "Memory: Total"
    writeDoubleMetricWithLiveConfig("memory-total-kbytes", tags_memory, "gauge", "60", total, "Memory Usage", "kilobytes", "name")

    tags_memory["name"] = "Memory: Used"
    writeDoubleMetricWithLiveConfig("memory-used-kbytes", tags_memory, "gauge", "60", used, "Memory Usage", "kilobytes", "name")

    tags_memory["name"] = "Memory Usage"
    tags_memory["resource-metric"] = "true"
    writeDoubleMetricWithLiveConfig("memory-usage", tags_memory, "gauge", 0, percent_memory_usage, "Memory Usage", "percentage", "name")
}

# This section handles the "per core" metrics for CPU usage
# v5.4
#CPU0 states: 2% user 4% system 0% nice 94% idle
# v5.6
#CPU1 states: 6% user 8% system 0% nice 86% idle 0% iowait 0% irq 0% softirq
/^CPU[0-9]+ states:/ {
    writeCpuUsageMetric("Per Core - " $1, $9, "false")
}

# "CPU states:" shows the average CPU usage across all CPU cores
# v5.4
#CPU states: 8% user 10% system 0% nice 82% idle
# v5.6
#CPU states: 4% user 6% system 0% nice 90% idle 0% iowait 0% irq 0% softirq
/^CPU states:/ {
    writeCpuUsageMetric("Average - CPU", $9, "true")
}

#Uptime: 3 days,  6 hours,  10 minutes
/^Uptime:/ {
    days = $2
    hours = $4
    minutes = $6
    uptime_in_seconds = days * 86400 + hours * 3600 + minutes * 60
    # Display in Overview - Live Config the uptime (in seconds)
    writeDoubleMetricWithLiveConfig("uptime-milliseconds", null, "gauge", 0, (uptime_in_seconds*1000), "Device Uptime", "duration", "")
}



cross_vendor_uptime_low

package com.indeni.server.rules.library.templatebased.crossvendor

import com.indeni.apidata.time.TimeSpan
import com.indeni.apidata.time.TimeSpan.TimePeriod
import com.indeni.server.common.data.conditions.Equals
import com.indeni.server.rules.RuleContext
import com.indeni.server.rules.library._
import com.indeni.server.rules.library.templates.TimeIntervalThresholdOnDoubleMetricTemplateRule
import com.indeni.server.sensor.models.managementprocess.alerts.dto.AlertSeverity

case class cross_vendor_uptime_low() extends TimeIntervalThresholdOnDoubleMetricTemplateRule(
  ruleName = "cross_vendor_uptime_low",
  ruleFriendlyName = "All Devices (Non-VSX): Device restarted (uptime low)",
  ruleDescription = "Indeni will alert when a device has restarted.",
  severity = AlertSeverity.CRITICAL,
  metricName = "uptime-milliseconds",
  threshold = TimeSpan.fromMinutes(60),
  metricUnits = TimePeriod.MILLISECOND,
  thresholdDirection = ThresholdDirection.BELOW,
  alertDescriptionFormat = "The current uptime is %.0f seconds which seems to indicate the device has restarted.",
  alertDescriptionValueUnits = TimePeriod.SECOND,
  baseRemediationText = "Determine why the device was restarted.",
  metaCondition = !Equals("vsx", "true")
)(
  ConditionalRemediationSteps.OS_NXOS ->
    """|
       |1. Use the "show version" or "show system reset-reason" NX-OS commands to display the reason for the reload.
       |2. Use the "show cores" command to determine if a core file was recorded during the unexpected reboot.
       |3. Run the "show process log" command to display the processes and if a core was created.
       |4. With the show logging command, review the events that happened close to the time of reboot.""".stripMargin,
  ConditionalRemediationSteps.VENDOR_FORTINET ->
    """
      |1. Watch the system reboot time.
      |2. Review the log messages and focus on error messages that were generated at least 5 minutes prior to system reboot, especially before unexpected system reboot.
      |3. Verify the status of the scheduled restart command to making  sure it's an irregular restart
      |   - config sys global
      |   - get | grep restart
      |   - end
      |4. Login via ssh to the Fortinet firewall and review the crash log in a readable format by using the FortiOS command “diag debug crashlog read”.
      |5. Contact Fortinet Technical support at https://support.fortinet.com/ for further assistance.""".stripMargin
)