Device uptime too high-fortinet-FortiOS

error
ongoing-maintenance
fortios
fortinet
Device uptime too high-fortinet-FortiOS
0

#1

Device uptime too high-fortinet-FortiOS

Vendor: fortinet

OS: FortiOS

Description:
Indeni will alert when a device’s uptime is too high

Remediation Steps:
Upgrade the device. You may also change the alert’s threshold, or disable the alert completely, if not needed.

How does this work?
Indeni uses the built-in Fortinet “get system performance status” command to retrieve the current device up-time.

Why is this important?
Capture the uptime of the device. If the uptime is lower than the previous sample, the device must have reloaded.

Without Indeni how would you find this?
An administrator could login and manually run the command via CLI, check the system resources widget via the GUI, enable SNMP, or use Fortinet FortiAnalyzer.

fortios-get-system-performance-status

#! META
name: fortios-get-system-performance-status
description: Performance metrics based on "get system performance status" command on Fortinet firewall
type: monitoring
monitoring_interval: 1 minute
includes_resource_data: true
requires:
    vendor: "fortinet"
    os.name: "FortiOS"
    product: "firewall"
    vdom_enabled: "false"

#! COMMENTS
memory-usage:
    why: |
        If the firewall memory becomes fully utilized, performance may be impacted and traffic may be dropped, and in extreme cases the firewall could crash. It is critical to monitor the memory usage and handle the issue prior to resource exhaustion.
    how: |
        Indeni uses the built-in Fortinet "get system performance status" command to retrieve the device memory utilization.
    without-indeni: |
        An administrator could login and manually run the command via CLI, check the system resources widget via the GUI, enable SNMP, configure a syslog server for a log message every 5 minutes containing the utilization, or use Fortinet FortiAnalyzer.
    can-with-snmp: true
    can-with-syslog: true

cpu-usage:
    why: |
        If the firewall CPU becomes fully utilized, performance may be impacted and traffic may be dropped, and in extreme cases the firewall could crash. It is critical to monitor the memory usage and handle the issue prior to resource exhaustion.
    how: |
        Indeni uses the built-in Fortinet "get system performance status" command to retrieve the device CPU utilization.
    without-indeni: |
        An administrator could login and manually run the command via CLI, check the system resources widget via the GUI, enable SNMP, configure a syslog server for a log message every 5 minutes containing the utilization, or use Fortinet FortiAnalyzer.
    can-with-snmp: true
    can-with-syslog: true

uptime-milliseconds:
    why: |
        Capture the uptime of the device. If the uptime is lower than the previous sample, the device must have reloaded.
    how: |
        Indeni uses the built-in Fortinet "get system performance status" command to retrieve the current device up-time.
    without-indeni: |
        An administrator could login and manually run the command via CLI, check the system resources widget via the GUI, enable SNMP, or use Fortinet FortiAnalyzer.
    can-with-snmp: true
    can-with-syslog: false

memory-free-kbytes:
    skip-documentation: true
memory-total-kbytes:
    skip-documentation: true
memory-used-kbytes:
    skip-documentation: true

#! REMOTE::SSH
get system performance status

#! PARSER::AWK

function writeCpuUsageMetric(id, cpuIdleAmount, cpuIsAverage) {
    sub(/%/, "", cpuIdleAmount)

    tags_cpu["cpu-id"] = id
    tags_cpu["cpu-is-avg"] = cpuIsAverage
    tags_cpu["resource-metric"] = "true"
    writeDoubleMetricWithLiveConfig("cpu-usage", tags_cpu, "gauge", 0, 100 - cpuIdleAmount, "CPU Usage", "percentage", "cpu-id")
}

# v5.4
#Memory states: 66% used
/^Memory states:/ {
    memory_usage = substr($3, 1, 2)

    # the following "RAM" tag value does NOT surface in the UI. It's here just to satisfy the
    # requirements of the rule -- for some reason, we need to have this tag _with_ a value for things
    # to function properly.

    tags_memory["name"] = "RAM"
    tags_memory["resource-metric"] = "true"
    writeDoubleMetricWithLiveConfig("memory-usage", tags_memory, "gauge", 0, memory_usage, "Memory Usage", "percentage", "")
}

# v5.6
#Memory: 1019996k total, 354312k used (34%), 665684k free (66%), 1616k buffers
/^Memory:/ {
    percent_memory_usage = substr($6, 2, 2)
    free = substr($7, 1, length($7) - 1)
    total = substr($2, 1, length($2) - 1)
    used = substr($4, 1, length($4) - 1)

    tags_memory["name"] = "Memory: Free"
    writeDoubleMetricWithLiveConfig("memory-free-kbytes", tags_memory, "gauge", "60", free, "Memory Usage", "kilobytes", "name")

    tags_memory["name"] = "Memory: Total"
    writeDoubleMetricWithLiveConfig("memory-total-kbytes", tags_memory, "gauge", "60", total, "Memory Usage", "kilobytes", "name")

    tags_memory["name"] = "Memory: Used"
    writeDoubleMetricWithLiveConfig("memory-used-kbytes", tags_memory, "gauge", "60", used, "Memory Usage", "kilobytes", "name")

    tags_memory["name"] = "Memory Usage"
    tags_memory["resource-metric"] = "true"
    writeDoubleMetricWithLiveConfig("memory-usage", tags_memory, "gauge", 0, percent_memory_usage, "Memory Usage", "percentage", "name")
}

# This section handles the "per core" metrics for CPU usage
# v5.4
#CPU0 states: 2% user 4% system 0% nice 94% idle
# v5.6
#CPU1 states: 6% user 8% system 0% nice 86% idle 0% iowait 0% irq 0% softirq
/^CPU[0-9]+ states:/ {
    writeCpuUsageMetric("Per Core - " $1, $9, "false")
}

# "CPU states:" shows the average CPU usage across all CPU cores
# v5.4
#CPU states: 8% user 10% system 0% nice 82% idle
# v5.6
#CPU states: 4% user 6% system 0% nice 90% idle 0% iowait 0% irq 0% softirq
/^CPU states:/ {
    writeCpuUsageMetric("Average - CPU", $9, "true")
}

#Uptime: 3 days,  6 hours,  10 minutes
/^Uptime:/ {
    days = $2
    hours = $4
    minutes = $6
    uptime_in_seconds = days * 86400 + hours * 3600 + minutes * 60
    # Display in Overview - Live Config the uptime (in seconds)
    writeDoubleMetricWithLiveConfig("uptime-milliseconds", null, "gauge", 0, (uptime_in_seconds*1000), "Device Uptime", "duration", "")
}



cross_vendor_uptime_high

package com.indeni.server.rules.library.templatebased.crossvendor

import com.indeni.apidata.time.TimeSpan
import com.indeni.apidata.time.TimeSpan.TimePeriod
import com.indeni.server.rules.RuleContext
import com.indeni.server.rules.library.{ConditionalRemediationSteps, ThresholdDirection, TimeIntervalThresholdOnDoubleMetricTemplateRule}
import com.indeni.server.sensor.models.managementprocess.alerts.dto.AlertSeverity

/**
  *
  */
case class cross_vendor_uptime_high() extends TimeIntervalThresholdOnDoubleMetricTemplateRule(
  ruleName = "cross_vendor_uptime_high",
  ruleFriendlyName = "All Devices: Device uptime too high",
  ruleDescription = "Indeni will alert when a device's uptime is too high",
  severity = AlertSeverity.ERROR,
  metricName = "uptime-milliseconds",
  metricUnits = TimePeriod.MILLISECOND,
  threshold = TimeSpan.fromDays(365 * 10),
  thresholdDirection = ThresholdDirection.ABOVE,
  alertDescriptionFormat = "The current uptime is %.0f seconds. This alert identifies when a device has been up for a very long time and may need an upgrade.",
  alertDescriptionValueUnits = TimePeriod.SECOND,
  baseRemediationText = "Upgrade the device. You may also change the alert's threshold, or disable the alert completely, if not needed.")(
  ConditionalRemediationSteps.OS_NXOS ->
    """|
       |1. Use the "show version" NX-OS command to display the current system uptime.
       |2. Run the "show system reset-reason" to check the reason for the last reboot of the device.
       |3. Check if the installed NX-OS version is supported and review it for software bugs.""".stripMargin
)