Categorizing alerts

With all the alerts, I want to be able to logically group them together.


This is related to Yoni’s post “Some rule tuning


Here’s the categories I’m considering:

1) System resources - e.g. high cpu, memory, critical process(es) down, etc.

2) device manageability - e.g. no syslog server configured, debug mode enabled, license usage limit approaching etc.

3) network/connections - e.g. interfaces down, packet drops, etc.

4) security best practices - e.g. telnet enabled, snmp v2 used, etc.

5) device/vendor specific best practices - e.g. disable console logging, configure at least one syslog server, etc.

6) High availability/clustering

7) Routing Protocols


We can then assign severity by default to categories.

  • Best practices are probably information.
  • Device manageability is warning.
  • The rest is error.


Any other category suggestion?


Thoughts on the default severity by categories?


Great idea. What do you think to follow the well known logging severity level pattern? What about the next? It could be updated later with more categories.


Alert severity level

Level Keyword

Level

Description

Indeni Alert

Categories

emergencies

0

System unstable

Environmental & System issues e.g. PS failure, CPU high usage

alerts

1

Immediate action needed

Layer 3 & Layer 2 issues e.g. BGP down, STP recalculations


Interface related issues e.g. link down, dropped packets

critical

2

Critical conditions

Management related issues such as License expiration, no config saved, debug enabled etc

errors

3

Error conditions

Fails against to Network Security best practices e.g. SNMPv2, no NTP etc


Mismatches against cluster

warnings

4

Warning conditions

Default (everything that is not to any other category)

notifications

5

Normal but significant condition

Fails against to Configuration best practices