Rule to alert based on prior entry

Brad_Spilde · December 28, 2017, 10:17pm

How could we go about alerting based on a disk previously being Present and now reporting as Missing? The system reflects a RAID status of Good even when a disk is removed. It reports it as Missing instead of Failed. Missing is ok if there was only one disk initially. Not every system is equiped with RAID.

If it switches from Present to Missing we need to alert that "Disk RAID status for Disk id X switched to Missing or Disk id X switched From Present to Failed. Basically alert on any Drive status change or RAID status change.

Overall RAID status Good

--------------------------------------------------------------------------------

Drive status

Disk id 1 Present (INTEL SSDSC2BB24)

Disk id 2 Missing

--------------------------------------------------------------------------------

Johnathan_Browall_No · December 29, 2017, 2:08pm

raid is currently stored in the metric called "hardware-element-status". It should be possible to change the alert rule to alert if an element is missing that was previously there (I think).

The problem, at least for Check Point, is that we determine if the disk is ok or not in the collection script.

This is how it works for Check Point.

run command: raid_diagnostic

For each volume, write a metric with the health of it.

It is deemed ok if it has status: ONLINE, OPTIMAL or MISSING.

So if it goes from ONLINE to MISSING, the metric would write the same data, and the alert rule would not be able to determine any change.

The collection script has no previous knowledge of what was found when it was run last, it only lives in the now, collecting the data and writing the metrics.

Its interesting, because the reason we consider MISSING as ok in this script is not the same reason as you describe. Our reason is https://supportcenter.checkpoint.com/supportcenter/portal?eventSubmit_doGoviewsolutiondetails=&solutionid=sk104580 which is basically a bug in Check Point (or Linux)