Rules Visibility

r808 · June 19, 2018, 6:38pm

Hi All,

I’m working on requirements for improved visibility of rules for Q3.

The issues we are trying to resolve include:

Dont know what rules are in each release/build
Dont know what has changed with the rule
Dont know what the rules are really doing
What metrics are involved
What is the alert criteria/algoritm
Dont know what the remediation steps are.
Only way is to look at code or wait for it to alert.

This feature will be tailored towards IKE developers vs a general user.

Let me know if you have any other issues or use case that you think this feature can help resolve. Also I be interested in hearing if you have ideas on how you would like to see this information conveyed.

Thanks,
Robert

Vasileios_Bouloukos · June 19, 2018, 9:40pm

Please have a look to the next thread to the community. You will find useful info for your task.

Hawkeye_Parker · June 21, 2018, 8:13pm

Hi Robert,

Great to hear you are working on this.
First: What’s the difference between “Dont know what the rules are really doing” and “What is the alert criteria/algoritm”? Those sound the same to me. Also, can you give an example of what you’re thinking about here? I’m assuming it would be something pretty simple like: “If this metric is above this threshold, then the rule will generate an alert.” Something like that?

Other thoughts (I think your requirements capture most of this, just putting it in my own words):

I would prioritize the following:
- How to find the actual source code for a given rule on a given server. I.e., correlate the rule that’s running on a server with the rule source. E.g., this is how I can do it for .ind scripts: https://indeni.atlassian.net/wiki/spaces/IKP/pages/433946662/Build+Information
- Some way to easily/quickly system test a given .ind against the rules on a given server; i.e., without having to tweak the script, deploy it, restart the server, wait for the alert to come up in the UI. I.e., quickly answer the question: “Will this .ind script generate an alert?” using command line tools (like command-runner).
- If this test fails, good error messaging around the cause of the failure.

After we have these basic tools in place, then, yes, it would be very nice to have a higher-level abstraction of a rule that we could look at to make it easier to understand how to satisfy the conditions of the rule. In terms of priority, I would want:

1.For a given rule/alert, what exact conditions would cause the alert to resolve? (Currently this often seems like a real mystery to me).
2. What is the alert criteria/algorithm (maybe #1 is a part of this?)
3. What is the type of the metric data that we need to pass? Double, complex string, complex object, etc.?
4. What metrics are involved (important: including metric tags)
5. Don’t know what the remediation steps are

It’s pretty easy to look at the code and figure out which metrics are involved and what the remediation steps are. Actually trying to figure out the code logic for the rule is, of course, much more time consuming.

In terms of the way this is all presented in some kind of tool/application, I’m not sure, but I know I care less about how it’s presented than I do about just getting the information. I.e., I’d rather have it sooner than have it pretty. Also, part of me thinks it would be better to have it decoupled from the Indeni UI, just so that problems there don’t affect the tool.
Hawkeye

r808 · June 24, 2018, 4:59am

Hi Hawkeye and Vasilis,
Thank you for the feedback.
The “Dont know what the rules are really doing” and “What is the alert criteria/algorithm" are similar and overlap. Between the two, I want to capture information of what are the parameters of the rule, when will it be executed( the interval?), the purpose/why, the remediation steps and the actual algorithm that causes a rule to trigger. One of the feedbacks I got is that you can see the rules under Indeni Rules tab but you dont know what will cause it to alert, the algorithm and the metrics. In addition you dont know anything about the remediation steps unless it actually alerts.

For example for CrossVendorPowerSupply, we want to convey all the parameters plus something like “If power_supply_inserted is up, then the rule will generate a warning” However, How the algorithms get conveyed is still up in the air. I think it is easy to put into word for a templated rule but non-templated ones could get complex. It may be that we just show the rule in the short term.

FYI, we are also working another set of requirements that will try to simplify the development and debugging of rules so there wont be a need for dedicated rules developers.

case class CrossVendorPowerSupply(context: RuleContext) extends StateDownTemplateRule(context,
severity = AlertSeverity.WARN,
ruleName = “CrossVendorPowerSupply”,
ruleFriendlyName = “Devices with multiple PSU: Some Power Supply Unit (PSU) slot is empty or PSU is down”,
ruleDescription = “Indeni will alert if some PSU slot is empty.”,
metricName = “power-supply-inserted”,
applicableMetricTag = “name”,
alertItemsHeader = “The following PSU(s) not installed or off line”,
alertDescription = “It is best practice to install multiple PSU with separate power sources whenever possible. Indeni will alert if a device has multiple PSU slots but some slot has no PSU installed or the PSU is off line.”,
baseRemediationText = “No redundant power supply has been detected. It is best practice to run dual power supplies on two separate Power Distribution Units, separate UPS backup, and circuit breakers. This provides the greatest level of redundancy for power access.”)(
ConditionalRemediationSteps.VENDOR_PANOS -> “Run the CLI command “show system environmentals power-supply” for more details.”
)