Great to hear you are working on this.
First: What’s the difference between “Dont know what the rules are really doing” and “What is the alert criteria/algoritm”? Those sound the same to me. Also, can you give an example of what you’re thinking about here? I’m assuming it would be something pretty simple like: “If this metric is above this threshold, then the rule will generate an alert.” Something like that?
Other thoughts (I think your requirements capture most of this, just putting it in my own words):
- I would prioritize the following:
- How to find the actual source code for a given rule on a given server. I.e., correlate the rule that’s running on a server with the rule source. E.g., this is how I can do it for .ind scripts: https://indeni.atlassian.net/wiki/spaces/IKP/pages/433946662/Build+Information
- Some way to easily/quickly system test a given .ind against the rules on a given server; i.e., without having to tweak the script, deploy it, restart the server, wait for the alert to come up in the UI. I.e., quickly answer the question: “Will this .ind script generate an alert?” using command line tools (like command-runner).
- If this test fails, good error messaging around the cause of the failure.
After we have these basic tools in place, then, yes, it would be very nice to have a higher-level abstraction of a rule that we could look at to make it easier to understand how to satisfy the conditions of the rule. In terms of priority, I would want:
1.For a given rule/alert, what exact conditions would cause the alert to resolve? (Currently this often seems like a real mystery to me).
2. What is the alert criteria/algorithm (maybe #1 is a part of this?)
3. What is the type of the metric data that we need to pass? Double, complex string, complex object, etc.?
4. What metrics are involved (important: including metric tags)
5. Don’t know what the remediation steps are
It’s pretty easy to look at the code and figure out which metrics are involved and what the remediation steps are. Actually trying to figure out the code logic for the rule is, of course, much more time consuming.
In terms of the way this is all presented in some kind of tool/application, I’m not sure, but I know I care less about how it’s presented than I do about just getting the information. I.e., I’d rather have it sooner than have it pretty. Also, part of me thinks it would be better to have it decoupled from the Indeni UI, just so that problems there don’t affect the tool.