Palo Alto cluster interface monitoring head aches

Patrik_Jonsson · February 13, 2018, 7:08am

We've run into an issue when monitoring Palo Alto clusters. When the firewalls are clustered the passive node lists the "floating" interfaces as down.

So ideally we don't want to alert for interface statuses on the passive devices.

Was thinking of doing a multistep script where the first step is to check if the device is active and the second is to check the interface states. If the dynamic variable from the first step is "Passive" then we'd set all the interfaces on that device to up.

The issue though, is that the script actually already is a multi-step script and afaik I can't carry over two dynamic variables right?

Current script looks like this:

List all interfaces
For each interface from step 1, get a bunch of metrics

I don't think this is possible, but could we do:

Check if the device is active or passive
List all interfaces
For each interface from step 1, get a bunch of metrics, but always write up is the result from step 1 is active.

Another option could be to modify the rules to only alert on interface states if the device is active.

Any other ideas?

Eyal_Roth · February 15, 2018, 11:11am

Well, you could actually use as many dynamic variables as you'd like; the trick about them is that the "end result" (which is used in the next step) is a cartesian product of their values.

Let's take a hypothetical (and not so realistic) example of a command with the first step listing all the CPUs (4) to the "cpu" variable, and all the interfaces (3) into the "nic" variable.

We have these values extracted from the first step:

cpu = [1, 2, 3, 4], nic = [et0, eth1, eth2]

What would happen now is a cartesian product between those two sets of values, so we'd get 12 (4 times 3) maps of dynamic variables:

- {cpu: 1, nic: eth0}, {cpu: 1, nic: eth1}, {cpu: 1, nic: eth2}

- {cpu: 2, nic: eth0}, {cpu: 2, nic: eth1}, {cpu: 2, nic: eth2}

- {cpu: 3, nic: eth0}, {cpu: 3, nic: eth1}, {cpu: 3, nic: eth2}

- {cpu: 4, nic: eth0}, {cpu: 4, nic: eth1}, {cpu: 4, nic: eth2}

This behavior doesn't always fit the use-case at hand, and we'd might change it comes a need, but in the case you presented it seems as if this could be useful. In the case of the PA clusters each device may have multiple interfaces, but it can only have one status at a time (either active or passive).

The sets of values generated by the first step of the command would look something like so:

nic = [eth0, eth2, eth3], is_active = [1]

The "is_active" variable will always have only one value (either 1 or 0), which makes the cartesian product of the two sets to result in as many variable maps as there are interfaces (3):

- {nic: eth0, is_active: 1}, {nic: eth1, is_active: 1}, {nic: eth2, is_active: 1}

That being said, I believe that this decision -- whether to act on a situation in which an interface is down -- is more appropriate to be made by a rule and not by a collection command / ind script. The collection commands should focus on collecting the data -- by means of remote operations and the convertion / parsing of their output into indeni's data types -- while the rules should convey the decision making logic.

In other words, I believe it's better if we just write a separate command to collect the (snapshot / complex) metric of whether a device is active or passive, and let the rule decide whether to alert on a device's interfaces or not.