"Expensive" IND scripts

"Expensive" IND scripts

Yesterday, after we saw that the indeni machine have high CPU (this machine has 16 cores), we spent some time profiling indeni collector on one of our coustomers that runs checkpoint devices. Once we were able to setup the profiling tools, we quickly saw that the most expensive CPU operations are IND scripts that parse huge amount (200MB) of data.

Image title

The short term solution was to disable some of them (IND script) and increase the interval (and the rule interval) for the others.

Disabled netobj_objects-clustersC.ind and log-server-connection.ind by add do-not-run:"true" to the requires section;
Increase the interval from 5 min to 15 min for vpn-check-tunnels-novsx.ind and vpn-check-tunnels-vsx.ind

Once we applied the changes the CPU consumption dropped from 90%-100% to 5%-30% and spikes to 75% at the 15 min interval

Beside the top 4 expensive IND script we have additional IND scripts that are also consuming a lot of CPU:

Image title

The question to you, knowledge expert is: How and what can we do in order ensure that exisiting and newly added IND script don't choke the CPU (both from the input size that the parser need to handle and the preformance of the parser)

Your thoughts are welcome!

A fix for these scripts are in the works, which should hopefully fix the issue.

However it is an interesting thought on how to detect this earlier. One part is that our lab environment does not have as many objects as customer environments. The other is that we do not have any procedure to check how heavy a new script is on the Indeni server.

This brings me back to a suggestion I had in one of my IKE meetings. I don’t know if this is helpful in your scenario at all but definitely related.

My suggestion was that to remain scalable on a large install base that we should take better advantage of the requirements section of the IND scripts. For example if the device is PAN-OS and it is looking for X then don’t run a script against it if say it only exists in hardware models and not a VM model. This could cut back drastically on the processing time and parsing execution right?

Another thing I have recently discovered is that on a PAN-OS device show system status has a very large amount of data in it and is being polled already for things like debug status. We maybe don’t need a script looking specifically at power supply status if that data was included already in the data response from the debug poll.

is there anything on CheckPoint that might be duplicating efforts like that?

A few things that I have thought of that we can use to reduce the risk in both Indeni and the device itself.

Test against a larger set of data

In one of my recent scripts I had a command which could potentially be a bit demanding. To test it I generated configuration roughly at the size of one of our production environments and tested that way.

If the device has any kind of API, or command-line then generating objects with scripts is better than nothing. Sure, it takes some time but ensuring quality could be worth it.

Consider which endpoints to use

With F5 one can either list, or show configuration. Listing shows the actual configuration and show shows the status of the configured objects. In some cases both of these could work as well, but list uses much less CPU.

Don't transfer as much data between the device and indeni

  • What Johnathan is doing with his scripts is good way when you have access to bash. Parse the data server-side and only send what you need.
  • Is there an alernate file with less data that you can parse?
  • Some API's has the option to filter the data. F5 for example uses part of the OData protocol which enables users to use $select in order to discard information that they don't need.
  • When parsing large files in a data structure, try to minimize the keys/values if you can. Example could be using "h: indeni.com" instead of "hostname: indeni.com".

Use next in the awk scripts

This was an eye opener (credit Vasilis!) for me. Using "next" causes awk to stop parsing for additional matches and makes it proceed to the next line. Doing this in conjunction with putting the most frequent matches at the top of the script saves time (and cycles).

Test your regular expressions

Recently I was notified that one of my scripts had an abomination in it in the form of a silly regular expression. It was too complex and slow.

There's tools to test and measure regular expressions online that's really useful. Example being https://regex101.com/

Consider which functions you use in the AWK scripts

Do you really need gsub, or could sub work as well? Could those nested loops be avoided? :)

Image title

I think it would be helpful to be able to run multiple commands on a device so that you can parse as much data locally on the device being monitored without having to pull it to the server. Of course, you would have to be careful not to put undue stress on the device being monitored.

Very interesting. Could someone provide instructions on how to change the polling interval of an interrogation script please ?

Do I have to copy it to a certain directory and edit it ? I assume changing it in its current directory could get overwritten in future ?