Checkpoint - trigger issue on process exceeding a defined memory threshold

1. What is the device type?

Product - Security Management, Multi-Domain Management, Security Gateway, VSX, ClusterXL, Cluster - 3rd party
Version - R77.20, R77.30, R80.10|

2. What is the issue that you want to automate the detection/triage of?

sk111880 - Memory leak in CPD daemon can possibly fail policy push

3. How can the issue be diagnosed?

Accept process name and memory threshold from the customer (CPD for the specific SK).
Track process memory usage using top or ps.
Another option, trigger on the continued upward trend in the process memory usage.

4. What are the Remediation Steps?

Install HF defined in
https://supportcenter.checkpoint.com/supportcenter/portal?eventSubmit_doGoviewsolutiondetails=&solutionid=sk111880

5. Are there known ways to reproduce this issue?

Hi Shouky,

Just wanted to let you know we got this request. I’ll be responding in the next few days with some more questions.

Thanks,
Hawkeye Parker

Hi Shouky,

Thank you for this feature request: sorry it’s taken me so long to get back to you. A few questions:

Regarding " 3. How can the issue be diagnosed?":

  • Is it enough to monitor just the CPD process? Or, ideally, would you like this feature to be available to any Check Point process?
  • You wrote: “Accept process name and memory threshold from the customer…”. I think I understand, but just to be very clear: you want the customer to be able to set an arbitrary, user-defined “amount of memory used threshold” for a process name; if the process memory usage is above this threshold, we would alert the customer. Is that correct?
  • Even if the customer can set a user-defined threshold, we will need a default. sk111880 mentions “~2Gb” for CPD. By default, I think we would want to alert before CPD actually crashes, so I think 1.75 GB (for CPD) could work. Does sound reasonable to you? Do you have any data about this? I.e., do you know (or can you check) what “normal” memory usage is for CPD when under heavy load? We always want to avoid false positives, so any real-world data would be very helpful.
  • Note that, for processes other than CPD, we can’t easily know what the default threshold should be. This may make it difficult to support this feature for processes other than CPD.
  • You wrote: “Another option, trigger on the continued upward trend in the process memory usage.” This could be difficult, because processes also consume memory when they are working properly. Theoretically, we could create an algorithm to try and guess if a process is leaking, but, at first glance, I think this approach could be prone to false positives. This could be more reason for us to focus this request only on CPD. Please let me know if you have other thoughts on this topic.

Thanks,
Hawkeye

We can hardcode a specific process name but I suggest that we try to provide a solution for the general case by allowing the customer to provide the relevant process name.

The threshold should also be provided by the customer, we can set the default to a relatively high value in order to prevent it from triggering by mistake.

Checking for a simple threshold should be fine.