Identify a significant spike in connection counts within the last hour

1. What is the device type?

CheckPoint GAiA (Any platform)

2. What is the issue that you want to automate the detection/triage of?

Identify when the connection count spikes significantly (15% -> 60% usage) within the last hour.

3. How can the issue be diagnosed?

Regularly track the connection count as a % of the total usage (current/limit) and identify when the % increases significantly.

4. What are the Remediation Steps?

Investigate if the connection growth was expected/organic. Otherwise, Identify if there are significant number of incomplete sessions.

5. Are there known ways to reproduce this issue?

N/A

Hi Charles,

Thanks for the request! A few questions:

This basically makes sense, but we need to translate this into a generic condition. What do you think is an approx. growth rate under which we should alert? A ‘spike’ implies a significant increase over a short period of time. What are the boundaries? E.g., anything greater than a percent difference of 20 (%10 -> %30, %20 -> %40) over 5 mins? Or…?

We can check the connection count percent usage as often as every minute. Once we detect this spike growth rate, we need to figure out if we should alert right away, or if we should wait a few minutes before alerting. I.e., maybe a really short spike is ok, so we don’t want to alert unnecessarily. WDYT? Should we alert as soon as we detect? Or after x mins?

We also need to determine when the alert should be resolved. The simple way would be: once the alert growth rate stops (i.e., the spike stops spiking), we resolve the alert and move it into cooldown. Does that sound ok?

Can you give any more detail here? How would we know if the growth was “expected/organic”? We can research this, but any ideas/information you have would be helpful.

tx,
Hawkeye Parker
Indeni Project Manager