Basic questions about debugging IND scripts on an indeni server

Hawkeye_Parker · September 18, 2018, 2:14am

We’re having an especially hard time debugging IND scripts for this release, so I just wanted to confirm a few assumptions, and ask a few questions.

I don’t thing complex objects and strings get written to the time series db – correct? Is there any way to query for these like we can via the metrics API with doubles?
After a server/collector restart, it seems to take somewhere between 5-10 minutes for some metrics to get written. Why is this? More generally, how does indeni decide the “order” to run scripts, especially at startup? Presumably not all scripts can run at once, so how does indeni decide which script to run, and when?
Aside from device suspension due to high CPU/memory, etc., is there any way that a script could get ‘crowded out’ by either too many scripts and/or scripts that have a really long run time? Is there any situation in which indeni might decide to ‘bypass’ a given script?

Thanks,
Hawkeye

Hawkeye_Parker · September 19, 2018, 3:57pm

Anyone? This information could really help us in debugging the current release candidate.

Ulrica_de_Fort-Menar · September 19, 2018, 8:44pm

@Alon_Ashkenazi @matanc can anyone help please

Eyal_Roth · September 21, 2018, 3:30pm

Snapshots (complex) metrics are written to the same in-memory DB to which the time-series metrics are written. There is currently no way of querying them, unless they are tagged as live-config (in which case the last value amongst the three will appear in the device information). This sounds like a valuable feature, I’ll make sure to open a ticket for it (if it isn’t already open).
As you already suspected, not all scripts run at once. In fact, most of the scripts run sequentially (not concurrently), and that is for two main reasons: (a) We keep a limited amount of open connections to the device and (b) we do not want to overload the device with commands. The order in which the monitoring commands run is completely arbitrary. Would you like to suggest a meaningful way in which we should order the execution of monitoring commands?
There is indeed the possibility in which a command might “starve” the connection to a device by taking too long to run, and thus prevent from other commands from running. We currently have only a very basic mechanism to protected against that and we are definitely thinking of how to tackle this better in the future.

(apparently numbered lists do not work here )

Eyal_Roth · September 21, 2018, 3:38pm

Added the feature request for querying snapshots as ticket IS-3546.

Hawkeye_Parker · September 21, 2018, 11:39pm

This is very helpful, Eyal: thank you for such a thorough response! Yes, we would love to be able to see complex metrics (thanks for creating that ticket).

In terms of the order of monitoring commands, I honestly haven’t thought it through. The resource-metrics are critical to monitor – maybe those are already prioritized in some way. Otherwise, off the top my head, I can’t think of any generic solution, I mean, one that’s not vendor specific; on the surface, it seems like you would have to add some kind of priority setting in the META section of the script. Honestly, I really don’t know.

Eyal_Roth · September 23, 2018, 3:01pm

It might be worth exploring a prioritization of monitoring commands in the future, but for the moment we aim to run all of them in a reasonable time without consuming too many resources (both on our end and on the device’s).

Hawkeye_Parker · September 24, 2018, 7:57pm

Added to the wiki:
https://indeni.atlassian.net/wiki/spaces/IKP/pages/76742659/Administration+and+Testing+on+a+Live+indeni+Server