Originally published at: Expert Review of Runbook Templates | Indeni
Infrastructure is changing so rapidly, that it is difficult to keep documentation up to date. To improve incident response times and reduce errors in the troubleshooting process it is critical to have operating steps documented. Before you can gather the information, it is important to have a solid template as a starting point. What background information is important to include in a runbook? What is a must have vs. nice to have? We asked our community of certified IT professionals for their review of free runbook templates. Check out what they said:
Templates
- THWACK member
- Skeleton Thatcher
- Indeni
Template #1 by THWACK
What I like about it:
Tells you in plain english what the issue is- Description of the problem
- What the symptoms are
- What the recovery process is
- Provides links to review it in the related operation tool dashboard
What’s missing
- How was the issue uncovered, what commands did the tool use?
- How major of an issue is this?
- What could the issue be related to?
Template #2 by Skelton Thatcher Consulting
![|700x496](upload://vWRpdDjJeEEfubrSgogM7QqCYuH.png)What I like about it
Provides background and contextual information about the system or service affected- Background
- What is the system or service
- What part of the business is impacted
- What are the expectations for availability, performance and our SLAs
- Expected traffic and load
- Required resources
- Security and access control
- How security validation on ongoing basis
- How system configuration is managed
- Which parts of the system are backed up
- Tools
- What tools are available to help operate the system?
- What significant metrics will be generated?
- How does the system report its own health?
- Does it perform routine and sanity checks?
- Contextual
- What are the contributing applications, daemons, services, middleware
- Infrastructure and network design - What servers, containers, schedulers, devices, vLANs, firewalls, etc. are needed?
- Differences between Production/Live and other environments
- Restore procedures
- Operational instructions - Deployment, Batch processing
- How to perform maintenance tasks such as patching, daylight-saving time changes, Data clear down, Log rotation
- Failover and Recovery procedures - What needs to happen when parts of the system are failed over to standby systems? What needs to during recovery?
What’s missing
- When there is an issue, what commands we’re using by those tools to identify it?
Template #3 by Indeni
![|1080x709](upload://12ZBwdbgYwZjksY5HYIPOZFU3PL.png)What our community likes about it:
- Tells you in plain english what the issue is
- Description of the problem
- What the symptoms are
- What the recovery or remediation process is
- Provides visibility into the commands that are used
- What metrics does it inspect
- What are the rules, or thresholds that caused the notification to be generated
- Tells you how else you could of found the problem
- Are written in collaboration between engineering, IT operations and a subject matter expert from the Indeni Crowd Community.
- Scripts are continuously updated
In Summary
Great runbook templates must include three things- Written in collaboration between the subject matter expert and IT operations
-
Are written for humans, and machines
- Provide readable summaries of the issue that has occurred, or about to occur.
- Simple instructions to resolve the problem
- Give visibility into the commands used so that it can be:
- Edited by an individual
- automated by a machine
- Are continuously kept up to date
Interested in automating runbook tasks?
Download Indeni and connect to up to 5 devices for free when you engage in the Indeni Crowd.If you found this article to be helpful, please share with your social networks using the buttons above. If you have feedback or other best practices you use please comment below. Thanks!