Skip to main content

Runbooks

Operational runbooks for responding to incidents, diagnosing problems, and running recurring maintenance.

On call? Start with the cross-repo view.

The cross-repo Runbooks index aggregates every runbook across all service repos in one place — start there when you don't know which repo owns the problem.

Library-native vs. repo runbooks

There are two homes for runbooks, by ownership:

  • This section (docs/runbooks/) — Library-authored, cross-cutting runbooks that aren't owned by a single service repo (org-wide incident process, on-call basics, procedures spanning multiple systems).
  • Service repos — runbooks tied to one service live with that service's code and are surfaced automatically in the cross-repo Runbooks index. To change one, edit it in the owning repo.

When in doubt, put it in the owning repo. Only write it here if it genuinely spans systems or describes org-level process.

Categories

  • 🚨 Incidents — incident response and recovery.
  • 🔍 Diagnostics — how to inspect systems and find root cause.
  • 🔧 Maintenance — recurring, planned operational tasks.

Adding a runbook

Copy templates/runbook-template.md (repo root) into the right category, fill in owner / last-reviewed metadata, and keep it copy-pasteable: concrete commands, expected output, and failure modes.