Diagnostics

Guides for inspecting systems and finding root cause — where to look, which commands and queries to run, and what healthy vs. unhealthy signals look like.

Diagnostics differ from incident runbooks: they don't assume something is on fire. They're the reusable "how do I see what's going on in X" guides that incident and maintenance runbooks link to.

A good diagnostics guide covers:

Access — how to reach the system (dashboards, logs, a shell, a DB client).
Key signals — the handful of metrics/log lines that actually matter, with expected ranges.
Drill-downs — common questions ("is tenant X throttled?", "why is ingest lagging?") and the exact query/command to answer each.

Available guides

No Library-native diagnostics guides yet. For service-specific inspection guides, see the cross-repo Runbooks index and the Systems overviews. Add cross-cutting diagnostics here (e.g. tracing a request across control plane → data plane).

Available guides​

Available guides