Admin Worker Failed Requests Exceed 1 in 5m
This runbook covers the Grafana alert Admin Worker Failed Requests Exceed 1 in 5m.
The alert fires when the Cloudflare admin_worker records more than 1 server-side failed request in 5 minutes. The admin worker backs internal Marqo Cloud support and operations workflows, so failures can block operators even when customer-facing ecom API traffic is healthy.
Triage
- Check the alert time window and confirm whether failures are ongoing.
- Tail the production admin worker logs and identify the failing route, status, and downstream dependency.
- Check whether the issue is isolated to one admin action or affects the whole worker.
- If the failing route proxies to Admin Lambda or controller APIs, check the downstream Lambda/API logs with the same request IDs or timestamps.
- Check for recent admin worker, admin Lambda, controller, or Cloudflare Access changes.
Useful starting point:
npx wrangler tail prod-admin-api
See Cloudflare Workers, Lambda, and Admin Platform.
Remediation
- If a deploy caused the failures, roll back or patch the admin worker/admin Lambda change.
- If the failure is downstream, fix the downstream service or route around the broken operation if there is an approved manual path.
- If Cloudflare Access/auth is failing, check access policy and token/session behavior before changing application code.
- If the route is low-impact but noisy, confirm with the owning team before muting or changing thresholds.
Validation
- The failing admin action succeeds.
- The admin worker produces no new 5XXs for the route.
- The alert clears after the next evaluation window.