Skip to main content

Admin Worker Failed Requests Exceed 1 in 5m

This runbook covers the Grafana alert Admin Worker Failed Requests Exceed 1 in 5m.

The alert fires when the Cloudflare admin_worker records more than 1 server-side failed request in 5 minutes. The admin worker backs internal Marqo Cloud support and operations workflows, so failures can block operators even when customer-facing ecom API traffic is healthy.

Triage

  1. Check the alert time window and confirm whether failures are ongoing.
  2. Tail the production admin worker logs and identify the failing route, status, and downstream dependency.
  3. Check whether the issue is isolated to one admin action or affects the whole worker.
  4. If the failing route proxies to Admin Lambda or controller APIs, check the downstream Lambda/API logs with the same request IDs or timestamps.
  5. Check for recent admin worker, admin Lambda, controller, or Cloudflare Access changes.

Useful starting point:

npx wrangler tail prod-admin-api

See Cloudflare Workers, Lambda, and Admin Platform.

Remediation

  • If a deploy caused the failures, roll back or patch the admin worker/admin Lambda change.
  • If the failure is downstream, fix the downstream service or route around the broken operation if there is an approved manual path.
  • If Cloudflare Access/auth is failing, check access policy and token/session behavior before changing application code.
  • If the route is low-impact but noisy, confirm with the owning team before muting or changing thresholds.

Validation

  • The failing admin action succeeds.
  • The admin worker produces no new 5XXs for the route.
  • The alert clears after the next evaluation window.