Skip to main content

Ecom API 4XX Rate Alerts

This runbook covers these Grafana alerts:

  • Ecom API 4XX Rate Exceeds 25%
  • Ecom API 4XX Rate Anomalous vs Baseline

Both alerts mean one account/index is returning unusually many 4XX responses from the ecom API. The fixed-threshold alert pages when more than 25% of recent requests are 4XX with enough absolute volume. The anomaly alert pages when the 4XX rate is high relative to that account/index's recent baseline.

Triage

  1. Use the alert labels to identify label_system_account_id and label_index_name.
  2. Check the prod ecom API worker logs in Cloudflare: prod-ecom-api logs.
  3. Filter around the alert window and the affected index. Preserve a few failing requests, statuses, and response bodies before changing anything.
  4. Decide whether the failures are customer input or platform behavior:
    • Customer/input examples: missing API key, malformed body, missing required fields, invalid settings supplied by the customer.
    • Platform examples: API key lookup failures, bad settings exported to KV, validation bugs, alias/index routing bugs, or a deploy causing valid requests to be rejected.

Remediation

  • If the customer is sending invalid requests, confirm the pattern is persistent and ask the account manager to contact them with examples.
  • If settings or alias config is wrong, use Edit ecommerce index settings or the relevant controller/admin path. Avoid direct DDB edits unless the approved path cannot work.
  • If Cloudflare KV is stale or wrong, follow Settings Sync, then re-export or wait for the exporter as appropriate.
  • If this started after a deploy, compare with the previous worker version and roll back or patch the bad deploy.

Validation

  • The same request shape should stop returning unexpected 4XX.
  • The alert should resolve after one or two evaluation windows.
  • Check whether the success-rate alert is also firing. A 4XX-only problem is not counted as failure by the success-rate alert, but the root cause can still degrade search/indexing behavior.