Ecom API 4XX Rate Alerts
This runbook covers these Grafana alerts:
Ecom API 4XX Rate Exceeds 25%Ecom API 4XX Rate Anomalous vs Baseline
Both alerts mean one account/index is returning unusually many 4XX responses from the ecom API. The fixed-threshold alert pages when more than 25% of recent requests are 4XX with enough absolute volume. The anomaly alert pages when the 4XX rate is high relative to that account/index's recent baseline.
Triage
- Use the alert labels to identify
label_system_account_idandlabel_index_name. - Check the prod ecom API worker logs in Cloudflare: prod-ecom-api logs.
- Filter around the alert window and the affected index. Preserve a few failing requests, statuses, and response bodies before changing anything.
- Decide whether the failures are customer input or platform behavior:
- Customer/input examples: missing API key, malformed body, missing required fields, invalid settings supplied by the customer.
- Platform examples: API key lookup failures, bad settings exported to KV, validation bugs, alias/index routing bugs, or a deploy causing valid requests to be rejected.
Remediation
- If the customer is sending invalid requests, confirm the pattern is persistent and ask the account manager to contact them with examples.
- If settings or alias config is wrong, use Edit ecommerce index settings or the relevant controller/admin path. Avoid direct DDB edits unless the approved path cannot work.
- If Cloudflare KV is stale or wrong, follow Settings Sync, then re-export or wait for the exporter as appropriate.
- If this started after a deploy, compare with the previous worker version and roll back or patch the bad deploy.
Validation
- The same request shape should stop returning unexpected 4XX.
- The alert should resolve after one or two evaluation windows.
- Check whether the success-rate alert is also firing. A 4XX-only problem is not counted as failure by the success-rate alert, but the root cause can still degrade search/indexing behavior.