Skip to main content

Ecom Index Settings Exporter Lambda Has Errors

This runbook covers the Grafana alert Ecom Index Settings Exporter Lambda Has Errors.

The alert fires when prod-EcomSettingsExporterLambda records any Lambda error in 5 minutes. The exporter reads prod-EcomIndexSettingsTable stream events and writes ecom index settings to Cloudflare KV. While it is failing, settings changes, aliases, query config changes, or API keys may not reach the ecom API worker.

Triage

  1. Check prod-EcomSettingsExporterLambda logs in the controller account around the alert window.
  2. Identify the failing system_account_id, index_name, and DDB sort key if the log includes them.
  3. Check whether the error is from DDB stream parsing, API key lookup, Pydantic/model validation, Cloudflare KV writes, or Cloudflare credentials.
  4. If a customer-facing setting changed recently, compare the DDB item with the KV value read by the search proxy.

Useful starting point:

aws logs tail /aws/lambda/prod-EcomSettingsExporterLambda --since 30m --filter-pattern "?ERROR ?Traceback ?Exception"

See Settings Sync, DynamoDB, and Cloudflare Workers.

Remediation

  • If a settings record is malformed, fix it through the admin/controller path when possible. Use direct DDB edits only with explicit approval.
  • If Cloudflare KV writes are failing, check the Cloudflare API token in Secrets Manager and Cloudflare API status/rate limits.
  • If API key lookup is failing, check the account's cell gateway and API keys service.
  • After fixing the root cause, manually re-export the affected index if the exporter supports the scoped invocation, or make a safe no-op settings update to retrigger the stream.

Example scoped event shape:

{"system_account_id": "<account-id>", "index_name": "<index-name>"}

Validation

  • The Lambda has no new errors.
  • The affected KV key contains the expected settings.
  • A request through the ecom API worker uses the updated settings.
  • Related 4XX/5XX/success-rate alerts clear if they were caused by stale or bad settings.