Metrics Publishing
E-commerce metrics currently have two supported publishing transports:
- AWS/Python producers publish to
EcomMetricsQueue. Thecomponents/ecom_metrics_consumerservice consumes those events and remote-writes to Amazon Managed Prometheus. components/search_proxycan publish request metrics through the CloudflareMETRICS_GATEWAYService Binding. This is the high-volume Worker producer and the first target for reducing SQS cost.
Python/AWS producers do not have a direct Cloudflare Service Binding path yet. Keep them on the existing SQS publishers until a supported AWS route exists.
Current SQS Path
Most e-commerce metrics are sent to EcomMetricsQueue and flushed by
components/ecom_metrics_consumer to AMP.
Current event shapes include:
- typed events such as
request,controller_request,indexer_job,agentic_stream,onboarding_progress, andfailure; - generic
type: "timeseries"events produced bycomponents/metrics_publisher.SqsMetricsPublisher.
The Python MetricsPublisher interface accepts pre-formatted Prometheus
TimeSeries values. SqsMetricsPublisher serializes those as:
{
"type": "timeseries",
"env": "staging",
"merge": "sum",
"series": []
}
The consumer coalesces additive series in the sum bucket and gauges in the
last bucket before remote-writing to AMP.
Search Proxy Gateway Worker
search_proxy still builds the same request metric event used by the SQS path.
When METRICS_GATEWAY is bound, it converts that event to the same Prometheus
series produced by components/ecom_metrics_consumer:
ecom_api_request_latency_milliseconds_bucketecom_api_request_latency_milliseconds_sumecom_api_request_latency_milliseconds_countecom_api_cache_hit_countecom_api_cache_miss_count
It sends those datapoints through the gateway worker batch API:
env.METRICS_GATEWAY.reportAndConfirm([
{
name: "ecom_api_request_latency_milliseconds_count",
type: "counter",
value: 1,
labels: { env: "dev-feat-grpc-push" },
},
]);
Zero-valued counter increments are omitted from the gateway batch. They are no-ops for additive counters, and the SQS fallback still preserves the original request event contract.
Wrangler only binds METRICS_GATEWAY for dev search proxy deployments:
| Environment | Service | Entrypoint |
|---|---|---|
| dev cells | cell1-dev-metrics-gateway-worker | MetricsReporter |
local dev env.dev | cell1-dev-metrics-gateway-worker | MetricsReporter |
Staging, preprod, and prod do not declare this Service Binding yet. They keep using the existing SQS path until the gateway/ingest path is ready for those environments.
If the binding is absent locally, or if the gateway worker call fails at
runtime, search_proxy falls back to the existing SQS path and logs the gateway
failure. Metrics publishing remains best-effort and must not affect the request
flow.
Ingest Service
The future ingest service will receive single-datapoint requests behind the
metrics gateway worker. search_proxy does not call that ingest service
directly and does not know its hostname, credentials, or request format.
Until that service exists, the gateway worker must own any forwarding or compatibility behavior needed to keep the published Prometheus metrics equivalent to the current SQS consumer output.
Tunnel
The metrics tunnel is not owned by search_proxy.
The metrics gateway worker owns any tunnel/private-connectivity setup required
to reach an ingest service in the cluster. This repo's search proxy config only
declares the METRICS_GATEWAY Service Binding and keeps SQS as the fallback
transport.
Validation
Relevant checks:
cd components/search_proxy
npm test -- metrics.test.ts
npm run lint