Runbook: grpc_metrics_gateway_workers_vpc_setup
How to finish wiring the gRPC metrics gateway's private Cloudflare Tunnel so
that it is reachable only from our Cloudflare Workers, via a Workers VPC
service binding. This covers the one step OpenTofu (v5 Cloudflare provider)
cannot express — creating the VPC service object in the Cloudflare
dashboard / API. The Worker-side vpc_service binding is OpenTofu-managed (the
metrics_gateway_worker deploys with the binding applied in
metrics_gateway_worker.tf); you just supply the service id via tfvars.
Background
infra/multitenant_eks_cluster/grpc_metrics_gateway_tunnel.tf already creates,
when enable_grpc_metrics_gateway_tunnel = true:
- the Cloudflare Tunnel + a
cloudflaredDeployment (2 replicas) in themetrics-gatewaynamespace, - a dedicated virtual network (the isolation boundary),
- optionally an IP-based tunnel route (only if
grpc_metrics_gateway_tunnel_route_networkis set).
There is no public hostname and no public DNS record — nothing is publicly resolvable. The remaining work is to let a specific Worker reach the gateway across that private network.
The gateway itself is reachable inside the cluster at:
grpc-metrics-gateway.metrics-gateway.svc.cluster.local:8080 # gRPC-Web (HTTP)
(8080 serves the gRPC-Web wrapper; clients call the path
/marqo.metrics.v1.MetricsIngest/Push.)
Prerequisites
-
enable_grpc_metrics_gateway = trueandenable_grpc_metrics_gateway_tunnel = trueapplied for the cell. -
cloudflaredconnector healthy and on a recent enough version. Creating the VPC service fails withTunnel's cloudflared version … is too old (requires >= 2025.7.0)if the connector is older.var.cloudflared_imagedefaults to a compatible tag; if you bump it, re-apply and let the Deployment roll so the tunnel reports the new version to Cloudflare before you create the VPC service.aws eks update-kubeconfig --region <region> --name <cluster>kubectl -n metrics-gateway get pods -l app=cloudflared-grpc-metrics-gatewaykubectl -n metrics-gateway logs -l app=cloudflared-grpc-metrics-gateway --tail=50# expect "Registered tunnel connection" lines, no credential errors# check the running version:kubectl -n metrics-gateway get deploy cloudflared-grpc-metrics-gateway \-o jsonpath='{.spec.template.spec.containers[0].image}{"\n"}' -
Grab the IDs Terraform exported (run in
infra/multitenant_eks_cluster):tofu output grpc_metrics_gateway_tunnel_idtofu output grpc_metrics_gateway_tunnel_virtual_network_idtofu output grpc_metrics_gateway_tunnel_virtual_network_name
Choose a model
Path A — Workers VPC service (recommended). A VPC service points the Worker
at the gateway via the tunnel; no CIDR routing, no warp-routing toggle. Leave
grpc_metrics_gateway_tunnel_route_network empty.
Path B — IP-based private route. Set
grpc_metrics_gateway_tunnel_route_network to the gateway's ClusterIP as a
/32 (IPv4) or /128 (IPv6), enable warp-routing on the tunnel, and have the
Worker target that IP. Use only if the VPC service product is unavailable. Get
the ClusterIP with:
kubectl -n metrics-gateway get svc grpc-metrics-gateway -o jsonpath='{.spec.clusterIPs}{"\n"}'
The exact Cloudflare console/API surface for "VPC services" / "Workers VPC" is evolving (beta). Names below may differ slightly — confirm against current Cloudflare docs. The Terraform-managed tunnel + virtual network IDs above are the stable inputs every variant needs.
Steps (Path A)
1. Create a VPC service for the gateway
In the Cloudflare dashboard: Zero Trust → Networks → VPC services (or via the API), create a service that references:
- Tunnel: the
grpc_metrics_gateway_tunnel_idfrom above. - Virtual network: the
grpc_metrics_gateway_tunnel_virtual_network_id. - Target (HTTP):
grpc-metrics-gateway.metrics-gateway.svc.cluster.localport8080. cloudflared runs in-cluster and resolves this via cluster DNS.
Record the resulting VPC service ID.
2. Bind the VPC service to the Worker
The gateway-facing Worker is components/metrics_gateway_worker, deployed by
OpenTofu (cloudflare_workers_script in metrics_gateway_worker.tf) with
the vpc_service binding (name = "METRICS_GATEWAY") applied in OpenTofu.
So there is no manual wrangler step in production — just set the VPC service id
from step 1 in the env's tfvars and apply:
# config/vars/<env>.tfvars
enable_metrics_gateway_cloudflare_worker = true
metrics_gateway_worker_vpc_service_id = "<vpc-service-id-from-step-1>"
For local wrangler dev only, the same id goes in
components/metrics_gateway_worker/wrangler.jsonc under vpc_services
(that block is ignored by the OpenTofu deploy).
Other Workers that want to emit metrics bind to this Worker over RPC (a
servicesbinding withentrypoint = "MetricsReporter"), not a VPC service — onlymetrics_gateway_workerholds the tunnel binding.
3. Call the gateway from the Worker
components/metrics_gateway_workeralready implements this encoder (src/grpcweb.ts) and exposes it as theMetricsReporterRPC entrypoint, so most callers just bind that Worker (aservicesbinding) and callincr/observe. The wire-level detail below is reference for that implementation, or for hand-rolling a direct gRPC-Web caller.
The binding exposes a fetch. A call is three layers: protobuf-encode a
PushRequest, wrap it in a single gRPC-Web length-prefixed frame, then POST it
to the RPC path /<package>.<Service>/<Method>
(/marqo.metrics.v1.MetricsIngest/Push) with content-type
application/grpc-web+proto. The gateway's gRPC-Web wrapper (port 8080)
dispatches it to the same handler the in-cluster gRPC listener uses.
Wire layout, straight from components/grpc_metrics_gateway/proto/metrics.proto:
| Message | Field | # | proto type | wire type |
|---|---|---|---|---|
PushRequest | metrics (repeated Metric) | 1 | message | 2 (len-delimited) |
Metric | name | 1 | string | 2 |
Metric | help (optional) | 2 | string | 2 |
Metric | type | 3 | enum (COUNTER=1, HISTOGRAM=3) | 0 (varint) |
Metric | labels (map<string,string>) | 4 | message entry {key=1,value=2} | 2 |
Metric | value | 5 | double | 1 (64-bit LE) |
type must be 1 (counter) or 3 (histogram) — 0/UNSPECIFIED and 2/GAUGE
are rejected. For a counter, value is the increment; for a histogram, value
is the observed sample (e.g. a latency in the same unit as the gateway's
buckets). The gateway prepends the pushed_ prefix, so request_latency_ms is
scraped as pushed_request_latency_ms_{bucket,sum,count}.
Dependency-free encoder + framing + send (drop into the Worker):
type Sample = {
name: string;
type: 1 | 3; // 1 = counter (value is ADDED), 3 = histogram (value is OBSERVED into buckets)
value: number;
labels?: Record<string, string>;
help?: string;
};
const TEXT = new TextEncoder();
// ── protobuf primitives (proto3 wire format) ────────────────────────────────
function putVarint(out: number[], v: number) {
// unsigned LEB128; v is a non-negative safe integer (tag, length, enum)
while (v > 0x7f) {
out.push((v & 0x7f) | 0x80);
v = Math.floor(v / 128);
}
out.push(v & 0x7f);
}
function putTag(out: number[], field: number, wire: number) {
putVarint(out, (field << 3) | wire);
}
function putLenDelim(out: number[], field: number, bytes: ArrayLike<number>) {
putTag(out, field, 2);
putVarint(out, bytes.length);
for (let i = 0; i < bytes.length; i++) out.push(bytes[i]);
}
function putString(out: number[], field: number, s: string) {
putLenDelim(out, field, TEXT.encode(s));
}
function putDouble(out: number[], field: number, n: number) {
putTag(out, field, 1);
const b = new Uint8Array(8);
new DataView(b.buffer).setFloat64(0, n, true); // IEEE-754, little-endian
for (const x of b) out.push(x);
}
function encodeMetric(m: Sample): number[] {
const out: number[] = [];
putString(out, 1, m.name);
if (m.help) putString(out, 2, m.help);
putTag(out, 3, 0);
putVarint(out, m.type); // enum as varint
for (const [k, v] of Object.entries(m.labels ?? {})) {
const entry: number[] = []; // each map entry is a message {key=1, value=2}
putString(entry, 1, k);
putString(entry, 2, v);
putLenDelim(out, 4, entry);
}
putDouble(out, 5, m.value);
return out;
}
function encodePushRequest(samples: Sample[]): Uint8Array {
const out: number[] = [];
for (const m of samples) putLenDelim(out, 1, encodeMetric(m));
return Uint8Array.from(out);
}
// ── gRPC-Web frame: [flags=0x00][uint32 length, big-endian][protobuf] ────────
function grpcWebFrame(payload: Uint8Array): Uint8Array {
const frame = new Uint8Array(5 + payload.length);
frame[0] = 0x00; // uncompressed data frame
new DataView(frame.buffer).setUint32(1, payload.length, false); // big-endian
frame.set(payload, 5);
return frame;
}
// ── send through the VPC binding ─────────────────────────────────────────────
async function pushMetrics(env: Env, samples: Sample[]): Promise<void> {
const body = grpcWebFrame(encodePushRequest(samples));
const res = await env.METRICS_GATEWAY.fetch(
// host is ignored by the binding; only the RPC path matters
"http://gateway/marqo.metrics.v1.MetricsIngest/Push",
{
method: "POST",
headers: {
"content-type": "application/grpc-web+proto",
"x-grpc-web": "1",
},
body,
},
);
// gRPC-Web returns HTTP 200 even on gRPC errors; the authoritative result is
// the grpc-status trailer (0 = OK), carried in a trailer frame (flags MSB set,
// 0x80) at the end of the body. Best-effort: log and move on.
if (res.status !== 200) {
console.warn(`metrics push transport error: HTTP ${res.status}`);
return;
}
const raw = new Uint8Array(await res.arrayBuffer());
let off = 0;
let grpcStatus = 0;
while (off + 5 <= raw.length) {
const flags = raw[off];
const len = new DataView(raw.buffer, raw.byteOffset + off + 1, 4).getUint32(0, false);
const payload = raw.subarray(off + 5, off + 5 + len);
off += 5 + len;
if (flags & 0x80) {
const m = /grpc-status:\s*(\d+)/i.exec(new TextDecoder().decode(payload));
if (m) grpcStatus = Number(m[1]);
}
// else: `payload` is the serialized PushResponse (accepted=field 1,
// rejected=field 2) — decode it only if you need the counts.
}
if (grpcStatus !== 0) {
console.warn(`metrics push grpc-status ${grpcStatus}`);
}
}
// usage
await pushMetrics(env, [
{ name: "requests_total", type: 1, value: 1, labels: { route: "/search", method: "GET" } },
{ name: "request_latency_ms", type: 3, value: 42.5, labels: { route: "/search" } },
]);
Treat the push as best-effort — do not block request handling on it (wrap the call so a failure can't fail the Worker's main response).
If you'd rather not hand-roll the encoder, generate a client from
metrics.proto with protobuf-es
(@bufbuild/protobuf) and feed its serialized bytes to grpcWebFrame() — the
framing and the fetch are identical.
4. Deploy the Worker
metrics_gateway_worker is deployed by OpenTofu (no manual wrangler deploy for production). With the service id set in tfvars (step 2), apply the
stack — the build bundles the Worker and cloudflare_workers_script deploys it
with the VPC binding:
cd infra/multitenant_eks_cluster
./scripts/deploy.sh --env dev # or the target env
Verification
-
From the Worker (or
wrangler devbound to the same VPC service), issue one push and confirm a200withaccepted: 1. -
Confirm the series landed (note the
pushed_prefix), from inside a gateway pod:kubectl -n metrics-gateway exec deploy/grpc-metrics-gateway -- \wget -qO- http://localhost:8080/metrics | grep pushed_ -
Confirm the gateway is not publicly reachable: there should be no public DNS record, and there is no public hostname on the tunnel. A plain
curl https://<anything>cannot reach it — only the bound Worker can.
Logpush: shipping Worker logs to the Parquet pipeline
The metrics-gateway Worker has in-dashboard observability off and Cloudflare
Logpush on (logpush = true on the cloudflare_workers_script in
metrics_gateway_worker.tf). That flag only enables the feature on the script; a
Logpush job (the S3 destination + ScriptName filter) must be created
out of band, the same way the global-worker does it. Logs only reach
Parquet/Athena/Grafana in an env whose stack has
enable_cloudflare_log_analytics = true.
-
Create/update the job via the manual workflow
.github/workflows/cloudflare-upsert-metrics-gateway-worker-logpush-job.yml(Actions → "Cloudflare Upsert Logpush Job - Metrics Gateway Worker" → Run):environment: the target env (e.g.prod2).worker_names: the deployed script name(s),<cell>-<env>-metrics-gateway-worker(e.g.cell1-prod2-metrics-gateway-worker). Outputmetrics_gateway_worker_script_nameis the exact name.job_name: a new name, e.g.cell1-prod2-metrics-gateway-worker-logs. This is also the S3 prefix and the Grafana dropdown value.sampling_rate: keep low — this Worker is on the metrics hot path (default0.01= 1%).
-
Register the job name so the rest of the pipeline picks it up. Append it to
cloudflare_logpush_job_namesin that env'sconfig/vars/<env>.tfvarsandtofu apply:cloudflare_logpush_job_names = ["...existing...", "cell1-prod2-metrics-gateway-worker-logs"]This adds the converter Lambda's S3-event filter prefix, includes the job in the Glue compaction
--worker_names, and adds it to the Grafana "Logpush Job Name" dropdown. If you skip this, logs land in S3 but are never converted to Parquet or shown in Grafana.
Rollback / disable
Set enable_grpc_metrics_gateway_tunnel = false (and
enable_metrics_gateway_cloudflare_worker = false) and apply: this removes the
cloudflared Deployment, the tunnel, the virtual network, the token Secret, and
the Worker (with its VPC binding). Then delete the VPC service created in step 1
in the Cloudflare dashboard/API.
Notes
- cloudflared connectors are stateless; scale
grpc_metrics_gateway_tunnel_replicasfreely. They all register against the same tunnel ID. - This path intentionally has no per-request app-layer auth (edge/private-network
enforcement only). If you later want defense in depth, add a Cloudflare Access
service-token policy using the pattern in
infra/multitenant_eks_cluster/modules/media_proxy_cloudflare_worker/main.tf.