Skip to main content

Runbook: grpc_metrics_gateway_workers_vpc_setup

How to finish wiring the gRPC metrics gateway's private Cloudflare Tunnel so that it is reachable only from our Cloudflare Workers, via a Workers VPC service binding. This covers the one step OpenTofu (v5 Cloudflare provider) cannot express — creating the VPC service object in the Cloudflare dashboard / API. The Worker-side vpc_service binding is OpenTofu-managed (the metrics_gateway_worker deploys with the binding applied in metrics_gateway_worker.tf); you just supply the service id via tfvars.

Background

infra/multitenant_eks_cluster/grpc_metrics_gateway_tunnel.tf already creates, when enable_grpc_metrics_gateway_tunnel = true:

  • the Cloudflare Tunnel + a cloudflared Deployment (2 replicas) in the metrics-gateway namespace,
  • a dedicated virtual network (the isolation boundary),
  • optionally an IP-based tunnel route (only if grpc_metrics_gateway_tunnel_route_network is set).

There is no public hostname and no public DNS record — nothing is publicly resolvable. The remaining work is to let a specific Worker reach the gateway across that private network.

The gateway itself is reachable inside the cluster at:

grpc-metrics-gateway.metrics-gateway.svc.cluster.local:8080 # gRPC-Web (HTTP)

(8080 serves the gRPC-Web wrapper; clients call the path /marqo.metrics.v1.MetricsIngest/Push.)

Prerequisites

  • enable_grpc_metrics_gateway = true and enable_grpc_metrics_gateway_tunnel = true applied for the cell.

  • cloudflared connector healthy and on a recent enough version. Creating the VPC service fails with Tunnel's cloudflared version … is too old (requires >= 2025.7.0) if the connector is older. var.cloudflared_image defaults to a compatible tag; if you bump it, re-apply and let the Deployment roll so the tunnel reports the new version to Cloudflare before you create the VPC service.

    aws eks update-kubeconfig --region <region> --name <cluster>
    kubectl -n metrics-gateway get pods -l app=cloudflared-grpc-metrics-gateway
    kubectl -n metrics-gateway logs -l app=cloudflared-grpc-metrics-gateway --tail=50
    # expect "Registered tunnel connection" lines, no credential errors
    # check the running version:
    kubectl -n metrics-gateway get deploy cloudflared-grpc-metrics-gateway \
    -o jsonpath='{.spec.template.spec.containers[0].image}{"\n"}'
  • Grab the IDs Terraform exported (run in infra/multitenant_eks_cluster):

    tofu output grpc_metrics_gateway_tunnel_id
    tofu output grpc_metrics_gateway_tunnel_virtual_network_id
    tofu output grpc_metrics_gateway_tunnel_virtual_network_name

Choose a model

Path A — Workers VPC service (recommended). A VPC service points the Worker at the gateway via the tunnel; no CIDR routing, no warp-routing toggle. Leave grpc_metrics_gateway_tunnel_route_network empty.

Path B — IP-based private route. Set grpc_metrics_gateway_tunnel_route_network to the gateway's ClusterIP as a /32 (IPv4) or /128 (IPv6), enable warp-routing on the tunnel, and have the Worker target that IP. Use only if the VPC service product is unavailable. Get the ClusterIP with:

kubectl -n metrics-gateway get svc grpc-metrics-gateway -o jsonpath='{.spec.clusterIPs}{"\n"}'

The exact Cloudflare console/API surface for "VPC services" / "Workers VPC" is evolving (beta). Names below may differ slightly — confirm against current Cloudflare docs. The Terraform-managed tunnel + virtual network IDs above are the stable inputs every variant needs.

Steps (Path A)

1. Create a VPC service for the gateway

In the Cloudflare dashboard: Zero Trust → Networks → VPC services (or via the API), create a service that references:

  • Tunnel: the grpc_metrics_gateway_tunnel_id from above.
  • Virtual network: the grpc_metrics_gateway_tunnel_virtual_network_id.
  • Target (HTTP): grpc-metrics-gateway.metrics-gateway.svc.cluster.local port 8080. cloudflared runs in-cluster and resolves this via cluster DNS.

Record the resulting VPC service ID.

2. Bind the VPC service to the Worker

The gateway-facing Worker is components/metrics_gateway_worker, deployed by OpenTofu (cloudflare_workers_script in metrics_gateway_worker.tf) with the vpc_service binding (name = "METRICS_GATEWAY") applied in OpenTofu. So there is no manual wrangler step in production — just set the VPC service id from step 1 in the env's tfvars and apply:

# config/vars/<env>.tfvars
enable_metrics_gateway_cloudflare_worker = true
metrics_gateway_worker_vpc_service_id = "<vpc-service-id-from-step-1>"

For local wrangler dev only, the same id goes in components/metrics_gateway_worker/wrangler.jsonc under vpc_services (that block is ignored by the OpenTofu deploy).

Other Workers that want to emit metrics bind to this Worker over RPC (a services binding with entrypoint = "MetricsReporter"), not a VPC service — only metrics_gateway_worker holds the tunnel binding.

3. Call the gateway from the Worker

components/metrics_gateway_worker already implements this encoder (src/grpcweb.ts) and exposes it as the MetricsReporter RPC entrypoint, so most callers just bind that Worker (a services binding) and call incr/observe. The wire-level detail below is reference for that implementation, or for hand-rolling a direct gRPC-Web caller.

The binding exposes a fetch. A call is three layers: protobuf-encode a PushRequest, wrap it in a single gRPC-Web length-prefixed frame, then POST it to the RPC path /<package>.<Service>/<Method> (/marqo.metrics.v1.MetricsIngest/Push) with content-type application/grpc-web+proto. The gateway's gRPC-Web wrapper (port 8080) dispatches it to the same handler the in-cluster gRPC listener uses.

Wire layout, straight from components/grpc_metrics_gateway/proto/metrics.proto:

MessageField#proto typewire type
PushRequestmetrics (repeated Metric)1message2 (len-delimited)
Metricname1string2
Metrichelp (optional)2string2
Metrictype3enum (COUNTER=1, HISTOGRAM=3)0 (varint)
Metriclabels (map<string,string>)4message entry {key=1,value=2}2
Metricvalue5double1 (64-bit LE)

type must be 1 (counter) or 3 (histogram) — 0/UNSPECIFIED and 2/GAUGE are rejected. For a counter, value is the increment; for a histogram, value is the observed sample (e.g. a latency in the same unit as the gateway's buckets). The gateway prepends the pushed_ prefix, so request_latency_ms is scraped as pushed_request_latency_ms_{bucket,sum,count}.

Dependency-free encoder + framing + send (drop into the Worker):

type Sample = {
name: string;
type: 1 | 3; // 1 = counter (value is ADDED), 3 = histogram (value is OBSERVED into buckets)
value: number;
labels?: Record<string, string>;
help?: string;
};

const TEXT = new TextEncoder();

// ── protobuf primitives (proto3 wire format) ────────────────────────────────
function putVarint(out: number[], v: number) {
// unsigned LEB128; v is a non-negative safe integer (tag, length, enum)
while (v > 0x7f) {
out.push((v & 0x7f) | 0x80);
v = Math.floor(v / 128);
}
out.push(v & 0x7f);
}
function putTag(out: number[], field: number, wire: number) {
putVarint(out, (field << 3) | wire);
}
function putLenDelim(out: number[], field: number, bytes: ArrayLike<number>) {
putTag(out, field, 2);
putVarint(out, bytes.length);
for (let i = 0; i < bytes.length; i++) out.push(bytes[i]);
}
function putString(out: number[], field: number, s: string) {
putLenDelim(out, field, TEXT.encode(s));
}
function putDouble(out: number[], field: number, n: number) {
putTag(out, field, 1);
const b = new Uint8Array(8);
new DataView(b.buffer).setFloat64(0, n, true); // IEEE-754, little-endian
for (const x of b) out.push(x);
}

function encodeMetric(m: Sample): number[] {
const out: number[] = [];
putString(out, 1, m.name);
if (m.help) putString(out, 2, m.help);
putTag(out, 3, 0);
putVarint(out, m.type); // enum as varint
for (const [k, v] of Object.entries(m.labels ?? {})) {
const entry: number[] = []; // each map entry is a message {key=1, value=2}
putString(entry, 1, k);
putString(entry, 2, v);
putLenDelim(out, 4, entry);
}
putDouble(out, 5, m.value);
return out;
}

function encodePushRequest(samples: Sample[]): Uint8Array {
const out: number[] = [];
for (const m of samples) putLenDelim(out, 1, encodeMetric(m));
return Uint8Array.from(out);
}

// ── gRPC-Web frame: [flags=0x00][uint32 length, big-endian][protobuf] ────────
function grpcWebFrame(payload: Uint8Array): Uint8Array {
const frame = new Uint8Array(5 + payload.length);
frame[0] = 0x00; // uncompressed data frame
new DataView(frame.buffer).setUint32(1, payload.length, false); // big-endian
frame.set(payload, 5);
return frame;
}

// ── send through the VPC binding ─────────────────────────────────────────────
async function pushMetrics(env: Env, samples: Sample[]): Promise<void> {
const body = grpcWebFrame(encodePushRequest(samples));
const res = await env.METRICS_GATEWAY.fetch(
// host is ignored by the binding; only the RPC path matters
"http://gateway/marqo.metrics.v1.MetricsIngest/Push",
{
method: "POST",
headers: {
"content-type": "application/grpc-web+proto",
"x-grpc-web": "1",
},
body,
},
);
// gRPC-Web returns HTTP 200 even on gRPC errors; the authoritative result is
// the grpc-status trailer (0 = OK), carried in a trailer frame (flags MSB set,
// 0x80) at the end of the body. Best-effort: log and move on.
if (res.status !== 200) {
console.warn(`metrics push transport error: HTTP ${res.status}`);
return;
}
const raw = new Uint8Array(await res.arrayBuffer());
let off = 0;
let grpcStatus = 0;
while (off + 5 <= raw.length) {
const flags = raw[off];
const len = new DataView(raw.buffer, raw.byteOffset + off + 1, 4).getUint32(0, false);
const payload = raw.subarray(off + 5, off + 5 + len);
off += 5 + len;
if (flags & 0x80) {
const m = /grpc-status:\s*(\d+)/i.exec(new TextDecoder().decode(payload));
if (m) grpcStatus = Number(m[1]);
}
// else: `payload` is the serialized PushResponse (accepted=field 1,
// rejected=field 2) — decode it only if you need the counts.
}
if (grpcStatus !== 0) {
console.warn(`metrics push grpc-status ${grpcStatus}`);
}
}

// usage
await pushMetrics(env, [
{ name: "requests_total", type: 1, value: 1, labels: { route: "/search", method: "GET" } },
{ name: "request_latency_ms", type: 3, value: 42.5, labels: { route: "/search" } },
]);

Treat the push as best-effort — do not block request handling on it (wrap the call so a failure can't fail the Worker's main response).

If you'd rather not hand-roll the encoder, generate a client from metrics.proto with protobuf-es (@bufbuild/protobuf) and feed its serialized bytes to grpcWebFrame() — the framing and the fetch are identical.

4. Deploy the Worker

metrics_gateway_worker is deployed by OpenTofu (no manual wrangler deploy for production). With the service id set in tfvars (step 2), apply the stack — the build bundles the Worker and cloudflare_workers_script deploys it with the VPC binding:

cd infra/multitenant_eks_cluster
./scripts/deploy.sh --env dev # or the target env

Verification

  1. From the Worker (or wrangler dev bound to the same VPC service), issue one push and confirm a 200 with accepted: 1.

  2. Confirm the series landed (note the pushed_ prefix), from inside a gateway pod:

    kubectl -n metrics-gateway exec deploy/grpc-metrics-gateway -- \
    wget -qO- http://localhost:8080/metrics | grep pushed_
  3. Confirm the gateway is not publicly reachable: there should be no public DNS record, and there is no public hostname on the tunnel. A plain curl https://<anything> cannot reach it — only the bound Worker can.

Logpush: shipping Worker logs to the Parquet pipeline

The metrics-gateway Worker has in-dashboard observability off and Cloudflare Logpush on (logpush = true on the cloudflare_workers_script in metrics_gateway_worker.tf). That flag only enables the feature on the script; a Logpush job (the S3 destination + ScriptName filter) must be created out of band, the same way the global-worker does it. Logs only reach Parquet/Athena/Grafana in an env whose stack has enable_cloudflare_log_analytics = true.

  1. Create/update the job via the manual workflow .github/workflows/cloudflare-upsert-metrics-gateway-worker-logpush-job.yml (Actions → "Cloudflare Upsert Logpush Job - Metrics Gateway Worker" → Run):

    • environment: the target env (e.g. prod2).
    • worker_names: the deployed script name(s), <cell>-<env>-metrics-gateway-worker (e.g. cell1-prod2-metrics-gateway-worker). Output metrics_gateway_worker_script_name is the exact name.
    • job_name: a new name, e.g. cell1-prod2-metrics-gateway-worker-logs. This is also the S3 prefix and the Grafana dropdown value.
    • sampling_rate: keep low — this Worker is on the metrics hot path (default 0.01 = 1%).
  2. Register the job name so the rest of the pipeline picks it up. Append it to cloudflare_logpush_job_names in that env's config/vars/<env>.tfvars and tofu apply:

    cloudflare_logpush_job_names = ["...existing...", "cell1-prod2-metrics-gateway-worker-logs"]

    This adds the converter Lambda's S3-event filter prefix, includes the job in the Glue compaction --worker_names, and adds it to the Grafana "Logpush Job Name" dropdown. If you skip this, logs land in S3 but are never converted to Parquet or shown in Grafana.

Rollback / disable

Set enable_grpc_metrics_gateway_tunnel = false (and enable_metrics_gateway_cloudflare_worker = false) and apply: this removes the cloudflared Deployment, the tunnel, the virtual network, the token Secret, and the Worker (with its VPC binding). Then delete the VPC service created in step 1 in the Cloudflare dashboard/API.

Notes

  • cloudflared connectors are stateless; scale grpc_metrics_gateway_tunnel_replicas freely. They all register against the same tunnel ID.
  • This path intentionally has no per-request app-layer auth (edge/private-network enforcement only). If you later want defense in depth, add a Cloudflare Access service-token policy using the pattern in infra/multitenant_eks_cluster/modules/media_proxy_cloudflare_worker/main.tf.