Runbook: grpc_metrics_gateway_workers_vpc_setup

How to finish wiring the gRPC metrics gateway's private Cloudflare Tunnel so that it is reachable only from our Cloudflare Workers, via a Workers VPC service binding. This covers the one step OpenTofu (v5 Cloudflare provider) cannot express — creating the VPC service object in the Cloudflare dashboard / API. The Worker-side vpc_service binding is OpenTofu-managed (the metrics_gateway_worker deploys with the binding applied in metrics_gateway_worker.tf); you just supply the service id via tfvars.

Background

infra/multitenant_eks_cluster/grpc_metrics_gateway_tunnel.tf already creates, when enable_grpc_metrics_gateway_tunnel = true:

the Cloudflare Tunnel + a cloudflared Deployment (2 replicas) in the metrics-gateway namespace,
a dedicated virtual network (the isolation boundary),
optionally an IP-based tunnel route (only if grpc_metrics_gateway_tunnel_route_network is set).

There is no public hostname and no public DNS record — nothing is publicly resolvable. The remaining work is to let a specific Worker reach the gateway across that private network.

The gateway itself is reachable inside the cluster at:

grpc-metrics-gateway.metrics-gateway.svc.cluster.local:8080   # gRPC-Web (HTTP)

(8080 serves the gRPC-Web wrapper; clients call the path /marqo.metrics.v1.MetricsIngest/Push.)

Prerequisites

enable_grpc_metrics_gateway = true and enable_grpc_metrics_gateway_tunnel = true applied for the cell.

cloudflared connector healthy and on a recent enough version. Creating the VPC service fails with Tunnel's cloudflared version … is too old (requires >= 2025.7.0) if the connector is older. var.cloudflared_image defaults to a compatible tag; if you bump it, re-apply and let the Deployment roll so the tunnel reports the new version to Cloudflare before you create the VPC service.

aws eks update-kubeconfig --region <region> --name <cluster>
kubectl -n metrics-gateway get pods -l app=cloudflared-grpc-metrics-gateway
kubectl -n metrics-gateway logs -l app=cloudflared-grpc-metrics-gateway --tail=50
# expect "Registered tunnel connection" lines, no credential errors
# check the running version:
kubectl -n metrics-gateway get deploy cloudflared-grpc-metrics-gateway \
  -o jsonpath='{.spec.template.spec.containers[0].image}{"\n"}'

Grab the IDs Terraform exported (run in infra/multitenant_eks_cluster):

tofu output grpc_metrics_gateway_tunnel_id
tofu output grpc_metrics_gateway_tunnel_virtual_network_id
tofu output grpc_metrics_gateway_tunnel_virtual_network_name

Choose a model

Path A — Workers VPC service (recommended). A VPC service points the Worker at the gateway via the tunnel; no CIDR routing, no warp-routing toggle. Leave grpc_metrics_gateway_tunnel_route_network empty.

Path B — IP-based private route. Set grpc_metrics_gateway_tunnel_route_network to the gateway's ClusterIP as a /32 (IPv4) or /128 (IPv6), enable warp-routing on the tunnel, and have the Worker target that IP. Use only if the VPC service product is unavailable. Get the ClusterIP with:

kubectl -n metrics-gateway get svc grpc-metrics-gateway -o jsonpath='{.spec.clusterIPs}{"\n"}'

The exact Cloudflare console/API surface for "VPC services" / "Workers VPC" is evolving (beta). Names below may differ slightly — confirm against current Cloudflare docs. The Terraform-managed tunnel + virtual network IDs above are the stable inputs every variant needs.

Steps (Path A)

1. Create a VPC service for the gateway

In the Cloudflare dashboard: Zero Trust → Networks → VPC services (or via the API), create a service that references:

Tunnel: the grpc_metrics_gateway_tunnel_id from above.
Virtual network: the grpc_metrics_gateway_tunnel_virtual_network_id.
Target (HTTP): grpc-metrics-gateway.metrics-gateway.svc.cluster.local port 8080. cloudflared runs in-cluster and resolves this via cluster DNS.

Record the resulting VPC service ID.

2. Bind the VPC service to the Worker

The gateway-facing Worker is components/metrics_gateway_worker, deployed by OpenTofu (cloudflare_workers_script in metrics_gateway_worker.tf) with the vpc_service binding (name = "METRICS_GATEWAY") applied in OpenTofu. So there is no manual wrangler step in production — just set the VPC service id from step 1 in the env's tfvars and apply:

# config/vars/<env>.tfvars
enable_metrics_gateway_cloudflare_worker = true
metrics_gateway_worker_vpc_service_id    = "<vpc-service-id-from-step-1>"

For local wrangler dev only, the same id goes in components/metrics_gateway_worker/wrangler.jsonc under vpc_services (that block is ignored by the OpenTofu deploy).

Other Workers that want to emit metrics bind to this Worker over RPC (a services binding with entrypoint = "MetricsReporter"), not a VPC service — only metrics_gateway_worker holds the tunnel binding.

3. Call the gateway from the Worker

components/metrics_gateway_worker already implements this encoder (src/grpcweb.ts) and exposes it as the MetricsReporter RPC entrypoint, so most callers just bind that Worker (a services binding) and call incr/observe. The wire-level detail below is reference for that implementation, or for hand-rolling a direct gRPC-Web caller.

The binding exposes a fetch. A call is three layers: protobuf-encode a PushRequest, wrap it in a single gRPC-Web length-prefixed frame, then POST it to the RPC path /<package>.<Service>/<Method> (/marqo.metrics.v1.MetricsIngest/Push) with content-type application/grpc-web+proto. The gateway's gRPC-Web wrapper (port 8080) dispatches it to the same handler the in-cluster gRPC listener uses.

Wire layout, straight from components/grpc_metrics_gateway/proto/metrics.proto:

Message	Field	#	proto type	wire type
`PushRequest`	`metrics` (repeated `Metric`)	1	message	2 (len-delimited)
`Metric`	`name`	1	string	2
`Metric`	`help` (optional)	2	string	2
`Metric`	`type`	3	enum (`COUNTER=1`, `HISTOGRAM=3`)	0 (varint)
`Metric`	`labels` (`map<string,string>`)	4	message entry `{key=1,value=2}`	2
`Metric`	`value`	5	double	1 (64-bit LE)

type must be 1 (counter) or 3 (histogram) — 0/UNSPECIFIED and 2/GAUGE are rejected. For a counter, value is the increment; for a histogram, value is the observed sample (e.g. a latency in the same unit as the gateway's buckets). The gateway prepends the pushed_ prefix, so request_latency_ms is scraped as pushed_request_latency_ms_{bucket,sum,count}.

Dependency-free encoder + framing + send (drop into the Worker):

type Sample = {
  name: string;
  type: 1 | 3; // 1 = counter (value is ADDED), 3 = histogram (value is OBSERVED into buckets)
  value: number;
  labels?: Record<string, string>;
  help?: string;
};

const TEXT = new TextEncoder();

// ── protobuf primitives (proto3 wire format) ────────────────────────────────
function putVarint(out: number[], v: number) {
  // unsigned LEB128; v is a non-negative safe integer (tag, length, enum)
  while (v > 0x7f) {
    out.push((v & 0x7f) | 0x80);
    v = Math.floor(v / 128);
  }
  out.push(v & 0x7f);
}
function putTag(out: number[], field: number, wire: number) {
  putVarint(out, (field << 3) | wire);
}
function putLenDelim(out: number[], field: number, bytes: ArrayLike<number>) {
  putTag(out, field, 2);
  putVarint(out, bytes.length);
  for (let i = 0; i < bytes.length; i++) out.push(bytes[i]);
}
function putString(out: number[], field: number, s: string) {
  putLenDelim(out, field, TEXT.encode(s));
}
function putDouble(out: number[], field: number, n: number) {
  putTag(out, field, 1);
  const b = new Uint8Array(8);
  new DataView(b.buffer).setFloat64(0, n, true); // IEEE-754, little-endian
  for (const x of b) out.push(x);
}

function encodeMetric(m: Sample): number[] {
  const out: number[] = [];
  putString(out, 1, m.name);
  if (m.help) putString(out, 2, m.help);
  putTag(out, 3, 0);
  putVarint(out, m.type); // enum as varint
  for (const [k, v] of Object.entries(m.labels ?? {})) {
    const entry: number[] = []; // each map entry is a message {key=1, value=2}
    putString(entry, 1, k);
    putString(entry, 2, v);
    putLenDelim(out, 4, entry);
  }
  putDouble(out, 5, m.value);
  return out;
}

function encodePushRequest(samples: Sample[]): Uint8Array {
  const out: number[] = [];
  for (const m of samples) putLenDelim(out, 1, encodeMetric(m));
  return Uint8Array.from(out);
}

// ── gRPC-Web frame: [flags=0x00][uint32 length, big-endian][protobuf] ────────
function grpcWebFrame(payload: Uint8Array): Uint8Array {
  const frame = new Uint8Array(5 + payload.length);
  frame[0] = 0x00; // uncompressed data frame
  new DataView(frame.buffer).setUint32(1, payload.length, false); // big-endian
  frame.set(payload, 5);
  return frame;
}

// ── send through the VPC binding ─────────────────────────────────────────────
async function pushMetrics(env: Env, samples: Sample[]): Promise<void> {
  const body = grpcWebFrame(encodePushRequest(samples));
  const res = await env.METRICS_GATEWAY.fetch(
    // host is ignored by the binding; only the RPC path matters
    "http://gateway/marqo.metrics.v1.MetricsIngest/Push",
    {
      method: "POST",
      headers: {
        "content-type": "application/grpc-web+proto",
        "x-grpc-web": "1",
      },
      body,
    },
  );
  // gRPC-Web returns HTTP 200 even on gRPC errors; the authoritative result is
  // the grpc-status trailer (0 = OK), carried in a trailer frame (flags MSB set,
  // 0x80) at the end of the body. Best-effort: log and move on.
  if (res.status !== 200) {
    console.warn(`metrics push transport error: HTTP ${res.status}`);
    return;
  }
  const raw = new Uint8Array(await res.arrayBuffer());
  let off = 0;
  let grpcStatus = 0;
  while (off + 5 <= raw.length) {
    const flags = raw[off];
    const len = new DataView(raw.buffer, raw.byteOffset + off + 1, 4).getUint32(0, false);
    const payload = raw.subarray(off + 5, off + 5 + len);
    off += 5 + len;
    if (flags & 0x80) {
      const m = /grpc-status:\s*(\d+)/i.exec(new TextDecoder().decode(payload));
      if (m) grpcStatus = Number(m[1]);
    }
    // else: `payload` is the serialized PushResponse (accepted=field 1,
    // rejected=field 2) — decode it only if you need the counts.
  }
  if (grpcStatus !== 0) {
    console.warn(`metrics push grpc-status ${grpcStatus}`);
  }
}

// usage
await pushMetrics(env, [
  { name: "requests_total", type: 1, value: 1, labels: { route: "/search", method: "GET" } },
  { name: "request_latency_ms", type: 3, value: 42.5, labels: { route: "/search" } },
]);

Treat the push as best-effort — do not block request handling on it (wrap the call so a failure can't fail the Worker's main response).

If you'd rather not hand-roll the encoder, generate a client from metrics.proto with protobuf-es (@bufbuild/protobuf) and feed its serialized bytes to grpcWebFrame() — the framing and the fetch are identical.

4. Deploy the Worker

metrics_gateway_worker is deployed by OpenTofu (no manual wrangler deploy for production). With the service id set in tfvars (step 2), apply the stack — the build bundles the Worker and cloudflare_workers_script deploys it with the VPC binding:

cd infra/multitenant_eks_cluster
./scripts/deploy.sh --env dev   # or the target env

Verification

From the Worker (or wrangler dev bound to the same VPC service), issue one push and confirm a 200 with accepted: 1.

Confirm the series landed (note the pushed_ prefix), from inside a gateway pod:

kubectl -n metrics-gateway exec deploy/grpc-metrics-gateway -- \
  wget -qO- http://localhost:8080/metrics | grep pushed_

Confirm the gateway is not publicly reachable: there should be no public DNS record, and there is no public hostname on the tunnel. A plain curl https://<anything> cannot reach it — only the bound Worker can.

Logpush: shipping Worker logs to the Parquet pipeline

The metrics-gateway Worker has in-dashboard observability off and Cloudflare Logpush on (logpush = true on the cloudflare_workers_script in metrics_gateway_worker.tf). That flag only enables the feature on the script; a Logpush job (the S3 destination + ScriptName filter) must be created out of band, the same way the global-worker does it. Logs only reach Parquet/Athena/Grafana in an env whose stack has enable_cloudflare_log_analytics = true.

Create/update the job via the manual workflow .github/workflows/cloudflare-upsert-metrics-gateway-worker-logpush-job.yml (Actions → "Cloudflare Upsert Logpush Job - Metrics Gateway Worker" → Run):
- environment: the target env (e.g. prod2).
- worker_names: the deployed script name(s), <cell>-<env>-metrics-gateway-worker (e.g. cell1-prod2-metrics-gateway-worker). Output metrics_gateway_worker_script_name is the exact name.
- job_name: a new name, e.g. cell1-prod2-metrics-gateway-worker-logs. This is also the S3 prefix and the Grafana dropdown value.
- sampling_rate: keep low — this Worker is on the metrics hot path (default 0.01 = 1%).
Register the job name so the rest of the pipeline picks it up. Append it to cloudflare_logpush_job_names in that env's config/vars/<env>.tfvars and tofu apply:
```
cloudflare_logpush_job_names = ["...existing...", "cell1-prod2-metrics-gateway-worker-logs"]
```
This adds the converter Lambda's S3-event filter prefix, includes the job in the Glue compaction --worker_names, and adds it to the Grafana "Logpush Job Name" dropdown. If you skip this, logs land in S3 but are never converted to Parquet or shown in Grafana.

Rollback / disable

Set enable_grpc_metrics_gateway_tunnel = false (and enable_metrics_gateway_cloudflare_worker = false) and apply: this removes the cloudflared Deployment, the tunnel, the virtual network, the token Secret, and the Worker (with its VPC binding). Then delete the VPC service created in step 1 in the Cloudflare dashboard/API.

Notes

cloudflared connectors are stateless; scale grpc_metrics_gateway_tunnel_replicas freely. They all register against the same tunnel ID.
This path intentionally has no per-request app-layer auth (edge/private-network enforcement only). If you later want defense in depth, add a Cloudflare Access service-token policy using the pattern in infra/multitenant_eks_cluster/modules/media_proxy_cloudflare_worker/main.tf.

Background​

Prerequisites​

Choose a model​

Steps (Path A)​

1. Create a VPC service for the gateway​

2. Bind the VPC service to the Worker​

3. Call the gateway from the Worker​

4. Deploy the Worker​

Verification​

Logpush: shipping Worker logs to the Parquet pipeline​

Rollback / disable​

Notes​