Skip to main content

Cost Delta Decomposition

This is Polo's most important analytical capability. When someone asks "costs are up $3,000 this week — why?", Polo decomposes the delta into a ranked list of concrete causes.

The Drill-Down Flow

Dashboard: "Total cost this week: $18,400 (+$3,000 vs last week)"

│ click "+$3,000"

Delta view: Top movers (hierarchy nodes ranked by |delta|)
│ cluster:acme-prod-1 +$1,200 (new test instances)
│ account:staging +$800 (forgot to tear down)
│ customer:beta +$600 (legitimate scale-up)
│ platform:github +$400 (CI minutes spike)

│ click "cluster:acme-prod-1 +$1,200"

Cluster delta: Resources that changed
│ i-new99 (m5.2xlarge) +$806 new_resource (created by alice@marqo.ai)
│ nat-abc +$360 cost_changed (traffic 4x)
│ vol-big (500GB gp3) +$34 new_resource

│ click "nat-abc +$360 cost_changed"

Resource timeline: Events for nat-abc
│ Mar 25 09:14 config 3 new instances routing via this NAT
│ Mar 25 09:14 metric bytes_processed: 840GB (was 210GB)
│ Mar 24 14:00 metric bytes_processed: 210GB (normal)

│ Root cause identified: deployment added 3 instances without
│ private endpoints, routing all S3 traffic through NAT.

Every table, index, and materialised view in the design should make this path fast.

API Endpoints

Status: IMPLEMENTED — All three delta endpoints are live. Query builders in components/api/src/queries/delta.ts, hooks in components/ui/src/api/useDelta.ts, UI page at components/ui/src/routes/delta/index.tsx. Uses single-pass conditional aggregation for efficiency.

GET /api/v1/cost/delta?currentStart=...&currentEnd=...&compareStart=...&compareEnd=...&hierarchy=marqo_logical
GET /api/v1/cost/delta?...&parentNodeId=cluster:acme-prod-1 # drill into children
GET /api/v1/cost/delta/resources?nodeId=cluster:acme-prod-1&currentStart=...&currentEnd=...&compareStart=...&compareEnd=...
GET /api/v1/cost/delta/events?arn=arn:aws:ec2:...:i-xxx&start=...&end=...

SQL Queries

Level 1: Which hierarchy nodes moved?

WITH
this_week AS (
SELECT ra.ancestor_id, ra.ancestor_type, ra.hierarchy, sum(re.value) AS cost
FROM polo.resource_events re
JOIN polo.resource_ancestry ra ON re.resource_arn = ra.resource_arn
WHERE re.event_type = 'cost'
AND re.event_time >= toStartOfWeek(today())
GROUP BY ra.ancestor_id, ra.ancestor_type, ra.hierarchy
),
last_week AS (
SELECT ra.ancestor_id, ra.ancestor_type, ra.hierarchy, sum(re.value) AS cost
FROM polo.resource_events re
JOIN polo.resource_ancestry ra ON re.resource_arn = ra.resource_arn
WHERE re.event_type = 'cost'
AND re.event_time >= toStartOfWeek(today()) - 7
AND re.event_time < toStartOfWeek(today())
GROUP BY ra.ancestor_id, ra.ancestor_type, ra.hierarchy
)
SELECT
coalesce(t.ancestor_id, l.ancestor_id) AS node_id,
coalesce(t.ancestor_type, l.ancestor_type) AS node_type,
coalesce(l.cost, 0) AS last_week_cost,
coalesce(t.cost, 0) AS this_week_cost,
coalesce(t.cost, 0) - coalesce(l.cost, 0) AS delta,
if(l.cost > 0, (t.cost - l.cost) / l.cost * 100, 999) AS pct_change
FROM this_week t
FULL OUTER JOIN last_week l ON t.ancestor_id = l.ancestor_id AND t.hierarchy = l.hierarchy
WHERE abs(delta) > 1.0
ORDER BY abs(delta) DESC
LIMIT 20;

Level 2: Which resources within a node moved?

WITH
this_week AS (
SELECT re.resource_arn, re.resource_type, re.resource_name, sum(re.value) AS cost
FROM polo.resource_events re
JOIN polo.resource_ancestry ra ON re.resource_arn = ra.resource_arn
WHERE re.event_type = 'cost' AND re.event_time >= toStartOfWeek(today())
AND ra.ancestor_id = {nodeId:String}
GROUP BY re.resource_arn, re.resource_type, re.resource_name
),
last_week AS (
SELECT re.resource_arn, re.resource_type, re.resource_name, sum(re.value) AS cost
FROM polo.resource_events re
JOIN polo.resource_ancestry ra ON re.resource_arn = ra.resource_arn
WHERE re.event_type = 'cost'
AND re.event_time >= toStartOfWeek(today()) - 7
AND re.event_time < toStartOfWeek(today())
AND ra.ancestor_id = {nodeId:String}
GROUP BY re.resource_arn, re.resource_type, re.resource_name
)
SELECT
coalesce(t.resource_arn, l.resource_arn) AS resource_arn,
coalesce(t.resource_type, l.resource_type) AS resource_type,
coalesce(l.cost, 0) AS last_week,
coalesce(t.cost, 0) AS this_week,
coalesce(t.cost, 0) - coalesce(l.cost, 0) AS delta,
multiIf(
l.resource_arn IS NULL, 'new_resource',
t.resource_arn IS NULL, 'terminated',
'cost_changed'
) AS change_type
FROM this_week t
FULL OUTER JOIN last_week l ON t.resource_arn = l.resource_arn
WHERE abs(delta) > 0.50
ORDER BY abs(delta) DESC;

Level 3: What events explain a resource's cost change?

SELECT event_time, event_type, event_action, value, unit,
properties, actor_name
FROM polo.resource_events
WHERE resource_arn = {arn:String}
AND event_time >= {start:DateTime64} AND event_time < {end:DateTime64}
ORDER BY event_time DESC
LIMIT 50;

UI Components

Status: IMPLEMENTED — The delta UI is at components/ui/src/routes/delta/index.tsx. Uses shadcn/ui Card, Table, Badge, Skeleton, Select, Button, and ResourceTypeBadge components.

  • routes/delta/index.tsx — Ranked list of movers with delta bars (red=increase, green=decrease), period selector (7d/14d/30d), click-to-drill into resource-level changes with back navigation.
  • ResourceTimeline.tsx — NOT YET BUILT. The API endpoint (/cost/delta/events) and hook (useResourceEvents) are ready; only the UI component is missing.
  • The dashboard shows KPI cards by account role and a top-5 customers table. Cost Delta page is accessible via sidebar navigation.

Performance Targets

QueryTarget
Hierarchy-level delta (100K events, 500 resources)< 500ms
Resource-level drill-down within a node< 300ms
Event timeline for a single resource< 50ms