Resource Hierarchy & Cost Rollup

The Problem

Polo's most important analytical capability is rolling costs up through multiple overlapping hierarchies that extend all the way down to physical AWS resource parentage.

A snapshot's cost rolls up to its volume, which rolls up to its instance, which rolls up to its index, cluster, and customer. The same snapshot simultaneously rolls up through the account hierarchy. The UI needs total cost at every level, and each level carries metadata that all descendants inherit (account role, customer tier, cluster version).

Marqo logical hierarchy:

  Customer "Acme Corp"  (tier: enterprise)
    ├── Cluster "acme-prod-1"  (version: v2)
    │     ├── Index "products"
    │     │     ├── ALB alb-xxx ────────────── $0.02/hr    (backed_by → i-aaa, i-ggg)
    │     │     ├── EC2 i-aaa ─────────────── $2.10/hr
    │     │     │     ├── EBS vol-bbb ──────── $0.08/hr    (attached_to i-aaa)
    │     │     │     │     └── Snapshot snap-ccc ── $0.01  (snapshot_of vol-bbb)
    │     │     │     ├── ENI eni-ddd ──────── $0.00        (eni_of i-aaa)
    │     │     │     └── Public IPv4 ──────── $0.005/hr    (associated_with i-aaa)
    │     │     └── NAT nat-fff ────────────── $0.045/hr
    │     └── Index "images"
    │           └── EC2 i-ggg ─────────────── $1.50/hr
    └── Cluster "acme-staging-1"
          └── ...

Network topology (non-cost edges):

  VPC vpc-111
    ├── Subnet subnet-aaa  (us-east-1a)
    │     ├── EC2 i-aaa            (in_subnet)
    │     ├── ENI eni-ddd          (in_subnet)
    │     └── NAT nat-fff          (in_subnet)
    └── Subnet subnet-bbb  (us-east-1b)
          └── EC2 i-ggg            (in_subnet)

Three Hierarchy Dimensions

Dimension	Edges	Example path	Purpose
`aws_physical`	`is_cost_parent=1` edges from `resource_relationships`	snapshot → volume → instance	Roll up child resource costs to their physical parents
`marqo_logical`	Tags on resources (or inherited from physical parents)	resource → index → cluster → customer	Business-level cost attribution
`aws_account`	Resource's `aws_account_id`	resource → account	Account-level cost tracking and role-based filtering

All three coexist in resource_ancestry and can be queried independently or together.

Reify vs Infer — The Hybrid Approach

Strategy	Pro	Con
Query-time inference	Always correct	ClickHouse terrible at recursive CTEs; every query walks the graph
Full ingest-time reification	Blazing fast GROUP BY	Metadata changes require backfilling billions of rows
Hybrid (chosen)	Fast GROUP BY on identity; metadata changes are instant	Slight query complexity for metadata columns

Identity columns (account_id, cluster, index) are stamped on events at ingest — they don't change after creation.

Metadata (account role, customer tier) lives in hierarchy_nodes and is resolved via ClickHouse dictionary at query time — changes take effect instantly without backfill.

account_role is the one exception — reified on events too because it's universally useful for filtering, with a backfill job if it changes (which is rare).

Key Tables

`hierarchy_nodes` — Logical hierarchy levels

Small table (hundreds of rows) defining accounts, customers, clusters, indexes with inheritable metadata. Individual AWS resources are NOT stored here — they participate via resource_ancestry.

CREATE TABLE polo.hierarchy_nodes
(
    node_id          String,                      -- 'account:123456789012', 'cluster:abc123', etc.
    node_type        LowCardinality(String),      -- 'account', 'customer', 'cluster', 'index'
    node_name        String,
    parent_id        String DEFAULT '',            -- '' = root
    hierarchy        LowCardinality(String),       -- 'aws_account', 'marqo_logical'
    metadata         Map(String, String),          -- 'role' → 'customer', 'tier' → 'enterprise', etc.
    created_at       DateTime64(3),
    updated_at       DateTime64(3),
    _version         UInt64
)
ENGINE = ReplacingMergeTree(_version)
ORDER BY (hierarchy, node_id);

`resource_ancestry` — Closure table

Pre-computed transitive closure of the full hierarchy. For every resource, every ancestor at every depth. Enables "total cost of everything under node X" without recursion.

CREATE TABLE polo.resource_ancestry
(
    resource_arn     String,                       -- the leaf resource
    ancestor_id      String,                       -- node_id or resource ARN
    ancestor_type    LowCardinality(String),       -- 'account', 'customer', 'cluster', 'index', 'ec2:instance', etc.
    hierarchy        LowCardinality(String),       -- 'aws_physical', 'marqo_logical', 'aws_account'
    depth            UInt8,                        -- 1 = parent, 2 = grandparent, etc.
    _version         UInt64
)
ENGINE = ReplacingMergeTree(_version)
ORDER BY (ancestor_id, resource_arn);

Example rows for snapshot snap-ccc:

resource_arn	ancestor_id	ancestor_type	hierarchy	depth
`arn:...:snap-ccc`	`arn:...:vol-bbb`	ebs:volume	aws_physical	1
`arn:...:snap-ccc`	`arn:...:i-aaa`	ec2:instance	aws_physical	2
`arn:...:snap-ccc`	`index:acme-products`	index	marqo_logical	3
`arn:...:snap-ccc`	`cluster:acme-prod-1`	cluster	marqo_logical	4
`arn:...:snap-ccc`	`customer:acme`	customer	marqo_logical	5
`arn:...:snap-ccc`	`account:111111111111`	account	aws_account	1

Size estimate: ~5K resources × ~6 ancestors each = ~30K rows. At tens of thousands of resources, O(100K) rows — trivial for ClickHouse. Rebuilt in < 1 second every 15 minutes.

`hierarchy_dict` — In-memory metadata lookups

CREATE DICTIONARY polo.hierarchy_dict
(
    node_id String, node_type String, node_name String,
    parent_id String, hierarchy String, metadata Map(String, String)
)
PRIMARY KEY node_id
SOURCE(CLICKHOUSE(TABLE 'hierarchy_nodes' DB 'polo'))
LAYOUT(COMPLEX_KEY_HASHED())
LIFETIME(MIN 60 MAX 300);

Usage: dictGet('polo.hierarchy_dict', 'metadata', 'account:111')['role'] — near-zero cost per lookup.

`cost_rollup_daily` — Pre-aggregated hierarchical costs

Daily cost at every hierarchy level (logical AND physical). Powers treemap and sunburst views.

CREATE TABLE polo.cost_rollup_daily
(
    day              Date,
    node_id          String,                      -- node_id or resource ARN
    node_type        LowCardinality(String),
    node_name        String,
    hierarchy        LowCardinality(String),
    parent_id        String,
    cost_usd         Float64,
    resource_count   UInt32,
    cost_by_type     Map(String, Float64)          -- 'ec2:instance' → 142.50, etc.
)
ENGINE = SummingMergeTree()
PARTITION BY toYYYYMM(day)
ORDER BY (hierarchy, node_type, node_id, day);

Tag Inheritance Through Physical Parents

Snapshots and other child resources typically have no Marqo tags. The HierarchyResolver walks up the physical parent chain (snapshot → volume → instance) until it finds an ancestor with marqo:index tags, then uses those for the logical hierarchy. This means an untagged snapshot still rolls up correctly.

MARQO_TAG_MAP = {
    'marqo_customer':  ['marqo:customer', 'Customer', 'customer'],
    'marqo_cluster':   ['marqo:cluster', 'marqo:cluster-id', 'Cluster'],
    'marqo_index':     ['marqo:index', 'marqo:index-name', 'Index'],
    'marqo_env':       ['marqo:environment', 'Environment', 'env'],
    'marqo_purpose':   ['marqo:purpose', 'Purpose'],
}

Resource Relationship Types

The relationships collector discovers edges between resources from AWS APIs. Each edge has a type and a is_cost_parent flag that controls whether the edge participates in cost rollup.

Cost-parent edges (roll up costs)

Relationship	Source → Target	AWS API	Purpose
`attached_to`	EBS volume → EC2 instance	`describe_volumes` → `Attachments[].InstanceId`	Volume cost attributed to instance
`snapshot_of`	EBS snapshot → EBS volume	`describe_snapshots` → `VolumeId`	Snapshot cost attributed to volume
`associated_with`	EIP → EC2 instance	`describe_addresses` → `InstanceId`	EIP cost attributed to instance
`eni_of`	ENI → EC2 instance	`describe_network_interfaces` → `Attachment.InstanceId`	ENI cost attributed to instance
`backed_by`	Load balancer → EC2 instance	`describe_target_groups` + `describe_target_health` → target instances	LB cost attributed to target instances

Topology edges (no cost rollup)

Relationship	Source → Target	AWS API	Purpose
`in_subnet`	EC2 instance → Subnet	`describe_instances` → `SubnetId`	Network placement
`in_subnet`	ENI → Subnet	`describe_network_interfaces` → `SubnetId`	Network placement
`in_subnet`	NAT gateway → Subnet	`describe_nat_gateways` → `SubnetId`	Network placement
`in_vpc`	Subnet → VPC	`describe_subnets` → `VpcId`	Network containment

Topology edges enable "show me all resources in VPC X" queries without polluting the cost-optimised closure table. They're queried directly from resource_relationships, not through resource_ancestry.

Relationship Between Tables

resource_relationships (direct edges, typed)
        │
        │  hierarchy_builder reads is_cost_parent=1 edges
        ▼
resource_ancestry (transitive closure, all 3 hierarchies)
        │
        │  cost queries JOIN here for rollups
        ▼
cost_rollup_daily (pre-aggregated, for dashboards)

resource_relationships stores the raw graph. resource_ancestry is the query-optimised closure table derived from it. The hierarchy_builder collector rebuilds resource_ancestry every 15 minutes.

Marqo Logical Hierarchy — Data Sources

The hierarchy_admin collector populates the marqo_logical hierarchy from multiple DynamoDB tables and resource tags. This section documents the data sources, join logic, and heuristics.

Data Sources

Source	Table	Account	What it provides
UsersAccountsTable	DynamoDB	468036072962 (Staging)	Customer accounts: `visible_account_id` (UUID), `system_account_id` (short ID), `organization`, `name`
CustomerIndexConfigTable	DynamoDB	468036072962 (Staging)	Indexes: `index_name` (SK), `system_account_id` (PK), `index_status`, `inference_type`, `storage_class`, `marqo_version`, `marqo_endpoint`
Resource tags	ClickHouse `resource_snapshots`	All collected accounts	Clusters and indexes via `marqo:cluster`, `marqo:index` AWS tags (fallback source)

Join Logic

UsersAccountsTable (ACCOUNT entities)
    │
    │  system_account_id = system_account_id (PK)
    ▼
CustomerIndexConfigTable (index records)
    │
    │  visible_account_id from UsersAccountsTable
    ▼
hierarchy_nodes:  customer:{visible_account_id} → index:{system_account_id}:{index_name}

The system_account_id is an opaque short identifier (e.g., nh236hm2) that serves as:

The partition key in CustomerIndexConfigTable
The join key between customers and their indexes
Part of the Marqo endpoint subdomain: {index-name}-{hash}-{system_account_id}.marqo-staging.com

It is stored in hierarchy_nodes customer metadata so that subsequent runs can read the mapping from ClickHouse via _load_sys_to_visible() without re-scanning DynamoDB.

Index Node Identity

Index node IDs use the (system_account_id, index_name) tuple: index:{system_account_id}:{index_name}. This is necessary because different customers can have indexes with the same name (e.g., kogan-prod-20251122 exists under two different accounts).

Index Status Filtering

Only indexes with active statuses are collected: CREATING, DELETING, MODIFYING, READY. The DELETED status (which accounts for the vast majority of records — ~13,800 of ~13,850) is excluded.

Index Metadata

Each index node stores metadata from CustomerIndexConfigTable:

Field	Example	Purpose
`index_status`	`READY`	Current lifecycle state
`inference_type`	`CPU`, `CPU.SMALL`, `GPU`, `GPU.LARGE`	Compute tier
`storage_class`	`BASIC`, `BALANCED_STORAGE`, `PERFORMANCE`, `BALANCED_THROUGHPUT_PLUS`	Storage tier
`marqo_version`	`2.25.0-cloud`	Marqo engine version
`marqo_endpoint`	`my-index-abc123.marqo-staging.com`	Customer-facing endpoint
`index_namespace`	`deadbeef1234...`	Internal namespace hash
`dev_cell`	`mehul` or empty	Dev cell identifier (see below)

Dev Cells

In staging, dev cells are isolated Marqo Cloud instances used by developers for testing. They're identified by naming patterns in resource names and index names:

Pattern	Example	Dev cell ID
Known developer prefix	`mehul-dev-MultitenantEKSCluster-*`	`mehul`
Known developer prefix	`keshavjois-test-index`	`keshavjois`
Generic `{name}-dev-` prefix	`cell1-dev-MultitenantEKSCluster-*`	`cell1-dev`
Branch-based deployment	`dev-controller-branchname-*`	`branchname`

Known developer prefixes (maintained in hierarchy_admin/handler.py): mehul, mehulporuwal, keshav, keshavjois, vfacabado.

Regular staging, preprod, and prod resources have an empty dev_cell value. The dev_cell metadata field enables filtering the hierarchy to show only production-grade resources or only a specific developer's environment.

Hierarchy Shape

The Marqo logical hierarchy is currently two levels (customer → index), not three:

Customer "Acme Corp"  (organization, name from UsersAccountsTable)
  ├── Index "prod-search"       (READY, GPU, PERFORMANCE)
  ├── Index "staging-search"    (READY, CPU.SMALL, BASIC)
  └── Index "dev-experiment"    (CREATING, CPU, BASIC, dev_cell: mehul)

There is no explicit "cluster" concept in the CustomerIndexConfigTable. Clusters only appear when AWS resources carry marqo:cluster tags (currently none do in staging). If Marqo introduces a cluster concept in the control plane, a third hierarchy level can be added.

Three Query Paths

Path	When to use	Joins required
Fast path	GROUP BY a denormalised column (account_role, marqo_customer)	None
Full rollup	Cost of a node including all physical children	JOIN resource_ancestry
Metadata	Resolve account role, customer tier for display	dictGet() — near free

The Problem​

Three Hierarchy Dimensions​

Reify vs Infer — The Hybrid Approach​

Key Tables​

hierarchy_nodes — Logical hierarchy levels​

resource_ancestry — Closure table​

hierarchy_dict — In-memory metadata lookups​

cost_rollup_daily — Pre-aggregated hierarchical costs​

Tag Inheritance Through Physical Parents​

Resource Relationship Types​

Cost-parent edges (roll up costs)​

Topology edges (no cost rollup)​

Relationship Between Tables​

Marqo Logical Hierarchy — Data Sources​

Data Sources​

Join Logic​

Index Node Identity​

Index Status Filtering​

Index Metadata​

Dev Cells​

Hierarchy Shape​

Three Query Paths​