Polo Legacy — Data Schema
DynamoDB Tables
ResourceTable
Single-table design storing all infrastructure resources.
| Attribute | Key | Type | Example |
|---|---|---|---|
type | PK | String | I (Instance), V (VPC), CL (Cluster) |
path | SK | String | A#023568249301/V#vpc-abc/S#subnet-123/I#i-xyz |
| All Resource fields | — | Mixed | See Resource model below |
ReportTable
Stores pre-computed reports.
| Attribute | Key | Type | Example |
|---|---|---|---|
| PK | PK | String | REPORT#MultiCostReport |
| SK | SK | String | 2025-06-21T00:00:00Z |
| data | — | Map | Serialized report payload |
Resource Type Codes
| Code | Class | Description |
|---|---|---|
| CU | Customer | Parent org/customer |
| AC | Account | Customer sub-account |
| US | User | IAM/Cognito user |
| CL | Cluster | Kubernetes cluster |
| IX | MarqoIndex | Vector search index |
| AWS | AwsAccount | AWS account |
| V | Vpc | VPC |
| S | Subnet | Subnet |
| NG | NatGateway | NAT gateway |
| IP | PublicIPv4 | Elastic IP |
| L | LoadBalancer | ELB (Classic/ALB/NLB) |
| I | Instance | EC2 instance |
| VO | Volume | EBS volume |
| SN | Snapshot | EBS snapshot |
| NB | Notebook | SageMaker notebook |
| BU | Bucket | S3 bucket |
| SP | SavingsPlan | AWS Savings Plan |
| OT | Other | Miscellaneous |
Resource Base Model
All resources share these fields:
type : str — type code (PK)
path : str — hierarchical path (SK)
subtype : str — hardware type (e.g., t3.micro, gp3)
account_id : str — Marqo account ID
dev_cell : str — development cell
name : str — human-readable name
region : str — AWS region
az : str — availability zone
state : str — running | stopped | terminated | available | deleted
cost : float — $/hr for this resource
sum_cost : float — $/hr including all descendants
role : Role — functional role (see enum)
system : System — system classification
audience : Audience — who this serves
team : Team — owning team
tags : dict — AWS tags
created_at : datetime
updated_at : datetime
deleted_at : datetime — soft-delete timestamp (None = alive)
created_by : str
updated_by : str
Enums
Role (25 values)
Infrastructure: grouping, saving, dev, controller, control-plane, networking
Index/Cluster: control, bastion, inference, vespa-config, vespa-content, vespa-api
Metrics: metrics, workflow, datadog
Other: monitoring, validation, notebook, bucket, jumpbox, testing, dataset, training, evaluation
System
CP (Control Plane), DP (Data Plane), MT (Marqtune), OS (Open Source), IT (Internal)
Audience
customers, development, applied_science, sales
Team
CP, DP, MT, OS, AS, CO, IT
Extended Fields by Resource Type
Instance (I)
cluster_id : str
marqo_index : str
public_ip : str
elastic_ip : str
network_bytes_in : int
network_bytes_out: int
cloud_version : int — 1 (legacy tagged), 2 (current role-based), null (non-cloud)
elb : str
Account (AC)
system_account_id : str
customer_visible_account_id : str
aws_id : str
owner_id : str
owner_name : str
organization : str
MarqoIndex (IX)
config : dict — index configuration
price : float — customer-facing price
cloud_version : int
User (US)
email : str
country : str
signup_method : str
industry : str
intention_of_use: str
use_case : str
utm_* : str — marketing attribution fields
Volume (VO)
size : int — GB
instance_ids : list — attached EC2 instances
Notebook (NB)
volume_size : int — GB
Bucket (BU)
size : int — bytes
Everything Object
The Everything dataclass aggregates all resources with cached relationship lookups:
volumes_by_instance : Dict[instance_id, List[Volume]]
accounts_by_aws : Dict[aws_id, List[Account]]
clusters_by_account : Dict[account_id, List[Cluster]]
instances_by_cluster : Dict[cluster_id, List[Instance]]
instances_by_index : Dict[index_name, List[Instance]]
instances_by_subnet : Dict[subnet_id, List[Instance]]
instances_by_vpc : Dict[vpc_id, List[Instance]]
vpcs_by_cluster : Dict[cluster_id, List[Vpc]]
subnets_by_vpc : Dict[vpc_id, List[Subnet]]
subnets_by_cluster : Dict[cluster_id, List[Subnet]]
nats_by_subnet : Dict[subnet_id, List[NatGateway]]
nats_by_cluster : Dict[cluster_id, List[NatGateway]]
users_by_email : Dict[email, User]
Report Models
MultiCostReport
Nested dict: {account_id: {usage_type: {period: cost}}}
Periods: last7, prev7, prev21, golden
Budget
Same structure as MultiCostReport + target tracking. Golden date: June 21, 2025. Monthly target: $79,167. Daily target: $2,639.
Action Reports
PruneClustersReport— clusters with no active instances for 3+ daysPruneVolumesReport— unattached volumes, created 30+ days agoPruneIPsReport— IPs associated with dead instancesPruneNatGatewaysReport— NATs unused by any instanceStopNotebooksReport— notebooks idle 3+ days