Skip to main content

Polo Legacy — Data Schema

DynamoDB Tables

ResourceTable

Single-table design storing all infrastructure resources.

AttributeKeyTypeExample
typePKStringI (Instance), V (VPC), CL (Cluster)
pathSKStringA#023568249301/V#vpc-abc/S#subnet-123/I#i-xyz
All Resource fieldsMixedSee Resource model below

ReportTable

Stores pre-computed reports.

AttributeKeyTypeExample
PKPKStringREPORT#MultiCostReport
SKSKString2025-06-21T00:00:00Z
dataMapSerialized report payload

Resource Type Codes

CodeClassDescription
CUCustomerParent org/customer
ACAccountCustomer sub-account
USUserIAM/Cognito user
CLClusterKubernetes cluster
IXMarqoIndexVector search index
AWSAwsAccountAWS account
VVpcVPC
SSubnetSubnet
NGNatGatewayNAT gateway
IPPublicIPv4Elastic IP
LLoadBalancerELB (Classic/ALB/NLB)
IInstanceEC2 instance
VOVolumeEBS volume
SNSnapshotEBS snapshot
NBNotebookSageMaker notebook
BUBucketS3 bucket
SPSavingsPlanAWS Savings Plan
OTOtherMiscellaneous

Resource Base Model

All resources share these fields:

type : str — type code (PK)
path : str — hierarchical path (SK)
subtype : str — hardware type (e.g., t3.micro, gp3)
account_id : str — Marqo account ID
dev_cell : str — development cell
name : str — human-readable name
region : str — AWS region
az : str — availability zone
state : str — running | stopped | terminated | available | deleted
cost : float — $/hr for this resource
sum_cost : float — $/hr including all descendants
role : Role — functional role (see enum)
system : System — system classification
audience : Audience — who this serves
team : Team — owning team
tags : dict — AWS tags
created_at : datetime
updated_at : datetime
deleted_at : datetime — soft-delete timestamp (None = alive)
created_by : str
updated_by : str

Enums

Role (25 values)

Infrastructure: grouping, saving, dev, controller, control-plane, networking Index/Cluster: control, bastion, inference, vespa-config, vespa-content, vespa-api Metrics: metrics, workflow, datadog Other: monitoring, validation, notebook, bucket, jumpbox, testing, dataset, training, evaluation

System

CP (Control Plane), DP (Data Plane), MT (Marqtune), OS (Open Source), IT (Internal)

Audience

customers, development, applied_science, sales

Team

CP, DP, MT, OS, AS, CO, IT

Extended Fields by Resource Type

Instance (I)

cluster_id : str
marqo_index : str
public_ip : str
elastic_ip : str
network_bytes_in : int
network_bytes_out: int
cloud_version : int — 1 (legacy tagged), 2 (current role-based), null (non-cloud)
elb : str

Account (AC)

system_account_id : str
customer_visible_account_id : str
aws_id : str
owner_id : str
owner_name : str
organization : str

MarqoIndex (IX)

config : dict — index configuration
price : float — customer-facing price
cloud_version : int

User (US)

email : str
country : str
signup_method : str
industry : str
intention_of_use: str
use_case : str
utm_* : str — marketing attribution fields

Volume (VO)

size : int — GB
instance_ids : list — attached EC2 instances

Notebook (NB)

volume_size : int — GB

Bucket (BU)

size : int — bytes

Everything Object

The Everything dataclass aggregates all resources with cached relationship lookups:

volumes_by_instance : Dict[instance_id, List[Volume]]
accounts_by_aws : Dict[aws_id, List[Account]]
clusters_by_account : Dict[account_id, List[Cluster]]
instances_by_cluster : Dict[cluster_id, List[Instance]]
instances_by_index : Dict[index_name, List[Instance]]
instances_by_subnet : Dict[subnet_id, List[Instance]]
instances_by_vpc : Dict[vpc_id, List[Instance]]
vpcs_by_cluster : Dict[cluster_id, List[Vpc]]
subnets_by_vpc : Dict[vpc_id, List[Subnet]]
subnets_by_cluster : Dict[cluster_id, List[Subnet]]
nats_by_subnet : Dict[subnet_id, List[NatGateway]]
nats_by_cluster : Dict[cluster_id, List[NatGateway]]
users_by_email : Dict[email, User]

Report Models

MultiCostReport

Nested dict: {account_id: {usage_type: {period: cost}}} Periods: last7, prev7, prev21, golden

Budget

Same structure as MultiCostReport + target tracking. Golden date: June 21, 2025. Monthly target: $79,167. Daily target: $2,639.

Action Reports

  • PruneClustersReport — clusters with no active instances for 3+ days
  • PruneVolumesReport — unattached volumes, created 30+ days ago
  • PruneIPsReport — IPs associated with dead instances
  • PruneNatGatewaysReport — NATs unused by any instance
  • StopNotebooksReport — notebooks idle 3+ days