Project Structure & Conventions
Directory Layout
polo/
├── docs/ # Documentation (this directory)
│
├── components/
│ ├── cli/ # Polo development CLI (click)
│ │ ├── commands.py # All commands
│ │ └── prod_sync.py # Production data sync logic
│ │
│ ├── schema/ # ClickHouse DDL + migrations
│ │ ├── migrations/ # 001_*.sql through 010_*.sql
│ │ ├── migrate.py # Applies migrations in order
│ │ └── test_schema.py
│ │
│ ├── collectors/ # AWS data collection (Python, Lambda-deployable)
│ │ ├── common/ # Shared library
│ │ │ ├── models.py # ResourceEvent Pydantic model
│ │ │ ├── normaliser.py # Raw AWS data → ResourceEvent
│ │ │ ├── tag_resolver.py # Tags → marqo_* fields
│ │ │ ├── hierarchy_resolver.py # Walks physical + logical hierarchies
│ │ │ ├── clickhouse_client.py # CH HTTP client with batching + retry
│ │ │ ├── arn.py # ARN parsing
│ │ │ └── config.py # Env var config
│ │ │
│ │ ├── config_ec2/ # EC2 instance collector
│ │ ├── config_ebs/ # EBS volume collector
│ │ ├── config_network/ # NATs, EIPs, ELBs, ENIs
│ │ ├── cost_explorer/ # Cost Explorer API
│ │ ├── lifecycle/ # CloudTrail events via EventBridge
│ │ ├── tags/ # Resource Groups Tagging API
│ │ ├── relationships/ # Discovers resource_relationships
│ │ ├── snapshot_builder/ # Maintains resource_snapshots
│ │ └── hierarchy_builder/ # Builds resource_ancestry closure table
│ │
│ ├── api/ # Cloudflare Worker (TypeScript)
│ │ ├── src/
│ │ │ ├── index.ts # Worker entrypoint
│ │ │ ├── router.ts # URL-pattern-based routing
│ │ │ ├── queries/ # Named parameterised SQL queries
│ │ │ │ ├── cost.ts
│ │ │ │ ├── delta.ts # Delta decomposition (3 endpoints)
│ │ │ │ ├── resources.ts
│ │ │ │ ├── hierarchy.ts
│ │ │ │ ├── types.ts
│ │ │ │ └── index.ts
│ │ │ ├── clickhouse.ts # CH HTTP client
│ │ │ └── auth.ts # Cloudflare Access JWT validation
│ │ ├── test/
│ │ ├── wrangler.toml
│ │ └── package.json
│ │
│ └── ui/ # React SPA
│ ├── src/
│ │ ├── App.tsx # Root — useState-based routing (5 pages)
│ │ ├── main.tsx
│ │ ├── routes/
│ │ │ ├── index.tsx # Dashboard (KPI cards + top customers)
│ │ │ ├── delta/index.tsx # Cost delta decomposition + drill-down
│ │ │ ├── costs/index.tsx # Costs page
│ │ │ ├── resources/index.tsx # Resources page
│ │ │ └── hierarchy/index.tsx # Hierarchy explorer + cost trend
│ │ ├── api/
│ │ │ ├── client.ts # Base API client
│ │ │ ├── useCosts.ts # Cost hooks (by-customer, by-account-role)
│ │ │ ├── useDelta.ts # Delta hooks (nodes, resources, events)
│ │ │ ├── useHierarchy.ts # Hierarchy tree + cost trend hooks
│ │ │ └── useResources.ts # Resource list hook
│ │ ├── components/
│ │ │ ├── Layout.tsx # Responsive sidebar, theme toggle, header
│ │ │ ├── CostTrendChart.tsx # Recharts cost visualisation
│ │ │ ├── ResourceTable.tsx # Resource inventory table
│ │ │ └── ui/ # shadcn/ui primitives (card, button, badge, table, select, skeleton)
│ │ └── lib/
│ │ ├── formatters.ts
│ │ ├── colors.ts
│ │ └── utils.ts # cn() utility (clsx + tailwind-merge)
│ ├── e2e/ # Playwright tests (26 tests, 14 screenshots)
│ ├── playwright.config.ts
│ ├── vite.config.ts
│ └── package.json
│
├── tests/
│ ├── integration/ # Tests against real ClickHouse
│ ├── performance/ # Benchmarks with latency targets
│ └── e2e/ # Real AWS resources → full pipeline
│
├── infra/
│ ├── polo/ # Polo v2 infrastructure (CDK)
│ │ ├── app.py # CDK app: ClickHouse + collectors
│ │ ├── destination.py # CDK app: PoloReadRole in target accounts
│ │ └── stacks/
│ │ ├── clickhouse_stack.py # VPC + EC2 ClickHouse + Secrets Manager
│ │ ├── collectors_stack.py # Collector Lambdas + EventBridge schedules
│ │ └── destination_stack.py # Cross-account PoloReadRole IAM
│ │
│ └── legacy/ # Polo v1 (DynamoDB, Cognito, legacy sync)
│ ├── app.py # CDK app entry point
│ ├── config/ # Environment config
│ ├── lambda/ # Auth Lambdas (OAuth, Cognito hooks)
│ └── stacks/
│ ├── polo_stack.py # Main orchestrator
│ ├── api_stack.py # Cloudflare Worker + CloudFront
│ ├── service_stack.py # Lambda + EventBridge orchestration
│ ├── database_stack.py # DynamoDB tables
│ ├── cognito_stack.py # Cognito user pools
│ ├── eventbridge_stack.py # Event scheduling
│ ├── sync_stack.py # Data sync
│ ├── ci_stack.py # CI/CD
│ ├── route53_stack.py # DNS
│ └── destination_stack.py # Event destinations
│
├── scripts/ # Utility scripts
│ ├── seed_test_data.py
│ ├── prices.py
│ └── ...
│
├── docker-compose.yml # ClickHouse for local dev
├── pants.toml # Pants build system config
├── pyproject.toml
└── CLAUDE.md
Key Dependencies
Python
clickhouse-connect— ClickHouse HTTP clientboto3— AWS SDKpydantic— Data validation for eventspytest,pytest-benchmark
TypeScript (API Worker)
jose— JWT validationwrangler— Cloudflare Workers CLI@cloudflare/workers-types
TypeScript (UI)
react19,react-dom19@tanstack/react-query— Data fetching + caching@tanstack/react-table— Resource table@tanstack/react-router— Installed but not yet wired up (App.tsx uses useState)recharts— Chartstailwindcss— Styling (CSS variable theming)lucide-react— Iconsclsx+tailwind-merge— Conditional class utility@playwright/test— E2E testing
Conventions
- Python 3.11+, type hints everywhere
- All collectors return
list[ResourceEvent](Pydantic model incomponents/collectors/common/models.py) - ClickHouse inserts are always batched (
DEFAULT_BATCH_SIZE = 1000inclickhouse_client.py) - Every module has a corresponding test file
- AWS calls are wrapped with retry + exponential backoff
- All test resources in AWS get tag
polo:test=true - Use AWS CLI profile "polo"
Build System
Pants 2.29.0 manages Python code. Source roots:
components/schema,components/collectorsinfra/legacyscriptstests/integration,tests/performance,tests/e2e
Pants-ignored (legacy, not Pants-managed):
components/polo-legacy/— the pre-restructure Flask apptasks.py,requirements.dev.txt— CDK deployment shims (see deployment.md)
Linting/formatting: ruff via Pants (pants lint ::, pants fmt ::)
ruff.toml defines source roots and known-first-party modules to keep standalone ruff and Pants ruff in sync. When adding a new collector, add its module name to known-first-party.
CLI
All development commands go through the polo CLI (components/cli/commands.py). See components/cli.md for design and full reference.