Skip to main content

Project Structure & Conventions

Directory Layout

polo/
├── docs/ # Documentation (this directory)

├── components/
│ ├── cli/ # Polo development CLI (click)
│ │ ├── commands.py # All commands
│ │ └── prod_sync.py # Production data sync logic
│ │
│ ├── schema/ # ClickHouse DDL + migrations
│ │ ├── migrations/ # 001_*.sql through 010_*.sql
│ │ ├── migrate.py # Applies migrations in order
│ │ └── test_schema.py
│ │
│ ├── collectors/ # AWS data collection (Python, Lambda-deployable)
│ │ ├── common/ # Shared library
│ │ │ ├── models.py # ResourceEvent Pydantic model
│ │ │ ├── normaliser.py # Raw AWS data → ResourceEvent
│ │ │ ├── tag_resolver.py # Tags → marqo_* fields
│ │ │ ├── hierarchy_resolver.py # Walks physical + logical hierarchies
│ │ │ ├── clickhouse_client.py # CH HTTP client with batching + retry
│ │ │ ├── arn.py # ARN parsing
│ │ │ └── config.py # Env var config
│ │ │
│ │ ├── config_ec2/ # EC2 instance collector
│ │ ├── config_ebs/ # EBS volume collector
│ │ ├── config_network/ # NATs, EIPs, ELBs, ENIs
│ │ ├── cost_explorer/ # Cost Explorer API
│ │ ├── lifecycle/ # CloudTrail events via EventBridge
│ │ ├── tags/ # Resource Groups Tagging API
│ │ ├── relationships/ # Discovers resource_relationships
│ │ ├── snapshot_builder/ # Maintains resource_snapshots
│ │ └── hierarchy_builder/ # Builds resource_ancestry closure table
│ │
│ ├── api/ # Cloudflare Worker (TypeScript)
│ │ ├── src/
│ │ │ ├── index.ts # Worker entrypoint
│ │ │ ├── router.ts # URL-pattern-based routing
│ │ │ ├── queries/ # Named parameterised SQL queries
│ │ │ │ ├── cost.ts
│ │ │ │ ├── delta.ts # Delta decomposition (3 endpoints)
│ │ │ │ ├── resources.ts
│ │ │ │ ├── hierarchy.ts
│ │ │ │ ├── types.ts
│ │ │ │ └── index.ts
│ │ │ ├── clickhouse.ts # CH HTTP client
│ │ │ └── auth.ts # Cloudflare Access JWT validation
│ │ ├── test/
│ │ ├── wrangler.toml
│ │ └── package.json
│ │
│ └── ui/ # React SPA
│ ├── src/
│ │ ├── App.tsx # Root — useState-based routing (5 pages)
│ │ ├── main.tsx
│ │ ├── routes/
│ │ │ ├── index.tsx # Dashboard (KPI cards + top customers)
│ │ │ ├── delta/index.tsx # Cost delta decomposition + drill-down
│ │ │ ├── costs/index.tsx # Costs page
│ │ │ ├── resources/index.tsx # Resources page
│ │ │ └── hierarchy/index.tsx # Hierarchy explorer + cost trend
│ │ ├── api/
│ │ │ ├── client.ts # Base API client
│ │ │ ├── useCosts.ts # Cost hooks (by-customer, by-account-role)
│ │ │ ├── useDelta.ts # Delta hooks (nodes, resources, events)
│ │ │ ├── useHierarchy.ts # Hierarchy tree + cost trend hooks
│ │ │ └── useResources.ts # Resource list hook
│ │ ├── components/
│ │ │ ├── Layout.tsx # Responsive sidebar, theme toggle, header
│ │ │ ├── CostTrendChart.tsx # Recharts cost visualisation
│ │ │ ├── ResourceTable.tsx # Resource inventory table
│ │ │ └── ui/ # shadcn/ui primitives (card, button, badge, table, select, skeleton)
│ │ └── lib/
│ │ ├── formatters.ts
│ │ ├── colors.ts
│ │ └── utils.ts # cn() utility (clsx + tailwind-merge)
│ ├── e2e/ # Playwright tests (26 tests, 14 screenshots)
│ ├── playwright.config.ts
│ ├── vite.config.ts
│ └── package.json

├── tests/
│ ├── integration/ # Tests against real ClickHouse
│ ├── performance/ # Benchmarks with latency targets
│ └── e2e/ # Real AWS resources → full pipeline

├── infra/
│ ├── polo/ # Polo v2 infrastructure (CDK)
│ │ ├── app.py # CDK app: ClickHouse + collectors
│ │ ├── destination.py # CDK app: PoloReadRole in target accounts
│ │ └── stacks/
│ │ ├── clickhouse_stack.py # VPC + EC2 ClickHouse + Secrets Manager
│ │ ├── collectors_stack.py # Collector Lambdas + EventBridge schedules
│ │ └── destination_stack.py # Cross-account PoloReadRole IAM
│ │
│ └── legacy/ # Polo v1 (DynamoDB, Cognito, legacy sync)
│ ├── app.py # CDK app entry point
│ ├── config/ # Environment config
│ ├── lambda/ # Auth Lambdas (OAuth, Cognito hooks)
│ └── stacks/
│ ├── polo_stack.py # Main orchestrator
│ ├── api_stack.py # Cloudflare Worker + CloudFront
│ ├── service_stack.py # Lambda + EventBridge orchestration
│ ├── database_stack.py # DynamoDB tables
│ ├── cognito_stack.py # Cognito user pools
│ ├── eventbridge_stack.py # Event scheduling
│ ├── sync_stack.py # Data sync
│ ├── ci_stack.py # CI/CD
│ ├── route53_stack.py # DNS
│ └── destination_stack.py # Event destinations

├── scripts/ # Utility scripts
│ ├── seed_test_data.py
│ ├── prices.py
│ └── ...

├── docker-compose.yml # ClickHouse for local dev
├── pants.toml # Pants build system config
├── pyproject.toml
└── CLAUDE.md

Key Dependencies

Python

  • clickhouse-connect — ClickHouse HTTP client
  • boto3 — AWS SDK
  • pydantic — Data validation for events
  • pytest, pytest-benchmark

TypeScript (API Worker)

  • jose — JWT validation
  • wrangler — Cloudflare Workers CLI
  • @cloudflare/workers-types

TypeScript (UI)

  • react 19, react-dom 19
  • @tanstack/react-query — Data fetching + caching
  • @tanstack/react-table — Resource table
  • @tanstack/react-router — Installed but not yet wired up (App.tsx uses useState)
  • recharts — Charts
  • tailwindcss — Styling (CSS variable theming)
  • lucide-react — Icons
  • clsx + tailwind-merge — Conditional class utility
  • @playwright/test — E2E testing

Conventions

  • Python 3.11+, type hints everywhere
  • All collectors return list[ResourceEvent] (Pydantic model in components/collectors/common/models.py)
  • ClickHouse inserts are always batched (DEFAULT_BATCH_SIZE = 1000 in clickhouse_client.py)
  • Every module has a corresponding test file
  • AWS calls are wrapped with retry + exponential backoff
  • All test resources in AWS get tag polo:test=true
  • Use AWS CLI profile "polo"

Build System

Pants 2.29.0 manages Python code. Source roots:

  • components/schema, components/collectors
  • infra/legacy
  • scripts
  • tests/integration, tests/performance, tests/e2e

Pants-ignored (legacy, not Pants-managed):

  • components/polo-legacy/ — the pre-restructure Flask app
  • tasks.py, requirements.dev.txt — CDK deployment shims (see deployment.md)

Linting/formatting: ruff via Pants (pants lint ::, pants fmt ::)

ruff.toml defines source roots and known-first-party modules to keep standalone ruff and Pants ruff in sync. When adding a new collector, add its module name to known-first-party.

CLI

All development commands go through the polo CLI (components/cli/commands.py). See components/cli.md for design and full reference.