Testing Strategy
Working Methodology
Work in phases. Each phase must be fully tested and verified before proceeding. If any test fails:
- Read the error message carefully.
- Identify the root cause (do not guess — inspect logs, query state, print debug output).
- Fix the root cause (not the symptom).
- Re-run ALL tests in the current phase to verify the fix didn't break anything.
- Only then proceed to the next phase.
Never skip tests. Never leave a failing test behind. Never comment out a test to make the suite pass. If a test is wrong, fix the test and document why.
Use aws CLI to inspect real resources when debugging AWS integration issues.
Test Tiers
| Tier | What | Requires | Speed |
|---|---|---|---|
| Unit | Mocked AWS, pure logic | Nothing | Fast |
| Integration | Real ClickHouse with test data | Docker CH running | Medium |
| Performance | 100K+ rows, latency benchmarks | Docker CH running | Medium |
| E2E | Real AWS resources, full pipeline | AWS creds + CH | Slow |
Phase Checkpoints
| Phase | Focus | Status | Checkpoint |
|---|---|---|---|
| 0 | Project bootstrap | Done | ClickHouse running, responds to ping |
| 1 | Schema | Done | All tables exist, correct columns, inserts work, MV triggers, dictionary resolves |
| 2 | Common library | Done | ARN parser, tag resolver, hierarchy resolver, CH client all pass |
| 3 | Collectors | Done | All unit tests (mocked) + integration tests (real CH) green |
| 4 | Integration | Done | Full pipeline round-trips, hierarchy rollups, MV correctness |
| 5 | Performance | Done | Query latency targets met, batch insert performance verified |
| 6 | API | Done | Query generation, routing, auth all pass |
| 7 | UI | Done | Component tests pass, npm run build succeeds |
| 8 | E2E | Done | Real AWS resources detected, hierarchy correct, tags propagate, cleanup verified |
| 9 | Actions | Not started | Safety tests, dry-run, suggestions, savings, full action e2e pipeline |
| 10 | Multi-account | Not started | Account discovery, cross-account collection, CUR v2, collector_runs |
| 11 | Governance | Not started | Budget rules, anomalies, Slack notifications, weekly digest |
| 12 | Multi-platform | Not started | GitHub/CF collectors, pricing, daily snapshots, allocation, system status |
| 13 | Forecasting | Not started | Forecasts, changelog diffs, allocated costs |
| 14 | Deployment | Not started | Worker deploys, SPA loads, all API endpoints respond, auth works |
Test Files
Implemented
Unit tests (components/collectors/)
Each collector has a test file alongside its handler. Common library tests:
components/collectors/common/test_arn.pycomponents/collectors/common/test_tag_resolver.pycomponents/collectors/common/test_hierarchy_resolver.pycomponents/collectors/common/test_normaliser.pycomponents/collectors/common/test_models.py
Integration tests (tests/integration/)
test_ingestion_pipeline.py— Full pipeline round-tripstest_hierarchy_rollup.py— Hierarchy building and cost rolluptest_materialised_views.py— MV correctness
Performance tests (tests/performance/)
test_query_latency.py— Query performance benchmarkstest_ingest_throughput.py— Batch insert performance
E2E tests (tests/e2e/)
test_ec2_lifecycle.py— EC2 instance lifecycle detectiontest_ebs_hierarchy.py— Instance + volume + snapshot hierarchy chaintest_cleanup_verification.py— No test resources remain after suite
Schema tests (components/schema/)
test_schema.py— Schema validation
UI e2e tests (components/ui/e2e/)
Playwright tests run against the Vite preview server with mocked API responses (no real backend required).
smoke.spec.ts— 12 smoke tests: app load, navigation to all 5 pages, delta drill-down + back, hierarchy node click + cost trend, theme cyclingscreenshots.spec.ts— 14 screenshot tests (7 pages/states x 2 themes): dashboard, delta, delta drill-down, costs, resources, hierarchy, hierarchy with trendfixtures.ts— Mock API data and sharedmockApi()route handler for all endpointsscreenshots/— 14 captured PNGs for visual reference
Run with: cd components/ui && npm run test:e2e
Planned (for phases 9-14)
components/collectors/action_executor/test_safety.py— Protected tags, production restrictions, minimum age, batch limitscomponents/collectors/action_executor/test_executor.py— Dry-run makes no API calls, real execution correcttests/integration/test_delta_decomposition.py— "Why are costs different?" at all three levelstests/e2e/test_actions.py— Orphaned volume -> suggestion -> preview -> execute -> verify
Performance Targets
| Query | Target |
|---|---|
| Delta decomposition (100K events, 500 resources) | < 500ms |
| Resource drill-down within a node | < 300ms |
| Cost-by-customer | < 500ms |
| Fast-path (denormalised column GROUP BY) | < 200ms |
| Snapshot lookup | < 50ms |
| 10K batch insert | < 2s |
E2E Resource Rules
All test resources created in AWS must:
- Be tagged with
polo:test=trueANDName=polo-e2e-test-<timestamp> - Use the cheapest possible options:
t3.nanoinstances (stopped immediately), 1GBgp3volumes - Be cleaned up in a
finallyblock even on test failure - Have a final verification test (
test_no_test_resources_remain_after_cleanup)
Manual verification commands
# Verify AWS credentials
aws sts get-caller-identity
# Check for leaked test resources
aws ec2 describe-instances --filters "Name=tag:polo:test,Values=true" \
--query 'Reservations[].Instances[].{ID:InstanceId,State:State.Name}'
aws ec2 describe-volumes --filters "Name=tag:polo:test,Values=true" \
--query 'Volumes[].{ID:VolumeId,State:State}'
aws ec2 describe-snapshots --owner-ids self --filters "Name=tag:polo:test,Values=true" \
--query 'Snapshots[].{ID:SnapshotId}'