Skip to main content

Testing Strategy

Working Methodology

Work in phases. Each phase must be fully tested and verified before proceeding. If any test fails:

  1. Read the error message carefully.
  2. Identify the root cause (do not guess — inspect logs, query state, print debug output).
  3. Fix the root cause (not the symptom).
  4. Re-run ALL tests in the current phase to verify the fix didn't break anything.
  5. Only then proceed to the next phase.

Never skip tests. Never leave a failing test behind. Never comment out a test to make the suite pass. If a test is wrong, fix the test and document why.

Use aws CLI to inspect real resources when debugging AWS integration issues.

Test Tiers

TierWhatRequiresSpeed
UnitMocked AWS, pure logicNothingFast
IntegrationReal ClickHouse with test dataDocker CH runningMedium
Performance100K+ rows, latency benchmarksDocker CH runningMedium
E2EReal AWS resources, full pipelineAWS creds + CHSlow

Phase Checkpoints

PhaseFocusStatusCheckpoint
0Project bootstrapDoneClickHouse running, responds to ping
1SchemaDoneAll tables exist, correct columns, inserts work, MV triggers, dictionary resolves
2Common libraryDoneARN parser, tag resolver, hierarchy resolver, CH client all pass
3CollectorsDoneAll unit tests (mocked) + integration tests (real CH) green
4IntegrationDoneFull pipeline round-trips, hierarchy rollups, MV correctness
5PerformanceDoneQuery latency targets met, batch insert performance verified
6APIDoneQuery generation, routing, auth all pass
7UIDoneComponent tests pass, npm run build succeeds
8E2EDoneReal AWS resources detected, hierarchy correct, tags propagate, cleanup verified
9ActionsNot startedSafety tests, dry-run, suggestions, savings, full action e2e pipeline
10Multi-accountNot startedAccount discovery, cross-account collection, CUR v2, collector_runs
11GovernanceNot startedBudget rules, anomalies, Slack notifications, weekly digest
12Multi-platformNot startedGitHub/CF collectors, pricing, daily snapshots, allocation, system status
13ForecastingNot startedForecasts, changelog diffs, allocated costs
14DeploymentNot startedWorker deploys, SPA loads, all API endpoints respond, auth works

Test Files

Implemented

Unit tests (components/collectors/)

Each collector has a test file alongside its handler. Common library tests:

  • components/collectors/common/test_arn.py
  • components/collectors/common/test_tag_resolver.py
  • components/collectors/common/test_hierarchy_resolver.py
  • components/collectors/common/test_normaliser.py
  • components/collectors/common/test_models.py

Integration tests (tests/integration/)

  • test_ingestion_pipeline.py — Full pipeline round-trips
  • test_hierarchy_rollup.py — Hierarchy building and cost rollup
  • test_materialised_views.py — MV correctness

Performance tests (tests/performance/)

  • test_query_latency.py — Query performance benchmarks
  • test_ingest_throughput.py — Batch insert performance

E2E tests (tests/e2e/)

  • test_ec2_lifecycle.py — EC2 instance lifecycle detection
  • test_ebs_hierarchy.py — Instance + volume + snapshot hierarchy chain
  • test_cleanup_verification.py — No test resources remain after suite

Schema tests (components/schema/)

  • test_schema.py — Schema validation

UI e2e tests (components/ui/e2e/)

Playwright tests run against the Vite preview server with mocked API responses (no real backend required).

  • smoke.spec.ts — 12 smoke tests: app load, navigation to all 5 pages, delta drill-down + back, hierarchy node click + cost trend, theme cycling
  • screenshots.spec.ts — 14 screenshot tests (7 pages/states x 2 themes): dashboard, delta, delta drill-down, costs, resources, hierarchy, hierarchy with trend
  • fixtures.ts — Mock API data and shared mockApi() route handler for all endpoints
  • screenshots/ — 14 captured PNGs for visual reference

Run with: cd components/ui && npm run test:e2e

Planned (for phases 9-14)

  • components/collectors/action_executor/test_safety.py — Protected tags, production restrictions, minimum age, batch limits
  • components/collectors/action_executor/test_executor.py — Dry-run makes no API calls, real execution correct
  • tests/integration/test_delta_decomposition.py — "Why are costs different?" at all three levels
  • tests/e2e/test_actions.py — Orphaned volume -> suggestion -> preview -> execute -> verify

Performance Targets

QueryTarget
Delta decomposition (100K events, 500 resources)< 500ms
Resource drill-down within a node< 300ms
Cost-by-customer< 500ms
Fast-path (denormalised column GROUP BY)< 200ms
Snapshot lookup< 50ms
10K batch insert< 2s

E2E Resource Rules

All test resources created in AWS must:

  • Be tagged with polo:test=true AND Name=polo-e2e-test-<timestamp>
  • Use the cheapest possible options: t3.nano instances (stopped immediately), 1GB gp3 volumes
  • Be cleaned up in a finally block even on test failure
  • Have a final verification test (test_no_test_resources_remain_after_cleanup)

Manual verification commands

# Verify AWS credentials
aws sts get-caller-identity

# Check for leaked test resources
aws ec2 describe-instances --filters "Name=tag:polo:test,Values=true" \
--query 'Reservations[].Instances[].{ID:InstanceId,State:State.Name}'

aws ec2 describe-volumes --filters "Name=tag:polo:test,Values=true" \
--query 'Volumes[].{ID:VolumeId,State:State}'

aws ec2 describe-snapshots --owner-ids self --filters "Name=tag:polo:test,Values=true" \
--query 'Snapshots[].{ID:SnapshotId}'