Skip to main content

CUR v2 Setup

Status: IMPLEMENTED — The cost_cur collector parses CUR v2 Parquet files from S3 and emits hourly per-resource cost events. CUR v2 must still be enabled manually in the AWS Billing console (see below). Until CUR data accumulates, the cost_explorer collector continues to provide daily cost data as a backup.

CUR (Cost and Usage Report) v2 is Polo's planned primary cost data source, providing hourly per-resource cost data. It must be enabled manually in the AWS Billing console — there is no API to create it. It only generates data from the day it's enabled; there is no backfill of historical data.

Until CUR v2 is implemented and accumulates enough history, the cost_explorer collector serves as the primary cost source (daily granularity, $0.01/query).

Setup Steps

  1. Go to AWS Billing Console → Data Exports → Create export
  2. Export type: CUR 2.0
  3. Export name: polo-cur-v2
  4. Time granularity: Hourly
  5. Include resource IDs: Yes
  6. Data format: Parquet
  7. Compression: Parquet native (Snappy)
  8. S3 bucket: create marqo-polo-cur-v2 (or similar) in the management account
  9. S3 prefix: cur/
  10. Enable S3 event notification on the bucket → SQS queue → triggers the cost_cur Lambda

CUR v2 Column Mapping

CUR v2 uses a different naming convention than legacy CUR. Key mappings for the cost_cur collector:

Polo fieldCUR v2 column
resource_arnline_item_resource_id
aws_account_idline_item_usage_account_id
aws_regionproduct_region
value (cost USD)line_item_unblended_cost
resource_typeDerived from line_item_resource_id via ARN parsing
Properties: usage_typeline_item_usage_type
Properties: operationline_item_operation
Properties: pricing_termpricing_term

File Structure

CUR v2 delivers Parquet files to a path like:

s3://marqo-polo-cur-v2/cur/polo-cur-v2/data/BILLING_PERIOD=2025-03/polo-cur-v2-00001.snappy.parquet

The cost_cur collector reads these files using pyarrow, maps to ResourceEvent, and inserts into ClickHouse.

Verification

@pytest.mark.aws
def test_cur_v2_bucket_has_recent_data():
s3 = boto3.client('s3')
response = s3.list_objects_v2(Bucket='marqo-polo-cur-v2', Prefix='cur/')
assert response.get('KeyCount', 0) > 0, (
"No CUR files found. Ensure CUR v2 export 'polo-cur-v2' is enabled."
)

Cost

CUR itself is free. S3 storage for the Parquet files is minimal (a few GB/month). The Cost Explorer API fallback costs $0.01 per query.