CUR v2 Setup
Status: IMPLEMENTED — The
cost_curcollector parses CUR v2 Parquet files from S3 and emits hourly per-resource cost events. CUR v2 must still be enabled manually in the AWS Billing console (see below). Until CUR data accumulates, thecost_explorercollector continues to provide daily cost data as a backup.
CUR (Cost and Usage Report) v2 is Polo's planned primary cost data source, providing hourly per-resource cost data. It must be enabled manually in the AWS Billing console — there is no API to create it. It only generates data from the day it's enabled; there is no backfill of historical data.
Until CUR v2 is implemented and accumulates enough history, the cost_explorer collector serves as the primary cost source (daily granularity, $0.01/query).
Setup Steps
- Go to AWS Billing Console → Data Exports → Create export
- Export type: CUR 2.0
- Export name:
polo-cur-v2 - Time granularity: Hourly
- Include resource IDs: Yes
- Data format: Parquet
- Compression: Parquet native (Snappy)
- S3 bucket: create
marqo-polo-cur-v2(or similar) in the management account - S3 prefix:
cur/ - Enable S3 event notification on the bucket → SQS queue → triggers the
cost_curLambda
CUR v2 Column Mapping
CUR v2 uses a different naming convention than legacy CUR. Key mappings for the cost_cur collector:
| Polo field | CUR v2 column |
|---|---|
resource_arn | line_item_resource_id |
aws_account_id | line_item_usage_account_id |
aws_region | product_region |
value (cost USD) | line_item_unblended_cost |
resource_type | Derived from line_item_resource_id via ARN parsing |
Properties: usage_type | line_item_usage_type |
Properties: operation | line_item_operation |
Properties: pricing_term | pricing_term |
File Structure
CUR v2 delivers Parquet files to a path like:
s3://marqo-polo-cur-v2/cur/polo-cur-v2/data/BILLING_PERIOD=2025-03/polo-cur-v2-00001.snappy.parquet
The cost_cur collector reads these files using pyarrow, maps to ResourceEvent, and inserts into ClickHouse.
Verification
@pytest.mark.aws
def test_cur_v2_bucket_has_recent_data():
s3 = boto3.client('s3')
response = s3.list_objects_v2(Bucket='marqo-polo-cur-v2', Prefix='cur/')
assert response.get('KeyCount', 0) > 0, (
"No CUR files found. Ensure CUR v2 export 'polo-cur-v2' is enabled."
)
Cost
CUR itself is free. S3 storage for the Parquet files is minimal (a few GB/month). The Cost Explorer API fallback costs $0.01 per query.