Skip to main content

Budget Rules & Zero-Based Budgeting

Status: PLANNED — This feature is not yet implemented. The budget_rules and rule_violations tables, the rule_evaluator collector, and the /api/v1/rules/* endpoints do not exist yet. This document is the design specification for future development.

Every dollar of spend needs a declared justification. The rules engine continuously compares reality against declared intent.

Rule Types

TypeWhat it checksExample
budgetCost metric ≤ threshold"Testing accounts total < $500/month"
existenceResource count/state within bounds"No running instances in staging outside business hours"
configResource property matches expectations"No io1/io2 volumes in staging"
lifecycleResource age within bounds"No SageMaker notebook running > 8 hours"
ratioCost proportion within bounds"NAT cost < 15% of cluster total"

Default Rules (seeded at migration)

  1. Staging off-hours — No running instances in testing accounts outside business hours (Melbourne TZ)
  2. SageMaker notebook limit — No notebook InService for > 8 hours
  3. Untagged resource cost — No untagged resource costing > $1/day
  4. Testing account budget — Testing accounts total < $500/month
  5. NAT cost ratio — NAT gateway cost < 15% of cluster total

Schema

budget_rules

CREATE TABLE polo.budget_rules
(
rule_id String,
rule_name String,
rule_type LowCardinality(String),
scope_hierarchy LowCardinality(String), -- 'marqo_logical', 'aws_account', '*'
scope_node_id String, -- 'customer:acme', 'account:222', '*'
scope_filters Map(String, String), -- {'resource_type': 'ec2:instance', 'marqo_env': 'staging'}
condition Map(String, String), -- type-specific condition parameters
severity LowCardinality(String), -- 'info', 'warning', 'critical'
notification_channel String DEFAULT '', -- 'slack', 'email', ''
enabled UInt8 DEFAULT 1,
created_by String,
created_at DateTime64(3),
updated_at DateTime64(3),
_version UInt64
)
ENGINE = ReplacingMergeTree(_version) ORDER BY (rule_id);

rule_violations

CREATE TABLE polo.rule_violations
(
violation_id UUID DEFAULT generateUUIDv4(),
rule_id String,
rule_name String,
rule_type LowCardinality(String),
severity LowCardinality(String),
detected_at DateTime64(3),
resolved_at Nullable(DateTime64(3)),
resource_arn String DEFAULT '',
node_id String DEFAULT '',
actual_value String,
threshold_value String,
message String,
notified UInt8 DEFAULT 0,
notified_at Nullable(DateTime64(3)),
_version UInt64
)
ENGINE = ReplacingMergeTree(_version) ORDER BY (rule_id, detected_at, violation_id);

Rule Evaluator

A scheduled collector that reads all enabled rules, executes the appropriate query, compares results against thresholds, creates/resolves violations, and sends notifications.

  • Budget/ratio rules: evaluated hourly
  • Existence/config/lifecycle rules: evaluated every 15 minutes

UI

  • Policies page: Rule list with rule builder form (not raw SQL)
  • Violations feed: Filtered by severity/scope/rule type
  • Compliance score: Per hierarchy node (e.g. "customer acme: 94% compliant, 3 open violations")