Skip to main content

A/B Test Runtime Design Doc

Author: Ruchira Jayasekara
Date: 2026-04-07
Updated: 2026-04-08
Status: Implemented


1. Problem Statement

A/B tests are partially implemented: the Controller has full CRUD for ABTest and ABTestRulesSet entities, and the Console has UI for creating/managing them. However, no runtime variant assignment exists — when a user performs a search, the Cloudflare worker has no mechanism to:

  1. Determine which A/B test(s) apply to the user.
  2. Select the appropriate merchandising rules variant.
  3. Apply variant-specific rules to the search request.

Additionally, there is no API for downstream consumers (analytics, reporting) to query which A/B tests were active at a given point in time.

Goals

  • Search-time variant assignment: The Cloudflare worker accepts a userId (or falls back to sessionId), determines the user's A/B test bucket, and applies the corresponding merchandising rules.
  • Live test filtering: An endpoint that accepts an ISO 8601 timestamp and returns tests that were active (started but not completed) at that time.
  • Deterministic bucketing: The same user always lands in the same variant for a given test (no randomness per-request).
  • Low latency: Variant assignment must add zero additional KV lookups — data is embedded in existing trigger KV entries.

2. Architecture

2.1 Controller (Django)

Models (components/controller/merchandise/services/models.py):

  • ABTestVariant: name (str), traffic_percent (float 0-100), rule_set_ids (list of ABTestRulesSet IDs — empty list = control/baseline).
  • ABTest: id, system_account_id, index_name, name, description, status (IN_PROGRESS | COMPLETED), variants (list of ABTestVariant), created_at, created_by, completed_at, deployed_at.
  • ABTestRulesSet: id, system_account_id, index_name, trigger_context, name, description, global_rules, trigger_rules.

DynamoDB (table: MERCHANDISING_TABLE_NAME):

  • ABTest PK: {system_account_id}#{index_name}#AB_TEST, SK: {ab_test_id}
  • ABTestRulesSet PK: {system_account_id}#{index_name}#AB_TEST_RULES_SET, SK: {rules_set_id}
  • Creates use ConditionExpression="attribute_not_exists(pk)" to prevent 8-char UUID collisions.

Endpoints (components/controller/merchandise/urls.py):

  • CRUD on /ab-test/<index_name> and /ab-test-rules/<index_name> (list, create, update, delete).
  • liveAt query parameter on list endpoint for temporal filtering (DDB FilterExpression).

Validation (enforced on both create and update):

  • At least 2 variants required.
  • Variant trafficPercent values must sum to 100 (±0.01 tolerance).
  • Variant names must be unique within a test.
  • update_ab_test view converts incoming camelCase keys to snake_case via _camel_to_snake_dict before dispatching to the service.

2.2 Cloudflare Worker

Location: ~/Marqo/cloud_data_plane/components/cloudflare-worker/

Search flow (src/index.ts):

  1. Request arrives at POST /api/v1/indexes/:index/search.
  2. Middleware authenticates and loads IndexSettings from INDEX_SETTINGS KV.
  3. Customer-specific interceptor applied if configured.
  4. getMerchandisingInterceptor() initializes merchandising:
    • Checks for merchandisingTriggerContext query param.
    • Loads index config from MERCHANDISING_KV (keyed by systemAccountId-indexName).
    • Returns DefaultMerchandisingInterceptor or NO_OP_INTERCEPTOR.
  5. getAbTestCacheKeySuffix() computes variant assignments early (before cache lookup):
    • Extracts userId (or falls back to sessionId) from request body, fetches trigger and global rules from KV (cached), computes testName(testId)=variantName for each active test.
    • Strips userId/sessionId from cache key body, appends variant assignment suffix.
    • Cache key: SHA256(apiKey + bodyWithoutUserIdSessionId + "|ab:testName(testId)=variantName").
  6. Cache lookup — users in the same variant share a cache entry.
  7. DefaultMerchandisingInterceptor.applyRules():
    • Extracts userId (or falls back to sessionId) from request body for A/B test bucketing; strips both userId and sessionId before forwarding to Marqo.
    • Extracts trigger from query param or request body q.
    • Calls getRulesForTrigger() — hashes triggerContext|trigger to MD5, looks up rules in MERCHANDISING_KV, performs A/B test bucketing if userId or sessionId present.
    • Applies request modifiers (excludes, pins, filters, score modifiers).
    • Sends modified request to Marqo.
    • Applies response modifiers (pin insertion, pagination restore).
    • Adds X-AB-Test-Assignments response header if assignments exist.
  8. Response cached at CF edge.

KV Namespaces:

  • INDEX_SETTINGS — per-subdomain index config (cacheTtl, logLevel).
  • MERCHANDISING_KV — merchandising rules with embedded A/B test data. Key format: ${keyPrefix}|${md5Hash} for trigger rules, ${keyPrefix}|${triggerContext} for global rules.

Hashing: MD5 via Web Crypto API (native to Workers runtime) — used for both trigger key hashing and A/B test bucketing.

2.3 Merchandising Exporter

Pipeline (components/merchandising_exporter/):

DynamoDB → MerchandisingExporter.export_index() → transform → CloudflareKVStoreClient.write_to_kv_store() → Cloudflare KV

Exports trigger rules, global rules, synonyms, index config, and A/B test variant rules embedded in trigger/global KV entries.

2.4 Console (React)

The Console is where users configure A/B tests, add and manage rule sets, view traffic splits, and complete tests. The UI currently supports 2-variant tests (Control + Variant) with a slider for traffic split. The backend supports N-way splits for future UI expansion.


3. Design

3.1 Data Model: Variants

Each ABTest contains a variants list. Each variant defines:

  • name — display label (e.g., "Control", "Holiday Boost"). Must be unique within a test.
  • traffic_percent — percentage of traffic (0-100). All variants must sum to 100.
  • rule_set_ids — list of ABTestRulesSet IDs to apply. Empty list means baseline behavior (control).

A typical 2-variant test:

{
"variants": [
{ "name": "Control", "trafficPercent": 50, "ruleSetIds": [] },
{ "name": "Holiday Boost", "trafficPercent": 50, "ruleSetIds": ["rs-abc", "rs-def"] }
]
}

This generalizes to N-way tests — divide the 0-100 range into N segments for bucketing.

3.2 User Variant Assignment (Bucketing)

Algorithm: Deterministic Hashing

bucketingId = userId ?? sessionId // prefer userId, fall back to sessionId
hash = md5(bucketingId + "|" + testId) // pipe separator prevents input collisions
bucket = parseInt(hash[0:8], 16) % 10000 // first 8 hex chars → 32-bit int → mod 10000 (0.01% granularity)
cumulativePercent = 0
for each variant in test.variants:
cumulativePercent += round(variant.trafficPercent * 100) // scale to 10000, rounded to avoid FP drift
if bucket < cumulativePercent:
return variant
return variants[last] // fallback (should not happen if percentages sum to 100)
  • Hash function: MD5 via Web Crypto API (already used in the Cloudflare worker for merchandising rule key generation).
  • Bucketing identifier: userId is preferred; if absent, sessionId is used as a fallback. The resolved identifier (bucketingId) is concatenated with testId — ensures the same identifier gets different buckets for different tests (no correlated assignments).
  • Deterministic: Same input always produces the same bucket. No need to store assignments.
  • N-way support: Walk through variants accumulating trafficPercent until the bucket falls within a variant's range.
  • Empty variants guard: assignVariant returns null if test.variants is empty; caller skips the test.

3.3 Cloudflare Worker Changes

3.3.1 New Request Parameters

The search API accepts optional parameters in the request body:

interface SearchRequestBody {
q: string;
// ... existing fields
userId?: string; // Optional user ID for A/B test bucketing (preferred)
sessionId?: string; // Optional session ID, used as fallback for A/B test bucketing when userId is absent
}

Search proxy (components/search_proxy): forwards both userId and sessionId from the client request to the Cloudflare worker via defaultBodyValues(). The search proxy also uses these fields locally for personalization (via PIXEL_WORKER) and traffic bucketing (via alias routing).

Cloudflare worker: extracts userId (or falls back to sessionId) for A/B test bucketing, then strips both fields from the request body before forwarding to Marqo (Marqo doesn't recognize these fields).

3.3.2 A/B Test Config Embedded in KV Entries

Rather than storing A/B test data in a separate KV key, variant rules are embedded directly into each trigger's existing KV entry as an ab_tests field. This means the worker gets A/B test data for free — zero additional KV lookups.

Trigger rules KV entry (existing fields + new ab_tests):

{
"triggerContext": "search",
"trigger": "jackets",
"pin_rules": {"1": "prod_123"},
"exclude_rules": ["prod_456"],
"filter_string": "color:(red)",
"score_modifiers": {"price": 0.8},
"inherit_global_score_modifiers": true,
"inherit_global_filters": true,
"ab_tests": [
{
"testId": "abc123",
"testName": "Holiday Boost",
"variants": [
{ "name": "Control", "trafficPercent": 50, "rules": null },
{
"name": "Variant",
"trafficPercent": 50,
"rules": {
"pin_rules": {"1": "prod_789"},
"exclude_rules": ["prod_456"],
"filter_string": "collection:(winter)",
"score_modifiers": {"popularity": 2.0},
"inherit_global_score_modifiers": true,
"inherit_global_filters": true
}
}
]
}
]
}

Global rules KV entry (same pattern):

{
"triggerContext": "search",
"filter_string": "collection:(winter)",
"score_modifiers": {"brand_popularity": 2.0},
"ab_tests": [
{
"testId": "abc123",
"testName": "Holiday Boost",
"variants": [
{ "name": "Control", "trafficPercent": 50, "rules": null },
{
"name": "Variant",
"trafficPercent": 50,
"rules": {
"filter_string": "collection:(summer)",
"score_modifiers": {"trending": 3.0}
}
}
]
}
]
}

Each trigger entry only contains A/B tests that affect that specific trigger — not all tests for the index. A variant with "rules": null is the control (baseline behavior). All variants appear at every scope (even those without rules for that scope) to ensure traffic percentages sum to 100%.

3.3.3 Search-Time Flow

Key design: A/B test bucketing happens inside getRulesForTrigger() after parsing the KV entry — not as a separate step. Global rules and trigger rules are fetched in parallel.

applyRules(fn) → async (req) =>

├── Extract userId/sessionId from request body, resolve bucketingId = userId ?? sessionId, strip before forwarding
├── [existing] Extract trigger from query param / request body

├── Start global rules fetch (parallel, not awaited yet)
├── getRulesForTrigger(triggerContext, trigger, ..., bucketingId)
│ ├── [existing] MD5 hash trigger, fetch KV entry
│ ├── [existing] Parse JSON, validate trigger match
│ ├── If bucketingId present and ab_tests field exists:
│ │ ├── For each test: bucket = md5(bucketingId + testId) % 10000
│ │ ├── Walk variants to find assigned variant
│ │ ├── If variant.rules != null: override base rules with variant rules
│ │ └── Collect ABTestAssignment for response header
│ ├── [existing] Instantiate rule objects from (possibly overridden) rules
│ ├── Await global rules (also with ab_tests bucketing)
│ └── Return { rules, abTestAssignments }

├── [existing] Apply request modifiers
├── [existing] Fetch from Marqo
├── [existing] Apply response modifiers
└── Add X-AB-Test-Assignments response header

3.3.4 Rule Replacement (No Merging)

When a user is assigned to a non-control variant, the variant's rules fully replace the base trigger rules ({...variant.rules, triggerContext, trigger}). No fields from the base rules leak through — the variant must provide a complete rule set. Only triggerContext and trigger are preserved from the base (needed for the hash collision check). The effective rules are then instantiated and applied in the standard pipeline:

  1. Variant-specific rules (fully replace base rules if user is in treatment)
  2. Global merchandising rules (inherited unless variant provides inherit_global_filters / inherit_global_score_modifiers flags)
  3. Base search config (from settings)

If a user is in the control variant (rules: null), the base trigger rules apply unchanged (baseline behavior).

For global rules, the same full-replacement semantics apply: a variant's global rules fully replace the base global rules. Only triggerContext is preserved.

When a variant references multiple rule sets in the Controller, the exporter pre-resolves and merges them at export time — the worker sees a single flattened rules object per variant. This ensures the variant always has a complete rule set for full replacement.

3.3.5 Response Headers

The Cloudflare worker adds a response header for observability and client-side tracking:

X-AB-Test-Assignments: test-001=Control,test-002=Holiday%20Boost

Both testId and variantName are URL-encoded via encodeURIComponent to prevent commas or equals signs in names from corrupting the header format. The header is added to both 200 and non-200 responses.

3.4 Live Test Filtering API

Endpoint

The existing list endpoint supports filtering to tests that were active at a specific point in time:

GET /api/merchandise/ab-test/<index_name>?accountId=<account_id>&liveAt=<ISO 8601 timestamp>

Query Parameters:

ParameterTypeRequiredDescription
accountIdstringYesAccount ID
liveAtstringNoISO 8601 timestamp — when provided, returns only tests that were live at this time

Filter logic (applied as a DynamoDB FilterExpression server-side with pagination):

  • createdAt <= liveAt — test had started
  • completedAt does not exist OR completedAt > liveAt — test had not yet been completed

When liveAt is omitted, all tests for the index are returned (no filtering).

Implementation: MerchandisingService.list_ab_tests_for_index(live_at=...) passes the filter to DynamoDB via Attr("createdAt").lte(live_at) & (Attr("completedAt").not_exists() | Attr("completedAt").gt(live_at)). Results are paginated via LastEvaluatedKey. ISO 8601 strings sort lexicographically, so string comparison in DDB is correct.

Response: Standard list of ABTest objects:

[
{
"id": "a1b2c3d4",
"name": "Holiday boost rules",
"status": "IN_PROGRESS",
"variants": [
{ "name": "Control", "trafficPercent": 50, "ruleSetIds": [] },
{ "name": "Treatment", "trafficPercent": 50, "ruleSetIds": ["rs-abc"] }
],
"createdAt": "2026-04-01T00:00:00Z",
"completedAt": null
}
]

3.5 Merchandising Exporter Changes

The exporter embeds A/B test variant rules directly into existing trigger and global rules KV entries. This is done in two phases:

Phase 1: Build A/B test index (_build_ab_test_data()):

Two-pass algorithm:

  1. Pass 1 — Resolve: Fetch all IN_PROGRESS AB tests and all rule sets for the index (2 DDB queries). For each variant's rule sets, resolve trigger rules and global rules into KV-ready format (camelCase input from DDB → snake_case output for worker). Track which triggers and contexts each test touches.

  2. Pass 2 — Complete variant lists: For each trigger/context, build the full variant list including ALL test variants — with rules set to the resolved object if the variant has rules for that scope, or None if it doesn't. This ensures traffic percentages always sum to 100% at every scope.

Output: Two lookup structures:

  • by_trigger: {md5_hash → [test entries]}
  • by_context: {trigger_context → [test entries]}

Phase 2: Embed during KV serialization (_trigger_rules_data() / _global_rules_data()):

When serializing each trigger's KV value, look up the trigger's MD5 hash in by_trigger. If A/B tests exist for that trigger, add the ab_tests field to the JSON value. Same for global rules via by_context.

DDB queries added: 2 (one for tests, one for rule sets) — both are single-partition queries. Additional KV writes: 0 — the A/B test data is included in the existing bulk PUT.

3.6 Data Flow Summary

┌─────────────────────────────────────────────────────────────────┐
│ Console UI │
│ Create/manage A/B tests, complete tests │
└─────────┬───────────────────────────────────────────────────────┘
│ CRUD + liveAt filtering

┌─────────────────┐ ┌───────────────────────────────┐
│ Controller │ │ Cloudflare Worker │
│ (Django) │ │ (cloud_data_plane/ │
│ │ │ cloudflare-worker/) │
│ ABTest CRUD │ │ │
│ ABTestRulesSet │ │ ┌───────────────────────────┐ │
│ CRUD │ │ │ getRulesForTrigger() │ │
│ liveAt filter │ │ │ - Fetch KV (1 read) │ │
│ │ │ │ - Parse ab_tests field │ │
└────────┬────────┘ │ │ - MD5 bucketing on userId │ │
│ │ (fallback to sessionId) │ │
│ │ │ - Apply variant rules │ │
▼ │ │ - X-AB-Test-Assignments │ │
┌─────────────────┐ │ └───────────────────────────┘ │
│ Merchandising │ │ │
│ Exporter │───embeds───────│►┌───────────────────────────┐ │
│ (Lambda, 5min) │ AB test data │ │ MERCHANDISING_KV │ │
│ │ into trigger │ │ {acct}-{idx}|{md5} + │ │
│ │ & global KV │ │ ab_tests field embedded │ │
└─────────────────┘ entries │ └───────────────────────────┘ │
└───────────────────────────────┘

4. API Contracts

4.1 Search API (Modified)

POST /api/v1/indexes/{index}/search

Request body additions:

{
"q": "winter jackets",
"userId": "user-abc-123",
"sessionId": "sess-xyz-789"
}

Both fields are forwarded by the search proxy to the Cloudflare worker. The worker uses userId for A/B test bucketing (falling back to sessionId when userId is absent) and strips both fields before forwarding to Marqo. Neither field reaches Marqo.

Response header additions (URL-encoded):

X-AB-Test-Assignments: test-001=Control,test-002=Holiday%20Boost

No changes to the response body. The search results are simply the result of applying the variant's merchandising rules.

4.2 Live Test Filtering

GET /api/merchandise/ab-test/my-index?accountId=acct-123&liveAt=2026-04-07T12:00:00Z

Response: See Section 3.4.


5. Performance Analysis

KV Lookup Cost

OperationLatencyFrequency
Fetch trigger rules from KV (includes embedded A/B test data)~5-10msOnce per search (cached)
MD5 bucketing computation<0.1msPer active test
Additional KV lookups for A/B tests0msNone — embedded in existing fetch

Total added latency from A/B testing: <0.1ms per active test (hash computation only). No additional KV reads. The existing trigger rules KV fetch already includes the ab_tests field. Global rules and trigger rules are fetched in parallel.

Caching Strategy

  • Variant-aware cache keys: The worker computes variant assignments before the cache lookup via getAbTestCacheKeySuffix(). The cache key is SHA256(apiKey + bodyWithoutUserIdSessionId + "|ab:testName(testId)=variantName"). Both trigger-level and global-level A/B tests are included. This means users in the same variant share a cache entry (typically 2-3 entries per test), while userId/sessionId are stripped from the cache key body to prevent per-user cache fragmentation.
  • KV rule data caching: A/B test data is part of the trigger/global rules KV entries and benefits from the existing MerchandisingRuleStore three-tier caching (CF Cache API → KV → preemptive background fetch). The MerchandisingRuleStore also maintains an in-memory cache (Map) scoped to the current request, so when getAbTestCacheKeySuffix() fetches trigger rules and applyRules() later requests the same key, the second call returns instantly from memory — no duplicate KV or CF Cache API lookups, even on a cold cache where cache.put hasn't completed yet.
  • Cache invalidation: When the exporter updates a trigger's KV entry (because A/B test data changed), the existing stale-while-revalidate strategy applies.
  • Trade-off: Trigger KV entries become slightly larger when A/B tests are active. For a typical 2-variant test with one rule set, this adds ~200-500 bytes to the affected trigger entries.

Scalability

  • Bucketing is a pure computation (hash + modulo) — O(N) per test where N is variant count (typically 2-3).
  • KV entry size increase: ~200-500 bytes per A/B test per trigger. Negligible even with 10 concurrent tests.
  • No per-user state stored. Stateless design scales horizontally.
  • Only triggers affected by A/B tests have the extra data — unaffected triggers are unchanged.

6. Edge Cases and Risks

CaseHandling
userId not providedFalls back to sessionId for bucketing. If neither is provided, no A/B test assignment; ab_tests field ignored.
userId is empty stringTreated as no userId (falsy in JS); falls back to sessionId. If sessionId is also empty/absent, no bucketing performed.
No active A/B tests for indexNo ab_tests field in KV entries; bucketing code skipped entirely.
Test with empty variants arrayassignVariant returns null; test is skipped. No assignment recorded.
Exporter enforces one active test per triggerMultiple tests on same trigger prevented at export time. If stale data has multiple, last test's rules win.
Test created/deleted between export cyclesStale KV entries used until next export (~5 min cycle); acceptable lag.
User in control variantvariant.rules is null; base trigger rules apply unchanged (baseline behavior).
Variant with multiple rule setsPre-resolved and merged at export time; worker sees a single flattened rules object.
Variant names with commas/equals in headerURL-encoded via encodeURIComponent; consumers must decode.
Very skewed traffic split (e.g., 1/99)Hash bucketing with 10000 buckets handles this; 1% of users deterministically assigned.
userId stripped before MarqoSearch proxy forwards userId to the Cloudflare worker; the merchandising interceptor strips it before forwarding to Marqo.
sessionId used as bucketing fallbackSearch proxy forwards sessionId to the Cloudflare worker; if userId is absent, sessionId is used for A/B test bucketing. Both fields are stripped before forwarding to Marqo.
Cache key excludes userId/sessionIdBoth fields are stripped from the cache key body. Variant assignments (testName(testId)=variantName) from both trigger and global scopes are appended instead, so users in the same variant share a cache entry.
Duplicate variant namesRejected by controller validation (400 response).
UUID collision on createDDB ConditionExpression prevents silent overwrites; returns 409 Conflict.
DDB query paginationlist_ab_tests_for_index paginates via LastEvaluatedKey for correctness.

7. Rollout

Phase 1: KV Export ✅ Implemented

  1. ✅ Merchandising exporter extended with _build_ab_test_data() — two-pass algorithm resolves A/B test rule sets and embeds into trigger/global KV entries.
  2. ✅ Two new DDB queries: get_ab_tests() and get_ab_test_rules_sets() in dynamodb_client.py.
  3. ✅ KV entries include ab_tests field only for triggers affected by active tests.
  4. Deploy and verify KV data is correct.

Phase 2: Search-Time Variant Assignment ✅ Implemented

  1. getRulesForTrigger() reads ab_tests from KV entry, performs bucketing, applies variant rules.
  2. DefaultMerchandisingInterceptor.applyRules() extracts userId from request body, passes to rule loader, strips before forwarding.
  3. X-AB-Test-Assignments response header with URL-encoded values added to all responses when assignments exist.
  4. ✅ Global rules and trigger rules fetched in parallel for minimal latency.
  5. ✅ All existing tests pass with updated mocks + 42 new A/B test unit tests.
  6. Gate behind feature flag in IndexSettings for gradual rollout.

Phase 3: Live Test Filtering ✅ Implemented

  1. list_ab_tests_for_index accepts optional liveAt query parameter.
  2. ✅ Filtering applied at DDB level via FilterExpression with pagination.
  3. ✅ 13 unit tests covering boundary conditions, mixed lifecycles, millisecond precision.

Phase 4: Observability and Guardrails

  1. Export-time conflict detection and warnings.
  2. Metrics: variant assignment distribution, latency impact.
  3. Console UI for viewing live assignment distributions.

8. Alternatives Considered

A. Store Assignments in DynamoDB

Instead of deterministic hashing, store each user's variant assignment in DDB.

Pros: Can handle complex assignment logic, supports mid-test reassignment.
Cons: Adds a DDB read to every search request (~5-15ms), requires managing state for potentially millions of users, breaks the stateless worker model.

Rejected: Deterministic hashing is simpler, faster, and sufficient for the stated requirements.

B. Variant Assignment in Controller

The Controller computes and returns assignments; Cloudflare worker receives variant ID directly.

Pros: Centralizes logic in one backend.
Cons: Adds a synchronous dependency on Controller during search (latency and availability risk). Controller is not on the edge.

Rejected: Cloudflare worker should be self-sufficient for the hot path.

C. Separate A/B Test KV Key Per Index

Store all A/B test configs in a single key ({accountId}-{indexName}|ab-tests) and fetch it alongside the trigger rules.

Pros: Clean separation; trigger KV entries don't grow; A/B test data is self-contained.
Cons: Requires an additional KV read per search (even if cached, it's an extra hop). Worker must coordinate two fetches. KV entry contains data for all triggers, most of which aren't relevant to the current search.

Rejected: Embedding in trigger entries adds zero KV lookups and keeps data locality — each trigger entry only contains the A/B tests that affect it.

D. xxHash for Bucketing

Use xxHash32 instead of MD5 for the bucketing hash.

Pros: Faster (~10x), purpose-built for non-cryptographic hashing.
Cons: Not available in the Workers runtime; would require a WASM dependency or polyfill. MD5 is already available via Web Crypto API and the latency difference (<0.1ms vs <0.01ms) is negligible in context.

Rejected: MD5 is fast enough and already available natively.