A/B Test Runtime Design Doc
Author: Ruchira Jayasekara
Date: 2026-04-07
Updated: 2026-04-08
Status: Implemented
1. Problem Statement
A/B tests are partially implemented: the Controller has full CRUD for ABTest and ABTestRulesSet entities, and the Console has UI for creating/managing them. However, no runtime variant assignment exists — when a user performs a search, the Cloudflare worker has no mechanism to:
- Determine which A/B test(s) apply to the user.
- Select the appropriate merchandising rules variant.
- Apply variant-specific rules to the search request.
Additionally, there is no API for downstream consumers (analytics, reporting) to query which A/B tests were active at a given point in time.
Goals
- Search-time variant assignment: The Cloudflare worker accepts a
userId(or falls back tosessionId), determines the user's A/B test bucket, and applies the corresponding merchandising rules. - Live test filtering: An endpoint that accepts an ISO 8601 timestamp and returns tests that were active (started but not completed) at that time.
- Deterministic bucketing: The same user always lands in the same variant for a given test (no randomness per-request).
- Low latency: Variant assignment must add zero additional KV lookups — data is embedded in existing trigger KV entries.
2. Architecture
2.1 Controller (Django)
Models (components/controller/merchandise/services/models.py):
ABTestVariant:name(str),traffic_percent(float 0-100),rule_set_ids(list of ABTestRulesSet IDs — empty list = control/baseline).ABTest:id,system_account_id,index_name,name,description,status(IN_PROGRESS | COMPLETED),variants(list of ABTestVariant),created_at,created_by,completed_at,deployed_at.ABTestRulesSet:id,system_account_id,index_name,trigger_context,name,description,global_rules,trigger_rules.
DynamoDB (table: MERCHANDISING_TABLE_NAME):
ABTestPK:{system_account_id}#{index_name}#AB_TEST, SK:{ab_test_id}ABTestRulesSetPK:{system_account_id}#{index_name}#AB_TEST_RULES_SET, SK:{rules_set_id}- Creates use
ConditionExpression="attribute_not_exists(pk)"to prevent 8-char UUID collisions.
Endpoints (components/controller/merchandise/urls.py):
- CRUD on
/ab-test/<index_name>and/ab-test-rules/<index_name>(list, create, update, delete). liveAtquery parameter on list endpoint for temporal filtering (DDBFilterExpression).
Validation (enforced on both create and update):
- At least 2 variants required.
- Variant
trafficPercentvalues must sum to 100 (±0.01 tolerance). - Variant names must be unique within a test.
update_ab_testview converts incoming camelCase keys to snake_case via_camel_to_snake_dictbefore dispatching to the service.
2.2 Cloudflare Worker
Location: ~/Marqo/cloud_data_plane/components/cloudflare-worker/
Search flow (src/index.ts):
- Request arrives at
POST /api/v1/indexes/:index/search. - Middleware authenticates and loads
IndexSettingsfromINDEX_SETTINGSKV. - Customer-specific interceptor applied if configured.
getMerchandisingInterceptor()initializes merchandising:- Checks for
merchandisingTriggerContextquery param. - Loads index config from
MERCHANDISING_KV(keyed bysystemAccountId-indexName). - Returns
DefaultMerchandisingInterceptororNO_OP_INTERCEPTOR.
- Checks for
getAbTestCacheKeySuffix()computes variant assignments early (before cache lookup):- Extracts
userId(or falls back tosessionId) from request body, fetches trigger and global rules from KV (cached), computestestName(testId)=variantNamefor each active test. - Strips
userId/sessionIdfrom cache key body, appends variant assignment suffix. - Cache key:
SHA256(apiKey + bodyWithoutUserIdSessionId + "|ab:testName(testId)=variantName").
- Extracts
- Cache lookup — users in the same variant share a cache entry.
DefaultMerchandisingInterceptor.applyRules():- Extracts
userId(or falls back tosessionId) from request body for A/B test bucketing; strips bothuserIdandsessionIdbefore forwarding to Marqo. - Extracts trigger from query param or request body
q. - Calls
getRulesForTrigger()— hashestriggerContext|triggerto MD5, looks up rules inMERCHANDISING_KV, performs A/B test bucketing ifuserIdorsessionIdpresent. - Applies request modifiers (excludes, pins, filters, score modifiers).
- Sends modified request to Marqo.
- Applies response modifiers (pin insertion, pagination restore).
- Adds
X-AB-Test-Assignmentsresponse header if assignments exist.
- Extracts
- Response cached at CF edge.
KV Namespaces:
INDEX_SETTINGS— per-subdomain index config (cacheTtl, logLevel).MERCHANDISING_KV— merchandising rules with embedded A/B test data. Key format:${keyPrefix}|${md5Hash}for trigger rules,${keyPrefix}|${triggerContext}for global rules.
Hashing: MD5 via Web Crypto API (native to Workers runtime) — used for both trigger key hashing and A/B test bucketing.
2.3 Merchandising Exporter
Pipeline (components/merchandising_exporter/):
DynamoDB → MerchandisingExporter.export_index() → transform → CloudflareKVStoreClient.write_to_kv_store() → Cloudflare KV
Exports trigger rules, global rules, synonyms, index config, and A/B test variant rules embedded in trigger/global KV entries.
2.4 Console (React)
The Console is where users configure A/B tests, add and manage rule sets, view traffic splits, and complete tests. The UI currently supports 2-variant tests (Control + Variant) with a slider for traffic split. The backend supports N-way splits for future UI expansion.
3. Design
3.1 Data Model: Variants
Each ABTest contains a variants list. Each variant defines:
name— display label (e.g., "Control", "Holiday Boost"). Must be unique within a test.traffic_percent— percentage of traffic (0-100). All variants must sum to 100.rule_set_ids— list ofABTestRulesSetIDs to apply. Empty list means baseline behavior (control).
A typical 2-variant test:
{
"variants": [
{ "name": "Control", "trafficPercent": 50, "ruleSetIds": [] },
{ "name": "Holiday Boost", "trafficPercent": 50, "ruleSetIds": ["rs-abc", "rs-def"] }
]
}
This generalizes to N-way tests — divide the 0-100 range into N segments for bucketing.
3.2 User Variant Assignment (Bucketing)
Algorithm: Deterministic Hashing
bucketingId = userId ?? sessionId // prefer userId, fall back to sessionId
hash = md5(bucketingId + "|" + testId) // pipe separator prevents input collisions
bucket = parseInt(hash[0:8], 16) % 10000 // first 8 hex chars → 32-bit int → mod 10000 (0.01% granularity)
cumulativePercent = 0
for each variant in test.variants:
cumulativePercent += round(variant.trafficPercent * 100) // scale to 10000, rounded to avoid FP drift
if bucket < cumulativePercent:
return variant
return variants[last] // fallback (should not happen if percentages sum to 100)
- Hash function: MD5 via Web Crypto API (already used in the Cloudflare worker for merchandising rule key generation).
- Bucketing identifier:
userIdis preferred; if absent,sessionIdis used as a fallback. The resolved identifier (bucketingId) is concatenated withtestId— ensures the same identifier gets different buckets for different tests (no correlated assignments). - Deterministic: Same input always produces the same bucket. No need to store assignments.
- N-way support: Walk through variants accumulating
trafficPercentuntil the bucket falls within a variant's range. - Empty variants guard:
assignVariantreturnsnulliftest.variantsis empty; caller skips the test.
3.3 Cloudflare Worker Changes
3.3.1 New Request Parameters
The search API accepts optional parameters in the request body:
interface SearchRequestBody {
q: string;
// ... existing fields
userId?: string; // Optional user ID for A/B test bucketing (preferred)
sessionId?: string; // Optional session ID, used as fallback for A/B test bucketing when userId is absent
}
Search proxy (components/search_proxy): forwards both userId and sessionId from the client request to the Cloudflare worker via defaultBodyValues(). The search proxy also uses these fields locally for personalization (via PIXEL_WORKER) and traffic bucketing (via alias routing).
Cloudflare worker: extracts userId (or falls back to sessionId) for A/B test bucketing, then strips both fields from the request body before forwarding to Marqo (Marqo doesn't recognize these fields).
3.3.2 A/B Test Config Embedded in KV Entries
Rather than storing A/B test data in a separate KV key, variant rules are embedded directly into each trigger's existing KV entry as an ab_tests field. This means the worker gets A/B test data for free — zero additional KV lookups.
Trigger rules KV entry (existing fields + new ab_tests):
{
"triggerContext": "search",
"trigger": "jackets",
"pin_rules": {"1": "prod_123"},
"exclude_rules": ["prod_456"],
"filter_string": "color:(red)",
"score_modifiers": {"price": 0.8},
"inherit_global_score_modifiers": true,
"inherit_global_filters": true,
"ab_tests": [
{
"testId": "abc123",
"testName": "Holiday Boost",
"variants": [
{ "name": "Control", "trafficPercent": 50, "rules": null },
{
"name": "Variant",
"trafficPercent": 50,
"rules": {
"pin_rules": {"1": "prod_789"},
"exclude_rules": ["prod_456"],
"filter_string": "collection:(winter)",
"score_modifiers": {"popularity": 2.0},
"inherit_global_score_modifiers": true,
"inherit_global_filters": true
}
}
]
}
]
}
Global rules KV entry (same pattern):
{
"triggerContext": "search",
"filter_string": "collection:(winter)",
"score_modifiers": {"brand_popularity": 2.0},
"ab_tests": [
{
"testId": "abc123",
"testName": "Holiday Boost",
"variants": [
{ "name": "Control", "trafficPercent": 50, "rules": null },
{
"name": "Variant",
"trafficPercent": 50,
"rules": {
"filter_string": "collection:(summer)",
"score_modifiers": {"trending": 3.0}
}
}
]
}
]
}
Each trigger entry only contains A/B tests that affect that specific trigger — not all tests for the index. A variant with "rules": null is the control (baseline behavior). All variants appear at every scope (even those without rules for that scope) to ensure traffic percentages sum to 100%.
3.3.3 Search-Time Flow
Key design: A/B test bucketing happens inside getRulesForTrigger() after parsing the KV entry — not as a separate step. Global rules and trigger rules are fetched in parallel.
applyRules(fn) → async (req) =>
│
├── Extract userId/sessionId from request body, resolve bucketingId = userId ?? sessionId, strip before forwarding
├── [existing] Extract trigger from query param / request body
│
├── Start global rules fetch (parallel, not awaited yet)
├── getRulesForTrigger(triggerContext, trigger, ..., bucketingId)
│ ├── [existing] MD5 hash trigger, fetch KV entry
│ ├── [existing] Parse JSON, validate trigger match
│ ├── If bucketingId present and ab_tests field exists:
│ │ ├── For each test: bucket = md5(bucketingId + testId) % 10000
│ │ ├── Walk variants to find assigned variant
│ │ ├── If variant.rules != null: override base rules with variant rules
│ │ └── Collect ABTestAssignment for response header
│ ├── [existing] Instantiate rule objects from (possibly overridden) rules
│ ├── Await global rules (also with ab_tests bucketing)
│ └── Return { rules, abTestAssignments }
│
├── [existing] Apply request modifiers
├── [existing] Fetch from Marqo
├── [existing] Apply response modifiers
└── Add X-AB-Test-Assignments response header
3.3.4 Rule Replacement (No Merging)
When a user is assigned to a non-control variant, the variant's rules fully replace the base trigger rules ({...variant.rules, triggerContext, trigger}). No fields from the base rules leak through — the variant must provide a complete rule set. Only triggerContext and trigger are preserved from the base (needed for the hash collision check). The effective rules are then instantiated and applied in the standard pipeline:
- Variant-specific rules (fully replace base rules if user is in treatment)
- Global merchandising rules (inherited unless variant provides
inherit_global_filters/inherit_global_score_modifiersflags) - Base search config (from settings)
If a user is in the control variant (rules: null), the base trigger rules apply unchanged (baseline behavior).
For global rules, the same full-replacement semantics apply: a variant's global rules fully replace the base global rules. Only triggerContext is preserved.
When a variant references multiple rule sets in the Controller, the exporter pre-resolves and merges them at export time — the worker sees a single flattened rules object per variant. This ensures the variant always has a complete rule set for full replacement.
3.3.5 Response Headers
The Cloudflare worker adds a response header for observability and client-side tracking:
X-AB-Test-Assignments: test-001=Control,test-002=Holiday%20Boost
Both testId and variantName are URL-encoded via encodeURIComponent to prevent commas or equals signs in names from corrupting the header format. The header is added to both 200 and non-200 responses.
3.4 Live Test Filtering API
Endpoint
The existing list endpoint supports filtering to tests that were active at a specific point in time:
GET /api/merchandise/ab-test/<index_name>?accountId=<account_id>&liveAt=<ISO 8601 timestamp>
Query Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
accountId | string | Yes | Account ID |
liveAt | string | No | ISO 8601 timestamp — when provided, returns only tests that were live at this time |
Filter logic (applied as a DynamoDB FilterExpression server-side with pagination):
createdAt <= liveAt— test had startedcompletedAtdoes not exist ORcompletedAt > liveAt— test had not yet been completed
When liveAt is omitted, all tests for the index are returned (no filtering).
Implementation: MerchandisingService.list_ab_tests_for_index(live_at=...) passes the filter to DynamoDB via Attr("createdAt").lte(live_at) & (Attr("completedAt").not_exists() | Attr("completedAt").gt(live_at)). Results are paginated via LastEvaluatedKey. ISO 8601 strings sort lexicographically, so string comparison in DDB is correct.
Response: Standard list of ABTest objects:
[
{
"id": "a1b2c3d4",
"name": "Holiday boost rules",
"status": "IN_PROGRESS",
"variants": [
{ "name": "Control", "trafficPercent": 50, "ruleSetIds": [] },
{ "name": "Treatment", "trafficPercent": 50, "ruleSetIds": ["rs-abc"] }
],
"createdAt": "2026-04-01T00:00:00Z",
"completedAt": null
}
]
3.5 Merchandising Exporter Changes
The exporter embeds A/B test variant rules directly into existing trigger and global rules KV entries. This is done in two phases:
Phase 1: Build A/B test index (_build_ab_test_data()):
Two-pass algorithm:
-
Pass 1 — Resolve: Fetch all
IN_PROGRESSAB tests and all rule sets for the index (2 DDB queries). For each variant's rule sets, resolve trigger rules and global rules into KV-ready format (camelCase input from DDB → snake_case output for worker). Track which triggers and contexts each test touches. -
Pass 2 — Complete variant lists: For each trigger/context, build the full variant list including ALL test variants — with
rulesset to the resolved object if the variant has rules for that scope, orNoneif it doesn't. This ensures traffic percentages always sum to 100% at every scope.
Output: Two lookup structures:
by_trigger:{md5_hash → [test entries]}by_context:{trigger_context → [test entries]}
Phase 2: Embed during KV serialization (_trigger_rules_data() / _global_rules_data()):
When serializing each trigger's KV value, look up the trigger's MD5 hash in by_trigger. If A/B tests exist for that trigger, add the ab_tests field to the JSON value. Same for global rules via by_context.
DDB queries added: 2 (one for tests, one for rule sets) — both are single-partition queries. Additional KV writes: 0 — the A/B test data is included in the existing bulk PUT.
3.6 Data Flow Summary
┌─────────────────────────────────────────────────────────────────┐
│ Console UI │
│ Create/manage A/B tests, complete tests │
└─────────┬───────────────────────────────────────────────────────┘
│ CRUD + liveAt filtering
▼
┌─────────────────┐ ┌───────────────────────────────┐
│ Controller │ │ Cloudflare Worker │
│ (Django) │ │ (cloud_data_plane/ │
│ │ │ cloudflare-worker/) │
│ ABTest CRUD │ │ │
│ ABTestRulesSet │ │ ┌───────────────────────────┐ │
│ CRUD │ │ │ getRulesForTrigger() │ │
│ liveAt filter │ │ │ - Fetch KV (1 read) │ │
│ │ │ │ - Parse ab_tests field │ │
└────────┬────────┘ │ │ - MD5 bucketing on userId │ │
│ │ (fallback to sessionId) │ │
│ │ │ - Apply variant rules │ │
▼ │ │ - X-AB-Test-Assignments │ │
┌─────────────────┐ │ └───────────────────────────┘ │
│ Merchandising │ │ │
│ Exporter │───embeds───────│►┌───────────────────────────┐ │
│ (Lambda, 5min) │ AB test data │ │ MERCHANDISING_KV │ │
│ │ into trigger │ │ {acct}-{idx}|{md5} + │ │
│ │ & global KV │ │ ab_tests field embedded │ │
└─────────────────┘ entries │ └───────────────────────────┘ │
└───────────────────────────────┘
4. API Contracts
4.1 Search API (Modified)
POST /api/v1/indexes/{index}/search
Request body additions:
{
"q": "winter jackets",
"userId": "user-abc-123",
"sessionId": "sess-xyz-789"
}
Both fields are forwarded by the search proxy to the Cloudflare worker. The worker uses userId for A/B test bucketing (falling back to sessionId when userId is absent) and strips both fields before forwarding to Marqo. Neither field reaches Marqo.
Response header additions (URL-encoded):
X-AB-Test-Assignments: test-001=Control,test-002=Holiday%20Boost
No changes to the response body. The search results are simply the result of applying the variant's merchandising rules.
4.2 Live Test Filtering
GET /api/merchandise/ab-test/my-index?accountId=acct-123&liveAt=2026-04-07T12:00:00Z
Response: See Section 3.4.
5. Performance Analysis
KV Lookup Cost
| Operation | Latency | Frequency |
|---|---|---|
| Fetch trigger rules from KV (includes embedded A/B test data) | ~5-10ms | Once per search (cached) |
| MD5 bucketing computation | <0.1ms | Per active test |
| Additional KV lookups for A/B tests | 0ms | None — embedded in existing fetch |
Total added latency from A/B testing: <0.1ms per active test (hash computation only). No additional KV reads. The existing trigger rules KV fetch already includes the ab_tests field. Global rules and trigger rules are fetched in parallel.
Caching Strategy
- Variant-aware cache keys: The worker computes variant assignments before the cache lookup via
getAbTestCacheKeySuffix(). The cache key isSHA256(apiKey + bodyWithoutUserIdSessionId + "|ab:testName(testId)=variantName"). Both trigger-level and global-level A/B tests are included. This means users in the same variant share a cache entry (typically 2-3 entries per test), whileuserId/sessionIdare stripped from the cache key body to prevent per-user cache fragmentation. - KV rule data caching: A/B test data is part of the trigger/global rules KV entries and benefits from the existing
MerchandisingRuleStorethree-tier caching (CF Cache API → KV → preemptive background fetch). TheMerchandisingRuleStorealso maintains an in-memory cache (Map) scoped to the current request, so whengetAbTestCacheKeySuffix()fetches trigger rules andapplyRules()later requests the same key, the second call returns instantly from memory — no duplicate KV or CF Cache API lookups, even on a cold cache wherecache.puthasn't completed yet. - Cache invalidation: When the exporter updates a trigger's KV entry (because A/B test data changed), the existing stale-while-revalidate strategy applies.
- Trade-off: Trigger KV entries become slightly larger when A/B tests are active. For a typical 2-variant test with one rule set, this adds ~200-500 bytes to the affected trigger entries.
Scalability
- Bucketing is a pure computation (hash + modulo) — O(N) per test where N is variant count (typically 2-3).
- KV entry size increase: ~200-500 bytes per A/B test per trigger. Negligible even with 10 concurrent tests.
- No per-user state stored. Stateless design scales horizontally.
- Only triggers affected by A/B tests have the extra data — unaffected triggers are unchanged.
6. Edge Cases and Risks
| Case | Handling |
|---|---|
userId not provided | Falls back to sessionId for bucketing. If neither is provided, no A/B test assignment; ab_tests field ignored. |
userId is empty string | Treated as no userId (falsy in JS); falls back to sessionId. If sessionId is also empty/absent, no bucketing performed. |
| No active A/B tests for index | No ab_tests field in KV entries; bucketing code skipped entirely. |
| Test with empty variants array | assignVariant returns null; test is skipped. No assignment recorded. |
| Exporter enforces one active test per trigger | Multiple tests on same trigger prevented at export time. If stale data has multiple, last test's rules win. |
| Test created/deleted between export cycles | Stale KV entries used until next export (~5 min cycle); acceptable lag. |
| User in control variant | variant.rules is null; base trigger rules apply unchanged (baseline behavior). |
| Variant with multiple rule sets | Pre-resolved and merged at export time; worker sees a single flattened rules object. |
| Variant names with commas/equals in header | URL-encoded via encodeURIComponent; consumers must decode. |
| Very skewed traffic split (e.g., 1/99) | Hash bucketing with 10000 buckets handles this; 1% of users deterministically assigned. |
userId stripped before Marqo | Search proxy forwards userId to the Cloudflare worker; the merchandising interceptor strips it before forwarding to Marqo. |
sessionId used as bucketing fallback | Search proxy forwards sessionId to the Cloudflare worker; if userId is absent, sessionId is used for A/B test bucketing. Both fields are stripped before forwarding to Marqo. |
Cache key excludes userId/sessionId | Both fields are stripped from the cache key body. Variant assignments (testName(testId)=variantName) from both trigger and global scopes are appended instead, so users in the same variant share a cache entry. |
| Duplicate variant names | Rejected by controller validation (400 response). |
| UUID collision on create | DDB ConditionExpression prevents silent overwrites; returns 409 Conflict. |
| DDB query pagination | list_ab_tests_for_index paginates via LastEvaluatedKey for correctness. |
7. Rollout
Phase 1: KV Export ✅ Implemented
- ✅ Merchandising exporter extended with
_build_ab_test_data()— two-pass algorithm resolves A/B test rule sets and embeds into trigger/global KV entries. - ✅ Two new DDB queries:
get_ab_tests()andget_ab_test_rules_sets()indynamodb_client.py. - ✅ KV entries include
ab_testsfield only for triggers affected by active tests. - Deploy and verify KV data is correct.
Phase 2: Search-Time Variant Assignment ✅ Implemented
- ✅
getRulesForTrigger()readsab_testsfrom KV entry, performs bucketing, applies variant rules. - ✅
DefaultMerchandisingInterceptor.applyRules()extractsuserIdfrom request body, passes to rule loader, strips before forwarding. - ✅
X-AB-Test-Assignmentsresponse header with URL-encoded values added to all responses when assignments exist. - ✅ Global rules and trigger rules fetched in parallel for minimal latency.
- ✅ All existing tests pass with updated mocks + 42 new A/B test unit tests.
- Gate behind feature flag in
IndexSettingsfor gradual rollout.
Phase 3: Live Test Filtering ✅ Implemented
- ✅
list_ab_tests_for_indexaccepts optionalliveAtquery parameter. - ✅ Filtering applied at DDB level via
FilterExpressionwith pagination. - ✅ 13 unit tests covering boundary conditions, mixed lifecycles, millisecond precision.
Phase 4: Observability and Guardrails
- Export-time conflict detection and warnings.
- Metrics: variant assignment distribution, latency impact.
- Console UI for viewing live assignment distributions.
8. Alternatives Considered
A. Store Assignments in DynamoDB
Instead of deterministic hashing, store each user's variant assignment in DDB.
Pros: Can handle complex assignment logic, supports mid-test reassignment.
Cons: Adds a DDB read to every search request (~5-15ms), requires managing state for potentially millions of users, breaks the stateless worker model.
Rejected: Deterministic hashing is simpler, faster, and sufficient for the stated requirements.
B. Variant Assignment in Controller
The Controller computes and returns assignments; Cloudflare worker receives variant ID directly.
Pros: Centralizes logic in one backend.
Cons: Adds a synchronous dependency on Controller during search (latency and availability risk). Controller is not on the edge.
Rejected: Cloudflare worker should be self-sufficient for the hot path.
C. Separate A/B Test KV Key Per Index
Store all A/B test configs in a single key ({accountId}-{indexName}|ab-tests) and fetch it alongside the trigger rules.
Pros: Clean separation; trigger KV entries don't grow; A/B test data is self-contained.
Cons: Requires an additional KV read per search (even if cached, it's an extra hop). Worker must coordinate two fetches. KV entry contains data for all triggers, most of which aren't relevant to the current search.
Rejected: Embedding in trigger entries adds zero KV lookups and keeps data locality — each trigger entry only contains the A/B tests that affect it.
D. xxHash for Bucketing
Use xxHash32 instead of MD5 for the bucketing hash.
Pros: Faster (~10x), purpose-built for non-cryptographic hashing.
Cons: Not available in the Workers runtime; would require a WASM dependency or polyfill. MD5 is already available via Web Crypto API and the latency difference (<0.1ms vs <0.01ms) is negligible in context.
Rejected: MD5 is fast enough and already available natively.