Facet Rules Merchandising — Phased Implementation Plan

Companion to the Facet Rules Merchandising spec (Notion, owner: Joshua) and its "Ishaaq's Plan Divergences" section. This document sequences the build into independently-shippable phases. Where this plan diverges from Joshua's spec, it says so and points at the divergences table; everything else follows the spec as written.

0. Decisions this plan is built on

The spec's design is adopted almost wholesale. Three things were corrected against the live code, and three are genuine divergences recorded in Notion. Everything else (controller placement, single global list, live catalog discovery, client-facets-win carve-out, limits, MVP seed, 5-min KV cache, parallel KV fetch, audit fields) is taken from the spec unchanged.

Code-grounded corrections (not design changes)

Spec wording	Reality in the repo	Consequence
Persistence as a generic "controller CRUD"	`components/controller/merchandise/` is Django REST Framework — `ViewSet`s + `urls.py` + `authentication_classes`/`permission_classes`, services built in `ViewSet.__init__`.	Mirror the DRF merchandise component (no FastAPI `Depends()`/`injector`).
Optimistic locking, unspecified mechanism	The `record_version` lock in `synonym_service.py` reads `:cv` server-side and never round-trips it through the client, so it only guards the in-request read→write window — not the load→edit→publish gap a real lost-update needs (`merchandising_service`'s `put_view` doesn't lock at all).	Facets is last-write-wins (§4); keep only the create-path `attribute_not_exists` guard, drop the ineffective update-path version condition. The shared synonyms fix is deferred.
"Pattern reference #2639"	Confirmed: facet rows belong in `MERCHANDISING_TABLE`; `merchandising_exporter` already scans it and serialises into the per-`{account}-{index}` KV blob the proxy reads.	The single KV blob (Decision 6) is the export channel; the proxy reads it via the existing `resolveProfileOverride` machinery.

The three divergences from Joshua's spec (see Notion divergences table)

Draft → Publish is UI-only. Drafting lives in client state (the pendingView/currentView house pattern); nothing persists until an explicit Publish, which writes the live row. No backend draft/published separation.
Proxy enforcement is request-side only. The proxy pushes field selection + pin order into the upstream request and Marqo returns facets already ordered (matching existing product-pin enforcement). No response rewrite. Depends on the in-progress Marqo facet field pin/order capability.
The superset stores typing, and is customer-editable + extendable. Joshua's Decision 3 keeps the superset id-only and has the UI join the live catalog for type on every render. Instead the superset persists { fieldName, type } (type ∈ {text, numeric, array, boolean}). Seeding detects the type from a q='*'&limit=10 sample (Marqo string→text, number→numeric, array→array, boolean→boolean, internal fields dropped), but the merchant can override a mis-detected type and can add facet fields the discovery sample never surfaced (arbitrary field name + chosen type). The persisted type is the single source of truth that flows superset → exporter → proxy, so the type-aware facet request needs no re-resolution at request time.

1. Data model (frozen for all phases)

Three rows per index in MERCHANDISING_TABLE, pk = "{systemAccountId}#INDEX#{indexName}" — identical keying to the ranking-rule rows. Each row carries audit fields (createdAt/By, updatedAt/By). Writes are last-write-wins (§4): the create path guards a lost first-write race with attribute_not_exists(pk), but the update path carries no client-round-tripped version lock. No draft/published fields — the live row is the published state; drafting is UI-only.

sk = "FACET_SUPERSET"            # index-wide candidate pool
  fields: [{ fieldName, type }]                            # type ∈ {text, numeric, array, boolean}; seeded from discovery, customer-editable (divergence 3)

sk = "GLOBAL_FACETS"             # index-wide pin order (one list, not per-context — Decision 2)
  enabled: bool                                            # merchant on/off intent (D5); exporter translates to presence/absence — NOT exported as a flag, NOT read by proxy
  onboarded: bool                                          # UI onboarding-banner state
  facets: [{ fieldName, pinned, position? }]               # dense 1..N for pinned; absent for dynamic

sk = "FACET_OVERRIDE#{context}#{trigger}[#{profileId}]"    # per (context, trigger); context ∈ {collection, search}
  context, trigger                                         # top-level attrs, mirroring MerchandiseViewRecord
  facets: [{ fieldName, pinned, position? }]

The superset is the typed candidate pool. Each entry is { fieldName, type } with type ∈ {text, numeric, array, boolean}. type is seeded from the discovery sample but is merchant-editable (override a mis-detection), and the merchant may add fields the sample never surfaced (arbitrary fieldName + chosen type). The persisted type is the single source of truth — the exporter carries it into the KV blob and the proxy uses it to build the type-aware facet request (no re-resolution at request time).
Both GLOBAL_FACETS and overrides validate their facets against the shared FACET_SUPERSET — the superset is the single candidate pool; nothing (global or trigger) may reference a fieldName outside it. Pin entries carry fieldName + pinned/position only and look up type from the superset. A trigger override independently picks/pins from the full superset, so it can surface a field the global list omits — but cannot introduce one the superset lacks. No eligibleFields.
profileId is reserved in the override sk now (Decision/OQ-5) so A/B profile scoping drops in the house way later — default overrides omit it.
Limits (Decision 8): superset ≤ 100; pinned per global / per override ≤ 100; overrides per (index, context) ≤ 100.

Two switches: entitlement vs runtime

Account entitlement = the facet_merchandising flag (done in #3708): gates UI visibility + write authorization, fail-closed. Turned on operationally via the per-account exception list — the Rollout step. Effectively one-way: once an account is entitled and using the feature, we never revoke entitlement at runtime — doing so would regress a live customer. So entitlement-off only removes new UI access; it never tears down exported config or running behaviour.
Per-index runtime = GLOBAL_FACETS.enabled, the in-UI Settings toggle (D5). It's the merchant's on/off intent, persisted in DDB; edits still persist when off. It is not read by the proxy — it only controls whether the exporter emits config.
The proxy gate is client opt-in + presence. Merchandised facets apply only when the request carries useDynamicFacets: true (a new boolean on the /search and /collections APIs) and facet config is present in the KV blob; otherwise pass through unchanged. useDynamicFacets is mutually exclusive with a client-supplied facets (passing both is a 422), and it's not forwarded to Marqo. Like the existing facets param, it requires a HYBRID search — a non-HYBRID request that opts in is rejected by Marqo exactly as facets is today (an SE-config responsibility, not new behaviour). The proxy reads neither entitlement nor enabled — the exporter translates the merchant's enabled toggle into blob presence/absence; entitlement never reaches the blob. (Decision: opt-in supersedes the original "pure presence, auto-apply" model — auto-applying facets to every request injects them onto wildcard/browse/TENSOR traffic the customer never asked to facet, which Marqo rejects; the explicit param scopes facet application to where the customer intends it.)

2. Phases

Search behaviour changes only at Phase 5. Everything before it is dark behind the fail-closed facet_merchandising flag. P1 and P2 are independent and can run in parallel.

Phase 1 — Console draft/publish UX (against the existing mock seam)

Goal: adopt the house edit→publish pattern in the facet-rules UI, decoupled from real persistence via the FacetRulesStorage seam — no backend needed.

Mirror slices/merchandise + SaveChangesButton.tsx: a current/pending pair, a change-count selector (à la selectNumMerchandiseChanges), a "Publish" button gated on the count + an "Undo Changes" button.
Remove the 300 ms autosave effect in FacetRulesProvider; edits mutate pending only; Publish calls storage.save(pending) (still the in-memory mock at this phase), then current = pending. Drop the PREVIEW chips and the "admin_lambda CRUD" tooltip; rewrite the disabled-state copy.
The Settings enable/disable toggle (global.enabled, D5) is a config field like any other — it mutates pending and takes effect on Publish, surfacing the existing "disabled in Settings — rules saved but not applied" warning when off.
Tests (RTL): edits mutate pending only; Publish disabled when clean; Undo resets to current; Publish persists through the seam.
Exit: the draft/publish UX is reviewable and demonstrable behind the flag against the mock. No backend, no search effect. No dependencies.

Phase 2 — Backend: rows + service + CRUD API

Goal: the three row models, the locked service, and the DRF endpoints — they go hand-in-hand, so one phase.

Row models in merchandise/services/models.py (dataclass Mixin, camelCase, to_ddb/from_ddb, floats_to_decimals).
merchandise/services/facet_rules_service.py: per-row get/upsert/delete; attribute_not_exists(pk) guards create against a lost first-write race; updates are unconditional (last-write-wins — §4); bump the #static row on every write so the exporter re-runs.
Validation: superset dedupe by fieldName + every entry's type ∈ {text, numeric, array, boolean} + ≤100; global/override facets → every fieldName in the superset (the single shared candidate pool), pinned ⇒ position, dense 1..N, ≤100; trigger resolved via _resolve_synonym_trigger with 409-on-resolved-collision.
API: extend MerchandiseViewSet / register in urls.py exactly as the spec's Controller API section: facet-superset, global-facets, facet-overrides (list + per-trigger PUT/GET/DELETE). Add put_facet_* to _SERVICE_AUTH_ACTIONS. Enforce Feature.FACET_MERCHANDISING on every route (fail-closed); permissions match MerchandiseViewSet (Decision 9).
Tests (gating): create→get round-trip per row (superset round-trips fieldName + type); a concurrent first-write create race is rejected (attribute_not_exists); an override can pin a superset field the global list omits; PUT rejects a fieldName outside the superset; PUT rejects an invalid type; validation rejections; 409 collision surfaces the existing trigger; authed CRUD per endpoint; entitlement-off → blocked.
Exit: pants test //components/controller/merchandise:: green; API usable via curl; flag off ⇒ invisible. No dependencies.
Scope note: the row stores type, but the live index-touching surface that detects it — facet-fields discovery (the q='*'&limit=10 sample), the per-trigger probe, and seed-on-first-enable — is not in P2. It needs live index/Marqo access and the OQ-1/OQ-2 confirmations, so it lands in P6 with the real-catalog work. In P2/P3 the type is supplied by the caller (mock catalog or the customer's own choice); first enable here just creates the rows the UI writes; nothing is pre-populated from the index.

Phase 3 — Console: real storage wiring

Goal: swap the mock seam for real persistence — the draft/publish UX from P1 is unchanged.

api/facetRules/api.ts + thunks/facetRules.thunk.ts (axios + createAsyncThunk, Zod-parsed), one call per row endpoint. Implement FacetRulesStorage against them; delete getDefaultFacetRulesStorage / makeMockDynamoFacetRulesStorage / INITIAL_MOCK_STATE.
Superset type round-trips through the row — reload is self-contained for the functional fields (fieldName + type); the mock catalog is consulted only for volatile decoration (distinct count) and the live-presence/"missing" badge. The customer can edit a field's type and add fields not in the candidate list (free fieldName + chosen type); both persist as ordinary superset entries.
Index-scope the provider — thread indexName into FacetRulesProvider at all three mounts (MerchandisingSettings.page.tsx, FacetsTab.tsx, GlobalFacets.page.tsx); Publish PUTs the live row(s), then refetch.
Tests: provider load/save against a mocked thunk; index-scoped keying; Publish dispatches the real PUT; superset type edit + custom field add persist and reload.
Exit: UI persists per (account, index), survives reload. No search-path effect yet. Depends on P1 (UX) + P2 (API).

Phase 4 — Exporter: publish → KV blob

Goal: live facet rows reach the proxy via the existing single KV channel.

Extend merchandising_exporter to scan sk ∈ {FACET_SUPERSET, GLOBAL_FACETS, FACET_OVERRIDE#*} and serialise into the existing {systemAccountId}-{indexName} blob: top-level facetSuperset (carrying each field's type, so the proxy needs no re-resolution), globalFacets, and triggers[md5].facetOverride. Reuse get_trigger_hash byte-for-byte.
Honour the runtime toggle by presence: when GLOBAL_FACETS.enabled is false, omit the facet keys entirely — the proxy gate is presence, so absence = pass-through. The exporter does not gate on account entitlement (never revoke a live customer) and does not emit an enabled flag.
Extend fork_routes.py's sk allowlist to copy the three new sks (Decision: lifecycle); index delete already rides along via the pk wipe.
Tests: blob shape exact; trigger-hash keys match the proxy's getMerchandisingHashKey; empty/absent ⇒ proxy pass-through; fork copies facet rows.
Exit: publish → blob carries facet config. Still inert (proxy not yet reading it). Depends on P2 shapes.

Phase 5 — Proxy enforcement (the only phase that changes live search)

Goal: ON indexes serve the merchandised facet set, request-side.

In search.ts/merchandising_overrides.ts: from the already-fetched merch blob, resolve the effective facet set (triggers[md5].facetOverride else globalFacets) and build the upstream facets request — fields[fieldName] = { type } (type from facetSuperset, already Marqo vocab), plus per-field pinning + dynamic ordering per the Marqo facets contract (marqo-internal docs/specs/search/facets.md, 2.28.1): a pinned field carries a 0-based position (controller stores 1-based dense, so the proxy subtracts 1), and crossFieldOrderingSettings: { dynamic: true, engagementField } divergence-ranks the unpinned fields with pins overlaid on top. engagementField defaults to _pixel_four_week_click_count (matches DEFAULT_POPULARITY_SETTINGS); a per-index override via index settings is a deferred fast-follow. Remap the boost/bury page context to the facet collection vocabulary before the override md5 (the exporter hashes overrides as md5("collection|trigger")). No response rewrite.
Opt-in trigger: facets are built only when the request sets useDynamicFacets: true — not auto-applied. A non-HYBRID request that opts in 422s on Marqo, identical to the existing facets param (confirmed on staging: omitted searchMethod defaults to TENSOR → 422; explicit searchMethod: HYBRID → 200). The proxy does not force HYBRID — sending facets on a non-HYBRID surface is the same SE-config mistake it already is for facets.
Carve-outs (spec Decisions 5 & 7): no useDynamicFacets: true ⇒ no merch facets; no facet config in the blob ⇒ pass through unchanged (D5 runtime OFF, by absence); useDynamicFacets: true + explicit facets ⇒ 422 (mutually exclusive, validated up front). The proxy reads neither the controller feature flag nor an enabled field.
Tests (merchandising_overrides.test.ts + search.test.ts): facet config present ⇒ merchandised facets.fields (typed) + 0-based pins + crossFieldOrderingSettings on the request; config absent ⇒ pass-through (byte-identical to today); client facets ⇒ untouched; below-threshold ⇒ omitted; per-trigger override beats global, with the page → collection remap exercised. Cross-language contract: no shared fixture file — the collection-override md5 (md5("collection|mens-shoes")) is pinned to the same literal in both the proxy test (merchandising_overrides.test.ts) and the exporter test (test_cloudflare_kv_store_client.py::TestFacetExport), with cross-reference comments, so a hash/context-vocab change fails a test on each side (see the §4 revision).
Exit: npm test + npm run lint green (tsc has pre-existing errors — not a gate). Depends on P4 and the Marqo capability — both now landed (facets contract shipped in Marqo 2.28.1: per-field position + crossFieldOrderingSettings).

Phase 6 — Live surface: catalog, probe, seed

Goal: build the live index-touching endpoints and retire the last mocks.

Real catalog (supersedes Decision 3 / D9 — see divergence 3): the GET /merchandise/facet-fields/{index}?apiKey=... endpoint runs q='*'&limit=10 against the index, walks the sampled hits, drops internal fields via the existing is_base_field rule (rejects ^_ and dotted keys — equivalent to the is_internal qualifier used in admin_worker), and detects each field's type (bool→boolean checked before int because of Python's subclass relation; int/float/Decimal→numeric; str→text; non-empty list→array; empty list defers to a later sample). The detected type only seeds the superset entry: the merchant can override it, and can add fields the sample never surfaced (free fieldName + chosen type). Keep the "missing" badge for configured-but-absent fields.
Real probe (Decision 4 / D8): GET /merchandise/facet-probe/{index}/{context}/{trigger}?apiKey=... issues one Marqo search with limit=1 and facets={fields: <every persisted superset field>}. Marqo computes facets via a separate Vespa limit 0 | <grouping> query that runs over the full match set regardless of the hits-side limit (OQ-1 resolved — semi_structured_vespa_index.py:_generate_facet_queries), so a single round-trip is enough. Per-field popularity = sum of bucket counts (numeric uses the stats count). Returns superset fields ranked desc by popularity, ties broken by fieldName asc. Search-context triggers are synonym-resolved; collection triggers go through the /collections path with the collection name. Classic-vs-Ecom collection scoping: the Ecom backend scopes server-side via collectionName on /collections; Classic Marqo has no collections concept, so the view builds an explicit <contexts[PAGE].trigger_list.source>:(<trigger>) filter and passes it through (mirrors MerchandisingService.get_ranking_modifiers's pattern). Live per drawer open with no controller-side cache — the ecom search proxy already caches /search and /collections responses at the edge (CacheableEndpoint in cache-config.ts) keyed by URL + body, and the probe body is deterministic per (account, index, context, trigger, superset), so the existing edge cache absorbs repeated drawer-opens. A controller-layer cache would double-cache for no gain (OQ-5 resolved).
Seed-on-first-enable (Decision 11): POST /merchandise/facet-superset/{index}/seed?apiKey=... runs the same discovery sample, filters to text + array types (every text / text-array field is lexically searchable in a Marqo index — OQ-2 resolved), and creates the FACET_SUPERSET row only if none exists. Idempotent: subsequent calls return the existing row with seeded: false; concurrent first-seed surfaces as a ConcurrentModificationError against the create's attribute_not_exists(pk) condition and the loser returns whichever row won the race. First-PUT/enable logic lives in FacetRulesService.seed_superset_if_absent, not a migration. Cognito-only — deliberately NOT in _SERVICE_AUTH_ACTIONS: the console drives the first enable, and the merchandise-service fan-out path couldn't supply the customer's Marqo apiKey anyway.
Pre-rollout: atomic backend publish endpoint (combined backend + console). A single Publish previously fanned out N independent row writes (superset + global + each override) from the console via Promise.all, and every write's backend transaction also Put the shared #static row — so a lone publish self-contended on #static and intermittently surfaced a spurious ConcurrentModificationError (moto hides it; see §4 "Publish is atomic"). Two concurrent publishers from the same account also raced even if the console serialised within a single publish. Fix (this PR): a new controller endpoint POST /merchandise/facets-publish/{indexName} accepts the whole desired facet-rules state (superset + global + every override) and writes it in one transact_write_items — all rows plus a single #static bump — with attribute_not_exists(pk) applied only on rows that didn't already exist (the create-race guard). Override rows present in DDB but absent from the payload are deleted in the same transaction. The console drops its multi-call Promise.all flush entirely and calls the new endpoint once. This supersedes the cheaper "serialise the console writes" workaround, which only hid a single self-racing publisher and still broke under two concurrent publishers. Hard pre-rollout gate before any account is entitled — unreachable behind the fail-closed flag until then.
Tests (P6 backend): type detection covers all four types and the bool-before-int precedence + empty-list deferral; internal-field filter drops _pixel_*/_merch_*/dotted keys; popularity aggregator sums bucket counts for string/array/boolean and uses count for numeric; probe ranks desc + tie-breaks name-asc; probe with no superset is 409; seed creates from text+array only; seed is idempotent and loses the create race cleanly; seed under service-auth POST is rejected (regression guard against accidentally restoring the grant).
Tests (atomic publish): publish creates all rows in one transact; LWW on update with createdAt audit preserved; orphan-reference + duplicate-trigger + over-limit + oversized-transaction rejected up front with nothing landed; deletes overrides absent from the payload; search-context synonym resolution; concurrent first-create race surfaces as a 409; service-auth fan-out lands on the explicit target account; entitlement-gate 403.
Console mock retirement (this PR). FacetRulesProvider fetches the catalog on mount via the live facet-fields endpoint; MOCK_INDEX_CATALOG is removed. makeControllerFacetProbe replaces makeMockMarqoFacetProbe — empty trigger short-circuits to the current snapshot (preserves the new-override drawer's open-with-no-trigger flow), non-empty trigger hits the live facet-probe endpoint. The seed endpoint is dispatched on the enabled false→true transition (idempotent, gated on the local superset being empty so subsequent toggles don't re-call). FacetOverrideDrawer catches probe failures so a 409 on an unseeded superset doesn't wedge the drawer.
Exit: P6 backend endpoints + atomic publish endpoint land green; the console publish flow goes through the new endpoint (single transact_write_items, one #static bump); the console-side mocks are retired (MOCK_INDEX_CATALOG, makeMockMarqoFacetProbe removed). Depends on P3 & P5.

Rollout (post-P6)

Internal QA on staging (flag already true there) → per-account exception list for one design partner → gradual prod default flip (Decision: feature flag rollout).

Pre-rollout gates (hard, before any account is entitled):

Production Marqo ≥ 2.28.1 for the indexes being enabled. P5 emits per-field position + top-level crossFieldOrderingSettings, which ship in Marqo 2.28.1. Confirm with the Marqo team that each index's deployed version speaks 2.28.1 before entitling its account. Validated against staging; production version is the open item. (The broad-traffic 422 risk is gone with the useDynamicFacets opt-in — facets now only attach when the customer asks; what remains is the same HYBRID contract the existing facets param already carries, i.e. an SE must point useDynamicFacets at HYBRID surfaces, just as for facets.)
Confirm engagementField-absent behaviour for non-pixel accounts. The cross-field engagementField defaults to _pixel_four_week_click_count, hardcoded even for accounts without pixel data. The facets spec says a cold-start / zero-engagement field falls back to request order (graceful, not an error), but verify on staging that an engagementField absent from every document degrades cleanly (no 422, sensible order) before entitling any non-pixel account.
Atomic backend publish endpoint shipped (P6, combined backend + console). See §4 "Publish is atomic" — folds the per-row writes into one transact_write_items covering every row + a single #static bump; console calls the new endpoint instead of fanning out per-row PUTs. Required because two concurrent publishers on the same account still race the #static bump under any console-only mitigation.

3. Dependency graph & sequencing

P1 (draft/publish UX) ──┐
                        ├──► P3 (real storage) ──┐
P2 (rows + CRUD API) ───┤                        ├──► P6 (catalog/probe)
                        └──► P4 (exporter) ─► P5 (proxy) ──┘
                                              ▲
                                   Marqo facet pin/order capability (external)

P1 and P2 are independent — UX (against the mock seam) and backend can run fully in parallel.
P3 (real storage swap) needs both: the UX from P1 and the API from P2.
Parallelizable once P2 freezes the row shapes: P4 (exporter) and P5 (proxy, TS, against frozen blob fixtures).
Single risky cutover: only P5 changes live search, and it stays inert until P4 exports config — so the behaviour change can be staged/rolled back at the export step, not the code step.
External gate: P5 also waits on the Marqo facet field pin/order change (colleague-owned). Nothing else does.

4. Cross-cutting risks

Marqo capability dependency. Request-side ordering assumes Marqo will honour configured facet field order. If it ships differently than assumed, only P5 is affected — revisit enforcement (a scoped response rewrite is the fallback), nothing upstream.
Concurrency is last-write-wins (revised). P2 originally modelled an optimistic lock on synonym_service, but that pattern reads record_version server-side and never round-trips it through the client, so it only guards the in-request read→write window — not the load→edit→publish gap a real lost-update needs. Rather than build the full client round-trip for facets alone (which would also need a transactional publish to be meaningful — see below), facets is explicitly last-write-wins, matching the core put_view rules path. The update path carries no version condition; only the create path keeps attribute_not_exists(pk) (a genuine first-write-race guard, also exploited by P6 seed_superset_if_absent and the atomic publish's per-row create guard). The shared synonym_service/facets pattern is a deferred fix — revisit real optimistic concurrency if/when multi-editor contention becomes a concrete problem.
Publish is atomic (this PR). Before this PR a Publish issued several independent row writes from the console (superset first, then global + the per-context override reconcile via Promise.all); each row write's backend transaction also Put the shared #static row, so a single publish firing them via Promise.all could hit a DynamoDB TransactionConflict → ConcurrentModificationError ("modified concurrently, retry") intermittently (moto serialises transactions so the suite didn't catch it; surfaced by review on PR #3806), and even with the console serialising within a single publish, two concurrent publishers on the same account still raced the #static bump and silently interleaved their row writes (publisher A's superset, B's global, A's override 1, B's override 2, …). The atomic publish endpoint (P6 pre-rollout) writes the whole desired state — superset + global + every override + a single #static bump — in one transact_write_items, with attribute_not_exists(pk) applied only where the snapshot showed no existing row (the create-race guard). Updates are last-write-wins. The console pre-validates the draft (every field referenced by global/overrides must be in the superset) before the publish call for an in-rule error message; the server re-validates and rejects identically. Override rows present in DDB but absent from the publish payload are deleted in the same transaction. Cross-publisher conflicts are not eliminated — they're made recoverable. Two concurrent publishers on the same (account, index) still both touch the shared #static item, and DDB serialises on that conflict — one transaction wins fully, the other is cancelled and surfaces as a clean 409 the console can retry, without ever having landed any of its rows. That's the substantive improvement over the cheaper "serialise the console writes" workaround, which kept the inter-publisher race but lost the all-or-nothing guarantee that turns it from "silent partial state" into "recoverable conflict". This supersedes the workaround. Transaction-size cap. DDB caps a single transact_write_items at 100 actions, so the publish path enforces a publish-level ceiling: ≤ 97 total overrides + deletes (= 100 − superset − global − #static), rejected up front with a numeric breakdown rather than letting DDB cancel mid-flight. This is tighter than Decision 8's per-context limit (100 each, 200 total), which still governs the per-row CRUD path used by service-to-service fan-out; merchant usage is in the low tens of overrides in practice, so the ceiling has plenty of headroom. If we ever approach it, the lift is to lower Decision 8's per-context cap to align — not to chunk the publish, which would defeat the atomicity guarantee this whole change exists for.
Blob-shape drift (TS proxy ↔ Python exporter). The facetSuperset (incl. each field's type) / globalFacets / triggers[md5].facetOverride shapes are a cross-language contract. A single shared fixture file was not adopted (it would only auto-enforce drift if both hermetic toolchains loaded the same file, which needs cross-component Pants wiring for marginal gain). Instead each side independently pins the shape in its own test, and the load-bearing invariant — the collection-override md5 (get_trigger_hash/getMerchandisingHashKey parity, via the page → collection remap) — is pinned to the same literal in both the exporter and proxy tests with cross-reference comments. The type vocabulary (text/numeric/array/boolean → Marqo string/array/number/boolean) is mapped exporter-side only; the proxy consumes Marqo vocab and never re-maps. The controller mirrors the mapping for its direct-to-Marqo calls in P6's probe (_MARQO_FACET_TYPE in search_api.py).
OQ confirmations (Yihan):
- OQ-1 (Marqo limit=1 facet behaviour) — RESOLVED. Marqo computes facets via a separate limit 0 | <grouping> Vespa query (see marqo-internal/components/marqo/src/marqo/core/semi_structured_vespa_index/semi_structured_vespa_index.py:_generate_facet_queries) that runs over the full match set regardless of the hits-side limit. The probe holds limit=1 for cheapness.
- OQ-2 (internal field set) — RESOLVED. Drop ^_ and dotted keys (the existing is_base_field rule, equivalent to the is_internal qualifier used in admin_worker). The seed additionally narrows to text + array<text> because those are the lexically-searchable types in a Marqo index.
- OQ-5 (sync vs cached probe) — RESOLVED, no controller-side cache needed. The probe's underlying call goes to the ecom search proxy's /search and /collections endpoints, both of which are CacheableEndpoints in components/search_proxy/src/cache-config.ts. Cloudflare keys POST cache by URL + body and the probe body is deterministic for a given (account, index, context, trigger, superset) — so repeated drawer-opens hit the edge cache directly. Adding a controller-side cache would either double-cache (no benefit) or override an admin's deliberate cache_config.endpoints.search = 0 decision (worse than no cache). The Classic Marqo path isn't cached, but facet merchandising targets ecom customers — not a real surface.

0. Decisions this plan is built on​

Code-grounded corrections (not design changes)​

The three divergences from Joshua's spec (see Notion divergences table)​

1. Data model (frozen for all phases)​

Two switches: entitlement vs runtime​

2. Phases​

Phase 1 — Console draft/publish UX (against the existing mock seam)​

Phase 2 — Backend: rows + service + CRUD API​

Phase 3 — Console: real storage wiring​

Phase 4 — Exporter: publish → KV blob​

Phase 5 — Proxy enforcement (the only phase that changes live search)​

Phase 6 — Live surface: catalog, probe, seed​

Rollout (post-P6)​

3. Dependency graph & sequencing​

4. Cross-cutting risks​