Skip to main content

Theme-Targeted Settings Deploy System

Status: Draft for review (Raynor) Author: theme-deploys planning agent, 2026-06-10/11 Related plans: settings-concurrency-control.md, settings-versioning.md, storefront-admin-sso.md (see Cross-plan interfaces)

Problem

Every settings save from the storefront_admin editor goes straight to production. POST /api/v1/storefront/shops/{domain}/settings (components/shopify/admin_server/admin_server/routes/storefront_routes.py:149) writes the single DDB settings record and immediately mirrors it to the shop-level marqo.search_settings metafield, which the live storefront widget reads on the next page load. Bad CSS or a broken selector saved from the editor breaks live customer storefronts instantly, with no staging step and no explicit promotion.

Goals:

  1. (a) Per-theme settings records — different settings for different Shopify themes (live theme vs. redesign/preview themes).
  2. (b) Live-target detection — detect and warn when a save targets the LIVE theme.
  3. (c) Staging mode — saves go to a preview theme only; live customers never see them.
  4. (d) Explicit "Deploy to live" — a deliberate, guarded operation that promotes staged settings to production.

Current state (verified)

Persistence

  • Settings live in DynamoDB table ShopifyEntities as a single record per shop: pk=SHOP#{domain}, sk=SETTINGS (components/shopify/admin_server/admin_server/repositories/shopify_settings_repository.py:19-22, models/shopify_entities.py:53-72, key constants in constants/database.py:18).
  • ShopifySettings carries ui_components, selector_components, configuration, plus index-association fields active_index / system_account_id / cell_id (models/shopify_entities.py:74-102) and audit fields updated_by_user_id / created_at / last_updated.
  • Account→shop listing goes through GSI_SystemAccountId with sort-key equality on sk = "SETTINGS" (repositories/shopify_settings_repository.py:81-100; equality confirmed in repositories/base_repository.py:151-160). New sort keys prefixed SETTINGS#... will NOT leak into this listing.

Save path

  • Storefront admin editor (the primary client): components/storefront_admin/app/lib/api-client.ts:65-67POST /api/v1/storefront/shops/{domain}/settings (routes/storefront_routes.py:149-256). Flow: validate → SettingsService.update_ui_components (DDB write, services/settings_service.py:470-499) → save_settings_to_metafields (mirror full settings JSON to shop metafield namespace marqo, key search_settings; services/settings_service.py:158-199) → save_public_search_metafields (searchBase, storefrontAccessToken, baseCurrency, indexId, cdnBase; services/settings_service.py:201-307). Metafield failures return HTTP 207 "partial".
  • Embedded Shopify app save: POST /api/settings (routes/settings_routes.py:60-127) — same service calls, session-token auth.
  • Metafields are written shop-level via metafieldsSet with the shop's owner ID (services/shopify_service.py:172-193, mutation in graphql/mutations/metafield_mutations.py:5-23) — global across all themes. This is the core problem for theme scoping.
  • The editor save UI is a single Save button in components/storefront_admin/app/components/layout/header.tsx:73-76, state managed by app/hooks/use-settings.ts:325-357.

Widget config resolution (the crux)

  • The theme app embed components/shopify/extensions/marqo-search-theme/blocks/marqo-search-embed.liquid:9-25 renders a <meta id="marqo-config"> tag whose content attribute is the JSON value of shop.metafields.marqo.search_settings, plus data-attributes for searchBase/indexId/etc.
  • assets/marqo-loader.js readConfigFromDom() (marqo-loader.js:385-449) parses that meta tag into window.MarqoUIConfig; the main bundle validates it (components/shopify/storefront_search/src/marqo-search-app.ts:181-186, config-validator.ts:312).
  • So the settings the widget uses are decided server-side by Liquid at page render time. Liquid in app embeds has access to the global theme object (theme.id — numeric, theme.rolemain / unpublished / demo / development). When a merchant previews an unpublished theme (?preview_theme_id= or theme editor preview), Liquid renders with that theme's context, so theme.id/theme.role reflect the preview theme. This is the lever for theme-scoped resolution.
  • The search proxy does not consume ui_components (verified: no uiComponents reads in components/search_proxy/src/); x-marqo-settings-override is index/search settings, a different domain. The only runtime consumer of UI settings is the metafield → Liquid → loader path. (storefront_routes.py:160 mentions a "KV export via DDB stream" for settings; the existing exporter components/ecom_settings_exporter reads the IndexSettings table, not ShopifyEntities — treat that comment as aspirational; see Open questions.)

Live-theme detection

  • Admin GraphQL exposes themes(first: N, roles: [MAIN]) / OnlineStoreTheme { id name role }. No theme query exists yet in graphql/queries/ (only shop/product/collection/metafield/bulk queries) — one must be added.
  • All app configs already grant read_themes,write_themes (components/shopify/shopify.app.prod.toml:23 and every per-merchant toml) — no scope migration needed.
  • Storefront-admin API-key requests have no Shopify session; the Admin GraphQL token is recovered from stored OAuth sessions via _get_access_token_for_shop (routes/storefront_routes.py:99-108). Theme listing therefore degrades when no token exists (see Edge cases).

Design

1. Data model: per-theme staged records

The existing record stays untouched as the live record — the thing production storefronts serve. Staged settings get sibling records keyed by theme:

Recordpksk
Live (existing, unchanged)SHOP#{domain}SETTINGS
Staged, per theme (new)SHOP#{domain}SETTINGS#THEME#{theme_id}

theme_id is the numeric Shopify OnlineStoreTheme id (stable across renames and across publish — publishing changes role, not id).

Plus one internal record for deploy safety (see deploy endpoint): sk=SETTINGS#DEPLOY#BACKUP — a rolling snapshot of the live record's content fields (ui_components, selector_components, configuration, plus audit info) taken transactionally at each deploy, giving an undo path before the versioning plan lands. It deliberately does not function as a restorable full record: restore is always a content-merge onto the current live record (see rollback), never a verbatim put — infra fields (active_index/system_account_id/cell_id/metadata) are exactly the fields other writers (index_service, webhook_service, api_key_routes) may legitimately change between deploy and rollback, and clobbering them with deploy-time values would, e.g., revive a deleted index id. This matches the versioning plan's content-fields-only restore rule (docs/plans/settings-versioning.md §"Infra fields … not versioned and never restored").

Staged and backup records carry no system_account_id (left None; model_dump() already strips it when None for the sparse GSI, models/shopify_entities.py:133-143) — account authorization always goes through the live record (dependencies.py:665-683), so staged/backup records must stay invisible to GSI_SystemAccountId by construction, not just by the current sk-equality query shape. They also don't carry active_index/cell_id; deploy reads those from the live record it is updating.

New fields on ShopifySettings (all optional, absent on live records — model stays one class; sk: Literal["SETTINGS"] at models/shopify_entities.py:70-72 relaxes to str with a validator enforcing the SETTINGS / SETTINGS#THEME#{id} / SETTINGS#DEPLOY#BACKUP shapes — a templated key cannot be a Literal; entity_type stays "SETTINGS"):

theme_id: str | None # numeric theme id, set only on staged records
theme_name: str | None # display only; refreshed on each save from GraphQL
created_from: str | None # "live" | "blank" — provenance of initial copy

Deployment audit fields on the live record:

deployed_from_theme_id: str | None
deployed_at: str | None
deployed_by: str | None

Repository additions (ShopifySettingsRepository):

  • get_theme_settings(domain, theme_id) / save_theme_settings(settings) / delete_theme_settings(domain, theme_id) — same get_item/put_item/delete_item plumbing with the new sk.
  • list_theme_settings(domain) — reuses the existing BaseRepository.query_by_pk_and_sk_prefix(pk, "SETTINGS#THEME#") (base_repository.py:95-123); the prefix excludes both the live record and the backup record. No new query primitive needed.
  • No bespoke transaction method: the concurrency plan's canonical save_settings(settings, *, expected_version, change_source, extra_transact_items=None) already accepts extra transaction items (their §4.2 — designed for exactly this consumer). Deploy passes its backup Put (and an optional ConditionCheck on the staged source record) via extra_transact_items; the conditional live write itself is the canonical one. The domain methods above are sk-parameterized wrappers over key-agnostic BaseRepository primitives — the existing methods hard-code sk=SETTINGS (shopify_settings_repository.py:34-35,62-63), per the concurrency teammate's implementation note.

A hot-path hygiene note that becomes more important with this plan: ShopifySessionRepository.list_shop_sessions sweeps the whole SHOP#{domain} partition with query_by_pk and filters client-side (shopify_session_repository.py:135), and it runs on every storefront save/deploy via _get_access_token_for_shop (storefront_routes.py:104). Settings records are large (real merchant payloads up to ~120KB), so each staged theme + the backup record adds discarded read volume to that sweep. The versioning plan already recommends switching it to query_by_pk_and_sk_prefix(pk, "USER#"); this plan adopts that one-line fix as part of implementation step 2 rather than leaving it as someone else's hygiene.

Why the same table/model rather than a new entity: staged records are the same shape, same validation (InputValidator.validate_ui_components), same service merge logic, and the deploy operation is a copy between two records in one table. Fail-fast and Model-Everything principles both hold with a single frozen model.

Default/fallback semantics: the live record is the default. A theme with no staged record falls back to live everywhere (widget resolution, GET API). A shop with no theme records behaves exactly as today.

2. Widget resolution: theme-scoped metafields + role-based Liquid selection

Shop-level metafields are global, but metafield keys are namespaced strings and Liquid supports dynamic key lookup on the metafields drop. Staged settings mirror to a per-theme key:

  • Live: namespace marqo, key search_settings (unchanged).
  • Staged: namespace marqo, key search_settings_theme_{theme_id} (type json). Each metafield value has its own size budget (see §7 Size limits), so staging does not eat into the live payload's headroom — the decisive advantage over packing theme payloads into the single live metafield.

Resolution happens in the Liquid embed (marqo-search-embed.liquid), which already renders per-request with the correct theme context:

{% assign marqo_settings = shop.metafields.marqo.search_settings %}
{% assign marqo_settings_source = 'live' %}
{% if theme.role != 'main' %}
{% assign marqo_theme_key = 'search_settings_theme_' | append: theme.id %}
{% assign marqo_theme_settings = shop.metafields.marqo[marqo_theme_key] %}
{% if marqo_theme_settings %}
{% assign marqo_settings = marqo_theme_settings %}
{% assign marqo_settings_source = 'theme' %}
{% endif %}
{% endif %}

…then data-has-settings, content (rendered as {{ marqo_settings.value | json | escape }} — the existing template renders .value of the metafield drop at marqo-search-embed.liquid:25 and the assigned variable must too), and new debug attributes data-theme-id="{{ theme.id }}" data-theme-role="{{ theme.role }}" data-settings-source="{{ marqo_settings_source }}" are rendered from marqo_settings. marqo-loader.js readConfigFromDom() surfaces themeId / themeRole / settingsSource on window.MarqoUIConfig for diagnostics; no other widget change is required — the bundle keeps consuming uiComponents/selectorComponents exactly as today.

The rule: the MAIN (live) theme always serves the live metafield, never a theme-keyed one. Non-main themes serve their theme-keyed metafield when present, else fall back to live. Consequences:

  • Publishing a staging theme in Shopify admin can never silently push staged settings to customers — the moment role becomes main, that theme serves live settings. Promotion happens only through the explicit deploy operation. (The trade-off — "theme + its settings ship together on publish" — is rejected deliberately: it reintroduces an unguarded path to prod.)
  • Preview of an unpublished theme (?preview_theme_id=, theme-editor preview pane) renders with that theme's theme.id, so merchants/agents see staged settings on the real storefront with zero risk to live traffic.

Feasibility notes (verify in implementation step 1): theme is a global Liquid object available in theme app extension blocks; dynamic bracket lookup on a metafield namespace drop (shop.metafields.marqo[var]) is supported Liquid. Both are standard Shopify behavior but must be smoke-tested on a dev store before the backend work lands (implementation order below makes this step 1 precisely because it is the load-bearing assumption).

Plan B if the Liquid spike fails (e.g. dynamic metafield-key lookup turns out not to work in app embeds): keep the single search_settings metafield and embed staged payloads inside it as a themes sub-object keyed by theme id ({"uiComponents": ..., "themes": {"123": {...}}}); the Liquid embed renders data-theme-id="{{ theme.id }}" data-theme-role="{{ theme.role }}" (plain interpolation, certainly supported) and readConfigFromDom() in marqo-loader.js selects the sub-object when data-theme-role != "main". Same resolution rule, same backend record shape — only the mirror format and ~10 loader lines change. Cost: staged payloads share the live metafield's size budget, so Plan B caps the number of concurrently staged themes; this is why Plan A (per-theme keys) is preferred and verified first.

The widget bundle and loader are theme-agnostic assets served from the app extension/CDN — they are shared across themes. Only the settings are theme-scoped, which matches the product need (testing new CSS/layout settings against a redesign theme). Testing a new widget bundle per theme is out of scope.

3. API changes (admin_server, storefront routes)

All new endpoints live in routes/storefront_routes.py (API-key auth, the storefront_admin editor's surface). The embedded-app routes (settings_routes.py) are untouched in v1 (see Rollout).

GET /shops/{domain}/themes (new)

Lists themes via a new Admin GraphQL query (graphql/queries/theme_queries.py):

query getThemes($first: Int!) {
themes(first: $first) { nodes { id name role updatedAt } }
}

Response model ThemeResponse: theme_id (numeric string extracted from the GID), name, role, is_live (role == MAIN), has_staged_settings (joined against list_theme_settings). Requires an access token from _get_access_token_for_shop; if none exists, return 409 no_shopify_session. A token may also be present but stale/revoked — GraphQL errors (401/403 from Shopify) are caught and mapped to the same 409 no_shopify_session (not a 500), so the editor's degraded mode handles both identically (Edge cases #5). Other GraphQL failures bubble as 502.

GET /shops/{domain}/settings?theme_id={id} (extended)

  • No theme_id (default): live record, exactly today's behavior and response shape (back-compat for current editor build).
  • theme_id present: return the staged record. If none exists, return the live settings with meta.exists=false so the editor can initialize a staging copy from live without a separate call.
  • Response gains a meta object: {target: "live"|"theme", theme_id, exists, is_live_theme, version, last_updated}version is the record's record_version (0 for legacy records, per the concurrency plan, which independently adds it to the GET payload; get_settings_with_defaults already returns last_updated, settings_service.py:140, the route just doesn't surface it today, storefront_routes.py:140-143). The deploy dialog's guard value comes from here.

POST /shops/{domain}/settings?theme_id={id} (extended) — staged save

  • No theme_id: legacy live save, unchanged pipeline (DDB → search_settings metafield → public metafields). Back-compat for existing clients. Response gains target: "live" so updated clients can warn.
  • theme_id present (server-side guard, requirement (b)):
    1. Require an existing live record. Shop resolution already depends on it (_resolve_storefront_shop_from_settings reads the live record for account authorization, dependencies.py:665-683), and deploy needs its active_index/system_account_id/cell_id. No live record → 409 no_live_settings ("initialize live settings first"). This removes the "deploy with no live record" branch entirely.
    2. Resolve themes via GraphQL. Fail fast if the theme id doesn't exist (404) or if its role is MAIN (409 live_theme_save_rejected) — saving "to the live theme" is not a thing; live writes go through the no-param path or deploy. This re-check at save time closes the race where a theme is published between editor load and save. Stale-token GraphQL auth errors map to 409 no_shopify_session as in GET /themes.
    3. Size guard (new paths only): serialized settings JSON exceeding the JSON-metafield design ceiling (128KB — see §7 Size limits for the Shopify numbers and their caveats) → 422 settings_too_large before the DDB write — a staged record whose metafield can't be written is unpreviewable and therefore useless, so fail fast. The check lives in the new service methods (update_theme_ui_components, deploy), not inside the shared save_settings_to_metafields (settings_service.py:158), where it would leak into the legacy path. The legacy no-theme_id path keeps today's exact behavior (size logged at settings_service.py:178-180, oversize surfaces as 207 partial) — a regression test pins that a near-ceiling legacy save still succeeds unchanged.
    4. Write the staged DDB record (sk=SETTINGS#THEME#{id}, theme_name refreshed from the GraphQL response) — versioned from birth: the staged write implements the concurrency plan's canonical conditional-write contract from day one (request body carries version = the staged record's record_version from GET meta.version; first create uses the attribute_not_exists(record_version) condition; mismatch → the canonical settings_conflict 409). The staged path is brand new with exactly one client (the new editor build), so unlike the live record there is no legacy-client transition to manage — it can be strict immediately, and no lost-update window ever exists for staged records. This makes the staged write path dependent on concurrency Phase 1's repo primitives (dependency stated in §5).
    5. Mirror to metafield search_settings_theme_{id} via the existing save_settings_to_metafields extended with a metafield_key parameter (default "search_settings").
    6. Do not touch search_settings or the public metafields (searchBase/indexId/etc. are infrastructure config, identical across themes; staging them has no meaning).
    7. Metafield mirror failure → 207 partial as the live path does today (storefront_routes.py:229-236) — but the message must say "staged settings saved but not previewable yet; retry" (staged settings are only served via the metafield; there is no other propagation path).

SettingsService gains update_theme_ui_components(shop_id, theme_id, theme_name, ui_components, selector_components, user). This is not a pass-through to create_or_update_settings_create_default_settings hardcodes sk=SETTINGS_SK (settings_service.py:459-462). It reuses _merge_settings for updates to an existing staged record, and a new _create_theme_settings constructor for first saves: initialized from the live record's content fields only (created_from="live"; live record guaranteed to exist per guard 1) with the theme sk and theme metadata, infra fields left None (§1). (created_from="blank" is reserved for a possible future "start from defaults" editor action; v1 always copies live.)

POST /shops/{domain}/settings/deploy (new) — requirement (d)

Body: {source_theme_id: str, expected_live_version: int, expected_source_version: int | null}.

Deploy is a guarded, reversible write. Its guard is the concurrency plan's record_version optimistic lock — deploy depends on that plan's Phase 1 (server-side record_version attribute + _version_condition builder + conditional save_settings; independently deployable, no client changes required — see docs/plans/settings-concurrency-control.md §3, §10). That plan explicitly rejected last_updated as a condition attribute (untrustworthy writers, no-timezone timestamps, equal-timestamp collisions — its §3.1), so this plan does not ship a divergent transitional guard; it sequences after the agreed one. Steps:

  1. Load staged record; 404 if absent. (Deploying a theme with no staged settings is a no-op error, not a silent success.) Size guard as in staged save.
  2. Build the live-record update: the staged record's ui_components, selector_components, configuration applied onto the current live record's other fields (infra fields active_index/system_account_id/cell_id/metadata come from the live record just read — staged records don't carry them); stamp deployed_from_theme_id, deployed_at, deployed_by (the auth plan's canonical actor string: user:{sub} | token:{token_id} | api_key:{system_account_id}), last_updated, updated_by_user_id.
  3. Route the write through the canonical service path, per agreement with both sibling planners: SettingsService.create_or_update_settings(shop_id, content, updated_by, expected_version=expected_live_version, change_source="theme_deploy", event_type="deploy", source_scope=f"theme:{theme_id}", source_version=staged.record_version) — so input validation, the concurrency conditional write, and (once landed) versioning's history capture all fire from one place. The rolling content backup Put to sk=SETTINGS#DEPLOY#BACKUP (stamped backed_up_at, backup_of_live_version) rides the same TransactWriteItems via the repo's extra_transact_items hook; when expected_source_version is supplied, a ConditionCheck on the staged record ("staged unchanged since the dialog was opened") joins it. Live-condition failure → 409 with the concurrency plan's SettingsConflictError payload ({detail: {code: "settings_conflict", expectedVersion, currentVersion, lastUpdated, updatedBy, changeSource}}); legacy live records without the attribute are version 0 and guarded by attribute_not_exists(record_version) exactly as in that plan.
  4. When the versioning plan lands, the same call produces the deploy version event automatically (their event_type="deploy", source_scope, source_version mapping — confirmed by that planner), and the rolling backup + rollback endpoint below are retired in favor of general restore. Until then, the backup record is the rollback story.
  5. Mirror to the live search_settings metafield + save_public_search_metafields (same as a live save). Metafield failure → 207 partial with status: "deployed_not_live" — and the editor must show this loudly, because the runtime read path is the metafield, not DDB (marqo-search-embed.liquid:25): a 207 deploy means customers are still seeing the OLD settings until a retry succeeds. There is no background reconciliation (the "KV export" mentioned at storefront_routes.py:218,233 does not exist for ShopifyEntities — see Open questions), so retry is the client's job and the response must say so explicitly.
  6. The staged record is kept (not deleted) — the staging theme keeps serving its own settings, and the merchant can iterate and re-deploy. Re-running deploy is content-convergent (live content and metafield converge to the staged content) though not byte-idempotent — each run stamps fresh deployed_at, increments record_version, and rolls the backup. Response: {status, deployed_at, source_theme_id, live_version}live_version is the new live record_version, so the editor can chain a follow-up action without a re-GET (matches the concurrency plan's save responses).

POST /shops/{domain}/settings/deploy/rollback (new) — undo before versioning lands

Body: {expected_live_version: int}.

Rollback is a content-fields-only merge, never a record swap: read the current live record, replace its ui_components/selector_components/configuration with the backup's content, keep every infra field (active_index/system_account_id/cell_id/metadata) from the current record, stamp fresh audit fields (change_source="theme_deploy", rolled_back_from_version), and write with _version_condition(expected_live_version); then mirror to the live metafield. The current live content is simultaneously written to the backup record (same transaction), so rollback is itself reversible. 404 if no backup exists; 409 on version conflict.

Why merge, not swap: infra fields are exactly what other writers legitimately change between deploy and rollback — index_service clears active_index on index deletion and sets it on creation, webhook_service rewrites metadata — and restoring deploy-time values would, e.g., point the indexId metafield at a deleted index. This mirrors the versioning plan's "infra fields are never restored" rule, and a dedicated test pins it (rollback after an active_index change preserves the new active_index).

Atomicity: DDB and Shopify metafields are two systems; a single atomic commit is impossible. Within DDB, backup+promote (and rollback's swap-back) are atomic transactions. Across systems the order is DDB-first, mirror second, explicit client retry on partial — the same consistency model the existing save path already uses (207 handling at storefront_routes.py:210-236), with the honest caveat in step 5 about what 207 means for live traffic.

Auth (per the locked scope matrix in docs/plans/storefront-admin-sso.md §4.5): POST /deploy requires settings:deploy_live only (promote-but-not-edit is a valid reviewer persona); POST /deploy/rollback requires settings:write + settings:deploy_live (it chooses non-head content to put live — deliberately aligned with their stricter restore-to-live rule); the no-theme_id live save requires both once scoped tokens land.

DELETE /shops/{domain}/settings/themes/{theme_id} (new)

Deletes the staged DDB record and its search_settings_theme_{id} metafield. Note: metafield deletion is net-new GraphQL surface — the codebase only has metafieldsSet today (graphql/mutations/metafield_mutations.py:5-23). Needs the metafieldsDelete mutation, a ShopifyService.delete_metafield method with userErrors handling mirroring the existing pattern (shopify_service.py:165-171), and a response transformer + test. Tolerates an already-absent metafield. Used by the editor's "discard staging" action and for cleaning up records for deleted themes.

4. Editor UX (storefront_admin)

  • Theme picker in the editor header (components/layout/header.tsx + new theme-picker.tsx): dropdown of themes from GET /themes — entries like Live — Dawn (current) (amber/red LIVE badge) and Preview — Dawn Redesign (+ "staged changes" dot when has_staged_settings). Selecting a target reloads settings for that target via GET /settings?theme_id=.
  • Default target: the live theme, preserving current behavior and muscle memory — but with the LIVE badge and save-guard below, the "didn't realize I was editing prod" failure mode is gone. (Defaulting to a staging theme was considered and rejected: which one? and silently editing a theme the merchant didn't pick is its own surprise.)
  • Save behavior (requirements (b)+(c)):
    • Target = staging theme: button reads Save to "<name>" — plain save, no friction.
    • Target = live: button reads Save to LIVE with warning styling; clicking opens a confirm dialog ("This updates the live storefront for all customers immediately. Consider saving to a preview theme and deploying instead."). Confirmation state is per-session, not per-click-forever (no "don't ask again" persistence in v1).
  • Deploy button: visible when target is a staging theme with staged settings; opens a confirm dialog showing source theme name, live last_updated/updated_by, and a coarse diff summary (counts of components whose serialized JSON differs between staged and live; full visual diff is the versioning plan's territory). Opening the dialog fetches the live record fresh (GET /settings, no theme param) — the live payload is not in memory while editing a staging target — and that fetch supplies both the diff baseline and the expected_live_version (meta.version) sent with POST /deploy. On 409 version conflict: re-fetch, re-show dialog with a "live changed since you opened this" banner. On 207 deployed_not_live: persistent error banner with a retry action — customers are still on the old settings until retry succeeds. A "Roll back last deploy" action (calls /deploy/rollback) lives behind an overflow menu with its own confirm.
  • Preview link: when target is a staging theme, a "Preview on storefront" link to https://{domain}/?preview_theme_id={theme_id} (and the existing search-preview iframe keeps rendering local state as today — components/preview/search-preview.tsx is target-agnostic since it renders from in-memory settings).
  • State plumbing — this is a real refactor of use-settings.ts, not a parameter add. Today the hook is single-record: one backendSnapshot, one editVersionRef, one load effect keyed on [getClient, shopifyDomain] (use-settings.ts:271-323). It becomes target-keyed: targetThemeId state joins the load-effect deps (target switch = reload), backendSnapshot/isDirty/editVersionRef reset on target switch, and switching with unsaved changes prompts (discard/save-first). getSettings/saveSettings in api-client.ts take optional themeId; new listThemes, deploySettings, rollbackDeploy, deleteThemeSettings client methods; settings-context.tsx exposes target + deploy actions. Holding parallel per-target edit buffers was considered and rejected for v1 (reload-on-switch is simpler and the prompt prevents data loss).
  • Degraded mode: GET /themes → 409 no_shopify_session: hide the picker, show a banner "Theme staging unavailable for this shop (no Shopify session) — saves go directly to live", keep today's exact flow.

5. Migration & rollout (backward compatible)

There is no data migration. Existing single-record shops are already in the target state: their record is the live record; staged records appear lazily on first theme-targeted save.

Deploy order (each step independently safe):

  1. Theme extension (Liquid + loader): the new resolution block is pure fallback — shops with no theme metafields take the marqo_settings = shop.metafields.marqo.search_settings path identical to today. Theme app extension versions roll out globally to all shops on release, so this must be merged with the fallback verified in e2e/manual testing first.
  2. Backend (model fields, repo methods, routes, GraphQL theme query, metafield key parameter): additive; legacy request shapes (no theme_id) hit unchanged code paths. New optional model fields require no backfill (Pydantic defaults).
  3. Editor (theme picker, guarded save, deploy): ships last; old editor builds keep calling the legacy shapes.

Coordination with sibling plans: every settings WRITE this plan introduces — staged saves, deploy, rollback — depends on concurrency Phase 1's repo primitives (server-side record_version + _version_condition + conditional save; independently deployable, no client coordination — settings-concurrency-control.md §10). Staged saves are deliberately versioned from birth (§3 step 4): the path has no pre-existing clients, so shipping it un-versioned and retrofitting later would create exactly the lost-update window the concurrency plan exists to close — there is no "staged saves before Phase 1" configuration. What IS dependency-free: the Liquid/theme-extension resolution, theme listing, the GET extensions, DELETE, and the editor's read/preview UX — those can land in any order, but requirements (a)–(d) all activate only with Phase 1 in place. Phase 1 is the smallest, first, server-only step of the four-plan program, so this gates little in practice. Deploy/rollback additionally carry their own transactional backup + content-merge rollback until versioning's general restore supersedes both; when versioning lands, deploy emits its event_type="deploy" version event and the backup/rollback machinery is retired. Scoped-token enforcement on deploy activates when the auth plan lands; until then the endpoint is full-access-API-key-only (current auth model), which is no weaker than today's live save.

6. Edge cases

  1. Staging theme gets published (role → MAIN): resolution rule instantly serves live settings on it (no leak). Its staged record stays; the editor shows it under its new role and POST /settings?theme_id= now rejects it (409) — the merchant deploys it properly or discards. Deploy from it remains allowed (deploy copies content; the source's role is irrelevant — but the deploy response warns when source role is MAIN so the editor can suggest cleanup).
  2. Live theme unpublished (another published): the old MAIN theme becomes unpublished; if it has a stale staged record, that now starts serving on previews of it — correct semantics (it's a preview theme now). Live traffic serves the new MAIN theme → live metafield. No action needed.
  3. Theme deleted in Shopify: staged record + metafield orphaned. GET /themes joins records to themes; records without a matching theme are listed as role: "deleted" with delete-only affordance in the editor. Orphaned metafields are inert (nothing resolves them) and removed by the DELETE endpoint.
  4. Theme renamed: theme_id is stable; theme_name on the record is display-only and refreshed on every staged save and theme listing.
  5. No or stale Shopify access token (session expired/never installed via OAuth/token revoked): theme listing impossible → both "no token" and "token rejected by Shopify" map to 409 no_shopify_session → degraded live-only editor mode; staged saves would 409 at the role-check step anyway. Documented, not silent, never a 500.
  6. Metafield size: see §7 — the JSON-metafield write limit is the binding constraint end-to-end and large merchants are already close to the future 128KB ceiling. New paths (staged save, deploy) fail fast at the ceiling with 422 (§3); the legacy live path is deliberately left byte-identical (oversize → 207, as today) to avoid regressing CSS-heavy shops — with a regression test pinning that. 6a. App embed disabled on the staging theme: app embeds are enabled per theme. A theme duplicated from the live theme inherits the Marqo embed's enablement (the normal redesign workflow — works out of the box), but a freshly installed theme-store theme has it disabled → no #marqo-config meta tag → staged settings can't be previewed there at all. The editor's preview link section shows a hint ("If search doesn't appear, enable the Marqo Search app embed for this theme in the Shopify theme editor"). Detecting enablement programmatically (reading the theme's config/settings_data.json via the Asset API) is out of scope for v1.
  7. Two editors staging the same theme: protected by the per-(pk,sk) optimistic lock from the staged path's first release — staged saves are versioned from birth (§3 step 4), so the second writer gets the canonical settings_conflict 409 and refreshes. Unlike the live record, staged records never pass through an unguarded last-writer-wins phase.
  8. Shopify.theme JS global absent/blocked: irrelevant — resolution is Liquid-side; the JS global is never load-bearing.
  9. Markets/locale domains: localization.* attributes in the embed are orthogonal; theme resolution is per-rendering-theme regardless of market.

7. Size limits & sharding trajectory (the 400KB question)

Raynor asked all settings planners what happens when settings outgrow DynamoDB's 400KB item cap, and whether split-records (shards) are needed. This plan's angle, aligned with the statements in settings-concurrency-control.md §6.1 and settings-versioning.md §"sharding":

Per-theme records multiply record COUNT, not item size. Each staged record is a full settings document in its own DDB item with its own independent 400KB budget and its own lock counter. The largest real merchant payload today is ~120KB (Muji CA; ~30% of the DDB cap). This feature adds zero new pressure on any single item's size.

The binding constraint end-to-end is the Shopify metafield mirror, not DDB. Most metafield types cap at 64KB, but json-type values (what search_settings uses, settings_service.py:183-187) are currently 2MB for apps that used JSON metafields before April 2026 (grandfathered — this app qualifies), dropping to 128KB per write on API version 2026-04+ per the Shopify changelog ("Reduced metafield value sizes") — confirm the exact effective version and grandfathering scope during the implementation spike (step 1). Two API-version pins matter and they differ: the GraphQL client performing the metafield writes pins ShopifyAPI.VERSION = "2024-01" (constants/shopify.py:11) — the pin that governs the write-path limit — while all 16 shopify.app.*.toml manifests declare api_version = "2025-04" (webhook/extension surface). Bumping ShopifyAPI.VERSION past 2026-04 is the explicit 128KB tripwire; that bump must not happen without checking per-shop mirror sizes against the new cap. Since the bump is eventually inevitable, 128KB is the design ceiling — and the largest merchant payload (~120KB, Muji CA) is already at ~94% of it. Ordering of walls: metafield 128KB → DDB 400KB → TransactWriteItems 4MB (the deploy transaction is 2 items + 1 condition check, ~240KB worst case today — never binding). The per-theme metafield keys help here: each staged theme gets its own 128KB budget instead of sharing one value (and Plan B in §2 would forfeit exactly that).

Consequences adopted in this plan: the new-path 422 guard (§3) checks the serialized mirror JSON against the 128KB ceiling (not DDB's 400KB, which the metafield wall makes unreachable); implementation should add a warn-level size log/metric at 100KB per (shop, theme) alongside the existing size log (settings_service.py:178-180), complementing the concurrency plan's 300KB DDB alarm.

Sharding: defer, with the shared trigger. This plan adopts the cross-plan trajectory verbatim: single item per scope now; if a shop crosses the 300KB DDB alarm, move to manifest + N shard items behind a settings_schema_version bump with the lock counter on the manifest only — the key shape extends naturally (SETTINGS#THEME#{id} stays the root/manifest; shards would be SETTINGS#THEME#{id}#SHARD#{n}), and expected_live_version semantics are unaffected because the version always lives on exactly one root item per scope. But note the honest ordering above: a payload big enough to need DDB sharding has already broken the single-metafield distribution model at 128KB, so the realistic trigger is metafield growth, and the realistic response is splitting the mirror (per-component metafields, or serving settings from CDN/KV instead of metafields) — alarmed here, designed when actually approached, out of scope for this plan.

Cross-plan interfaces

Interface proposals were exchanged and confirmed by all three teammates (2026-06-10/11); the agreed contracts below are also recorded in their plans' cross-plan sections.

  • settings-concurrency (docs/plans/settings-concurrency-control.md, confirmed): locking is strictly per (pk, sk) record — SETTINGS#THEME#{id} inherits the mechanism unchanged with its own independent counter. Adopted contract: storage attribute record_version (exposed as version in API payloads), legacy/absent ≡ version 0 via attribute_not_exists, helpers _version_condition(expected_version) / put_item_versioned / update_item_versioned on BaseRepository, and canonical save_settings(settings, *, expected_version, change_source, extra_transact_items=None) — deploy's backup Put and optional staged ConditionCheck ride extra_transact_items. Conflicts raise SettingsConflictError → 409 with {detail: {code: "settings_conflict", expectedVersion, currentVersion, lastUpdated, updatedBy, changeSource}}. Agreed rules: deploy routes through SettingsService.create_or_update_settings(..., expected_version, change_source="theme_deploy") (value added to their enum) rather than the repo; the source counter is never copied onto the target — live increments from its own value; deploy responses return the new live version. Deploy depends on their Phase 1 (server-side only, independently deployable); they rejected last_updated as a condition attribute and this plan follows that rejection.
  • settings-versioning (docs/plans/settings-versioning.md, confirmed): history lives in a dedicated SHOPVER#{domain} partition (NOT under SHOP# — partition-sweep isolation), sk live#{record_version:010d} / theme#{theme_id}#{record_version:010d} — per-record histories cannot interleave, and theme history numbering is per-theme because each theme record has its own counter. Deploy event mapping (their schema, confirmed): the deploy/save distinction is event_type="deploy" (NOT change_source, which the concurrency plan owns as the writing-path label); this plan's source_theme_idsource_scope="theme:{theme_id}" + source_version=<staged record_version deployed>; deployed_byauthor_id (canonical actor string). Capture hooks inside SettingsService.create_or_update_settings, whose signature they're extending with the deploy kwargs — deploy gets history capture for free by routing through it (§3 step 3). Their v1 captures live-scope only; wiring capture for saves to staged theme records is a small extension owned by this plan (a target-record param on the shared path or a direct SettingsVersionService.capture call — converge at implementation). Their restore is content-fields-only — the same rule this plan's rollback follows — and supersedes the rolling backup + rollback endpoint when it lands.
  • storefront-admin-auth (docs/plans/storefront-admin-sso.md, confirmed — scope names adopted verbatim in their §4.5/§8.1 matrix): staged save + theme-record delete → settings:write; direct live save (no theme_id) → settings:write + settings:deploy_live; POST /deploysettings:deploy_live only (independent-flags accepted; promote-but-not-edit persona works); POST /deploy/rollback → both (aligned with their stricter restore-to-live rule). Sessions carry both scopes (interactive auth + this plan's confirm dialogs as the gate); CLI tokens get deploy_live only via warning-gated opt-in; enforcement is immediate when scoped tokens land (resolves Open question 2). Actor identity for deployed_by/audit fields: user:{cognito_sub} | token:{token_id} | api_key:{system_account_id}. Legacy raw API keys: wrapped by their new authenticate_storefront_request dependency with all settings scopes and shops=('*',) — expressed in code on the legacy branch, no data migration — persisting through their STOREFRONT_LEGACY_KEYS allow→warn→deny ratchet, so existing API-key callers (including deploy) keep working until deny.

Test plan

Per CLAUDE.md, tests are a completion precondition for every new branch and error path.

Backend (pants test //components/shopify/admin_server::)

  • Repository: theme record round-trip with new sk; list_theme_settings returns only SETTINGS#THEME#* (excludes live and SETTINGS#DEPLOY#BACKUP); list_settings_by_system_account excludes theme and backup records; delete removes only the targeted record; the deploy transaction (canonical save_settings + extra_transact_items) on condition failure surfaces SettingsConflictError with neither item written.
  • Service: first staged save copies live (created_from="live"); staged save without a live record → no_live_settings error; staged merge semantics match live merge; save_settings_to_metafields writes search_settings_theme_{id} for staged and search_settings for live; over-limit payload on staged save fails fast (422 semantics) before the DDB write; staged save with a stale version → canonical settings_conflict 409 with no write, and first staged create succeeds via attribute_not_exists(record_version); regression: a near-ceiling legacy live save still succeeds exactly as today — fixture parameterized at ~95% of the 128KB metafield design ceiling (~122KB, mirroring the largest real payload, ~120KB Muji CA) and derived from the same constant the 422 guard uses, so the test tracks the ceiling if it changes (DDB written, 207 on metafield failure, no 422); deploy copy preserves active_index/system_account_id/cell_id and stamps deploy audit fields; sk validator accepts the three shapes and rejects others.
  • Routes (storefront_routes_test.py): GET /themes happy path (mocked GraphQL), no-token 409, stale-token (Shopify 401) → 409 not 500, has_staged_settings join; GET /settings?theme_id exists/not-exists meta shapes and live fallback payload; POST /settings?theme_id rejects MAIN-role theme (409), unknown theme (404), missing live record (409), writes correct record + metafield key, 207 partial with the "not previewable" message on metafield failure, no public-metafield writes on staged save; legacy no-param save byte-identical behavior (regression); deploy: success writes backup + live transactionally with incremented record_version, missing staged record 404, record_version mismatch → 409 with neither live nor backup mutated, legacy version-0 live record deploys via attribute_not_exists, metafield failure → 207 deployed_not_live, re-run converges content, staged record retained, deployed live record preserves active_index/system_account_id/cell_id/metadata from the current live record; rollback: restores backup content onto current live, preserves an active_index/metadata changed after the deploy (the clobber test — must fail against a verbatim-swap implementation), swaps current live content into backup (rollback is reversible), 404 with no backup, 409 on conflict; staged + backup records carry no system_account_id and never appear in GSI_SystemAccountId; DELETE removes record + metafield and tolerates missing metafield.
  • GraphQL: theme query GID→numeric id extraction.

Editor (npm test in components/storefront_admin)

  • api-client: themeId query param threading, deploy/rollback/delete/listThemes request shapes.
  • use-settings: target-keyed reload, dirty reset on target switch, switch-with-unsaved-changes prompt, staged saves send version from meta.version and surface the 409 conflict refresh prompt, deploy flow incl. fresh live fetch for expected_live_version, 409 conflict refresh path, 207 deployed_not_live banner state.
  • Components: theme picker rendering (live badge, staged dot, deleted-theme entry), live-save confirm dialog gating the save call, deploy dialog content + conflict banner + retry banner, rollback confirm, degraded no-session banner, embed-enablement hint.

Liquid/widget (manual + e2e)

  • Dev store smoke test (step 1 of implementation): duplicate theme, write a theme-keyed metafield by hand, verify dynamic-key lookup and role-based selection in preview vs. live, verify data-settings-source. Then an e2e scenario in components/shopify/e2e_tests (CI): staged save → preview URL shows staged CSS, live URL unchanged → deploy → live shows it. Loader change (readConfigFromDom new attributes) covered by existing storefront_search vitest patterns if present, else by the e2e.

Implementation order

  1. Liquid feasibility spike on a dev store — load-bearing assumptions, fail fast; fall back to Plan B (§2) if it fails. Checklist: (a) theme.id/theme.role are available inside a target: "head" app-embed block specifically (global-object docs say yes; verify in this exact rendering context); (b) dynamic bracket lookup shop.metafields.marqo[var]; (c) role-based selection behaves correctly under ?preview_theme_id= and the theme-editor preview; (d) confirm the JSON-metafield write limit effective for ShopifyAPI.VERSION (§7).
  2. Model + repository + service (staged records, sk validator, metafield key param, new-path size guard, list_shop_sessions sk-prefix hygiene fix) + tests. The staged save methods build on concurrency Phase 1's repo primitives (§5) — read paths and record shapes do not.
  3. Theme GraphQL query + GET /themes + tests.
  4. Extended GET/POST settings routes + tests (staged POST blocked on concurrency Phase 1, as step 2).
  5. Deploy + rollback + DELETE endpoints (canonical conditional save + extra_transact_items backup; content-merge rollback; net-new metafieldsDelete mutation + ShopifyService.delete_metafield + userErrors test per §3) + tests. Blocked on settings-concurrency Phase 1, as are the staged write paths in steps 2 and 4 (§5); steps 1, 3, 6, the read paths, and the editor work minus save/deploy wiring are not.
  6. Theme extension change (Liquid + loader debug attrs) + e2e.
  7. Editor: client + hook refactor, theme picker, guarded save, deploy/rollback dialogs + tests.

Out of scope

  • Per-theme public metafields (searchBase, indexId, storefrontAccessToken, baseCurrency, cdnBase) — infrastructure config, theme-invariant.
  • Staging for index/search settings (x-marqo-settings-override already serves tuning), merchandising rules, or promo metaobjects.
  • Per-theme widget bundle versions (settings only).
  • Embedded Shopify app settings page (settings_routes.py) and the deprecated admin_ui ui-customization page — they keep their current live-write behavior; adding the live-save warning there is a fast follow.
  • Visual diff of staged vs. live settings (versioning plan owns history/diff UX; v1 deploy dialog shows a coarse component-count diff only).
  • Automatic deploy on Shopify theme publish events (explicitly rejected — publish must never imply settings promotion).

Open questions

  1. storefront_routes.py:160,218,233 claims settings propagate via "DDB stream → KV export", but no such path exists for ShopifyEntities (verified: ecom_settings_exporter/lambda_function.py:106,156 reads only INDEX_SETTINGS_TABLE_NAME). The 207 response messages promising KV propagation are therefore misleading today. Fixing those two legacy messages is owned by the settings-concurrency plan (its §4.5 touches that route); this plan only ensures its new messages don't repeat the claim. If a ShopifyEntities→KV export is ever built, staged/backup records must be excluded from it (export only sk=SETTINGS). To confirm with Raynor.
  2. Should settings:deploy_live on the no-theme_id POST be enforced immediately when the auth plan lands? Resolved with storefront-admin-auth: enforcement is immediate for scoped tokens; legacy raw keys are exempt via their dependency's legacy branch until the STOREFRONT_LEGACY_KEYS ratchet reaches deny (see Cross-plan interfaces).