Storefront Admin: SSO with Marqo Console Accounts + Scoped Tokens for CLI

Status: Draft for review (Raynor)
Owner: storefront-admin-auth (feature-plans team)
Date: 2026-06-10
Related plans: settings concurrency control (task #1), settings versioning (task #2), theme-targeted deploys (task #4) — see Cross-plan interfaces

Summary

Replace "paste a raw, never-expiring, admin-scope API key into a login form and keep it in localStorage" with:

SSO: log in to storefront_admin with a Marqo cloud console account (Cognito email/password or Google/GitHub via Cognito Hosted UI), reusing the Controller's existing identity stack and building directly on the in-flight worktree-feat+cognito-login branch.
Sessions: short-lived, scoped session JWTs held in an httpOnly cookie; the React Router 7 worker proxies admin_server calls and guards routes in loaders. Raw API keys and Cognito tokens never sit in localStorage.
CLI tokens: revocable, expiring, scope- and shop-limited tokens minted in the storefront_admin UI, stored hashed in DynamoDB, replacing raw --api-key usage in scripts/ecom/storefront_settings.py and agent tooling.
Scopes: settings:read, settings:write, settings:deploy_live, tokens:manage, plus per-shop scoping, enforced by a new unified auth dependency in admin_server. Legacy raw-API-key callers keep working unchanged during a metered deprecation.

1. Current state (verified, with evidence)

1.1 storefront_admin (the editor)

React Router 7 app on a Cloudflare Worker (components/storefront_admin), deployed at shopify.marqo-ep.ai (prod), staging-shopify.dev-marqo.org, preprod-shopify.dev-marqo.org (components/storefront_admin/wrangler.toml). No KV/DO bindings; the worker only validates env (Zod) and renders (workers/app.ts:17-41, app/.server/env.ts:7-13 — only ENV, FULL_ENV, ADMIN_SERVER_BASE_URL).

Login = paste raw Marqo API key, validate by calling listShops(), pick a shop (app/routes/_index.tsx:36-97).
Auth state = three localStorage keys including the raw API key; isAuthenticated: !!storedKey (app/hooks/use-auth.ts:11-13,26-40). No expiry, no server-side session, no loader guards — the editor route is gated client-side only.
API calls go browser → admin_server directly with Authorization: Bearer <raw key> (app/lib/api-client.ts:28-31), against https://admin.ecom.marqo.ai/api/v1/storefront/*.
Live preview queries the search proxy directly with the read-only x-marqo-index-id credential, not the API key (README "How the live preview works") — unaffected by this plan.

1.2 admin_server (the API)

Storefront routes (components/shopify/admin_server/admin_server/routes/storefront_routes.py) authenticate with authenticate_api_key_request (auth/dependencies.py:220-274): decrypt the key locally (3DES-ECB, secret from Secrets Manager — utils/api_key_utils.py) to get {system_account_id, cell, token}, then validate against the per-cell legacy control plane POST /account/key/validate through the IAM-signed ControlPlaneGateway (components/ecom_utils/ecom_utils/control_plane/gateway.py:30-120).
No scope check exists in admin_server. Any valid key gets full access to every storefront route and every ecom route (index delete, doc writes, etc. — routes/ecom_routes.py). The validate response does return a scope field — the Controller checks it (components/controller/account/authentication/api_key_auth.py:16,45-60, ALLOWED_SCOPES = {"read", "read_write", "admin"}), admin_server ignores it.
Per-shop access control exists: resolve_storefront_shop 403s when the shop's system_account_id doesn't match the key's (admin_server/dependencies.py:725-755).
Author identity: writes stamp updated_by_user_id = f"api_key:{auth.system_account_id}" (routes/storefront_routes.py:174-179 → services/settings_service.py:60-89; field on models/shopify_entities.py:111). Every write by every person on an account is attributed to the same string.
CORS is allow_origins=["*"], allow_headers=["*"] (admin_server/main.py:66-70) — any website can make authenticated requests if it has a key.

1.3 Console identity (the thing to reuse)

The live console (cloud.marqo.ai) is the Django Controller (components/controller), backed by AWS Cognito:

Email/password sign-in → POST /account/signin; Google/GitHub SSO → Cognito Hosted UI code → POST /account/sso exchange (controller/account/views/sso.py).
Token verification: standard Cognito JWKS (controller/account/authentication/verification.py:23-75, AccessTokenVerifier fetching https://cognito-idp.{region}.amazonaws.com/{userpool_id}/.well-known/jwks.json), then cognito_service.get_user_complete + membership resolution (backend.py:22-110).
Users belong to accounts via memberships with roles OWNER/MEMBER/MERCHANDISER ("MERCH") and status PENDING/ACTIVE (controller/models/users_accounts.py:27-35,242-260). MERCH is a deliberately restricted role: the console gates merchandiser users into a limited view via the IS_MERCHANDISER feature flag (controller/account/feature_flagging_checks.py:126, with a standing TODO to move to proper RBAC).
The monolith identity_service in this repo (Cognito + GCP backends, sso() is a TODO at identity_service/index.py:81-83) is not in production use (root CLAUDE.md: "These are not currently used") — do not build on it.

1.4 In-flight prior art: `worktree-feat+cognito-login`

Branch worktree-feat+cognito-login (6 commits, dc4739966..928a05831) already implements Cognito login UX for storefront_admin:

app/lib/auth-client.ts (new): ControllerAuthClient — POST /account/signin, POST /account/sso (code exchange with redirect_uri), GET /api_keys?accountId=; pickBestApiKey() prefers admin > read_write > any; resolveApiKeyAndShops() fetches the raw API key and shop list.
app/routes/auth.callback.tsx (new): OAuth callback; OAuth state CSRF protection; single state across Google/GitHub buttons.
use-auth.ts: stores Cognito token + account id + raw API key in localStorage; treats Cognito token expiry as session anchor.
Controller side: _validate_redirect_uri checks the redirect origin against CORS_ORIGIN_WHITELIST (account/views/sso.py), config additions per env, tests (test_sso.py).
wrangler.toml/env.ts: adds CONTROLLER_BASE_URL (e.g. https://cloud.marqo.ai/api), COGNITO_HOSTED_UI_URL (https://auth.controller.marqo.com), COGNITO_CLIENT_ID.

Gap: it solves the login UX but the security posture is unchanged — the session still degenerates to a raw admin key in localStorage, with no scopes, no expiry on the key, no revocation, and no server-side session. This plan adopts its Controller-side work and login UI wholesale, and replaces the "resolve raw API key" step with scoped session tokens.

1.5 CLI / agent usage today

scripts/ecom/storefront_settings.py (backup/push/diff helper, BASE_URL = "https://admin.ecom.marqo.ai/api/v1/storefront", line 36) takes --api-key <raw key>. The storefront CSS customization guide and agent workflows pass raw keys on the command line; keys end up in shell history, tmp files, and agent context/memory. This is the second consumer of the new tokens.

1.6 Internal-staff prior art (not reused, for contrast)

admin_worker/admin_lambda use Cloudflare Access JWTs validated by an API Gateway JWT authorizer (admin_worker/app/.server/gateway.ts:67-78; admin_lambda/admin_lambda/auth/auth.py). That stack authenticates Marqo staff via Cloudflare Zero Trust and is unsuitable for customers (they aren't in our CF org), but two patterns carry over: the worker-as-auth-proxy (gateway.ts promotes a credential into Authorization before forwarding) and the Zod refine that forbids local-dev escape hatches outside dev (admin_worker/app/.server/env.ts:36-38).

2. Threat model

Assets: shop settings (defacement of live storefront UX, malicious custom CSS/JS-adjacent injection into merchant sites), live theme deploys (task #4 — direct write access to a customer's published Shopify theme), Marqo index/account control (the same key works on /api/v1/indexes — deletion, data exfiltration), Shopify access tokens reachable through settings writes (metafield sync uses stored shop tokens, storefront_routes.py:182-208).

Actors: merchant users (per-account), Marqo staff/SE doing integrations, CI/agents running CLI scripts, attackers (XSS on the editor, stolen laptops/shell history, leaked keys in repos/docs/screen shares).

Today's weaknesses (ranked):

#	Weakness	Impact
W1	Raw admin-scope key in localStorage; any XSS in the editor (it renders merchant-controlled CSS/templates in a live preview) exfiltrates a credential that can delete indexes	Critical
W2	Keys never expire and revocation = deleting the key in the console, which breaks every other consumer of that key (widget provisioning, scripts) — so in practice nobody revokes	High
W3	No scopes at admin_server; the validate response's `scope` is ignored (`auth/dependencies.py:247-254`), so even a `read` key gets write access on storefront routes	High
W4	CLI keys in argv/shell history/agent memory; one paste into the wrong place = W1 without needing XSS	High
W5	No per-user identity — `updated_by_user_id` is `api_key:{account}` for everyone; no audit trail, no conflict attribution (blocks tasks #1, #2)	Medium
W6	admin_server sets `allow_origins=[""]` with `allow_credentials=True`* (`main.py:66-71`) — Starlette reflects the request Origin when credentials are enabled, so any origin can already make credentialed cross-site requests to admin_server. Today nothing is exploitable only because auth is a Bearer header (which an attacker page can't auto-attach without the key) and no cookie auth exists. This config must be fixed in the same PR that introduces cookies (§7 step 3), not after	Medium
W7	API key cipher is 3DES-ECB (`utils/api_key_utils.py`) — legacy format, out of scope to replace here, but new tokens must not inherit it	Low (here)

Design responses: W1 → httpOnly cookie + worker proxy, short-lived session; W2/W4 → dedicated revocable CLI tokens, key stays in the console vault; W3 → scope enforcement dependency; W5 → actor identity on every request; W6 → CORS allowlist shipped in the same PR as cookie auth; W7 → new tokens are asymmetric-JWT/SHA-256, no new uses of the 3DES path.

3. Goals and non-goals

Goals

Console account (Cognito) login to storefront_admin — email/password and Google/GitHub SSO.
No raw API key or Cognito token in localStorage; sessions expire and are server-verifiable.
Scoped, shop-bound, revocable, expiring tokens for CLI/agents, self-served from the storefront_admin UI.
Scope + shop enforcement in admin_server on all /api/v1/storefront/* routes (and exported for task #4's deploy routes).
Per-request actor identity available to versioning (task #2) and concurrency (task #1).
Zero-lockout rollout: legacy API-key auth keeps working until metrics show it's unused, and remains as a documented break-glass path.

Non-goals (out of scope)

Changing auth on non-storefront admin_server routes (/api/v1/indexes, docs, sync, collections) or the Shopify-embedded-app session auth.
Replacing the 3DES API-key format or the per-cell /account/key/validate contract.
Console (cloud.marqo.ai) UI changes beyond Cognito app-client callback config + CORS allowlist entries (already drafted on the cognito-login branch).
Cognito custom claims / groups; SCIM; fine-grained roles beyond OWNER/MEMBER passthrough.
Registering the token prefix with GitHub secret scanning (worth doing later; noted, not planned).
search_proxy auth (the preview's x-marqo-index-id read-only credential is unchanged).

4. Recommended design

4.1 Why reuse the console Cognito identity (and not invent anything)

It's the only production customer IdP we have; users already exist, with verified emails, MFA-capable Cognito pools, and Google/GitHub federation via the Hosted UI (auth.controller.marqo.com).
The Controller-side enablement (redirect-uri allowlisting, SSO code exchange for an external origin) is already written and reviewed on worktree-feat+cognito-login — we adopt those commits rather than re-deriving them.
Cognito access tokens are standard JWTs verifiable offline via JWKS; the verification pattern already exists in-repo (controller/account/authentication/verification.py:23-75) and admin_server can copy it without calling Cognito on the hot path.
Alternatives rejected: Cloudflare Access (staff-only, customers not in our ZT org); building a new IdP in identity_service (explicitly not in production, sso() unimplemented); passing Cognito tokens straight to admin_server on every request (Cognito tokens carry no scopes/shops/account claims, forcing a membership lookup per request and making CLI tokens a separate mechanism anyway — one token format the server mints itself is simpler and uniform).

Browser                storefront_admin worker          Controller (cloud.marqo.ai)      admin_server
   |                            |                                |                           |
   |--- GET / (login page) ---->|                                |                           |
   |  [Google/GitHub] redirect to Cognito Hosted UI (state=CSRF nonce, redirect_uri=/auth/callback)
   |--- GET /auth/callback?code=... -->|                         |                           |
   |                            |-- POST /account/sso (code, redirect_uri) ->|               |
   |                            |<------- cognito access token, account list ---------------|
   |                            |-- POST /api/v1/storefront/auth/session ------------------>|
   |                            |   {cognito_token, account_id}   verifies JWKS sig locally  |
   |                            |                                 verifies membership via    |
   |                            |                                 Controller, mints session JWT
   |                            |<-------------------- {session_jwt, shops, email} ---------|
   |<-- Set-Cookie: __sft_session=<jwt>; HttpOnly; Secure; SameSite=Lax; redirect /editor    |

Email/password follows the same shape: the login form posts to a worker action, which calls Controller POST /account/signin server-side, then the same auth/session exchange. The password transits the worker but is never stored; this matches what the console itself does and what the cognito-login branch does client-side — moving it server-side keeps credentials out of browser JS.
Multi-account users: Controller signin/SSO responses include the account context (get_user_accounts_and_selected, controller/account/views/account_data.py); if the user has >1 account, the worker shows an account picker before the auth/session exchange (replacing today's shop picker step at the same spot in the UX; shop picking stays after login as today).
OAuth state: keep the branch's CSRF nonce, but store it in a short-lived httpOnly cookie set by the worker (not localStorage), compared in the callback action. This cookie must be SameSite=Lax and Max-Age ≤ 10 minutes — not Strict: the return from the Cognito Hosted UI is a cross-site top-level navigation, and Strict would withhold the cookie exactly when the callback needs it.

New admin_server endpoint — POST /api/v1/storefront/auth/session:

Verify the Cognito access token signature/expiry/client_id/issuer against the pool's JWKS (new admin_server/auth/cognito_verifier.py, ported from controller/account/authentication/verification.py; JWKS cached with TTL; pool id + client id from env). Fail fast on any mismatch (CLAUDE.md: no silent fallbacks).
Resolve membership authoritatively via a new, thin Controller endpoint — GET /account/memberships — which is a committed PR2 deliverable, not a fallback. No existing Controller endpoint fits: sso.py and the account-context view (get_user_accounts_and_selected, controller/account/views/account_data.py) return only the user's selected account, not the membership set, and nothing confirmed today takes a raw Cognito access token and returns memberships. The new view authenticates the user's own Cognito access token through the existing CognitoAuthentication backend (controller/account/authentication/backend.py:22) and returns every membership as {account_id, system_account_id, role, cell_id}. Note the join this requires: membership records carry only {cognito_username, visible_account_id, role, status} (BaseMembershipData, controller/models/users_accounts.py:242-249) — neither system_account_id nor cell_id is on them; both live on AccountData (users_accounts.py:174-181). The view therefore does a per-account AccountData read for each membership (N+1 reads at login time — acceptable; this endpoint is never on the editor's request path). That same join is the authoritative account_id ↔ system_account_id mapping source: AccountData carries both visible_account_id and system_account_id, confirming they are distinct identifiers whose mapping is held on the account record (feeds step 3's parity gate). admin_server requires the requested account_id ∈ that set and the membership's status == ACTIVE (MemberStatus, users_accounts.py:33-35 — a PENDING invitee must not mint a session); the request param is a selector, never trusted. The membership's role feeds the role→scope mapping in step 4. Without this endpoint, session issuance does not ship — there is no "trust the client's account_id" interim mode.
Take system_account_id (and cell_id) for the JWT only from the matched membership record returned by step 2 — never from client input. Whether console account_id and system_account_id are the same identifier is unresolved (the cognito-login branch lists API keys by account_id; resolve_storefront_shop and the settings GSI key by system_account_id); this is settled by a PR2 design spike with a test gate: an integration test (staging data) asserting that GET /shops under a minted session returns exactly the same shop list as under a legacy API key of the same account. PR2 does not merge until that parity test passes.
Mint the session JWT (4.4) with scopes derived from the membership's role (role→scope mapping, 4.4A; unknown roles fail closed) and return it with the user's shop list.

This is a login-time exchange — Controller/Cognito are not on the per-request path. If the Controller is down, existing sessions keep working; only new logins fail (and the break-glass path in §7 still works).

4.3 Session handling in the RR7 worker

Cookie: __sft_session, httpOnly, Secure, SameSite=Lax, Path=/, Max-Age = session lifetime. Stateless (the JWT is the session); no KV needed.
Loader guards: a shared requireSession(request, context) helper in app/.server/session.ts parses + verifies the cookie (ES256 signature check against the public verification key, a plain Wrangler var SESSION_JWT_PUBLIC_KEY — see 4.4A; the worker holds no signing material) and redirect("/") on missing/expired/invalid. Applied in loaders of editor.tsx, editor.$section.tsx, and the new tokens route. The root loader exposes {email, accountId, shops} claims to the UI — components stop reading localStorage for auth.
API proxying: new resource route app/routes/api.proxy.$.ts — the browser calls same-origin /api/proxy/<path>; the worker verifies the cookie, then forwards to ADMIN_SERVER_BASE_URL with Authorization: Bearer <session JWT>. This mirrors admin_worker's gateway promotion pattern (gateway.ts:67-78). ApiClient changes only its baseUrl and drops the key parameter. Methods allowlist: GET/POST/PUT/DELETE on /api/v1/storefront/* and /api/v1/account only.
CSRF: SameSite=Lax + same-origin proxy + JSON content-type checks on admin_server cover the cookie-auth CSRF surface (no cross-site POST can carry the cookie with Lax except top-level navigations, which don't POST JSON). The proxy additionally rejects requests whose Origin/Sec-Fetch-Site indicate cross-site.
Session refresh: the proxy transparently refreshes near-expiry sessions — when the cookie's JWT has < 30 minutes left, the worker calls POST /api/v1/storefront/auth/session/refresh (Bearer = the still-valid session JWT) and admin_server re-mints with the same claims, new exp/jti, preserving orig_iat; refresh is refused once now - orig_iat > 24h (absolute cap → forced re-login). Active users never see a logout mid-edit; idle sessions die within 2h.
Logout: clears the cookie; optional best-effort Controller logout. Session JWTs are not individually revocable — the mitigations are the short 2h lifetime + 24h cap (see 4.4A; a jti deny-list checked on the proxy was considered and rejected — at a 2h lifetime the added state buys little over the cap, and the deny-list itself becomes an availability dependency); the account-wide kill switch is rotating the signing key in Secrets Manager.
CORS fix (same PR as cookies — §7 step 3): admin_server today runs allow_origins=["*"] with allow_credentials=True (main.py:66-71), which makes Starlette reflect any Origin on credentialed requests. The cookie itself never reaches admin_server — it is scoped to the worker's host (shopify.marqo-ep.ai) and the worker swaps it for a Bearer header, which is the actual cross-site safety property of this design. But shipping cookie auth anywhere in the system with that CORS config standing is indefensible defense-in-depth posture: the cookie-introducing PR (PR5) also changes admin_server CORS to an explicit origin allowlist (storefront_admin origins, admin app origins, localhost dev ports) and drops allow_credentials (nothing uses cookies against admin_server). The CSRF surface that does exist — the worker proxy, same host as the cookie — is covered by the SameSite=Lax + Origin/Sec-Fetch-Site checks above.

4.4 Token design

Two token types, one verifier (admin_server), one scope model.

A. Session JWT (UI sessions) — stateless, short-lived.

{
  "iss": "marqo-storefront-admin",
  "token_use": "session",
  "sub": "user:<cognito_sub>",
  "email": "raynor@marqo.ai",
  "account_id": "<console account id>",
  "system_account_id": "<system account id>",
  "scopes": ["settings:read", "settings:write", "settings:deploy_live", "tokens:manage"],
  "shops": ["*"],
  "jti": "<uuid>",
  "iat": 1760000000,
  "orig_iat": 1760000000,
  "exp": 1760007200
}

ES256 (asymmetric). The private signing key lives in AWS Secrets Manager (STOREFRONT_SESSION_SIGNING_KEY_NAME) and only admin_server can sign. The worker verifies with the public key, delivered as a plain (non-secret) Wrangler var SESSION_JWT_PUBLIC_KEY — a compromised worker config can read sessions' claims but can never mint or alter one. Rotation: tokens carry a kid header; verifiers (admin_server and worker) accept the current + previous public keys, the signer uses current.
Lifetime 2h sliding, 24h absolute cap. Each JWT lives 2 hours; the worker proxy silently refreshes it when < 30 min remain (4.3) via POST .../auth/session/refresh, which re-mints with the same claims and the original orig_iat; refresh is denied past orig_iat + 24h. Rationale: this tool mutates live storefronts, sessions are not individually revocable, and the only global remedy (key rotation) logs out everyone — so the exposure window of a stolen cookie must be short. Idle theft window ≤ 2h; active-attacker window ≤ 24h; honest users re-login at most daily.
jti is logged with every write for audit and is offered to task #1 as a same-user-two-tabs discriminator; refresh issues a new jti (the chain is linkable via orig_iat + sub in logs).

Role→scope mapping at mint (UserRole, users_accounts.py:27-31): session scopes are derived from the membership role — interactive authentication alone does not grant everything:

Membership role	Session scopes
`OWNER`, `MEMBER`	`settings:read`, `settings:write`, `settings:deploy_live`, `tokens:manage`
`MERCHANDISER` (`"MERCH"`)	`settings:read`, `settings:write`, `tokens:manage` — no `settings:deploy_live`
any other / future role	mint refused (403, explicit error) — a role added in the Controller can never silently inherit full scopes

Rationale: MERCH is the console's restricted role (feature-flag-gated limited view, feature_flagging_checks.py:126), and settings:deploy_live mutates live storefronts — the most privileged operation in this surface. The CLI-token subset rule (4.4B) composes with this: a MERCH session can only mint tokens ≤ its own scopes, so MERCH can never produce a deploy-capable credential. Honest consequence (same shape as the §4.5 CLI-default note): until task #4's staged records exist, every settings save is a live mutation, so MERCH sessions are effectively read-only on settings; revisiting MERCH live-save once staged saves exist is a named follow-up (§11). The deploy UI's confirmation step (task #4's gate) remains on top for roles that do hold the scope. tokens:manage is session-only.

Lambda runtime note: admin_server runs as Lambda via Mangum (run_lambda.py:5), so in-process caches (Cognito JWKS, key material from Secrets Manager) survive only per warm container. That's acceptable by construction: JWKS verification happens on the login/refresh path only (never per-request), so a cold start costs one JWKS + one Secrets Manager fetch on a login — not on editor traffic.

B. CLI token (PAT) — opaque, stored, revocable.

Format: mqsft_<token_id>_<secret> where token_id = 12-char base32, secret = 32 bytes urlsafe-base64. Distinct greppable prefix; never a JWT (nothing to decode offline, nothing leaks if the signing secret leaks).
Storage: ShopifyEntities table (same table as settings, no new infra): PK=AUTHTOKEN#{token_id}, SK=DETAILS, attributes: secret_hash (SHA-256), system_account_id, account_id, scopes: list, shops: list, name, created_by (actor string of the creating session), created_at, expires_at (required, default 90d, max 365d), revoked_at?, last_used_at (updated via a conditional update_item that only fires when the stored value is older than one hour — concurrency-safe across parallel Lambda containers and bounds write load; a lost update here is cosmetic and acceptable). New Pydantic StorefrontAuthToken(RecordModel) in models/shopify_entities.py. GSI on system_account_id for listing (reuse the existing system-account GSI pattern used by list_settings_by_system_account).
Issuance UX: new "CLI tokens" page in storefront_admin (session auth, tokens:manage): create (choose name, scopes — deploy scope behind an explicit warning toggle, shops, expiry), list (name, scopes, shops, last used, expiry), revoke. Secret displayed exactly once at creation.
Endpoints (session-auth only; CLI tokens deliberately cannot mint or revoke tokens — no self-escalation):
- POST /api/v1/storefront/auth/tokens → {token: "mqsft_...", ...metadata}. Requested scopes must be ⊆ the creating session's scopes; requested shops ⊆ the session's shops.
- GET /api/v1/storefront/auth/tokens → metadata list (never secrets/hashes).
- DELETE /api/v1/storefront/auth/tokens/{token_id} → sets revoked_at.
Verification: prefix parse → DDB get by token_id → constant-time SHA-256 compare → reject if revoked_at set or expires_at past → build auth context. One DDB point-read per request; no cache in v1 (revocation is then immediate).
CLI usage: scripts/ecom/storefront_settings.py gains MARQO_STOREFRONT_TOKEN env-var support (preferred) while keeping --api-key working; the CSS customization guide and agent prompts switch to the env var. Same Authorization: Bearer header — admin_server distinguishes credential types by shape (4.6).

4.5 Scope model

Scope	Grants	In sessions (OWNER/MEMBER; see 4.4A for MERCH)	Default in CLI tokens
`settings:read`	GET shops/settings/fields/defaults, `GET /api/v1/account`	yes	yes
`settings:write`	settings mutations that do not touch the live record: staged/theme-scoped saves, theme-record deletes (task #4), restore-to-staged (task #2)	yes	yes (unticked-able)
`settings:deploy_live`	any mutation of the live settings record (`sk=SETTINGS`) or the live `search_settings` metafield: today's plain `POST .../settings` (no `theme_id`), task #4's `POST /deploy`, restores targeting live	yes (UI adds per-action confirm)	no — explicit opt-in with warning
`tokens:manage`	create/list/revoke CLI tokens	yes	never (not grantable)

Shop scoping: shops claim/attribute — ["*"] (all shops of the account, tracks newly connected shops) or an explicit domain list. Enforced in resolve_storefront_shop (4.6): a shop must pass both the existing account-ownership check and the token's shop list.
Names are flat strings, validated against a closed registry (admin_server/auth/scopes.py, a frozenset + helpers) — unknown scope in a token record fails closed at verification (fail fast).
settings:deploy_live name and semantics are taken from the theme-deploys plan (docs/plans/theme-targeted-deploys.md, Cross-plan interfaces section): the two flags are independent (deploy-without-edit is a valid reviewer persona), and the privileged scope gates live-record mutations, not a specific endpoint. Until task #4's staged-settings records exist, every settings POST is a live mutation — so a default CLI token (no settings:deploy_live) is read-only in practice until staged saves ship. The token-creation UI must say this explicitly so users minting a push-capable token know to tick the deploy scope.
Enforcement timing (theme-deploys' recommendation, adopted): scoped tokens are enforced immediately on landing; legacy API keys are exempt (all scopes except tokens:manage) during the §5 transition.

4.6 Authorization enforcement in admin_server

New module admin_server/auth/storefront_auth.py. The full principal enum {console_user, cli_token, api_key, shopify_user} is defined exactly once, in admin_server/models/auth.py (the single-source file that also holds the actor grammar below); storefront auth uses a documented subset alias of it — storefront credentials can only ever be the first three, while shopify_user exists in the shared enum for the embedded-app surface (tasks #1/#2 author records). One definition, two views — implementers must not create a second enum:

# admin_server/models/auth.py (single source)
PrincipalType = Literal["console_user", "cli_token", "api_key", "shopify_user"]
# Subset produced by storefront credentials (no Shopify session auth on this surface):
StorefrontPrincipalType = Literal["console_user", "cli_token", "api_key"]

# admin_server/auth/storefront_auth.py
class StorefrontAuthContext(BaseModel):
    actor_id: str            # "user:<sub>" | "token:<token_id>" | "api_key:<system_account_id>"
    actor_display: str | None  # email | token name | None for legacy keys
    principal_type: StorefrontPrincipalType  # subset of the shared PrincipalType (models/auth.py)
    principal_id: str        # cognito_sub | token_id | system_account_id (the bare id, no prefix)
    system_account_id: str
    cell_id: str | None      # only resolvable for api_key creds; None otherwise (see note)
    scopes: frozenset[str]
    shops: tuple[str, ...]   # ("*",) or explicit domains
    session_id: str | None   # jti for sessions; offered to task #1

authenticate_storefront_request (FastAPI dependency) classifies the Bearer credential:

mqsft_ prefix → CLI-token path (4.4B).
Three-dot JWT with iss=marqo-storefront-admin → verify session JWT.
Otherwise → legacy path: delegate to the existing authenticate_api_key_request logic (auth/dependencies.py:220-274) unchanged, then wrap the result in a StorefrontAuthContext with all scopes except tokens:manage (token CRUD requires a console session, §4.4B) + all shops (zero behavior change for existing callers — no legacy route is lost because token CRUD is new). Additionally read the scope field that /account/key/validate already returns and log (not enforce, v1) when a read-scope key performs a write — input for the deprecation ratchet.

require_scopes(*scopes) returns a dependency that raises 403 (insufficient_scope, listing the missing scope) — applied per-route:

GET /shops, GET .../settings, GET .../fields, GET /defaults → settings:read
POST .../settings → settings:write
token CRUD → tokens:manage
task #4 deploy routes and the existing live POST .../settings → settings:deploy_live (dependency exported for task #4's routes)

resolve_storefront_shop (dependencies.py:725-755) gains the shop-list check after the existing ownership check; 403 message distinguishes "not your shop" from "token not scoped to this shop".

Cell note: cell_id today comes from decrypting the raw key (auth/dependencies.py:236-251) and is used by get_fields to resolve a data-plane key (storefront_routes.py:299-301). For session/CLI credentials, persist cell_id into the session/token record at issuance (Controller membership data knows the account's cell) so get_fields keeps working for all credential types. This is a concrete implementation task, not an afterthought — get_fields breaks otherwise.

Identity propagation (tasks #1/#2): storefront_routes.save_settings passes auth.actor_id (and actor_display) instead of the hardcoded f"api_key:{...}" at storefront_routes.py:174-179. updated_by_user_id keeps receiving the actor string — format is backward compatible (legacy callers produce the identical string as today).

Canonical actor grammar (single source, shared with tasks #1/#2/#4): every canonical actor_id is uniformly prefixed — <prefix>:<principal_id> with no unprefixed canonical form — and the prefix ↔ principal_type mapping is defined once, as code, in admin_server/models/auth.py (importable by the versioning/concurrency/deploy code, which all live in the same component). The locked string forms:

`actor_id` form	`principal_type`	produced by
`user:{cognito_sub}`	`console_user`	session JWT (`sub`)
`token:{token_id}`	`cli_token`	CLI token verification
`api_key:{system_account_id}`	`api_key`	legacy key path
`shopify_user:{shopify_user_id}`	`shopify_user`	embedded-app writers, normalized at capture time by the tasks #1/#2 write paths (the raw Shopify JWT user id gets the prefix when an actor_id-bearing record is produced)

The canonical enum is {console_user, cli_token, api_key, shopify_user}. Derivation is a plain split at the first :: known prefix → (principal_type, principal_id); unknown prefix → fail closed (ValueError), never silently bucketed into principal_id.

Legacy-compat branch (data, not grammar): existing updated_by_user_id values predate the grammar — raw unprefixed Shopify user ids (embedded saves, settings_routes.py:88) and the literal "system" (webhook writers, webhook_service.py:702). The shared helper exposes a separate classify_legacy(value) for reading historical data only: no colon and not "system" → shopify_user; "system" → system writer. New writes never produce these forms — canonical actor_ids are always prefixed.

4.7 What does NOT change

Ecom routes (/api/v1/indexes/* etc.) keep authenticate_api_key_request untouched. Session/CLI tokens are rejected there (they don't decrypt as keys) — by design: a leaked storefront token cannot touch indexes; §9 includes an explicit test for this.
One deliberate exception: ecom_routes.get_account (routes/ecom_routes.py:122-141, currently on authenticate_api_key_request) is the single ecom route that swaps to the unified dependency (with settings:read), because the editor needs GET /api/v1/account to build the preview credential. It is listed in PR1's route-annotation table; every other ecom route is untouched.
The Shopify embedded app auth (authenticate_shopify_request), webhooks, app proxy: untouched.
Search preview credential (x-marqo-index-id): untouched.

5. Backward compatibility & deprecation path

Phase A (land scopes, no enforcement change): unified dependency on storefront routes; legacy keys → all scopes except tokens:manage. Existing UI, scripts, agents: zero change. Log principal_type + scope-mismatch metrics (CloudWatch, dimension on principal type).
Phase B (SSO default): storefront_admin login page defaults to SSO; "Use API key instead" remains as a secondary link (it exercises the legacy path end-to-end, which doubles as the break-glass path). CLI docs/scripts switch to tokens.
Phase C (ratchet): when metrics show legacy-key traffic ≈ 0 on storefront routes (target: 30 consecutive days), flip STOREFRONT_LEGACY_KEYS env from allow → warn (response header + log) → deny per environment, staging first. The flag is read at request time; flipping back is instant (lockout antidote). Raw keys on ecom routes are unaffected forever (out of scope).
Never remove the API-key code path in v1 of this project; removal is a separate decision after Phase C holds in prod.

6. Local development story

Hippodrome already runs fake_cognito (local Cognito replacement issuing JWTs, port 9012) + the Controller + admin_server (components/hippodrome/AGENTS.md:76,134,152). Point COGNITO_*/CONTROLLER_BASE_URL at the local stack; admin_server's JWKS URL is env-configurable so it can verify fake_cognito-issued tokens. Full SSO loop testable offline.
Against real backends: .dev.vars with staging Controller/Cognito values (as the cognito-login branch documents in wrangler.toml comments).
Escape hatch: API-key login stays available in the UI in all envs through Phase B, so local dev against prod data (ADMIN_SERVER_BASE_URL=https://admin.ecom.marqo.ai, per README) keeps working with no auth infra at all.
No prod-weakening backdoors: any local-only bypass vars (e.g. a pre-made session secret) must be guarded by a Zod .refine that rejects them when ENV ∉ {local, dev} — the exact pattern admin_worker enforces for LOCAL_CF_ACCESS_TOKEN (admin_worker/app/.server/env.ts:36-38). Server-side mirrors: admin_server refuses a session signing key from plain env (vs Secrets Manager) unless FULL_ENV ∈ {local, dev, test} — a hard gate that raises at startup, never a logged fallback. The existing MARQO_API_KEY_SECRET env fallback (utils/api_key_utils.py:150-159) has exactly the laxity we are avoiding (env fallback with no env check, reachable in prod if Secrets Manager errors); do not copy it, and file a follow-up to env-gate it (noted in §11).

7. Rollout (zero lockout)

Step	Change	Lockout risk & mitigation
1	admin_server: scopes module, unified dependency (legacy=all-scopes), session/token endpoints, secrets	None — pure addition; legacy path byte-identical
2	Controller: adopt cognito-login branch commits (redirect allowlist, Cognito app-client callbacks for storefront origins, CORS entries)	None — console unaffected; new origins only
3	storefront_admin: SSO login + cookie sessions + proxy, API-key login kept; same PR fixes admin_server CORS (explicit origin allowlist, drop `allow_credentials` — nothing uses cookies against admin_server) so cookies and the CORS fix ship atomically	If SSO breaks → users click "Use API key instead" (old flow, fully server-side legacy path). CORS: an unknown-origin log-only report runs during steps 1–2 so the allowlist is data-backed before enforcement
4	CLI tokens UI + script env-var support	None — `--api-key` still accepted
5	Phase C ratchet (allow→warn→deny), staging→prod	Flag is runtime-flippable; deny only after 30 quiet days; ecom routes never affected

Break-glass: legacy API-key auth (request: same as today) works at every step until the final deny; even at deny, flipping the env var back restores it without a deploy (env var change = ECS/Lambda config update, minutes). Signing-key rotation: ES256 kid header + verifiers accepting current and previous public keys means rotation never invalidates the fleet instantly unless intended (pulling both keys at once is the deliberate kill switch).

8. Cross-plan interfaces

(Proposals messaged to all three teammates 2026-06-10; reconciled against their published plans — theme-targeted-deploys.md, settings-versioning.md, settings-concurrency-control.md — same day.)

8.1 theme-deploys (task #4) — agreed with their published plan (messages exchanged 2026-06-10)

Scope names adopted from their plan: settings:write and settings:deploy_live (required for any mutation of the live record sk=SETTINGS or the live search_settings metafield). The flags are independent — deploy-without-edit is a valid reviewer persona. My earlier themes:deploy-live proposal is superseded.

Operation → scope matrix (locked):

Operation	Required scopes
Staged (theme-targeted) save; theme-record delete	`settings:write`
Direct live save (no `theme_id` — today's `POST .../settings`)	`settings:write` + `settings:deploy_live`
`POST .../settings/deploy` (promote staged → live)	`settings:deploy_live` only
`POST .../settings/deploy/rollback` (task #4)	`settings:write` + `settings:deploy_live` — same rule as restore-to-live: puts non-head content live
Restore-to-live (task #2)	`settings:write` + `settings:deploy_live` — stricter than promote on purpose: restore chooses arbitrary historical content; promote moves the single staged head

Enforcement: they declare Depends(require_scopes(...)) + resolve_storefront_shop; their plan states scoped-token enforcement "activates when the auth plan lands; until then the endpoint is full-access-API-key-only" — consistent with §5 Phase A here.
Their open question 2 ("enforce immediately for scoped tokens vs grace period") is answered here: enforce immediately for scoped tokens; legacy keys exempt (all scopes except tokens:manage) during the transition. No grace-period ambiguity because scoped tokens are new — nothing existing breaks.
Defaults: OWNER/MEMBER sessions carry settings:deploy_live (their UI confirm is the interactive gate); MERCHANDISER sessions do not (role→scope mapping, §4.4A — post-review addition 2026-06-11); CLI tokens exclude it unless explicitly granted with a warning (§4.5). No interface change for them: the scope check is identical regardless of how the caller came to hold (or not hold) the scope.
Audit: StorefrontAuthContext.actor_id/actor_display/principal_type available for their deploy records (they stamp actor_id into deployed_by).
Token claims: confirmed no per-token theme_id allowlist in v1 — theme targeting is request-level; the deploy gate is scope + confirm dialog.
Status: confirmed both ways 2026-06-10/11; matrix pinned in their plan's Cross-plan interfaces section.

8.2 settings-versioning (task #2) — agreed (messages exchanged 2026-06-10)

Author shape (their final form, accepted): version records store two fields — author_id = the canonical actor string (user:<cognito_sub> | token:<token_id> | api_key:<system_account_id> | shopify_user:<shopify_user_id>; principal type derivable from the prefix) and author_display (email / token name, denormalized at write time, nullable for legacy api_key callers). Their earlier structured-object draft is superseded. StorefrontAuthContext additionally exposes principal_type/principal_id as convenience fields, but the version-record interface is just the two strings. (Embedded-app saves keep their Shopify-user identity, populated from ShopifyAuthRequestContext — outside this plan's surface.)
updated_by_user_id on ShopifySettings continues to receive the flat actor_id string — legacy writers produce today's exact api_key:{system_account_id} value, so no migration and no breakage for existing consumers. Their version capture copies it.
Reads (version list/get/diff) record no actor; restore is a write and gets the same context — the restoring actor becomes the new version's author (event_type=restore, their concern).
They import the prefix↔type mapping from admin_server/models/auth.py (never re-derived). Grammar update (round-2 review, re-coordinated and confirmed 2026-06-11): version authors use the uniformly prefixed form — embedded-app authors are normalized to shopify_user:{id} at capture time (§4.6); the classify_legacy helper covers historical raw values only. They confirmed adoption and updated their doc in all three relevant spots (model comment, cross-plan section, dependencies) as a post-approval interface correction; their backfill stamps the script principal as author, so classify_legacy is only needed if historical updated_by_user_id values are ever attributed. The webhook writers' literal "system" can never appear as a version author on their surface (infra writes go through their non-capturing path), so no special case is needed there.
Restore scoping (agreed): restore requires the same scopes as the equivalent save against the same target — live: settings:write + settings:deploy_live (identical to the plain live POST under §8.1 semantics); staged (post task #4): settings:write. No separate restore scope.

8.3 settings-concurrency (task #1) — agreed (messages exchanged 2026-06-10; recorded in their plan §11)

Storage: their write path persists updated_by_user_id = actor_id (same attribute as today, richer values — no schema migration) plus a new optional updated_by_display = actor_display. settings-versioning derives its author pair from the same two attributes — one identity scheme feeds both plans.
Payload shape (theirs, accepted): flat fields shared by GET and 409 — GET returns updatedBy (actor_id) + updatedByDisplay + lastUpdated; the 409 detail adds expectedVersion, currentVersion, changeSource. (My earlier nested conflicting_writer proposal is superseded — same information, flatter shape.)
Pre-SSO degradation (agreed): when the conflicting actor_id equals the caller's own api_key:{system_account_id}, the conflict dialog says "saved from another session of this account" instead of naming a writer; real identities light up automatically once SSO lands.
change_source (storefront_admin | script | ...) is set by the route/caller, not by the auth context — no auth work needed.
session_id (JWT jti) stays on the context; not required for their v1, persistable later without contract change.
Sequencing: until StorefrontAuthContext replaces ApiKeyAuthRequestContext on storefront routes (PR1 here), their Phase 1 stores the legacy api_key:{system_account_id} string as actor_id-compatible attribution; the context swap is then a one-line change per write path on their side, no contract change.

9. Test plan

admin_server (pants test, moto for DDB/Secrets Manager):

Credential classification: legacy key → legacy path (existing tests keep passing unmodified — that's the back-compat proof); mqsft_ → token path; session JWT → session path; garbage → 401.
Session issuance: valid Cognito token + ACTIVE member account → JWT with correct claims; tampered/expired/wrong-audience Cognito token → 401; non-member account_id → 403; PENDING membership → 403; unknown/future role value → 403 (fail closed); Controller membership call failure → 502, never a silent default (fail fast).
Role→scope mapping: OWNER and MEMBER sessions carry settings:deploy_live; a MERCHANDISER session does not — its live POST .../settings → 403, and its attempt to mint a CLI token requesting settings:deploy_live → 403 (subset rule composes with role mapping).
Session verification: expired / tampered / wrong-key-signed / unknown-kid / token_use-mismatch matrix; previous-public-key (rotation window) still verifies.
Session refresh: valid session → new JWT with same claims, fresh exp/jti, preserved orig_iat; expired session → 401; refresh past orig_iat + 24h → 401 (forced re-login); refresh never extends scopes/shops.
CLI tokens: create (scope ⊆ session scopes enforced — escalation attempt → 403; tokens:manage ungrantable; expiry required and capped), verify (wrong secret constant-time-fails, revoked → 401, expired → 401), list never returns hashes, revoke is immediate (no cache).
Scope enforcement: per-route 403 matrix (read-only token POSTs settings → 403; no-deploy token hits deploy route → 403); shop scoping (token for shop A → shop B → 403 even when same account).
Identity propagation: updated_by_user_id receives each credential type's actor string (this also locks the tasks #1/#2 interface).
get_fields works for all three credential types (cell_id resolution — the regression trap from §4.6).
Ecom-route rejection (locks the §4.7 isolation property, currently true but untested): a session JWT and an mqsft_ token presented to /api/v1/indexes/... (e.g. DELETE) → 401; ecom_routes.get_account is the only ecom route accepting them.
Shop-list parity gate (PR2, C3): integration test asserting GET /shops under a minted session equals GET /shops under a legacy API key of the same account.
Non-default-cell account: a member of an account whose AccountData.cell_id is not the default cell gets a session carrying that cell_id, and get_fields (storefront_routes.py:299-301) resolves the correct data-plane key with it — guards the §4.2 join and the §4.6 cell note together.
CORS (PR5): preflight from an allowlisted origin passes; unknown origin gets no CORS headers; allow_credentials is off.
last_used_at: concurrent token uses → exactly one conditional update within the hour window (no ConditionalCheckFailed surfacing to the caller).

storefront_admin (vitest): session cookie parse/verify helpers (expired/garbage → null), loader guard redirects, proxy route (path allowlist, header promotion, cookie-less → 401, cross-site Origin → 403), auth-client (extend the branch's existing auth-client.test.ts), token-management UI state. OAuth state cookie roundtrip.

E2E (CI, optionally Hippodrome + fake_cognito): SSO login → editor loads → save settings → version/author visible; create CLI token in UI → storefront_settings.py backup/push with MARQO_STOREFRONT_TOKEN → revoke → same command 401s; legacy --api-key still works end-to-end.

Security checks (part of code review, not automated): no token/secret ever logged (request_logging review); secret displayed once; constant-time compares; cookie flags.

10. Work breakdown (suggested PR sequence)

PR1 — admin_server auth substrate: scopes registry + canonical actor grammar (admin_server/models/auth.py), StorefrontAuthContext, unified dependency wrapping legacy path, route annotations incl. ecom_routes.get_account (legacy = all scopes except tokens:manage ⇒ no behavioral change), metrics, unknown-origin CORS log-only report. Independently shippable.
PR2 — Controller: memberships endpoint + branch adoption: the committed GET /account/memberships view (Cognito-token-authenticated, returns {account_id, system_account_id, role, cell_id} per membership — C2) plus cherry-pick/rebase of the worktree-feat+cognito-login Controller commits (sso redirect validation, config) — coordinate with that branch's owner. Includes the account_id ↔ system_account_id mapping spike; merge gate: the shop-list parity test (§9) must pass against staging (C3).
PR3 — session issuance (admin_server): Cognito JWKS verifier, membership client, POST /auth/session + /auth/session/refresh, ES256 key plumbing (Secrets Manager private key, public-key distribution, kid rotation), cell_id persistence. Depends on PR2.
PR4 — CLI tokens backend: token model/repo, CRUD endpoints, verification path.
PR5 — storefront_admin SSO + CORS fix (atomic): login UI (reusing branch UI work), callback route, cookie session + silent refresh, loader guards, proxy route, API-key fallback link; same PR: admin_server CORS allowlist + drop allow_credentials (C1 — cookies never ship while allow_origins=["*"] + allow_credentials=True stands).
PR6 — tokens UI + CLI: tokens page; storefront_settings.py env-var support; docs (CSS customization guide, integration playbooks).
PR7 — hardening: deprecation flag (allow→warn→deny), dashboards, scope-mismatch alarms.

Each PR follows CLAUDE.md Definition of Done (tests for new branches/paths land with the PR, not after).

11. Open questions / verification items for implementation

~~Membership endpoint~~ — resolved: committed PR2 deliverable (GET /account/memberships, §4.2 step 2). The cell_id location question is answered: it lives on AccountData (users_accounts.py:178), not membership records, so the view joins memberships → AccountData per account. Remaining for PR2: exact serializer shape only.
~~account_id ↔ system_account_id~~ — resolved process-wise: PR2 design spike with the shop-list parity merge gate (§4.2 step 3, §9). The factual answer is discovered, not assumed.
Coordinate with the owner of worktree-feat+cognito-login (branch is ~review-complete; this plan supersedes its localStorage approach but adopts everything else).
theme-deploys scope naming confirmed both ways (§8.1: settings:deploy_live); no rename expected before PR1 lands the registry.
Follow-up issue (out of scope, S2): env-gate the existing MARQO_API_KEY_SECRET fallback in utils/api_key_utils.py:150-159 the way the new signing-key gate works.
Follow-up (named, post task #4): revisit whether MERCHANDISER sessions should get settings:deploy_live — or a staged-write-only workflow — once staged saves exist and MERCH is no longer effectively read-only on settings (§4.4A). Aligns with the Controller's standing RBAC TODO (feature_flagging_checks.py:126).

Summary​

1. Current state (verified, with evidence)​

1.1 storefront_admin (the editor)​

1.2 admin_server (the API)​

1.3 Console identity (the thing to reuse)​

1.4 In-flight prior art: worktree-feat+cognito-login​

1.5 CLI / agent usage today​

1.6 Internal-staff prior art (not reused, for contrast)​

2. Threat model​

3. Goals and non-goals​

4. Recommended design​

4.1 Why reuse the console Cognito identity (and not invent anything)​

4.2 SSO login flow​

4.3 Session handling in the RR7 worker​

4.4 Token design​

4.5 Scope model​

4.6 Authorization enforcement in admin_server​

4.7 What does NOT change​

5. Backward compatibility & deprecation path​

6. Local development story​

7. Rollout (zero lockout)​

8. Cross-plan interfaces​

8.1 theme-deploys (task #4) — agreed with their published plan (messages exchanged 2026-06-10)​

8.2 settings-versioning (task #2) — agreed (messages exchanged 2026-06-10)​

8.3 settings-concurrency (task #1) — agreed (messages exchanged 2026-06-10; recorded in their plan §11)​

9. Test plan​

10. Work breakdown (suggested PR sequence)​

11. Open questions / verification items for implementation​