Storefront Integration Automation Architecture

Design for automating Shopify storefront integration work — CSS customization, feature development, third-party app integration, and search tuning — using Claude Code primitives (skills, agent teams, workflows, subagents) as composable building blocks.

Status: Historical design doc

This was the planning document for the orchestration system. The system has since been BUILT and has drifted from this design in places (role names, hooks, loop defaults). The operational source of truth is AGENTS.md plus the agent definitions under .claude/agents/, the skills under .claude/skills/, and the workflows under .claude/workflows/. Known deltas: independent-reviewer / theme-inspector were never created under those names (the realized roles are finisher and the implementer/verifier pairs); the TODO checklist at the bottom is stale — most items exist on disk; inline-iterate (not the scripted workflow) is now the default teammate loop (see docs/dev/orchestration/iteration-patterns.md). Read this doc for design rationale, not current behavior.

Primitives and How They Compose

Each Claude Code primitive has a strength. We use each where it fits:

  +------------------+------------------------------------------+
  | Primitive        | Our use                                  |
  +------------------+------------------------------------------+
  | Skill            | Entry points + coordinator behavior      |
  |                  | /integrate-storefront                    |
  |                  | /grill-me                                |
  +------------------+------------------------------------------+
  | Agent Team       | Parallel coordination layer              |
  |                  | Lead = coordinator, teammates = workers  |
  |                  | Shared task list, inter-agent messaging   |
  |                  | Hooks enforce phase transitions           |
  +------------------+------------------------------------------+
  | Workflow         | Deterministic iteration loops             |
  |                  | Workflow A: single-section CSS loop       |
  |                  | Workflow C: search tuning iteration       |
  |                  | Script controls retry, not agent judgment |
  +------------------+------------------------------------------+
  | Subagent         | Quick focused tasks within any of above  |
  |                  | Single screenshot, GraphQL query,        |
  |                  | theme token extraction                   |
  +------------------+------------------------------------------+
  | Subagent Defs    | Reusable teammate/worker roles           |
  | (.claude/agents) | css-implementer, independent-reviewer,   |
  |                  | search-tuner, theme-inspector            |
  +------------------+------------------------------------------+

How they nest

  User invokes /integrate-storefront
  |
  | SKILL (coordinator instructions)
  v
  COORDINATOR (your session, becomes team lead)
  |
  | Setup phase: uses SUBAGENTS for quick tasks
  | (GraphQL queries, screenshot references)
  |
  | Dispatches work via AGENT TEAM:
  |
  +---> Spawns teammates from SUBAGENT DEFINITIONS
  |     |
  |     | Teammate "cards" --> runs WORKFLOW A (cards, browser1)
  |     | Teammate "filters" --> runs WORKFLOW A (filters, browser2)
  |     | Teammate "paging" --> runs WORKFLOW A (paging, browser3)
  |     | Teammate "heading" --> runs WORKFLOW A (heading, browser4)
  |     |
  |     | Teammates CAN message each other:
  |     |   "I changed heading height, check your grid layout"
  |     |
  |     | HOOKS enforce task transitions
  |     |
  |     | After all workflows done:
  |     |   Spawns independent-reviewer teammate
  |     |
  |     | Escalations logged to ${SHARED_WORKSPACE}/escalations/*.json
  |     |   Clear scope --> lead spawns Workflow B teammate
  |     |   Unclear scope --> lead invokes /grill-me with human
  |
  +---> Workflow C teammate for search tuning (if needed)
  |
  +---> Workflow B teammate for feature PRs (from escalations)
  |
  v
  Lead collates outputs, gates prod changes with human

Infrastructure (validated 2026-06-09)

  .mcp.json                          .claude/settings.json
  +--------------------------+       +----------------------------+
  | browser1 (--isolated)    |       | playwright plugin          |
  | browser2 (--isolated)    |       | (coordinator's browser)    |
  | browser3 (--isolated)    |       +----------------------------+
  | browser4 (--isolated)    |
  +--------------------------+       scripts/ecom/
                                     +----------------------------+
  docs/integrations/AGENTS.md        | shopify_graphql.py         |
  +--------------------------+       | storefront_settings.py     |
  | Single source of truth   |       | inject_marqo_blocks.py     |
  | for all agent work       |       +----------------------------+
  +--------------------------+

Parallel Playwright MCP

4 named @playwright/mcp instances with --isolated flag, registered in .mcp.json. Each spawns a separate Chrome process with an ephemeral profile — no shared state. Validated working 2026-06-09 with @playwright/mcp@0.0.75.

Ships with Phase 2 (skills implementation), not this planning doc. The .mcp.json is project-scoped and auto-discovered by Claude Code for every developer — it should only land alongside the skills that consume it.

{
  "mcpServers": {
    "browser1": { "command": "npx", "args": ["-y", "@playwright/mcp@0.0.75", "--isolated"] },
    "browser2": { "command": "npx", "args": ["-y", "@playwright/mcp@0.0.75", "--isolated"] },
    "browser3": { "command": "npx", "args": ["-y", "@playwright/mcp@0.0.75", "--isolated"] },
    "browser4": { "command": "npx", "args": ["-y", "@playwright/mcp@0.0.75", "--isolated"] }
  }
}

Tools are namespaced: mcp__browser1__browser_navigate, mcp__browser2__browser_navigate, etc. Subagents inherit all MCP tools. Each agent is prompted to use ONLY its assigned browserN.

Coexists with the existing Playwright plugin (mcp__plugin_playwright_playwright__*) which the coordinator keeps for interactive human work.

Hooks (enforcement layer)

Configured in .claude/settings.json. Shell scripts that enforce work unit transitions.

TaskCompleted — main gate. Runs when any teammate marks a task done.

  For [IMPLEMENT] tasks (all types):
    - ${SHARED_WORKSPACE}/escalations/{task-name}.json exists (even if empty [])

  For [IMPLEMENT] CSS tasks (cards, filters, paging, heading):
    - ${SHARED_WORKSPACE}/css/{section}.css exists (CSS output)
    - ${SHARED_WORKSPACE}/screenshots/{section}-* exists (native theme screenshots)

  For [FINISH] tasks:
    - ${SHARED_WORKSPACE}/screenshots/final-* exists (per-element verification screenshots)
    - ${SHARED_WORKSPACE}/verification-report.json exists (per-section pass/fail)

  For [INTEGRATION-CODE] tasks:
    - ${SHARED_WORKSPACE}/escalations/{task-name}.json exists (blockers/scope/clarifications)

  Exit 2 with feedback listing all missing artifacts.

TaskCreated — validates new tasks follow conventions.

  For [INTEGRATION-CODE] or [INTEGRATION-VERIFY] tasks:
    - Check matching escalation file exists in ${SHARED_WORKSPACE}/escalations/
    - Verify escalation has real content (not empty [])
    - Prevents phantom tasks that didn't come from a real escalation
    Exit 2 if no matching escalation

TeammateIdle — nudges teammates before going idle.

  No integration workspace: exit 0 (not an integration session)
  CSS sections with no CSS output yet: exit 2 (pending implement tasks)
  CSS exists but no verification report: exit 2 (pending finish task)
  All done: exit 0 (allow idle, remind to run cleanup)

Subagent definitions (`.claude/agents/`)

Reusable teammate roles with tool restrictions and browser assignments:

css-implementer.md — writes CSS, self-verifies, logs escalations. NO admin API access, NO prod push.
independent-reviewer.md — cross-checks all CSS sections. Separate context, didn't write the CSS.
search-tuner.md — builds + tests config overrides via header. NO DDB writes.

Universal Work Unit

Every workflow type — CSS, features, integrations, search tuning — follows the same loop:

  +----------+    +----------+    +-----------+    +-------------+
  | CLARIFY  |--->|   PLAN   |--->|   GATE    |--->|  IMPLEMENT  |
  |  specs   |    |          |    |           |    |             |
  +----------+    +----------+    | auto:     |    +------+------+
                                  |  specific |          |
                                  |  & scoped |          v
                                  |           |    +-------------+
                                  | /grill-me:|    | INDEPENDENT |
                                  |  vague or |    |   REVIEW    |
                                  |  new feat |    | (separate   |
                                  +-----------+    |  agent)     |
                                                   +------+------+
                                                          |
                                                          v
                                                   +-------------+
                                               +-->|   VERIFY    |
                                               |   | (targeted   |
                                               |   |  element    |
                                               |   |  screenshot |
                                               |   |  + compare) |
                                               |   +------+------+
                                               |          |
                                               |     pass | fail
                                               |          |   |
                                               |          |   +-> iterate
                                               |          |       (max 3)
                                               |          |          |
                                               |          |    still failing?
                                               |          |          |
                                               |          |          v
                                               |          |    "unresolved gap"
                                               |          |    with evidence
                                               |          |          |
                                               +----------+----------+
                                                          |
                                                          v
                                                   +-------------+
                                                   |   OUTPUT    |
                                                   | CSS candidate|
                                                   | or raised PR |
                                                   | or gap report|
                                                   +-------------+

System Architecture: Agent Team

The coordinator (your session) becomes the team lead. It spawns teammates for parallel work, manages the shared task list, and gates all production changes.

                          +-----------+
                          |   HUMAN   |
                          +-----+-----+
                                |
     ===========================+================================
     ||                                                         ||
     ||  /integrate-storefront           ||
     ||  COORDINATOR = TEAM LEAD (main session)                 ||
     ||                                                         ||
     ||  Owns: plugin Playwright (interactive, human-facing)    ||
     ||                                                         ||
     ||  Does:                          Does NOT:               ||
     ||  - Interface with human         - Write CSS             ||
     ||  - Setup & inspect phase        - Make code changes     ||
     ||  - Spawn teammates              - Push to prod without  ||
     ||  - Manage shared task list        human approval        ||
     ||  - Triage escalations           - Own browser1-4        ||
     ||  - Invoke /grill-me             - Run iteration loops   ||
     ||  - Collate + gate prod pushes                           ||
     ||                                                         ||
     ||=========================================================||
     ||                                                         ||
     ||  PHASE 1: SETUP (lead does directly)                    ||
     ||  +- Backup settings (storefront_settings.py)            ||
     ||  +- Create/update merchant doc from template            ||
     ||  +- Query theme tokens (shopify_graphql.py + Playwright)||
     ||  +- Map native filters -> Marqo fields                  ||
     ||  +- Identify scope -> create tasks on shared list       ||
     ||                                                         ||
     ||  PHASE 2: SPAWN TEAM + CSS WORK                         ||
     ||  +- Spawn teammates from subagent definitions           ||
     ||  +- Assign browserN per teammate                        ||
     ||  +- Require plan approval for IMPLEMENT tasks           ||
     ||  +- Each teammate runs Workflow A for their section      ||
     ||                                                         ||
     ||  PHASE 3: HANDLE ESCALATIONS (during Phase 2)           ||
     ||  +- Read ${SHARED_WORKSPACE}/escalations/*.json as they arrive          ||
     ||  +- Clear scope (app in AGENTS.md) -->                  ||
     ||  |    spawn new teammate for Workflow B                  ||
     ||  +- Unclear scope -->                                   ||
     ||  |    /grill-me with human, then dispatch or defer       ||
     ||  +- Message other teammates if escalation affects them   ||
     ||  |    "Skip this feature entirely, don't style it"       ||
     ||                                                         ||
     ||  PHASE 4: COLLATE & GATE                                ||
     ||  +- CSS candidate --> show diff to human --> push on ok ||
     ||  +- Feature PR -----> link to human for review          ||
     ||  +- Config diff -----> show comparison to human         ||
     ||  +- Present escalation log:                             ||
     ||       "These features were found but not implemented:    ||
     ||        - Okendo reviews (Workflow B dispatched)          ||
     ||        - Custom badge X (deferred, needs /grill-me)"    ||
     ||                                                         ||
     =============================================================

Workflow A: CSS (single section, deterministic loop)

Workflow A is always single-section, single-browser. The agent team handles fan-out (spawning N teammates); the workflow script handles the iteration loop.

A tweak is a team of 1 teammate running Workflow A for one section. Initial setup is a team of 4 teammates, each running Workflow A for their section.

Workflow A script (what the runtime executes)

  Input (from team lead via task):
  - Section spec (e.g., "cards", "filters", "paging", "heading")
  - Assigned browserN
  - Merchant doc (design tokens, native theme reference)
  - Current settings backup

  Script:
    // Implementor inspects native theme
    nativeRef = agent("css-implementer", { phase: "inspect", section, browserN })

    for (i = 0; i < 3; i++):
      // Implementor writes CSS (separate agent from verifier)
      css = agent("css-implementer", {
        phase: "implement", section, merchantDoc, nativeRef, previousFeedback
      })

      // Verifier evaluates (separate agent — did not write the CSS)
      // Injects CSS into browser, screenshots, compares to native
      result = agent("css-verifier", {
        css, nativeRef, marqoUrl, elementSelector, browserN
      })

      if (result.pass): break
      previousFeedback = result.feedback

    return { css, screenshots: result.screenshots,
             verification: result,
             escalations: readFile("${SHARED_WORKSPACE}/escalations/{section}.json"),
             iterations: i + 1 }

What the teammate does vs what the script does

  +---------------------------+---------------------------+
  | Implementor agent         | Verifier agent            |
  +---------------------------+---------------------------+
  | Inspect native theme      | Inject CSS into browser   |
  | Write CSS                 | Screenshot elements       |
  | Identify escalations      | Compare native vs Marqo   |
  | NEVER verifies own work   | Decide pass/fail          |
  |                           | Describe gaps             |
  |                           | NEVER writes CSS          |
  +---------------------------+---------------------------+

  Workflow script (orchestration):
  | Controls iteration count                            |
  | Passes verifier feedback to implementor             |
  | Decides when to give up                             |
  | Formats final output                                |

End-to-end team flow (how Workflow A fits into the team)

  Input from coordinator:
  - Merchant doc (design tokens, filter mapping)
  - Scope: "initial setup" (fan out 4) or "fix: {problem}" (single agent)
  - Current settings backup + reference screenshots

  +-----------------------------------------------------------------------+
  |                                                                       |
  |  IF initial setup:                 IF tweak:                          |
  |                                                                       |
  |  +-------------+ +-------------+  +------------------+               |
  |  |  Agent 1    | |  Agent 2    |  |  Single Agent    |               |
  |  |  Cards+Grid | |  Filters    |  |  "fix: {desc}"   |               |
  |  |  browser1   | |  browser2   |  |  browser1        |               |
  |  +------+------+ +------+------+  +--------+---------+               |
  |  +------+------+ +------+------+           |                         |
  |  |  Agent 3    | |  Agent 4    |           |                         |
  |  |  Paging+Sort| |  Heading+   |           |                         |
  |  |  browser3   | |  Layout     |           |                         |
  |  +------+------+ |  browser4   |           |                         |
  |         |        +------+------+           |                         |
  |         |               |                  |                         |
  |         | Each runs Workflow A:             |                         |
  |         | 1. Inspect native (own browser)  |                         |
  |         | 2. Write CSS                     |                         |
  |         | 3. Inject CSS into browser,      |                         |
  |         |    screenshot, compare to native |                         |
  |         | 4. Iterate on feedback (max 3)   |                         |
  |         |    (workflow script controls      |                         |
  |         |     when to retry / give up)     |                         |
  |         | 5. Output: CSS text + screenshots|                         |
  |         |                                  |                         |
  |         +---------------+------------------+                         |
  |                         |                                            |
  |                         v                                            |
  |          +--------------+---------------+                            |
  |          |       COLLATE (lead)         |                            |
  |          | Merge all CSS in order:      |                            |
  |          | heading -> layout -> cards ->|                            |
  |          | filters -> paging -> sort -> |                            |
  |          | count -> states             |                            |
  |          +--------------+---------------+                            |
  |                         |                                            |
  |                         v                                            |
  |          +--------------+---------------+                            |
  |          |   FINISHER TEAMMATE          |                            |
  |          |   (didn't write any CSS)     |                            |
  |          |                              |                            |
  |          |   Uses any free browserN     |                            |
  |          |                              |                            |
  |          |   1. Inject FULL collated    |                            |
  |          |      CSS into browser        |                            |
  |          |   2. Screenshot + verify     |                            |
  |          |      EVERY element           |                            |
  |          |   3. Fix integration issues  |                            |
  |          |      (z-index, specificity,  |                            |
  |          |       layout conflicts)      |                            |
  |          |   4. Own iterate loop        |                            |
  |          |      (max 3 rounds)          |                            |
  |          |   5. Output: final CSS +     |                            |
  |          |      per-section pass/fail   |                            |
  |          +--------------+---------------+                            |
  |                         |                                            |
  |                         v                                            |
  |   OUTPUT to coordinator:                                              |
  |   - Final CSS candidate (ready to push)                              |
  |   - Per-section verification report                                  |
  |   - Fixes applied + unresolved gaps                                  |
  |                                                                       |
  |   Meanwhile (parallel with CSS work):                                 |
  |   INTEGRATION teammates may be running Workflow B for                  |
  |   escalated features (e.g., Okendo PR). These run                     |
  |   independently — the PR can be written while CSS work                |
  |   continues. After human deploys the integration, the SAME            |
  |   teammate verifies it and outputs any CSS patches needed.            |
  |                                                                       |
  +-----------------------------------------------------------------------+

Escalation Model

When a teammate finds a feature that can't be replicated with CSS (e.g., review stars, wishlist buttons, custom badges), it does NOT attempt workarounds.

Teammate behavior (enforced by subagent definition + tool restrictions)

  1. Skip the feature entirely — no placeholder, no workaround
  2. Write structured escalation to ${SHARED_WORKSPACE}/escalations/{section}.json:
     {
       "section": "cards",
       "feature": "Okendo review stars",
       "location": "product card, below title",
       "type": "integration",        // integration | config_change | code_change
       "screenshot": "${SHARED_WORKSPACE}/screenshots/native-card-okendo.png",
       "notes": "Stars show 1-5 rating + review count, loads via JS"
     }
  3. Message lead: "Escalation logged: {feature}. See ${SHARED_WORKSPACE}/escalations/{section}.json"
  4. Continue CSS work for everything in scope

  Teammates CANNOT (tool restrictions in subagent def):
  - Push to prod (no admin API access)
  - Enable integrations
  - Write to DDB
  - Modify application code

Lead behavior (on receiving escalation)

  1. Read ${SHARED_WORKSPACE}/escalations/{section}.json

  2. GATE: Is scope clear?
     +-- YES (app documented in AGENTS.md) ------+
     |   Spawn new teammate for Workflow B        |
     |   (integration implementation)             |
     +--------------------------------------------+
     +-- NO (novel app, unclear scope) ----------+
     |   Invoke /grill-me with human:             |
     |   "Teammate found {feature}. Options:      |
     |    A) Integrate now  B) Defer  C) Other"   |
     |   Human decides -> dispatch or log          |
     +--------------------------------------------+

  3. Message other teammates if relevant:
     "Skip {feature} entirely if you see it in your section"

  4. After all CSS work completes, present escalation summary:
     "CSS complete. These features were found but not implemented:
      - Okendo reviews (Workflow B dispatched / deferred)
      - Custom badge X (deferred, needs human decision)
      Human decisions needed on deferred items."

Workflow B: Feature / Integration PR (code changes)

  Input from coordinator:
  - Feature spec (e.g., "integrate Okendo reviews for shop X")
  - Relevant AGENTS.md section (per-app guide)
  - Current codebase state

  +-----------------------------------------------------------------------+
  |                                                                       |
  |  CLARIFY                                                              |
  |  +- Read per-app guide from AGENTS.md                                 |
  |  +- Identify which components to touch:                               |
  |     +- storefront_search (TypeScript bundle)                          |
  |     +- search_proxy (Cloudflare Worker)                               |
  |     +- admin_server (Python/FastAPI)                                  |
  |     +- admin_ui (React)                                               |
  |     +- extensions (Liquid theme blocks)                               |
  |  +- Check if changes span multiple components                         |
  |                                                                       |
  |  PLAN                                                                 |
  |  +- Implementation plan with file list per component                  |
  |  +- Test plan (which test suites to run)                              |
  |  +- UI impact assessment (does this need browser verification?)       |
  |                                                                       |
  |  GATE                                                                 |
  |  +- auto: well-documented app in AGENTS.md with clear guide           |
  |  +- /grill-me -> coordinator -> human: novel integration,             |
  |     unclear scope, or architectural decision needed                   |
  |                                                                       |
  |  IMPLEMENT (in isolated worktree)                                     |
  |  |                                                                    |
  |  |  IF single component:           IF multi-component:                |
  |  |                                                                    |
  |  |  +----------------+            +--------------+ +--------------+   |
  |  |  | Single Agent   |            | Agent 1      | | Agent 2      |   |
  |  |  | All changes    |            | storefront   | | search_proxy |   |
  |  |  | in one worktree|            | _search      | | changes      |   |
  |  |  | browser1       |            | browser1     | | browser2     |   |
  |  |  +-------+--------+            +------+-------+ +------+-------+   |
  |  |          |                            |                |           |
  |  |          |                            +-------+--------+           |
  |  |          |                                    |                    |
  |  |          +------------------------------------+                    |
  |  |                         |                                          |
  |  |                         v                                          |
  |  |          +--------------+---------------+                          |
  |  |          |  INDEPENDENT REVIEW          |                          |
  |  |          |  (separate agent)            |                          |
  |  |          |                              |                          |
  |  |          |  Reads the diff, checks:     |                          |
  |  |          |  - Code quality              |                          |
  |  |          |  - CLAUDE.md compliance      |                          |
  |  |          |  - No unintended side effects|                          |
  |  |          |  - Test coverage for new     |                          |
  |  |          |    behavior                  |                          |
  |  |          +--------------+---------------+                          |
  |  |                         |                                          |
  |  |                    pass | changes requested                        |
  |  |                         |      |                                   |
  |  |                         |      v                                   |
  |  |                         |   Fix agent addresses feedback           |
  |  |                         |   (max 3 rounds, then escalate           |
  |  |                         |    to coordinator)                       |
  |  |                         |      |                                   |
  |  |                         <------+                                   |
  |  |                         |                                          |
  |  |                         v                                          |
  |  |          +--------------+---------------+                          |
  |  |          |       VERIFY                 |                          |
  |  |          |                              |                          |
  |  |          |  Run test suites:            |                          |
  |  |          |  +- storefront_search:       |                          |
  |  |          |  |  cd components/shopify/   |                          |
  |  |          |  |  storefront_search &&     |                          |
  |  |          |  |  npm run build && npm test|                          |
  |  |          |  +- search_proxy:            |                          |
  |  |          |  |  cd components/           |                          |
  |  |          |  |  search_proxy && npm test |                          |
  |  |          |  +- admin_server:            |                          |
  |  |          |  |  pants test               |                          |
  |  |          |  |  //components/shopify/    |                          |
  |  |          |  |  admin_server::           |                          |
  |  |          |  +- ruff / biome formatting  |                          |
  |  |          |                              |                          |
  |  |          |  If UI change:               |                          |
  |  |          |  +- Screenshot targeted      |                          |
  |  |          |     element via browserN     |                          |
  |  |          |  +- Compare before/after     |                          |
  |  |          +--------------+---------------+                          |
  |  |                         |                                          |
  |  |                    pass | fail                                     |
  |  |                         |   |                                      |
  |  |                         |   +-> fix + re-verify (max 3)            |
  |  |                         |           |                              |
  |  |                         |      still failing?                      |
  |  |                         |           v                              |
  |  |                         |   Escalate to coordinator                |
  |  |                         |   with failure details                   |
  |  |                         |           |                              |
  |  |                         <-----------+                              |
  |  |                         |                                          |
  |  |                         v                                          |
  |  |   OUTPUT:                                                          |
  |  |   +- Raised PR via /raise (branch, commit, push, PR, reviewers)   |
  |  |   +- Run /monitor-pr to poll for auto-reviewer feedback,           |
  |  |   |  address comments, and iterate until approved                  |
  |  |   +- PR link returned to coordinator                              |
  |  |   +- OR: gap report if unresolved issues                          |
  |  |                                                                    |
  +-----------------------------------------------------------------------+

  Third-party app examples and which components they touch:

  +-------------------+----------------------------------------------+
  | Integration       | Components affected                          |
  +-------------------+----------------------------------------------+
  | Okendo reviews    | storefront_search (hydration, card template) |
  |                   | admin_server (settings: product_reviews)     |
  +-------------------+----------------------------------------------+
  | Swym wishlist     | storefront_search (wishlist button, events)  |
  |                   | admin_server (settings: wishlist_button)     |
  +-------------------+----------------------------------------------+
  | Selly/TDF badges  | storefront_search (selly anchor, hydration)  |
  |                   | admin_server (settings: integrations.selly)  |
  +-------------------+----------------------------------------------+
  | Videowise video   | admin_server (grid injection config)         |
  | tiles             | (no code changes — config-only via API)      |
  +-------------------+----------------------------------------------+
  | Quickshop modal   | storefront_search (DOM event bridge only)    |
  |                   | (merchant's theme handles the modal)         |
  +-------------------+----------------------------------------------+
  | New search field  | search_proxy (score modifiers, field config) |
  | or facet          | admin_server (index settings, field mapping) |
  |                   | storefront_search (filter UI if new facet)   |
  +-------------------+----------------------------------------------+

Workflow C: Search Tuning

Data flow

  Normal path (production):
  Admin API  --->  DDB (EcomIndexSettingsTable)  --> Exporter Lambda --> KV --> Proxy
                   (source of truth)                 (DDB Streams)      (60s TTL)

  Override path (testing — used by this workflow):
  Agent sends search request with x-marqo-settings-override header
  --> Proxy merges override onto base KV settings per-request
  --> Results reflect the override WITHOUT writing to DDB
  --> Nothing changes in production

Workflow C agents NEVER write to DDB. They use the x-marqo-settings-override header (PR #3140) to test config changes per-request. The override is ephemeral — it only affects that single request. Production settings are untouched until the coordinator (with human approval) applies the change via the admin API.

Override header reference

  Header: x-marqo-settings-override
  Value: JSON object

  {
    "mode": "deep",           // "replace" | "shallow" | "deep" (default)
    "settings": {             // partial or full MarqoSettings object
      "search_config": {
        "hybridParameters": {
          "scoreModifiers": {
            "add_to_score": [
              { "field_name": "outOfStock", "weight": -2.0 }
            ]
          }
        }
      }
    }
  }

  Merge modes:
  - "replace" — discard base settings, use override as full settings
  - "shallow" — Object.assign override onto base (top-level keys)
  - "deep" — recursively merge (objects merged, arrays replaced)

  Result is re-parsed through MarqoSettings schema. Malformed input = 400.

  Input from coordinator:
  - What to tune (relevance, sort, facets, boosting, etc.)
  - Current search behavior (example queries + results)
  - Desired behavior (what should rank higher/lower)

  +-----------------------------------------------------------------------+
  |                                                                       |
  |  CLARIFY                                                              |
  |  +- Identify tuning type:                                             |
  |  |                                                                    |
  |  |  +-------------------+------------------------------------------+  |
  |  |  | Type              | DDB field path                           |  |
  |  |  |                   | (override via x-marqo-settings-override) |  |
  |  |  +-------------------+------------------------------------------+  |
  |  |  | Score modifiers   | search_config.hybridParameters           |  |
  |  |  |                   | .scoreModifiers.add_to_score[]           |  |
  |  |  +-------------------+------------------------------------------+  |
  |  |  | Query overrides   | search_config.queryOverrides[]           |  |
  |  |  | (boost/bury)      |                                          |  |
  |  |  +-------------------+------------------------------------------+  |
  |  |  | Sort mapping      | storefront_search: constants.ts          |  |
  |  |  |                   | (code change — delegate to Workflow B)   |  |
  |  |  +-------------------+------------------------------------------+  |
  |  |  | Facet config      | Storefront settings API                  |  |
  |  |  | (storefront       | filter_config.value.items                |  |
  |  |  |  filters)         | (NOT overridable — delegate to coord)   |  |
  |  |  +-------------------+------------------------------------------+  |
  |  |  | Collection config | collections_config.{collection_name}     |  |
  |  |  +-------------------+------------------------------------------+  |
  |  |  | Relevance mode    | relevance_search, relevance_sort         |  |
  |  |  +-------------------+------------------------------------------+  |
  |  |  | Merchandising     | add_docs_config.merchandising_fields[]   |  |
  |  |  | fields            |                                          |  |
  |  |  +-------------------+------------------------------------------+  |
  |  |  | Personalization   | for_you_config, personalization_enabled  |  |
  |  |  | (for-you)         |                                          |  |
  |  |  +-------------------+------------------------------------------+  |
  |  |                                                                    |
  |  +- Gather baseline: run example queries via search proxy             |
  |     and record current result ordering (no override header)           |
  |                                                                       |
  |  PLAN                                                                 |
  |  +- Read current config via admin API:                                |
  |  |  GET /api/v1/indexes/{index_name}/config                           |
  |  +- Build proposed override JSON                                      |
  |  +- List example queries to test                                      |
  |  +- Define expected impact per query                                  |
  |                                                                       |
  |  GATE                                                                 |
  |  +- auto: testing via override header is always safe                  |
  |  |  (no production impact — override is per-request only)             |
  |  +- /grill-me: if scope is unclear (e.g., "make search better")       |
  |                                                                       |
  |  IMPLEMENT + VERIFY (iterative — agent tests its own changes)         |
  |  |                                                                    |
  |  |  IF overridable config:              IF code change (new sort      |
  |  |  (score modifiers, query overrides,  field, new facet UI):         |
  |  |  relevance, collections config,                                    |
  |  |  merchandising, personalization):    Delegate to Workflow B        |
  |  |                                      (Feature/Integration PR)     |
  |  |  +----------------------------+                                    |
  |  |  | Single Agent               |                                    |
  |  |  |                            |                                    |
  |  |  | 1. Read current config     |                                    |
  |  |  |    GET /api/v1/indexes/    |                                    |
  |  |  |    {index}/config          |                                    |
  |  |  |                            |                                    |
  |  |  | 2. Build override JSON     |                                    |
  |  |  |                            |                                    |
  |  |  | 3. Test via search proxy:  |                                    |
  |  |  |    POST /api/v1/indexes/   |                                    |
  |  |  |    {index}/search          |                                    |
  |  |  |    Headers:                |                                    |
  |  |  |      x-marqo-settings-    |                                    |
  |  |  |      override: {override} |                                    |
  |  |  |    Body: {example query}   |                                    |
  |  |  |                            |                                    |
  |  |  | 4. Compare results to      |                                    |
  |  |  |    baseline (no override)  |                                    |
  |  |  |                            |                                    |
  |  |  | 5. Check:                  |                                    |
  |  |  |    - Target products moved |                                    |
  |  |  |      as expected?          |                                    |
  |  |  |    - Unrelated queries     |                                    |
  |  |  |      not degraded?         |                                    |
  |  |  |    - Facet counts correct? |                                    |
  |  |  |    - Sort ordering right?  |                                    |
  |  |  |                            |                                    |
  |  |  | 6. If not right, adjust    |                                    |
  |  |  |    override and re-test    |                                    |
  |  |  |    (iterate, max 5 rounds  |                                    |
  |  |  |     — no prod risk)        |                                    |
  |  |  +-------------+--------------+                                    |
  |  |                |                                                   |
  |  |                v                                                   |
  |  |                                                                    |
  |  |  If UI change (new facet visible in storefront):                   |
  |  |  +- Screenshot filter sidebar via browserN                         |
  |  |  +- Compare to current                                             |
  |  |                                                                    |
  |  |  OUTPUT to coordinator:                                            |
  |  |  +- Validated config override (JSON)                               |
  |  |  +- Before/after query comparison report                           |
  |  |  +- The override was tested N times with real queries              |
  |  |  +- Ready-to-apply API call:                                       |
  |  |  |  PUT /api/v1/indexes/{index}/config                             |
  |  |  |  { ...validated changes }                                       |
  |  |  +- Rollback values (current config snapshot)                      |
  |  |                                                                    |
  |  |  NOTE: the agent does NOT call PUT /config.                        |
  |  |  Only the coordinator (with human approval) writes to DDB.         |
  |  |                                                                    |
  +-----------------------------------------------------------------------+

  The coordinator then:
  1. Shows the validated diff + query comparison report to the human
  2. The config was already tested via override header — results are real
  3. On approval, coordinator applies via PUT /api/v1/indexes/{index}/config
     (writes to DDB -> streams to KV -> live in ~60s)
  4. Optionally re-runs queries to confirm live behavior matches override test

  Example: adding an out-of-stock demotion

  Baseline (no override):
    POST /search  {"q": "red shoes"}
    -> Result: [Nike Air (in stock), Adidas Run (OOS), Puma Slide (in stock)]

  Test with override:
    POST /search
    Headers: x-marqo-settings-override: {
      "mode": "deep",
      "settings": {
        "search_config": {
          "hybridParameters": {
            "scoreModifiers": {
              "add_to_score": [
                {"field_name": "onSale", "weight": 0.5},
                {"field_name": "outOfStock", "weight": -2.0}
              ]
            }
          }
        }
      }
    }
    Body: {"q": "red shoes"}
    -> Result: [Nike Air (in stock), Puma Slide (in stock), Adidas Run (OOS)]
    -> OOS item demoted as expected. No override written to DDB.

  Spot-check unrelated query:
    POST /search (same override)  {"q": "gift card"}
    -> Result: [unchanged — gift cards have no stock field]
    -> No degradation confirmed.

  Agent outputs validated override + comparison report to coordinator.
  Coordinator shows to human, human approves, coordinator applies to DDB.

/grill-me Skill

Structured clarification when auto-gate can't approve. Invoked by coordinator, runs interactively with human.

Input:

What we're trying to do
What's ambiguous (specific questions)
Options identified (if any)
Evidence (screenshots, current vs native behavior)

Output:

Structured spec with clear acceptance criteria
Returned to coordinator who re-dispatches to the appropriate workflow

Examples:

"Native theme has quickshop modal. Replicate via DOM events or skip? [screenshot]"
"Store uses Okendo AND Judge.me. Which review provider?"
"12 native filters, 8 Marqo fields match. Which 4 to drop?"

Browser Allocation

  .mcp.json (4x @playwright/mcp --isolated)
  Each instance: separate Chrome process, ephemeral profile

  +----------------------------------------------------------------+
  |                                                                |
  |  Coordinator                                                   |
  |  +- mcp__plugin_playwright_playwright__*                       |
  |     (interactive, shows things to human)                       |
  |                                                                |
  |  Subagent work:                                                |
  |  +- Agent 1 --> mcp__browser1__*  (exclusive)                  |
  |  +- Agent 2 --> mcp__browser2__*  (exclusive)                  |
  |  +- Agent 3 --> mcp__browser3__*  (exclusive)                  |
  |  +- Agent 4 --> mcp__browser4__*  (exclusive)                  |
  |  |                                                             |
  |  Independent reviewer / fix agents:                            |
  |  +- Uses whichever browserN is free                            |
  |     (implementation agents are done by this point)             |
  |                                                                |
  |  Rule: prompt each agent with                                  |
  |  "Use ONLY mcp__browserN__* tools"                             |
  |                                                                |
  +----------------------------------------------------------------+

Verification Standards (per-element, not viewport)

Full-page screenshots are useless for verifying small CSS changes. Each subtask defines WHAT ELEMENT to screenshot and compare.

Subtask	Element selector	Viewports
Product cards	`.marqo-product-card`	1440, 375
Filter sidebar	`.marqo-filter-sidebar`	1440
Filter drawer	`.marqo-filter-drawer`	375
Pagination	`.marqo-pagination`	1440, 375
Sort dropdown	`.marqo-sort-select`	1440
Results count	`.marqo-results-count`	1440
Page heading	`.marqo-collection-header`	1440, 375
No results	`.marqo-no-results`	1440
CTA hover	`.marqo-product-card:hover`	1440

Each agent screenshots:

The element on the NATIVE theme (reference)
The element on the MARQO theme (current)
Compares visually and reports pass/fail + description

Docs Structure: Current vs Needed

Existing docs (keep, reference from AGENTS.md)

Doc	Purpose	Action
`docs/integrations/README.md`	Quickstart workflow (8 steps)	Update to reference AGENTS.md
`docs/integrations/storefront-css-customization-guide.md`	CSS how-to, agent prompts	Keep as reference
`docs/integrations/shopify-collection-blocks-guide.md`	Block injection	Keep as reference
`docs/integrations/msqc.md`	Per-merchant reference	Keep
`docs/integrations/muji.md`	Per-merchant reference	Keep
`docs/dev/shopify-styling-architecture.md`	Three-layer CSS system	Keep as reference
`docs/dev/storefront-extensibility.md`	DOM events, grid injections	Keep as reference
`docs/dev/customer-theme-matching-guide.md`	Theme analysis prompts	Consolidate into AGENTS.md
`docs/dev/agent-workflow-guide.md`	Parallel planning patterns	Consolidate into AGENTS.md
`docs/runbooks/diagnostics/components/shopify.md`	GraphQL, metafields, diagnostics	Keep
`docs/runbooks/diagnostics/components/search-proxy.md`	Search proxy diagnostics	Keep

New artifacts to create

Artifact	Type	Purpose
`docs/integrations/AGENTS.md`	Doc	Single source of truth for all agent integration work
`.claude/agents/css-implementer.md`	Subagent def	Teammate: write CSS, self-verify, log escalations. Tool-restricted.
`.claude/agents/independent-reviewer.md`	Subagent def	Teammate: cross-check all sections, separate context
`.claude/agents/search-tuner.md`	Subagent def	Teammate: build + test config overrides, no DDB writes
`.claude/skills/grill-me/SKILL.md`	Skill	Structured human clarification
`.claude/skills/integrate-storefront/SKILL.md`	Skill	Team lead behavior (scope determines fan-out)
Workflow A	Workflow script	Single-section CSS implement+verify loop
Workflow C	Workflow script	Search tuning iteration with override header
TaskCompleted hook	Hook	Validate phase outputs before task completion
TaskCreated hook	Hook	Validate escalation file exists for integration tasks
TeammateIdle hook	Hook	Redirect idle teammates to unclaimed tasks
`.mcp.json`	Config	4 pinned isolated Playwright instances

TODO List

Phase 0: Infrastructure

scripts/ecom/shopify_graphql.py — reusable Shopify GraphQL helper
Shopify diagnostics docs updated with script reference
Parallel Playwright MCP validated (4 isolated instances, 2026-06-09, @playwright/mcp@0.0.75)

Storefront Integration Automation Architecture

Status: Historical design doc

Primitives and How They Compose

How they nest

Infrastructure (validated 2026-06-09)

Parallel Playwright MCP

Hooks (enforcement layer)

Subagent definitions (`.claude/agents/`)

Universal Work Unit

System Architecture: Agent Team

Workflow A: CSS (single section, deterministic loop)

Workflow A script (what the runtime executes)

What the teammate does vs what the script does

End-to-end team flow (how Workflow A fits into the team)

Escalation Model

Teammate behavior (enforced by subagent definition + tool restrictions)

Lead behavior (on receiving escalation)

Workflow B: Feature / Integration PR (code changes)

Workflow C: Search Tuning

Data flow

Override header reference

/grill-me Skill

Browser Allocation

Verification Standards (per-element, not viewport)

Docs Structure: Current vs Needed

Existing docs (keep, reference from AGENTS.md)

New artifacts to create

TODO List

Phase 0: Infrastructure

Phase 1: Documentation

Phase 2: Subagent Definitions + Skills + Config

Phase 3: Workflows

Phase 4: Testing & Validation

Phase 5: Iteration

Status: Historical design doc​

Primitives and How They Compose​

How they nest​

Infrastructure (validated 2026-06-09)​

Parallel Playwright MCP​

Hooks (enforcement layer)​

Subagent definitions (.claude/agents/)​

Universal Work Unit​

System Architecture: Agent Team​

Workflow A: CSS (single section, deterministic loop)​

Workflow A script (what the runtime executes)​

What the teammate does vs what the script does​

End-to-end team flow (how Workflow A fits into the team)​

Escalation Model​

Teammate behavior (enforced by subagent definition + tool restrictions)​

Lead behavior (on receiving escalation)​

Workflow B: Feature / Integration PR (code changes)​

Workflow C: Search Tuning​

Data flow​

Override header reference​

/grill-me Skill​

Browser Allocation​

Verification Standards (per-element, not viewport)​

Docs Structure: Current vs Needed​

Existing docs (keep, reference from AGENTS.md)​

New artifacts to create​

TODO List​

Phase 0: Infrastructure​

Phase 1: Documentation​

Phase 2: Subagent Definitions + Skills + Config​

Phase 3: Workflows​

Phase 4: Testing & Validation​

Phase 5: Iteration​

Status: Historical design doc

Primitives and How They Compose

How they nest

Infrastructure (validated 2026-06-09)

Parallel Playwright MCP

Hooks (enforcement layer)

Subagent definitions (`.claude/agents/`)

Universal Work Unit

System Architecture: Agent Team

Workflow A: CSS (single section, deterministic loop)

Workflow A script (what the runtime executes)

What the teammate does vs what the script does

End-to-end team flow (how Workflow A fits into the team)

Escalation Model

Teammate behavior (enforced by subagent definition + tool restrictions)

Lead behavior (on receiving escalation)

Workflow B: Feature / Integration PR (code changes)

Workflow C: Search Tuning

Data flow

Override header reference

/grill-me Skill

Browser Allocation

Verification Standards (per-element, not viewport)

Docs Structure: Current vs Needed

Existing docs (keep, reference from AGENTS.md)

New artifacts to create

TODO List

Phase 0: Infrastructure

Phase 1: Documentation

Phase 2: Subagent Definitions + Skills + Config

Phase 3: Workflows

Phase 4: Testing & Validation

Phase 5: Iteration