Skip to main content

Storefront Integration Automation Architecture

Design for automating Shopify storefront integration work — CSS customization, feature development, third-party app integration, and search tuning — using Claude Code primitives (skills, agent teams, workflows, subagents) as composable building blocks.

Status: Historical design doc

This was the planning document for the orchestration system. The system has since been BUILT and has drifted from this design in places (role names, hooks, loop defaults). The operational source of truth is AGENTS.md plus the agent definitions under .claude/agents/, the skills under .claude/skills/, and the workflows under .claude/workflows/. Known deltas: independent-reviewer / theme-inspector were never created under those names (the realized roles are finisher and the implementer/verifier pairs); the TODO checklist at the bottom is stale — most items exist on disk; inline-iterate (not the scripted workflow) is now the default teammate loop (see docs/dev/orchestration/iteration-patterns.md). Read this doc for design rationale, not current behavior.


Primitives and How They Compose

Each Claude Code primitive has a strength. We use each where it fits:

+------------------+------------------------------------------+
| Primitive | Our use |
+------------------+------------------------------------------+
| Skill | Entry points + coordinator behavior |
| | /integrate-storefront |
| | /grill-me |
+------------------+------------------------------------------+
| Agent Team | Parallel coordination layer |
| | Lead = coordinator, teammates = workers |
| | Shared task list, inter-agent messaging |
| | Hooks enforce phase transitions |
+------------------+------------------------------------------+
| Workflow | Deterministic iteration loops |
| | Workflow A: single-section CSS loop |
| | Workflow C: search tuning iteration |
| | Script controls retry, not agent judgment |
+------------------+------------------------------------------+
| Subagent | Quick focused tasks within any of above |
| | Single screenshot, GraphQL query, |
| | theme token extraction |
+------------------+------------------------------------------+
| Subagent Defs | Reusable teammate/worker roles |
| (.claude/agents) | css-implementer, independent-reviewer, |
| | search-tuner, theme-inspector |
+------------------+------------------------------------------+

How they nest

User invokes /integrate-storefront
|
| SKILL (coordinator instructions)
v
COORDINATOR (your session, becomes team lead)
|
| Setup phase: uses SUBAGENTS for quick tasks
| (GraphQL queries, screenshot references)
|
| Dispatches work via AGENT TEAM:
|
+---> Spawns teammates from SUBAGENT DEFINITIONS
| |
| | Teammate "cards" --> runs WORKFLOW A (cards, browser1)
| | Teammate "filters" --> runs WORKFLOW A (filters, browser2)
| | Teammate "paging" --> runs WORKFLOW A (paging, browser3)
| | Teammate "heading" --> runs WORKFLOW A (heading, browser4)
| |
| | Teammates CAN message each other:
| | "I changed heading height, check your grid layout"
| |
| | HOOKS enforce task transitions
| |
| | After all workflows done:
| | Spawns independent-reviewer teammate
| |
| | Escalations logged to ${SHARED_WORKSPACE}/escalations/*.json
| | Clear scope --> lead spawns Workflow B teammate
| | Unclear scope --> lead invokes /grill-me with human
|
+---> Workflow C teammate for search tuning (if needed)
|
+---> Workflow B teammate for feature PRs (from escalations)
|
v
Lead collates outputs, gates prod changes with human

Infrastructure (validated 2026-06-09)

.mcp.json .claude/settings.json
+--------------------------+ +----------------------------+
| browser1 (--isolated) | | playwright plugin |
| browser2 (--isolated) | | (coordinator's browser) |
| browser3 (--isolated) | +----------------------------+
| browser4 (--isolated) |
+--------------------------+ scripts/ecom/
+----------------------------+
docs/integrations/AGENTS.md | shopify_graphql.py |
+--------------------------+ | storefront_settings.py |
| Single source of truth | | inject_marqo_blocks.py |
| for all agent work | +----------------------------+
+--------------------------+

Parallel Playwright MCP

4 named @playwright/mcp instances with --isolated flag, registered in .mcp.json. Each spawns a separate Chrome process with an ephemeral profile — no shared state. Validated working 2026-06-09 with @playwright/mcp@0.0.75.

Ships with Phase 2 (skills implementation), not this planning doc. The .mcp.json is project-scoped and auto-discovered by Claude Code for every developer — it should only land alongside the skills that consume it.

{
"mcpServers": {
"browser1": { "command": "npx", "args": ["-y", "@playwright/mcp@0.0.75", "--isolated"] },
"browser2": { "command": "npx", "args": ["-y", "@playwright/mcp@0.0.75", "--isolated"] },
"browser3": { "command": "npx", "args": ["-y", "@playwright/mcp@0.0.75", "--isolated"] },
"browser4": { "command": "npx", "args": ["-y", "@playwright/mcp@0.0.75", "--isolated"] }
}
}

Tools are namespaced: mcp__browser1__browser_navigate, mcp__browser2__browser_navigate, etc. Subagents inherit all MCP tools. Each agent is prompted to use ONLY its assigned browserN.

Coexists with the existing Playwright plugin (mcp__plugin_playwright_playwright__*) which the coordinator keeps for interactive human work.

Hooks (enforcement layer)

Configured in .claude/settings.json. Shell scripts that enforce work unit transitions.

TaskCompleted — main gate. Runs when any teammate marks a task done.

For [IMPLEMENT] tasks (all types):
- ${SHARED_WORKSPACE}/escalations/{task-name}.json exists (even if empty [])

For [IMPLEMENT] CSS tasks (cards, filters, paging, heading):
- ${SHARED_WORKSPACE}/css/{section}.css exists (CSS output)
- ${SHARED_WORKSPACE}/screenshots/{section}-* exists (native theme screenshots)

For [FINISH] tasks:
- ${SHARED_WORKSPACE}/screenshots/final-* exists (per-element verification screenshots)
- ${SHARED_WORKSPACE}/verification-report.json exists (per-section pass/fail)

For [INTEGRATION-CODE] tasks:
- ${SHARED_WORKSPACE}/escalations/{task-name}.json exists (blockers/scope/clarifications)

Exit 2 with feedback listing all missing artifacts.

TaskCreated — validates new tasks follow conventions.

For [INTEGRATION-CODE] or [INTEGRATION-VERIFY] tasks:
- Check matching escalation file exists in ${SHARED_WORKSPACE}/escalations/
- Verify escalation has real content (not empty [])
- Prevents phantom tasks that didn't come from a real escalation
Exit 2 if no matching escalation

TeammateIdle — nudges teammates before going idle.

No integration workspace: exit 0 (not an integration session)
CSS sections with no CSS output yet: exit 2 (pending implement tasks)
CSS exists but no verification report: exit 2 (pending finish task)
All done: exit 0 (allow idle, remind to run cleanup)

Subagent definitions (.claude/agents/)

Reusable teammate roles with tool restrictions and browser assignments:

  • css-implementer.md — writes CSS, self-verifies, logs escalations. NO admin API access, NO prod push.
  • independent-reviewer.md — cross-checks all CSS sections. Separate context, didn't write the CSS.
  • search-tuner.md — builds + tests config overrides via header. NO DDB writes.

Universal Work Unit

Every workflow type — CSS, features, integrations, search tuning — follows the same loop:

+----------+ +----------+ +-----------+ +-------------+
| CLARIFY |--->| PLAN |--->| GATE |--->| IMPLEMENT |
| specs | | | | | | |
+----------+ +----------+ | auto: | +------+------+
| specific | |
| & scoped | v
| | +-------------+
| /grill-me:| | INDEPENDENT |
| vague or | | REVIEW |
| new feat | | (separate |
+-----------+ | agent) |
+------+------+
|
v
+-------------+
+-->| VERIFY |
| | (targeted |
| | element |
| | screenshot |
| | + compare) |
| +------+------+
| |
| pass | fail
| | |
| | +-> iterate
| | (max 3)
| | |
| | still failing?
| | |
| | v
| | "unresolved gap"
| | with evidence
| | |
+----------+----------+
|
v
+-------------+
| OUTPUT |
| CSS candidate|
| or raised PR |
| or gap report|
+-------------+

System Architecture: Agent Team

The coordinator (your session) becomes the team lead. It spawns teammates for parallel work, manages the shared task list, and gates all production changes.

+-----------+
| HUMAN |
+-----+-----+
|
===========================+================================
|| ||
|| /integrate-storefront ||
|| COORDINATOR = TEAM LEAD (main session) ||
|| ||
|| Owns: plugin Playwright (interactive, human-facing) ||
|| ||
|| Does: Does NOT: ||
|| - Interface with human - Write CSS ||
|| - Setup & inspect phase - Make code changes ||
|| - Spawn teammates - Push to prod without ||
|| - Manage shared task list human approval ||
|| - Triage escalations - Own browser1-4 ||
|| - Invoke /grill-me - Run iteration loops ||
|| - Collate + gate prod pushes ||
|| ||
||=========================================================||
|| ||
|| PHASE 1: SETUP (lead does directly) ||
|| +- Backup settings (storefront_settings.py) ||
|| +- Create/update merchant doc from template ||
|| +- Query theme tokens (shopify_graphql.py + Playwright)||
|| +- Map native filters -> Marqo fields ||
|| +- Identify scope -> create tasks on shared list ||
|| ||
|| PHASE 2: SPAWN TEAM + CSS WORK ||
|| +- Spawn teammates from subagent definitions ||
|| +- Assign browserN per teammate ||
|| +- Require plan approval for IMPLEMENT tasks ||
|| +- Each teammate runs Workflow A for their section ||
|| ||
|| PHASE 3: HANDLE ESCALATIONS (during Phase 2) ||
|| +- Read ${SHARED_WORKSPACE}/escalations/*.json as they arrive ||
|| +- Clear scope (app in AGENTS.md) --> ||
|| | spawn new teammate for Workflow B ||
|| +- Unclear scope --> ||
|| | /grill-me with human, then dispatch or defer ||
|| +- Message other teammates if escalation affects them ||
|| | "Skip this feature entirely, don't style it" ||
|| ||
|| PHASE 4: COLLATE & GATE ||
|| +- CSS candidate --> show diff to human --> push on ok ||
|| +- Feature PR -----> link to human for review ||
|| +- Config diff -----> show comparison to human ||
|| +- Present escalation log: ||
|| "These features were found but not implemented: ||
|| - Okendo reviews (Workflow B dispatched) ||
|| - Custom badge X (deferred, needs /grill-me)" ||
|| ||
=============================================================

Workflow A: CSS (single section, deterministic loop)

Workflow A is always single-section, single-browser. The agent team handles fan-out (spawning N teammates); the workflow script handles the iteration loop.

A tweak is a team of 1 teammate running Workflow A for one section. Initial setup is a team of 4 teammates, each running Workflow A for their section.

Workflow A script (what the runtime executes)

Input (from team lead via task):
- Section spec (e.g., "cards", "filters", "paging", "heading")
- Assigned browserN
- Merchant doc (design tokens, native theme reference)
- Current settings backup

Script:
// Implementor inspects native theme
nativeRef = agent("css-implementer", { phase: "inspect", section, browserN })

for (i = 0; i < 3; i++):
// Implementor writes CSS (separate agent from verifier)
css = agent("css-implementer", {
phase: "implement", section, merchantDoc, nativeRef, previousFeedback
})

// Verifier evaluates (separate agent — did not write the CSS)
// Injects CSS into browser, screenshots, compares to native
result = agent("css-verifier", {
css, nativeRef, marqoUrl, elementSelector, browserN
})

if (result.pass): break
previousFeedback = result.feedback

return { css, screenshots: result.screenshots,
verification: result,
escalations: readFile("${SHARED_WORKSPACE}/escalations/{section}.json"),
iterations: i + 1 }

What the teammate does vs what the script does

+---------------------------+---------------------------+
| Implementor agent | Verifier agent |
+---------------------------+---------------------------+
| Inspect native theme | Inject CSS into browser |
| Write CSS | Screenshot elements |
| Identify escalations | Compare native vs Marqo |
| NEVER verifies own work | Decide pass/fail |
| | Describe gaps |
| | NEVER writes CSS |
+---------------------------+---------------------------+

Workflow script (orchestration):
| Controls iteration count |
| Passes verifier feedback to implementor |
| Decides when to give up |
| Formats final output |

End-to-end team flow (how Workflow A fits into the team)

Input from coordinator:
- Merchant doc (design tokens, filter mapping)
- Scope: "initial setup" (fan out 4) or "fix: {problem}" (single agent)
- Current settings backup + reference screenshots

+-----------------------------------------------------------------------+
| |
| IF initial setup: IF tweak: |
| |
| +-------------+ +-------------+ +------------------+ |
| | Agent 1 | | Agent 2 | | Single Agent | |
| | Cards+Grid | | Filters | | "fix: {desc}" | |
| | browser1 | | browser2 | | browser1 | |
| +------+------+ +------+------+ +--------+---------+ |
| +------+------+ +------+------+ | |
| | Agent 3 | | Agent 4 | | |
| | Paging+Sort| | Heading+ | | |
| | browser3 | | Layout | | |
| +------+------+ | browser4 | | |
| | +------+------+ | |
| | | | |
| | Each runs Workflow A: | |
| | 1. Inspect native (own browser) | |
| | 2. Write CSS | |
| | 3. Inject CSS into browser, | |
| | screenshot, compare to native | |
| | 4. Iterate on feedback (max 3) | |
| | (workflow script controls | |
| | when to retry / give up) | |
| | 5. Output: CSS text + screenshots| |
| | | |
| +---------------+------------------+ |
| | |
| v |
| +--------------+---------------+ |
| | COLLATE (lead) | |
| | Merge all CSS in order: | |
| | heading -> layout -> cards ->| |
| | filters -> paging -> sort -> | |
| | count -> states | |
| +--------------+---------------+ |
| | |
| v |
| +--------------+---------------+ |
| | FINISHER TEAMMATE | |
| | (didn't write any CSS) | |
| | | |
| | Uses any free browserN | |
| | | |
| | 1. Inject FULL collated | |
| | CSS into browser | |
| | 2. Screenshot + verify | |
| | EVERY element | |
| | 3. Fix integration issues | |
| | (z-index, specificity, | |
| | layout conflicts) | |
| | 4. Own iterate loop | |
| | (max 3 rounds) | |
| | 5. Output: final CSS + | |
| | per-section pass/fail | |
| +--------------+---------------+ |
| | |
| v |
| OUTPUT to coordinator: |
| - Final CSS candidate (ready to push) |
| - Per-section verification report |
| - Fixes applied + unresolved gaps |
| |
| Meanwhile (parallel with CSS work): |
| INTEGRATION teammates may be running Workflow B for |
| escalated features (e.g., Okendo PR). These run |
| independently — the PR can be written while CSS work |
| continues. After human deploys the integration, the SAME |
| teammate verifies it and outputs any CSS patches needed. |
| |
+-----------------------------------------------------------------------+

Escalation Model

When a teammate finds a feature that can't be replicated with CSS (e.g., review stars, wishlist buttons, custom badges), it does NOT attempt workarounds.

Teammate behavior (enforced by subagent definition + tool restrictions)

1. Skip the feature entirely — no placeholder, no workaround
2. Write structured escalation to ${SHARED_WORKSPACE}/escalations/{section}.json:
{
"section": "cards",
"feature": "Okendo review stars",
"location": "product card, below title",
"type": "integration", // integration | config_change | code_change
"screenshot": "${SHARED_WORKSPACE}/screenshots/native-card-okendo.png",
"notes": "Stars show 1-5 rating + review count, loads via JS"
}
3. Message lead: "Escalation logged: {feature}. See ${SHARED_WORKSPACE}/escalations/{section}.json"
4. Continue CSS work for everything in scope

Teammates CANNOT (tool restrictions in subagent def):
- Push to prod (no admin API access)
- Enable integrations
- Write to DDB
- Modify application code

Lead behavior (on receiving escalation)

1. Read ${SHARED_WORKSPACE}/escalations/{section}.json

2. GATE: Is scope clear?
+-- YES (app documented in AGENTS.md) ------+
| Spawn new teammate for Workflow B |
| (integration implementation) |
+--------------------------------------------+
+-- NO (novel app, unclear scope) ----------+
| Invoke /grill-me with human: |
| "Teammate found {feature}. Options: |
| A) Integrate now B) Defer C) Other" |
| Human decides -> dispatch or log |
+--------------------------------------------+

3. Message other teammates if relevant:
"Skip {feature} entirely if you see it in your section"

4. After all CSS work completes, present escalation summary:
"CSS complete. These features were found but not implemented:
- Okendo reviews (Workflow B dispatched / deferred)
- Custom badge X (deferred, needs human decision)
Human decisions needed on deferred items."

Workflow B: Feature / Integration PR (code changes)

Input from coordinator:
- Feature spec (e.g., "integrate Okendo reviews for shop X")
- Relevant AGENTS.md section (per-app guide)
- Current codebase state

+-----------------------------------------------------------------------+
| |
| CLARIFY |
| +- Read per-app guide from AGENTS.md |
| +- Identify which components to touch: |
| +- storefront_search (TypeScript bundle) |
| +- search_proxy (Cloudflare Worker) |
| +- admin_server (Python/FastAPI) |
| +- admin_ui (React) |
| +- extensions (Liquid theme blocks) |
| +- Check if changes span multiple components |
| |
| PLAN |
| +- Implementation plan with file list per component |
| +- Test plan (which test suites to run) |
| +- UI impact assessment (does this need browser verification?) |
| |
| GATE |
| +- auto: well-documented app in AGENTS.md with clear guide |
| +- /grill-me -> coordinator -> human: novel integration, |
| unclear scope, or architectural decision needed |
| |
| IMPLEMENT (in isolated worktree) |
| | |
| | IF single component: IF multi-component: |
| | |
| | +----------------+ +--------------+ +--------------+ |
| | | Single Agent | | Agent 1 | | Agent 2 | |
| | | All changes | | storefront | | search_proxy | |
| | | in one worktree| | _search | | changes | |
| | | browser1 | | browser1 | | browser2 | |
| | +-------+--------+ +------+-------+ +------+-------+ |
| | | | | |
| | | +-------+--------+ |
| | | | |
| | +------------------------------------+ |
| | | |
| | v |
| | +--------------+---------------+ |
| | | INDEPENDENT REVIEW | |
| | | (separate agent) | |
| | | | |
| | | Reads the diff, checks: | |
| | | - Code quality | |
| | | - CLAUDE.md compliance | |
| | | - No unintended side effects| |
| | | - Test coverage for new | |
| | | behavior | |
| | +--------------+---------------+ |
| | | |
| | pass | changes requested |
| | | | |
| | | v |
| | | Fix agent addresses feedback |
| | | (max 3 rounds, then escalate |
| | | to coordinator) |
| | | | |
| | <------+ |
| | | |
| | v |
| | +--------------+---------------+ |
| | | VERIFY | |
| | | | |
| | | Run test suites: | |
| | | +- storefront_search: | |
| | | | cd components/shopify/ | |
| | | | storefront_search && | |
| | | | npm run build && npm test| |
| | | +- search_proxy: | |
| | | | cd components/ | |
| | | | search_proxy && npm test | |
| | | +- admin_server: | |
| | | | pants test | |
| | | | //components/shopify/ | |
| | | | admin_server:: | |
| | | +- ruff / biome formatting | |
| | | | |
| | | If UI change: | |
| | | +- Screenshot targeted | |
| | | element via browserN | |
| | | +- Compare before/after | |
| | +--------------+---------------+ |
| | | |
| | pass | fail |
| | | | |
| | | +-> fix + re-verify (max 3) |
| | | | |
| | | still failing? |
| | | v |
| | | Escalate to coordinator |
| | | with failure details |
| | | | |
| | <-----------+ |
| | | |
| | v |
| | OUTPUT: |
| | +- Raised PR via /raise (branch, commit, push, PR, reviewers) |
| | +- Run /monitor-pr to poll for auto-reviewer feedback, |
| | | address comments, and iterate until approved |
| | +- PR link returned to coordinator |
| | +- OR: gap report if unresolved issues |
| | |
+-----------------------------------------------------------------------+

Third-party app examples and which components they touch:

+-------------------+----------------------------------------------+
| Integration | Components affected |
+-------------------+----------------------------------------------+
| Okendo reviews | storefront_search (hydration, card template) |
| | admin_server (settings: product_reviews) |
+-------------------+----------------------------------------------+
| Swym wishlist | storefront_search (wishlist button, events) |
| | admin_server (settings: wishlist_button) |
+-------------------+----------------------------------------------+
| Selly/TDF badges | storefront_search (selly anchor, hydration) |
| | admin_server (settings: integrations.selly) |
+-------------------+----------------------------------------------+
| Videowise video | admin_server (grid injection config) |
| tiles | (no code changes — config-only via API) |
+-------------------+----------------------------------------------+
| Quickshop modal | storefront_search (DOM event bridge only) |
| | (merchant's theme handles the modal) |
+-------------------+----------------------------------------------+
| New search field | search_proxy (score modifiers, field config) |
| or facet | admin_server (index settings, field mapping) |
| | storefront_search (filter UI if new facet) |
+-------------------+----------------------------------------------+

Workflow C: Search Tuning

Data flow

Normal path (production):
Admin API ---> DDB (EcomIndexSettingsTable) --> Exporter Lambda --> KV --> Proxy
(source of truth) (DDB Streams) (60s TTL)

Override path (testing — used by this workflow):
Agent sends search request with x-marqo-settings-override header
--> Proxy merges override onto base KV settings per-request
--> Results reflect the override WITHOUT writing to DDB
--> Nothing changes in production

Workflow C agents NEVER write to DDB. They use the x-marqo-settings-override header (PR #3140) to test config changes per-request. The override is ephemeral — it only affects that single request. Production settings are untouched until the coordinator (with human approval) applies the change via the admin API.

Override header reference

Header: x-marqo-settings-override
Value: JSON object

{
"mode": "deep", // "replace" | "shallow" | "deep" (default)
"settings": { // partial or full MarqoSettings object
"search_config": {
"hybridParameters": {
"scoreModifiers": {
"add_to_score": [
{ "field_name": "outOfStock", "weight": -2.0 }
]
}
}
}
}
}

Merge modes:
- "replace" — discard base settings, use override as full settings
- "shallow" — Object.assign override onto base (top-level keys)
- "deep" — recursively merge (objects merged, arrays replaced)

Result is re-parsed through MarqoSettings schema. Malformed input = 400.
Input from coordinator:
- What to tune (relevance, sort, facets, boosting, etc.)
- Current search behavior (example queries + results)
- Desired behavior (what should rank higher/lower)

+-----------------------------------------------------------------------+
| |
| CLARIFY |
| +- Identify tuning type: |
| | |
| | +-------------------+------------------------------------------+ |
| | | Type | DDB field path | |
| | | | (override via x-marqo-settings-override) | |
| | +-------------------+------------------------------------------+ |
| | | Score modifiers | search_config.hybridParameters | |
| | | | .scoreModifiers.add_to_score[] | |
| | +-------------------+------------------------------------------+ |
| | | Query overrides | search_config.queryOverrides[] | |
| | | (boost/bury) | | |
| | +-------------------+------------------------------------------+ |
| | | Sort mapping | storefront_search: constants.ts | |
| | | | (code change — delegate to Workflow B) | |
| | +-------------------+------------------------------------------+ |
| | | Facet config | Storefront settings API | |
| | | (storefront | filter_config.value.items | |
| | | filters) | (NOT overridable — delegate to coord) | |
| | +-------------------+------------------------------------------+ |
| | | Collection config | collections_config.{collection_name} | |
| | +-------------------+------------------------------------------+ |
| | | Relevance mode | relevance_search, relevance_sort | |
| | +-------------------+------------------------------------------+ |
| | | Merchandising | add_docs_config.merchandising_fields[] | |
| | | fields | | |
| | +-------------------+------------------------------------------+ |
| | | Personalization | for_you_config, personalization_enabled | |
| | | (for-you) | | |
| | +-------------------+------------------------------------------+ |
| | |
| +- Gather baseline: run example queries via search proxy |
| and record current result ordering (no override header) |
| |
| PLAN |
| +- Read current config via admin API: |
| | GET /api/v1/indexes/{index_name}/config |
| +- Build proposed override JSON |
| +- List example queries to test |
| +- Define expected impact per query |
| |
| GATE |
| +- auto: testing via override header is always safe |
| | (no production impact — override is per-request only) |
| +- /grill-me: if scope is unclear (e.g., "make search better") |
| |
| IMPLEMENT + VERIFY (iterative — agent tests its own changes) |
| | |
| | IF overridable config: IF code change (new sort |
| | (score modifiers, query overrides, field, new facet UI): |
| | relevance, collections config, |
| | merchandising, personalization): Delegate to Workflow B |
| | (Feature/Integration PR) |
| | +----------------------------+ |
| | | Single Agent | |
| | | | |
| | | 1. Read current config | |
| | | GET /api/v1/indexes/ | |
| | | {index}/config | |
| | | | |
| | | 2. Build override JSON | |
| | | | |
| | | 3. Test via search proxy: | |
| | | POST /api/v1/indexes/ | |
| | | {index}/search | |
| | | Headers: | |
| | | x-marqo-settings- | |
| | | override: {override} | |
| | | Body: {example query} | |
| | | | |
| | | 4. Compare results to | |
| | | baseline (no override) | |
| | | | |
| | | 5. Check: | |
| | | - Target products moved | |
| | | as expected? | |
| | | - Unrelated queries | |
| | | not degraded? | |
| | | - Facet counts correct? | |
| | | - Sort ordering right? | |
| | | | |
| | | 6. If not right, adjust | |
| | | override and re-test | |
| | | (iterate, max 5 rounds | |
| | | — no prod risk) | |
| | +-------------+--------------+ |
| | | |
| | v |
| | |
| | If UI change (new facet visible in storefront): |
| | +- Screenshot filter sidebar via browserN |
| | +- Compare to current |
| | |
| | OUTPUT to coordinator: |
| | +- Validated config override (JSON) |
| | +- Before/after query comparison report |
| | +- The override was tested N times with real queries |
| | +- Ready-to-apply API call: |
| | | PUT /api/v1/indexes/{index}/config |
| | | { ...validated changes } |
| | +- Rollback values (current config snapshot) |
| | |
| | NOTE: the agent does NOT call PUT /config. |
| | Only the coordinator (with human approval) writes to DDB. |
| | |
+-----------------------------------------------------------------------+

The coordinator then:
1. Shows the validated diff + query comparison report to the human
2. The config was already tested via override header — results are real
3. On approval, coordinator applies via PUT /api/v1/indexes/{index}/config
(writes to DDB -> streams to KV -> live in ~60s)
4. Optionally re-runs queries to confirm live behavior matches override test

Example: adding an out-of-stock demotion

Baseline (no override):
POST /search {"q": "red shoes"}
-> Result: [Nike Air (in stock), Adidas Run (OOS), Puma Slide (in stock)]

Test with override:
POST /search
Headers: x-marqo-settings-override: {
"mode": "deep",
"settings": {
"search_config": {
"hybridParameters": {
"scoreModifiers": {
"add_to_score": [
{"field_name": "onSale", "weight": 0.5},
{"field_name": "outOfStock", "weight": -2.0}
]
}
}
}
}
}
Body: {"q": "red shoes"}
-> Result: [Nike Air (in stock), Puma Slide (in stock), Adidas Run (OOS)]
-> OOS item demoted as expected. No override written to DDB.

Spot-check unrelated query:
POST /search (same override) {"q": "gift card"}
-> Result: [unchanged — gift cards have no stock field]
-> No degradation confirmed.

Agent outputs validated override + comparison report to coordinator.
Coordinator shows to human, human approves, coordinator applies to DDB.

/grill-me Skill

Structured clarification when auto-gate can't approve. Invoked by coordinator, runs interactively with human.

Input:

  • What we're trying to do
  • What's ambiguous (specific questions)
  • Options identified (if any)
  • Evidence (screenshots, current vs native behavior)

Output:

  • Structured spec with clear acceptance criteria
  • Returned to coordinator who re-dispatches to the appropriate workflow

Examples:

  • "Native theme has quickshop modal. Replicate via DOM events or skip? [screenshot]"
  • "Store uses Okendo AND Judge.me. Which review provider?"
  • "12 native filters, 8 Marqo fields match. Which 4 to drop?"

Browser Allocation

.mcp.json (4x @playwright/mcp --isolated)
Each instance: separate Chrome process, ephemeral profile

+----------------------------------------------------------------+
| |
| Coordinator |
| +- mcp__plugin_playwright_playwright__* |
| (interactive, shows things to human) |
| |
| Subagent work: |
| +- Agent 1 --> mcp__browser1__* (exclusive) |
| +- Agent 2 --> mcp__browser2__* (exclusive) |
| +- Agent 3 --> mcp__browser3__* (exclusive) |
| +- Agent 4 --> mcp__browser4__* (exclusive) |
| | |
| Independent reviewer / fix agents: |
| +- Uses whichever browserN is free |
| (implementation agents are done by this point) |
| |
| Rule: prompt each agent with |
| "Use ONLY mcp__browserN__* tools" |
| |
+----------------------------------------------------------------+

Verification Standards (per-element, not viewport)

Full-page screenshots are useless for verifying small CSS changes. Each subtask defines WHAT ELEMENT to screenshot and compare.

SubtaskElement selectorViewports
Product cards.marqo-product-card1440, 375
Filter sidebar.marqo-filter-sidebar1440
Filter drawer.marqo-filter-drawer375
Pagination.marqo-pagination1440, 375
Sort dropdown.marqo-sort-select1440
Results count.marqo-results-count1440
Page heading.marqo-collection-header1440, 375
No results.marqo-no-results1440
CTA hover.marqo-product-card:hover1440

Each agent screenshots:

  1. The element on the NATIVE theme (reference)
  2. The element on the MARQO theme (current)
  3. Compares visually and reports pass/fail + description

Docs Structure: Current vs Needed

Existing docs (keep, reference from AGENTS.md)

DocPurposeAction
docs/integrations/README.mdQuickstart workflow (8 steps)Update to reference AGENTS.md
docs/integrations/storefront-css-customization-guide.mdCSS how-to, agent promptsKeep as reference
docs/integrations/shopify-collection-blocks-guide.mdBlock injectionKeep as reference
docs/integrations/msqc.mdPer-merchant referenceKeep
docs/integrations/muji.mdPer-merchant referenceKeep
docs/dev/shopify-styling-architecture.mdThree-layer CSS systemKeep as reference
docs/dev/storefront-extensibility.mdDOM events, grid injectionsKeep as reference
docs/dev/customer-theme-matching-guide.mdTheme analysis promptsConsolidate into AGENTS.md
docs/dev/agent-workflow-guide.mdParallel planning patternsConsolidate into AGENTS.md
docs/runbooks/diagnostics/components/shopify.mdGraphQL, metafields, diagnosticsKeep
docs/runbooks/diagnostics/components/search-proxy.mdSearch proxy diagnosticsKeep

New artifacts to create

ArtifactTypePurpose
docs/integrations/AGENTS.mdDocSingle source of truth for all agent integration work
.claude/agents/css-implementer.mdSubagent defTeammate: write CSS, self-verify, log escalations. Tool-restricted.
.claude/agents/independent-reviewer.mdSubagent defTeammate: cross-check all sections, separate context
.claude/agents/search-tuner.mdSubagent defTeammate: build + test config overrides, no DDB writes
.claude/skills/grill-me/SKILL.mdSkillStructured human clarification
.claude/skills/integrate-storefront/SKILL.mdSkillTeam lead behavior (scope determines fan-out)
Workflow AWorkflow scriptSingle-section CSS implement+verify loop
Workflow CWorkflow scriptSearch tuning iteration with override header
TaskCompleted hookHookValidate phase outputs before task completion
TaskCreated hookHookValidate escalation file exists for integration tasks
TeammateIdle hookHookRedirect idle teammates to unclaimed tasks
.mcp.jsonConfig4 pinned isolated Playwright instances

TODO List

Phase 0: Infrastructure

  • scripts/ecom/shopify_graphql.py — reusable Shopify GraphQL helper
  • Shopify diagnostics docs updated with script reference
  • Parallel Playwright MCP validated (4 isolated instances, 2026-06-09, @playwright/mcp@0.0.75)

Phase 1: Documentation

  • Create docs/integrations/AGENTS.md — the single source of truth
    • Universal work unit definition (clarify -> plan -> gate -> implement -> review -> verify -> output)
    • Team lead (coordinator) responsibilities and boundaries
    • Workflow A spec: single-section CSS loop (script pseudocode + agent responsibilities)
    • Workflow B spec: feature/integration PR in worktree
    • Workflow C spec: search tuning with override header
    • Escalation model (file-based, no workarounds, gated dispatch)
    • Verification standards (per-element screenshot table, scoring rubric)
    • Playwright multi-browser rules (browserN assignment, --isolated requirement)
    • Third-party app integration guides:
      • Okendo reviews
      • Swym wishlist
      • Selly/TDF badges
      • Videowise video tiles
      • Quickshop modals (DOM event bridge)
    • Search tuning guide (score modifiers, query overrides, sort mapping, relevance, collections config)
    • Marqo request anatomy (prefetch, normal search, composite search, recommendations)
    • Per-merchant doc template
    • Reference links to existing docs
  • Update docs/integrations/README.md — point to AGENTS.md as the primary reference
  • Consolidate docs/dev/customer-theme-matching-guide.md — fold into AGENTS.md
  • Consolidate docs/dev/agent-workflow-guide.md — fold into AGENTS.md

Phase 2: Subagent Definitions + Skills + Config

  • Land .mcp.json with 4 pinned @playwright/mcp@0.0.75 --isolated instances
    • Ship alongside the first skill that consumes browser1-4
  • Create subagent definitions (.claude/agents/)
    • css-implementer.md — tool restrictions (no admin API, no prod push), browserN assignment, escalation file requirement
    • independent-reviewer.md — cross-check role, separate context, browserN assignment
    • search-tuner.md — override header usage, no DDB writes, comparison report format
  • Create hooks (.claude/settings.json)
    • TaskCompleted — validate escalation file, CSS output, screenshots before allowing completion
    • TaskCreated — validate escalation file exists for [INTEGRATION]/[FEATURE] tasks
    • TeammateIdle — redirect to unclaimed tasks
  • Create /grill-me skill (.claude/skills/grill-me/SKILL.md)
    • Structured question template
    • Evidence attachment (screenshots, current behavior)
    • Decision recording format
    • Output: structured spec with acceptance criteria
  • Create /integrate-storefront skill (.claude/skills/integrate-storefront/SKILL.md)
    • Scope detection: full setup (4 teammates) vs tweak (1) vs tuning vs integration
    • Phase 1: Setup (backup, merchant doc, theme tokens, filter mapping)
    • Phase 2: Spawn team, assign tasks, require plan approval for IMPLEMENT
    • Phase 3: Handle escalations (gate: clear scope -> Workflow B teammate, unclear -> /grill-me)
    • Phase 4: Collate outputs, present escalation log, gate prod pushes

Phase 3: Workflows

  • Workflow A: CSS single-section loop
    • Script: inspect -> write -> deploy -> screenshot -> compare -> iterate (max 3)
    • Escalation file output (even if empty)
    • Verification artifacts (native + Marqo screenshots per element)
  • Workflow B: Feature/PR (teammate runs in worktree)
    • Script: implement -> verify (code-verifier) -> iterate on feedback (max 3)
    • Same implementor/verifier handoff loop as Workflows A and C
    • Plan approval from lead before first implementation round
    • Test execution per component (npm test, pants test, ruff, biome)
    • /raise -> /monitor-pr -> PR link to lead
  • Workflow C: Search tuning iteration
    • Script: baseline -> build override -> test with header -> compare -> iterate (max 5)
    • Spot-check pattern (unrelated queries for regression)
    • Output: validated override JSON + comparison report + rollback values
    • Depends on: PR #3140 (settings override header)
  • Deploy-to-preview pattern
    • CSS: inject-css -> push to settings (preview theme only, lead gates prod)
    • Search config: agents test via override header only. Lead applies to DDB on human approval.

Phase 4: Testing & Validation

  • End-to-end test: initial setup against a dev store
    • Team of 4 teammates, each running Workflow A with own browser
    • Verify inter-teammate messaging works
    • Verify escalation flow (teammate logs, lead triages)
    • Verify hooks enforce task completion requirements
  • End-to-end test: single tweak against a dev store
    • Same skill, scope = 1 section, team of 1 teammate
  • End-to-end test: search tuning with override header
    • Workflow C iteration loop, verify no DDB writes
  • Test /grill-me skill with sample ambiguous specs
  • Test browser allocation under load (4 Chrome instances + plugin)

Phase 5: Iteration

  • Gather feedback from first real integration using the automation
  • Refine subagent definitions and workflow scripts based on what works
  • Add more per-app integration guides as new third-party apps are encountered
  • Consider Dynamic Workflows for Workflow A when the feature stabilizes