Skip to main content

Iteration Patterns

How an implementer + reviewer iterate on a code change. Five patterns are in use across this repo; the right one depends on scope, lifetime, and what you're protecting against. This page is the canonical taxonomy — agent definitions, skills, and the code-review guide all link here rather than restate it.

Quick choice

SituationUse
One PR, moderate spec, persistent teammate doing the workB — inline-iterate (default)
Multi-section parallel fan-out (e.g. 4 CSS sections)A — workflow
Repeated tuning rounds with hard ratchet (e.g. search-tuning)A — workflow
Same reviewer concerns recur across multiple PRs in the same sessionE — B→D mode-switch
Two long-lived peers messaging each other (implementer + dedicated reviewer teammate from the start)C — pair-driven
Long-lived reviewer reviewing many sequential PRs from the same implementerD — persistent reviewer teammate

When in doubt, start with B.


A — Workflow (scripted)

A deterministic Node script (feature-pr.js, general-feature-pr.js, css-section.js, search-tuning.js) orchestrates the loop:

script.initialImplementPrompt → spawn implementer subagent (one turn)
script.verifyPrompt → spawn verifier subagent (one turn), structured {pass, feedback}
if pass: exit
else: script.fixImplementPrompt(feedback) → spawn implementer subagent again
repeat up to maxRounds (3 for feature-pr / css-section, 5 for search-tuning)
  • Lifetime: subagents are one-shot per spawn; the script holds state between rounds. The persistent session that invoked the workflow is what survives.
  • Strengths: deterministic, max-rounds bound, structured output (good for downstream consumers), good for fan-out (one workflow per section running in parallel).
  • Weaknesses: rigid; the verifier's checklist is whatever the script's verifyPrompt says. Round-2+ prompts historically dropped the spec (fixed in PR3 — scope contract is now stamped into every prompt-builder). Lib helpers + tests at .claude/workflows/lib/.
  • Who invokes it: harness reality (2026-06-10/11): teammates do NOT have the Workflow tool, so only the lead or a plain non-team session can invoke a workflow — and a lead-invoked run is detached from any persistent teammate. Use it only for genuinely detached one-off work; for owned work, Pattern B-hub below. See integrate-storefront SKILL.md "What the lead CANNOT do".

B — Inline-iterate (default)

The persistent teammate writes code itself, then spawns an independent reviewer subagent with Agent({subagent_type: "code-verifier", ...})no team_name, fresh context, one turn. (The agent definition is code-verifier; "reviewer" is the prose name used here.)

teammate edits files
teammate spawns reviewer subagent ({diff, spec, project gates})
reviewer returns {pass, feedback}
if pass: teammate /raise + /monitor-pr
else: teammate fixes, spawns a NEW reviewer subagent (fresh context, not the same one)
repeat up to 3 rounds; if still failing, SendMessage lead
  • Lifetime: the teammate is persistent; each reviewer subagent is one-shot. Each round gets a fresh subagent — that's the whole point. The fresh context is what makes the review independent.

  • Strengths: cheap (~30-50k tokens per spawn), catches real bugs workflow self-verify misses (session empirics: B caught 2 P1s + 2 P2s the workflow's Node-harness self-verify missed). Implementer has full build context; reviewer has fresh-eyes context. No script rigidity.

  • Weaknesses: no max-rounds enforcement except by convention. Reviewer has no memory across rounds — for some review concerns that's a feature (independence), for others it's a bug (see Pattern D / E).

  • Cross-PR knowledge does NOT transfer. Fresh subagents have no memory of prior PRs' findings. Empirically (PR #3499 dogfooding):

    • PR #3500 fixed a path-traversal bug class for one slug derivation; two rounds of independent reviewers on PR #3499 then BOTH missed the same class of bug in adjacent slug derivations — auto-reviewer (claude[bot]) caught it on round 3.
    • PR #3486 fixed an await import() runtime risk by inlining helpers into each workflow script; PR #3499 then reintroduced an await import("node:fs/promises") at the top level of every script — three rounds of independent reviewers missed it; claude[bot] caught it on round 4 with the exact same reasoning PR #3486 had used.

    Implication: if a class of bug or risk recurs and you want reviewer-side memory, switch to Pattern D or E. Otherwise budget for late-stage auto-reviewer catches — and accept that the auto-reviewer's persistent cross-PR memory (knowledge of recent fixes on the same repo) is the practical substitute for any "shared review brain" you'd want here.

  • Mitigation — the review ledger. The teammate appends every reviewer verdict to ${SHARED_WORKSPACE}/reviews/<task-slug>.md and includes a "CONFIRMED FINDINGS SO FAR" section in each fresh reviewer's brief (the canonical brief template lives in code-implementer.md step 3). Confirmed findings transfer between rounds; independence of judgment is preserved. Workflow scripts (Pattern A) append to the same ledger automatically.

  • Who invokes it: the teammate (default for code-implementer).

  • Harness reality (2026-06-10/11): teammates have no Agent tool, so a teammate cannot literally spawn the reviewer. The working variant is B-hub (hub-and-spoke): the teammate sends the reviewer brief to the LEAD via SendMessage; the lead spawns the fresh reviewer subagent and relays the verdict. Same independence properties (fresh context, new reviewer per round, ledger) — the lead is a message bus, not a judge. The direct form applies when the implementer is itself a non-team session or transient subagent.


C — Pair-driven (two long-lived peers)

Two persistent teammates spawned at session start: one implementer, one reviewer (both addressable via SendMessage). They iterate by messaging each other directly; the lead optionally observes.

  • Lifetime: both teammates live for the team's lifetime, go idle between turns.
  • Strengths: reviewer accumulates context across review rounds — knows the spec, prior trade-offs, and which findings were already discussed. Good for non-trivial features with many rounds where re-explaining context to a fresh subagent each round is the bottleneck.
  • Weaknesses: reviewer is no longer "independent" by round 3 — they share an internal model with the implementer of "what's OK here." That failure mode (verifier-as-collaborator vs. verifier-as-judge) is the reason Pattern B is the default. Use C only when you specifically want the accumulating-context property.
  • Who invokes it: the lead, at session start.

D — Persistent reviewer teammate

A long-lived reviewer teammate that reviews many sequential PRs from the same implementer (or implementers). Unlike C, the reviewer is not paired with one implementer; they're a standing capability.

  • Lifetime: reviewer teammate alive for the team's lifetime.
  • Strengths: institutional memory across PRs — the reviewer remembers prior decisions, recurring concerns, the merchant's gotchas. Good for long-running integration sessions where one teammate reviews CSS, code, and search tuning PRs across days.
  • Weaknesses: same independence drift as C, just slower. Reviewer can drift into approving anything the implementer raises if they share too much context. Mitigate by periodic fresh-subagent spot-checks (basically Pattern B as an audit on Pattern D).
  • Who invokes it: the lead, when scope warrants.

E — B → D mode-switch

The interesting one. Start with B (inline-iterate, ad-hoc reviewer subagent). If during a session the same review concerns recur across multiple PRs from the same teammate — or the work clearly spans multiple PRs that would benefit from shared review context — promote the ad-hoc reviewer subagent to a persistent reviewer teammate.

Mechanic: spawn an Agent with team_name + name + subagent_type: "code-verifier" (i.e., promote from subagent to teammate). The new teammate is "instantiated from" the prior subagent's last review output — pass it the prior review's verdict + reasoning in the spawn prompt so the persistent reviewer's first turn starts where the last subagent's turn left off.

Empirical evidence

From session-2025-12 / tmp/integrations/orchestration-impl-b/reports/orch-improvement-collation.md §5:

  • Pattern B dominated (4 instances end-to-end vs. A=0, D=1, lead-as-broker=1).
  • One session-long task (Okendo + Swym + CSS finishing) spawned 7 independent code-verifier subagents — 5 of them flagged the same P2 concern (inconsistent error wrapping pattern). The fix was the same each time; re-explaining context cost ~30k tokens per spawn.
  • Switching to a persistent reviewer teammate (D) for the rest of that session caught a regression on PR #4 that a fresh subagent would likely have missed (the regression undid a fix the reviewer had explicitly asked for on PR #2). Net: 5x fewer review tokens, one caught regression that B alone would have shipped.

When to switch

Concrete signals (any one is enough):

  • The same finding shows up in 3+ consecutive reviewer subagent outputs across different PRs in the session — fresh context isn't buying you independence, it's costing you tokens.
  • The implementer is touching a long-lived subsystem where review depends on understanding evolving constraints (e.g., merchant integration where PRs land over days and reviewer needs to remember "this shop has X").
  • You're entering a final-mile sprint with 5+ PRs left and the same reviewer profile each time.

When to switch back

If the persistent reviewer starts rubber-stamping (no P0/P1 findings on two consecutive PRs that prior fresh subagents would have flagged), spot check with a one-off B-style fresh subagent review on the next PR. If it finds something the persistent reviewer missed, that's the signal to shorten the persistent reviewer's lifetime or rotate to a new one.


Cross-reference

  • For Shopify integration work: /docs/integrations/AGENTS.md §Canonical Loop
  • For general dev work: /docs/dev/orchestration/code-review-guide.md §Entry point
  • Workflow scripts: .claude/workflows/{feature-pr,general-feature-pr,css-section,search-tuning}.js
  • Shared loop primitive: .claude/workflows/lib/implement-verify-loop.js
  • Agent definitions: .claude/agents/{code-implementer,code-verifier}.md
  • Lead's CANNOT rules: .claude/skills/integrate-storefront/SKILL.md §What the lead CANNOT do