Orchestration — Deferred Follow-ups
Items the Option B series (#3482 + #3486) explicitly DID NOT implement and the reasoning + suggested resolution. Reference this doc from PR descriptions and endstate-summary documents.
1. Workflow-runtime await import() support
Status: deferred; drift risk now mitigated (orch-uplift-v2). Source-of-truth
lib lives at .claude/workflows/lib/implement-verify-loop.js; each workflow script
inlines a copy of the primitive at the top of its body. Unit tests on the lib
validate the semantics in one place, and
.claude/workflows/lib/inline-parity.test.js now enforces that every inlined
copy is byte-identical to the lib (minus export ) — manual mirroring can no
longer drift silently. The smoke-test workflow (see item 6) reports whether
dynamic await import("node:fs/promises") resolves in the real runtime, which
is the same question for sibling imports.
Why deferred: An early Option B revision had each workflow start with
const {...} = await import("./lib/implement-verify-loop.js"). claude[bot] and
the option-b-review teammate both flagged that there is no precedent for
sibling-relative import() from a workflow script anywhere in .claude/workflows/
on main, and the Node test harness used for verification doesn't exercise the
real Workflow tool runtime — it uses Node's native ESM loader, which always
resolves relative imports. The Node-test harness was a false-confidence signal
on the question "does the real runtime support await import()".
To eliminate the runtime question entirely, the workflow scripts were re-inlined. Cost: ~80 lines × 4 workflows = ~320 lines of duplication.
Resolution — CONFIRMED NEGATIVE (real-runtime smoke-test, 2026-06-10):
Workflow({name: "smoke-test"}) ran in the real runtime and reported
fs-import: "unavailable: import() is not available in workflow scripts." —
dynamic import() of ANY module (including siblings) does not resolve there.
Re-extraction is off the table; the inlined copies + inline-parity.test.js
are the permanent design, not a stopgap. Consequence: writeSpecFile /
appendLedgerEntry script-side calls are no-ops in the real runtime (their
.catch(() => null) guards absorb the rejection); the spec flows in-prompt
via the scope contract, and the review ledger is written by the VERIFIER
subagent itself (instructed in every verifyPrompt), since spawned agents have
real tool access even though the script does not.
2. Teammate-spawn validation hook (absolute-workspace + isolation)
Status: RESOLVED (orch-uplift-v2) — the probe captured a real payload
and the enforcement gate is now live at .claude/hooks/subagent-start-gate.sh
(replaces the probe; wired to SubagentStart in .claude/settings.json).
Discovered schema (from a real capture, 2026-06-10):
{"session_id": "...", "transcript_path": "...", "cwd": "...",
"agent_id": "...", "agent_type": "...", "hook_event_name": "SubagentStart"}
The payload does NOT expose spawn-prompt text or the isolation flag, so the
originally-planned "validate ${SHARED_WORKSPACE} is absolute in the prompt"
check is not implementable from this event. What IS enforceable:
- Warn (non-blocking): implementer-class agents (
code-implementer,css-implementer,finisher) whosecwdis not under.claude/worktrees/get a loud warning on stdout and a logged violation attmp/.subagent-gate/<session>-violations.log, but the spawn is allowed (exit 0). See the e2e finding below for why this is warn-not-block. - Allow (silent): implementer-class agents whose
cwdIS under.claude/worktrees/pass and have their worktree occupancy registered. - Soft signal: worktree reuse by a second agent_id is logged to
tmp/.subagent-gate/<session>-worktrees.log(never blocks — the payload can't distinguish legitimate sequential workflow rounds from teammate contamination).
E2E finding (orch-uplift-v2, 2026-06-10) — why warn, not block. The gate
originally blocked (exit 2) on case 1. End-to-end testing in the real harness
found that SubagentStart fires with the repo-root cwd, not the worktree
cwd, even for spawns the orchestration intends to isolate. Every real
SubagentStart capture in the test session — an Explore agent, the
e2e-tester agent itself (meant to run isolated) — reported
cwd: <repo-root>, never a .claude/worktrees/ path. The payload at this
event therefore cannot distinguish a genuinely non-isolated implementer (the
failure to catch) from an isolation: "worktree" spawn whose worktree cwd is
not yet reflected. A hard exit-2 block under that timing would deadlock EVERY
legitimate isolated implementer spawn and take down the whole workflow
substrate. The gate was downgraded to a loud, logged, non-blocking warning:
the signal survives for the lead and post-mortems, but a false positive can
never wedge a real run.
Open verification CLOSED — with a more serious finding (2026-06-10, lead
session): a real code-implementer teammate spawned with isolation: "worktree" was captured by the gate with cwd = <repo-root>, AND the agent's
own pwd (run via Bash inside its session) returned the repo root, AND no new
worktree appeared under .claude/worktrees/. The SubagentStart cwd was not
"early" — it was ACCURATE: the harness is not honoring isolation: "worktree" for team-spawned agents at all in the current build. Two more
data points from the same session agree: the e2e-tester teammate (also spawned
with isolation) committed directly onto the lead's checked-out branch, and
PR #3499's dogfooding logged the same failure twice. Consequences:
- The gate's violations log is currently a TRUE-positive detector — every warned spawn really is sharing the lead's tree.
- Warn-only remains the only viable posture: a hard block would reject every implementer spawn, because none currently get isolation.
- The real fix is harness-side: honor
isolation: "worktree"for teammate spawns (or expose anisolationfield in the SubagentStart payload so the gate can block only genuine violations). Until then, leads should assume teammates SHARE the lead cwd: keep the lead's checkout on the integration branch teammates are expected to commit to, and avoid concurrent implementer teammates touching git state.
Narrowing (same session): the gap is SPECIFIC to team spawns. Transient
Agent spawns (no team_name) with isolation: "worktree" DO get real
worktrees — verified twice via the agents' own pwd (an Explore probe and a
code-implementer probe both ran under .claude/worktrees/agent-*). So
Pattern B's transient reviewer subagents and verifier-spawns-fix implementers
are isolated correctly; it is the persistent TEAMMATE itself that shares the
lead's cwd. Gate-behavior consequence: hooks run in the spawned agent's cwd,
so isolated transient spawns evaluate the gate inside their own worktree (cwd
check passes; their registry entries live in worktree-local tmp/ and vanish
with auto-cleanup), while non-isolated teammates evaluate it at the repo root
and land in the lead-side violations log. The lead-side
<session>-violations.log therefore contains only true positives — every
entry is an implementer-class agent genuinely running in the lead's tree.
Tests: cases 9-14 in .claude/hooks/test_workspace_resolution.sh (case 11 now
asserts warn-not-block + violation logging).
Residual gap: prompt-content validation (absolute ${SHARED_WORKSPACE})
still needs either a harness-side field or typed workspace params on
Agent({...}). Tracked below as the original resolution option 3.
Why an enforcement gate isn't shipping yet:
The followups doc originally named the event TeammateSpawned. That literal
event name does not exist in the current Claude Code harness — the wording
was a conceptual placeholder. What DOES exist:
SubagentStart— fires when any subagent (including teammates, which the agent-teams substrate spawns as subagents) starts. Recently added per~/.claude/cache/changelog.md. Must be configured as atype: "command"hook (prompt-/agent-type hooks for this event error out by design).WorktreeCreate/WorktreeRemove— fire when agent worktree isolation creates or removes a worktree. Closer to theisolation: "worktree"side of the validation contract;hookSpecificOutput.worktreePathavailable for HTTP hooks.
Both are candidate landing surfaces. The unknown blocking confident
enforcement is the hook-input schema: the changelog confirms agent_id
and agent_type are in hook payloads generally, but does not document
whether SubagentStart exposes the spawn prompt text (where
${SHARED_WORKSPACE} is interpolated) or the isolation parameter. Without
those fields in the payload, a hook can't enforce "the workspace path in
the spawn prompt is absolute" — the data isn't reachable. Landing
enforcement on guesswork would ship a placebo gate.
What PR5 does ship (probe-only):
.claude/hooks/subagent-start-probe.sh, wired to SubagentStart in
.claude/settings.json, captures the first spawn's full input payload per
session and exits 0 unconditionally. NEVER blocks a spawn. The capture lets
the follow-up PR build enforcement on real-world payload data instead of
the changelog's partial schema.
Empirical motivation — the gap fired during PR5 work itself, in real
time. While orchestrating the PR3 + PR5 + PR6 series on 2026-06-10, two
implementer subagents (PR3 and PR6) had their Edit calls land in the lead's
parent worktree at /Users/.../cloud_control_plane/ instead of their assigned
isolated worktrees. The lead's PR2 branch (fix/last-page-clamp) was dirtied
twice. Recovery required spinning up fresh worktrees, applying patches by
hand, and re-running. This is the same failure mode the holistic review §3
P0-A predicted and PR1 first observed — and it happened again, twice,
within minutes, while the followups doc literally describes the gap on
disk. Voluntary prose contracts demonstrably do not hold under team
parallelism. The enforcement gate must land.
Resolution (next PR, owner: TBD):
- After PR5 ships and a few real spawns capture probe data, read the dumps
at
tmp/.subagent-start-probe/*.jsonto confirm whetherSubagentStartexposes spawn-prompt content + isolation flag. - If yes — write
.claude/hooks/subagent-start.shthat fails loud (exit 2, blocking) when${SHARED_WORKSPACE}interpolates to a relative path OR whenisolationis not"worktree"on code-touching roles. - If no — escalate to harness-team for an explicit
hookSpecificOutput-style field, OR add typedworkspaceandisolationparameters toAgent({...})spawns so they appear in the probe payload.
Owner: whoever picks up the followup once one or two probe payloads have been captured in real sessions.
3. Workspace path generalization in hooks (tmp/work/ for general-dev)
Status: landed in PR5. Hooks now resolve workspaces via the shared
.claude/hooks/_lib.sh helper, which picks tmp/work/<slug>/ when present
else tmp/integrations/<slug>/. teammate-idle.sh walks both subtrees.
Smoke test at .claude/hooks/test_workspace_resolution.sh. SKILL.md and
docs/integrations/AGENTS.md updated to document the dual-root convention.
4. finisher.md — verifier-spawns-fix section
Status: RESOLVED in PR6. Option (b) picked: clarify the role, do not add a verifier-spawns-fix section, do not rename the file.
Rationale:
- The finisher already owns implementer authority (Write/Edit in
tools:frontmatter, step 3 writes CSS fixes directly during its own max-3-rounds verify-fix loop). - Adding a verifier-spawns-fix section would be redundant — the finisher does not need an escape hatch for post-loop tactical fixes; it already writes fixes inline.
- The file is already named
finisher.md(not*-verifier.md). The confusion was in prose framing, not the filename. Renaming would create cross-reference churn for cosmetic gain.
Concrete change in PR6: Added "Role classification — NOT a strict verifier"
section to .claude/agents/finisher.md explicitly stating it is a post-collation
integration pass with its own self-contained implement-verify cycle, NOT a strict
verifier in the loop-scoped sense. Also routes Liquid/JS integration gaps to a
code-implementer rather than attempting them inline.
Recorded in: ASCII endstate diagram Layer 3 already notes:
(writes targeted CSS fixes — not a "verifier" in the strict sense).
5. Plan-verifier / test-case-generator / project-clarity-interviewer orphans
Status: PARTIALLY RESOLVED in PR6.
-
project-clarity-interviewer.md: DELETED in PR6. Confidently superseded by the/grill-meskill (.claude/skills/grill-me/), which is the active and skill-system-registered clarification tool covering the same workflow. Two parallel clarification entry points was a confusion source ("spawn the agent or invoke the skill?"); collapsed to the skill. -
plan-verifier.md: DEFERRED. Plausibly useful for a futureplan-first-feature-prskill that runs a pre-implement plan-review gate before any code is written. The PR1 verifier-spawns-fix policy covers in-loop implementation review viacode-verifier(verifies code against a given plan/spec), but does not cover pre-loop plan review — that remains a real gap a plan-verifier could fill. Low cost to keep the unreferenced agent file; if the plan-first workflow does not materialize in ~3 subsequent orchestration PRs, revisit deletion. -
test-case-generator.md: DEFERRED. Produces test-case markdown documentation (not test code). Niche but legitimate utility — useful for QC/audit work, test-coverage planning, or before refactoring a critical module. Usable on-demand via theAgenttool without being part of a fixed workflow. Could also integrate into a futureplan-first-feature-prslot (after plan-verifier, generates test cases the implementer must satisfy).
Target follow-up slot for both deferred agents: a future
plan-first-feature-pr skill OR continued on-demand Agent invocation. If the
plan-first flow does not get built in the next ~3 orchestration PRs, delete them
in a follow-up cleanup.
Catalog cleanup in PR6: docs/agentic-development.md updated to drop the
deleted agent and note the deferred status of the remaining two.
6. Smoke-test workflow exercising the real Workflow tool
Status: RESOLVED (orch-uplift-v2). .claude/workflows/smoke-test.js
spawns NO agents and validates, in the real Workflow runtime: args delivery,
log(), the inlined helpers (safeSlug, buildScopeContractClause,
formatFeedbackHistory, assertAbsoluteWorkspace), dynamic
import("node:fs/promises") availability, and (when given an absolute
args.workspace) a file round-trip. Run it after touching the workflow lib or
upgrading the harness:
Workflow({name: "smoke-test", args: {workspace: "<abs path, optional>"}})
The returned diagnostic object answers the open FS question from items 1
and the spec/ledger best-effort writes: if fsAvailable is false, the
writeSpecFile/appendLedgerEntry calls in workflows are no-ops and all
context flows in-prompt only (by design — the .catch(() => null) pattern).
First real-runtime result (2026-06-10): pass=true; args delivered as
object; log works; all inlined helpers behave; fsAvailable=false
("import() is not available in workflow scripts"). The same session also ran
general-feature-pr end-to-end in the real runtime (toy spec): implementer
- verifier spawned, scope contract + implementer report present in prompts, approved on round 1, artifacts written by the AGENTS (escalation file, probe output) while script-side spec/ledger writes no-op'd as designed.
7. Teammate tool availability (no Agent / no Workflow) — hub-and-spoke
Status: documented + designed-around (2026-06-11). Empirics: 5+ teammates
across two production teams confirmed via ToolSearch that team-spawned agents
have NO Agent tool and NO Workflow tool (absent, not deferred).
Consequence: the in-teammate inline-iterate loop (teammate spawns its own
reviewer) and teammate-invoked workflows are unrunnable for teammates.
Intended vs bug (researched against the official agent-teams docs):
intended. The docs state "No nested teams: teammates cannot spawn their own
teams or teammates. Only the lead can manage the team" — the blanket removal
of Agent/Workflow is the (broad) enforcement of that limitation. Likewise the
docs define NO worktree isolation for teammates (teammates share the project
directory; the docs explicitly warn "two teammates editing the same file
leads to overwrites" and recommend manual git worktrees for parallel
sessions) — so isolation: "worktree" being ignored on team spawns is
unsupported-by-design, not a regression. Final-text invisibility is also
intended: teammates communicate only via SendMessage / idle notifications.
Designed-around (PR: fix/orch-hub-spoke):
- Hub-and-spoke review (proven, 8+ cycles): teammate implements →
SendMessages the lead the reviewer brief → lead spawns
code-verifier(transient or background) → lead relays the verdict → iterate. Codified incode-implementer.md§3 item 4, both SKILLs, AGENTS.md, code-review-guide, iteration-patterns (Pattern B-hub). - Self-provisioned worktrees: implementer-class teammates run
git worktree add .claude/worktrees/<your-name> -b <branch>before any git work (item 1 of the loop incode-implementer.md§3; spawn-prompt directive in the integrate-storefront SKILL). Note: the SubagentStart violations log still records these spawns (cwd is the root AT SPAWN); an entry means "spawned unisolated" — teammates following the convention remediate immediately after, so cross-check the log against.claude/worktrees/before treating an entry as an incident. - SendMessage-verdict clause appended to all verifier-class agent definitions (code/css/search/plan-verifier, finisher): final text is invisible; the verdict must be SendMessage'd. 4 of 5 reviewers empirically needed this.
Re-tightening plan: if a future harness grants teammates Agent/Workflow
or real isolation, the direct inline-iterate path in code-implementer.md
§3 item 4 (route "Direct") simply becomes reachable again — the docs describe
both routes, gated on a ToolSearch("select:Agent") probe, so no doc change
is needed to benefit. The SubagentStart gate re-tightens per item 2.
8. qc-investigator accounting
.claude/agents/qc-investigator.md exists and is the durable investigation
role (spawned by /integrate-storefront for "investigate X" scopes). The
holistic review predates it and calls the role "hallucinated" — that claim is
stale; the agent definition IS the codified contract. Catalog:
docs/agentic-development.md.
Investigation heuristics (from the June 2026 LG promo investigation — resolved as merchant config, after eliminating four code hypotheses):
- Effective-config first. Multi-source features (manual/shopify, per-shop/per-theme) make half the visible config dead. Establish which source is live before debugging any rules you can see.
- Sibling control test. When X doesn't render, immediately check a sibling feature in the same pipeline batch (same enrichment pass, same ref-update path). Sibling renders ⇒ the shared plumbing is innocent; the break is feature-specific. This single check eliminated the prime refactor suspect in minutes.
- Source of truth, not its projection. "Read the actual metafields" means query the API (Storefront/Admin GraphQL), not the page-injected globals derived from it. The decisive 12-entries-vs-2-delivered delta was invisible in the projection.
- Deploy fingerprint before blaming code.
gzip -9 | wc -cof the served CDN bundle vs known per-PR sizes answers "which code is live" in 30 seconds. - Verify the environment you think you're testing. Preview-theme URLs
silently fall back to the live theme under curl and across some
redirects; assert
window.Shopify.theme.idafter every navigation. - Maintain an explicit hypothesis tree in the dispatches; welcome human-supplied control tests — the user's "check the sibling batch" suggestion was the investigation's decisive move.