Skip to main content

Orchestration — Deferred Follow-ups

Items the Option B series (#3482 + #3486) explicitly DID NOT implement and the reasoning + suggested resolution. Reference this doc from PR descriptions and endstate-summary documents.

1. Workflow-runtime await import() support

Status: deferred; drift risk now mitigated (orch-uplift-v2). Source-of-truth lib lives at .claude/workflows/lib/implement-verify-loop.js; each workflow script inlines a copy of the primitive at the top of its body. Unit tests on the lib validate the semantics in one place, and .claude/workflows/lib/inline-parity.test.js now enforces that every inlined copy is byte-identical to the lib (minus export ) — manual mirroring can no longer drift silently. The smoke-test workflow (see item 6) reports whether dynamic await import("node:fs/promises") resolves in the real runtime, which is the same question for sibling imports.

Why deferred: An early Option B revision had each workflow start with const {...} = await import("./lib/implement-verify-loop.js"). claude[bot] and the option-b-review teammate both flagged that there is no precedent for sibling-relative import() from a workflow script anywhere in .claude/workflows/ on main, and the Node test harness used for verification doesn't exercise the real Workflow tool runtime — it uses Node's native ESM loader, which always resolves relative imports. The Node-test harness was a false-confidence signal on the question "does the real runtime support await import()".

To eliminate the runtime question entirely, the workflow scripts were re-inlined. Cost: ~80 lines × 4 workflows = ~320 lines of duplication.

Resolution — CONFIRMED NEGATIVE (real-runtime smoke-test, 2026-06-10): Workflow({name: "smoke-test"}) ran in the real runtime and reported fs-import: "unavailable: import() is not available in workflow scripts." — dynamic import() of ANY module (including siblings) does not resolve there. Re-extraction is off the table; the inlined copies + inline-parity.test.js are the permanent design, not a stopgap. Consequence: writeSpecFile / appendLedgerEntry script-side calls are no-ops in the real runtime (their .catch(() => null) guards absorb the rejection); the spec flows in-prompt via the scope contract, and the review ledger is written by the VERIFIER subagent itself (instructed in every verifyPrompt), since spawned agents have real tool access even though the script does not.


2. Teammate-spawn validation hook (absolute-workspace + isolation)

Status: RESOLVED (orch-uplift-v2) — the probe captured a real payload and the enforcement gate is now live at .claude/hooks/subagent-start-gate.sh (replaces the probe; wired to SubagentStart in .claude/settings.json).

Discovered schema (from a real capture, 2026-06-10):

{"session_id": "...", "transcript_path": "...", "cwd": "...",
"agent_id": "...", "agent_type": "...", "hook_event_name": "SubagentStart"}

The payload does NOT expose spawn-prompt text or the isolation flag, so the originally-planned "validate ${SHARED_WORKSPACE} is absolute in the prompt" check is not implementable from this event. What IS enforceable:

  1. Warn (non-blocking): implementer-class agents (code-implementer, css-implementer, finisher) whose cwd is not under .claude/worktrees/ get a loud warning on stdout and a logged violation at tmp/.subagent-gate/<session>-violations.log, but the spawn is allowed (exit 0). See the e2e finding below for why this is warn-not-block.
  2. Allow (silent): implementer-class agents whose cwd IS under .claude/worktrees/ pass and have their worktree occupancy registered.
  3. Soft signal: worktree reuse by a second agent_id is logged to tmp/.subagent-gate/<session>-worktrees.log (never blocks — the payload can't distinguish legitimate sequential workflow rounds from teammate contamination).

E2E finding (orch-uplift-v2, 2026-06-10) — why warn, not block. The gate originally blocked (exit 2) on case 1. End-to-end testing in the real harness found that SubagentStart fires with the repo-root cwd, not the worktree cwd, even for spawns the orchestration intends to isolate. Every real SubagentStart capture in the test session — an Explore agent, the e2e-tester agent itself (meant to run isolated) — reported cwd: <repo-root>, never a .claude/worktrees/ path. The payload at this event therefore cannot distinguish a genuinely non-isolated implementer (the failure to catch) from an isolation: "worktree" spawn whose worktree cwd is not yet reflected. A hard exit-2 block under that timing would deadlock EVERY legitimate isolated implementer spawn and take down the whole workflow substrate. The gate was downgraded to a loud, logged, non-blocking warning: the signal survives for the lead and post-mortems, but a false positive can never wedge a real run.

Open verification CLOSED — with a more serious finding (2026-06-10, lead session): a real code-implementer teammate spawned with isolation: "worktree" was captured by the gate with cwd = <repo-root>, AND the agent's own pwd (run via Bash inside its session) returned the repo root, AND no new worktree appeared under .claude/worktrees/. The SubagentStart cwd was not "early" — it was ACCURATE: the harness is not honoring isolation: "worktree" for team-spawned agents at all in the current build. Two more data points from the same session agree: the e2e-tester teammate (also spawned with isolation) committed directly onto the lead's checked-out branch, and PR #3499's dogfooding logged the same failure twice. Consequences:

  • The gate's violations log is currently a TRUE-positive detector — every warned spawn really is sharing the lead's tree.
  • Warn-only remains the only viable posture: a hard block would reject every implementer spawn, because none currently get isolation.
  • The real fix is harness-side: honor isolation: "worktree" for teammate spawns (or expose an isolation field in the SubagentStart payload so the gate can block only genuine violations). Until then, leads should assume teammates SHARE the lead cwd: keep the lead's checkout on the integration branch teammates are expected to commit to, and avoid concurrent implementer teammates touching git state.

Narrowing (same session): the gap is SPECIFIC to team spawns. Transient Agent spawns (no team_name) with isolation: "worktree" DO get real worktrees — verified twice via the agents' own pwd (an Explore probe and a code-implementer probe both ran under .claude/worktrees/agent-*). So Pattern B's transient reviewer subagents and verifier-spawns-fix implementers are isolated correctly; it is the persistent TEAMMATE itself that shares the lead's cwd. Gate-behavior consequence: hooks run in the spawned agent's cwd, so isolated transient spawns evaluate the gate inside their own worktree (cwd check passes; their registry entries live in worktree-local tmp/ and vanish with auto-cleanup), while non-isolated teammates evaluate it at the repo root and land in the lead-side violations log. The lead-side <session>-violations.log therefore contains only true positives — every entry is an implementer-class agent genuinely running in the lead's tree.

Tests: cases 9-14 in .claude/hooks/test_workspace_resolution.sh (case 11 now asserts warn-not-block + violation logging).

Residual gap: prompt-content validation (absolute ${SHARED_WORKSPACE}) still needs either a harness-side field or typed workspace params on Agent({...}). Tracked below as the original resolution option 3.

Why an enforcement gate isn't shipping yet:

The followups doc originally named the event TeammateSpawned. That literal event name does not exist in the current Claude Code harness — the wording was a conceptual placeholder. What DOES exist:

  • SubagentStart — fires when any subagent (including teammates, which the agent-teams substrate spawns as subagents) starts. Recently added per ~/.claude/cache/changelog.md. Must be configured as a type: "command" hook (prompt-/agent-type hooks for this event error out by design).
  • WorktreeCreate / WorktreeRemove — fire when agent worktree isolation creates or removes a worktree. Closer to the isolation: "worktree" side of the validation contract; hookSpecificOutput.worktreePath available for HTTP hooks.

Both are candidate landing surfaces. The unknown blocking confident enforcement is the hook-input schema: the changelog confirms agent_id and agent_type are in hook payloads generally, but does not document whether SubagentStart exposes the spawn prompt text (where ${SHARED_WORKSPACE} is interpolated) or the isolation parameter. Without those fields in the payload, a hook can't enforce "the workspace path in the spawn prompt is absolute" — the data isn't reachable. Landing enforcement on guesswork would ship a placebo gate.

What PR5 does ship (probe-only):

.claude/hooks/subagent-start-probe.sh, wired to SubagentStart in .claude/settings.json, captures the first spawn's full input payload per session and exits 0 unconditionally. NEVER blocks a spawn. The capture lets the follow-up PR build enforcement on real-world payload data instead of the changelog's partial schema.

Empirical motivation — the gap fired during PR5 work itself, in real time. While orchestrating the PR3 + PR5 + PR6 series on 2026-06-10, two implementer subagents (PR3 and PR6) had their Edit calls land in the lead's parent worktree at /Users/.../cloud_control_plane/ instead of their assigned isolated worktrees. The lead's PR2 branch (fix/last-page-clamp) was dirtied twice. Recovery required spinning up fresh worktrees, applying patches by hand, and re-running. This is the same failure mode the holistic review §3 P0-A predicted and PR1 first observed — and it happened again, twice, within minutes, while the followups doc literally describes the gap on disk. Voluntary prose contracts demonstrably do not hold under team parallelism. The enforcement gate must land.

Resolution (next PR, owner: TBD):

  1. After PR5 ships and a few real spawns capture probe data, read the dumps at tmp/.subagent-start-probe/*.json to confirm whether SubagentStart exposes spawn-prompt content + isolation flag.
  2. If yes — write .claude/hooks/subagent-start.sh that fails loud (exit 2, blocking) when ${SHARED_WORKSPACE} interpolates to a relative path OR when isolation is not "worktree" on code-touching roles.
  3. If no — escalate to harness-team for an explicit hookSpecificOutput-style field, OR add typed workspace and isolation parameters to Agent({...}) spawns so they appear in the probe payload.

Owner: whoever picks up the followup once one or two probe payloads have been captured in real sessions.


3. Workspace path generalization in hooks (tmp/work/ for general-dev)

Status: landed in PR5. Hooks now resolve workspaces via the shared .claude/hooks/_lib.sh helper, which picks tmp/work/<slug>/ when present else tmp/integrations/<slug>/. teammate-idle.sh walks both subtrees. Smoke test at .claude/hooks/test_workspace_resolution.sh. SKILL.md and docs/integrations/AGENTS.md updated to document the dual-root convention.


4. finisher.md — verifier-spawns-fix section

Status: RESOLVED in PR6. Option (b) picked: clarify the role, do not add a verifier-spawns-fix section, do not rename the file.

Rationale:

  • The finisher already owns implementer authority (Write/Edit in tools: frontmatter, step 3 writes CSS fixes directly during its own max-3-rounds verify-fix loop).
  • Adding a verifier-spawns-fix section would be redundant — the finisher does not need an escape hatch for post-loop tactical fixes; it already writes fixes inline.
  • The file is already named finisher.md (not *-verifier.md). The confusion was in prose framing, not the filename. Renaming would create cross-reference churn for cosmetic gain.

Concrete change in PR6: Added "Role classification — NOT a strict verifier" section to .claude/agents/finisher.md explicitly stating it is a post-collation integration pass with its own self-contained implement-verify cycle, NOT a strict verifier in the loop-scoped sense. Also routes Liquid/JS integration gaps to a code-implementer rather than attempting them inline.

Recorded in: ASCII endstate diagram Layer 3 already notes: (writes targeted CSS fixes — not a "verifier" in the strict sense).


5. Plan-verifier / test-case-generator / project-clarity-interviewer orphans

Status: PARTIALLY RESOLVED in PR6.

  • project-clarity-interviewer.md: DELETED in PR6. Confidently superseded by the /grill-me skill (.claude/skills/grill-me/), which is the active and skill-system-registered clarification tool covering the same workflow. Two parallel clarification entry points was a confusion source ("spawn the agent or invoke the skill?"); collapsed to the skill.

  • plan-verifier.md: DEFERRED. Plausibly useful for a future plan-first-feature-pr skill that runs a pre-implement plan-review gate before any code is written. The PR1 verifier-spawns-fix policy covers in-loop implementation review via code-verifier (verifies code against a given plan/spec), but does not cover pre-loop plan review — that remains a real gap a plan-verifier could fill. Low cost to keep the unreferenced agent file; if the plan-first workflow does not materialize in ~3 subsequent orchestration PRs, revisit deletion.

  • test-case-generator.md: DEFERRED. Produces test-case markdown documentation (not test code). Niche but legitimate utility — useful for QC/audit work, test-coverage planning, or before refactoring a critical module. Usable on-demand via the Agent tool without being part of a fixed workflow. Could also integrate into a future plan-first-feature-pr slot (after plan-verifier, generates test cases the implementer must satisfy).

Target follow-up slot for both deferred agents: a future plan-first-feature-pr skill OR continued on-demand Agent invocation. If the plan-first flow does not get built in the next ~3 orchestration PRs, delete them in a follow-up cleanup.

Catalog cleanup in PR6: docs/agentic-development.md updated to drop the deleted agent and note the deferred status of the remaining two.


6. Smoke-test workflow exercising the real Workflow tool

Status: RESOLVED (orch-uplift-v2). .claude/workflows/smoke-test.js spawns NO agents and validates, in the real Workflow runtime: args delivery, log(), the inlined helpers (safeSlug, buildScopeContractClause, formatFeedbackHistory, assertAbsoluteWorkspace), dynamic import("node:fs/promises") availability, and (when given an absolute args.workspace) a file round-trip. Run it after touching the workflow lib or upgrading the harness:

Workflow({name: "smoke-test", args: {workspace: "<abs path, optional>"}})

The returned diagnostic object answers the open FS question from items 1 and the spec/ledger best-effort writes: if fsAvailable is false, the writeSpecFile/appendLedgerEntry calls in workflows are no-ops and all context flows in-prompt only (by design — the .catch(() => null) pattern).

First real-runtime result (2026-06-10): pass=true; args delivered as object; log works; all inlined helpers behave; fsAvailable=false ("import() is not available in workflow scripts"). The same session also ran general-feature-pr end-to-end in the real runtime (toy spec): implementer

  • verifier spawned, scope contract + implementer report present in prompts, approved on round 1, artifacts written by the AGENTS (escalation file, probe output) while script-side spec/ledger writes no-op'd as designed.

7. Teammate tool availability (no Agent / no Workflow) — hub-and-spoke

Status: documented + designed-around (2026-06-11). Empirics: 5+ teammates across two production teams confirmed via ToolSearch that team-spawned agents have NO Agent tool and NO Workflow tool (absent, not deferred). Consequence: the in-teammate inline-iterate loop (teammate spawns its own reviewer) and teammate-invoked workflows are unrunnable for teammates.

Intended vs bug (researched against the official agent-teams docs): intended. The docs state "No nested teams: teammates cannot spawn their own teams or teammates. Only the lead can manage the team" — the blanket removal of Agent/Workflow is the (broad) enforcement of that limitation. Likewise the docs define NO worktree isolation for teammates (teammates share the project directory; the docs explicitly warn "two teammates editing the same file leads to overwrites" and recommend manual git worktrees for parallel sessions) — so isolation: "worktree" being ignored on team spawns is unsupported-by-design, not a regression. Final-text invisibility is also intended: teammates communicate only via SendMessage / idle notifications.

Designed-around (PR: fix/orch-hub-spoke):

  • Hub-and-spoke review (proven, 8+ cycles): teammate implements → SendMessages the lead the reviewer brief → lead spawns code-verifier (transient or background) → lead relays the verdict → iterate. Codified in code-implementer.md §3 item 4, both SKILLs, AGENTS.md, code-review-guide, iteration-patterns (Pattern B-hub).
  • Self-provisioned worktrees: implementer-class teammates run git worktree add .claude/worktrees/<your-name> -b <branch> before any git work (item 1 of the loop in code-implementer.md §3; spawn-prompt directive in the integrate-storefront SKILL). Note: the SubagentStart violations log still records these spawns (cwd is the root AT SPAWN); an entry means "spawned unisolated" — teammates following the convention remediate immediately after, so cross-check the log against .claude/worktrees/ before treating an entry as an incident.
  • SendMessage-verdict clause appended to all verifier-class agent definitions (code/css/search/plan-verifier, finisher): final text is invisible; the verdict must be SendMessage'd. 4 of 5 reviewers empirically needed this.

Re-tightening plan: if a future harness grants teammates Agent/Workflow or real isolation, the direct inline-iterate path in code-implementer.md §3 item 4 (route "Direct") simply becomes reachable again — the docs describe both routes, gated on a ToolSearch("select:Agent") probe, so no doc change is needed to benefit. The SubagentStart gate re-tightens per item 2.


8. qc-investigator accounting

.claude/agents/qc-investigator.md exists and is the durable investigation role (spawned by /integrate-storefront for "investigate X" scopes). The holistic review predates it and calls the role "hallucinated" — that claim is stale; the agent definition IS the codified contract. Catalog: docs/agentic-development.md.

Investigation heuristics (from the June 2026 LG promo investigation — resolved as merchant config, after eliminating four code hypotheses):

  1. Effective-config first. Multi-source features (manual/shopify, per-shop/per-theme) make half the visible config dead. Establish which source is live before debugging any rules you can see.
  2. Sibling control test. When X doesn't render, immediately check a sibling feature in the same pipeline batch (same enrichment pass, same ref-update path). Sibling renders ⇒ the shared plumbing is innocent; the break is feature-specific. This single check eliminated the prime refactor suspect in minutes.
  3. Source of truth, not its projection. "Read the actual metafields" means query the API (Storefront/Admin GraphQL), not the page-injected globals derived from it. The decisive 12-entries-vs-2-delivered delta was invisible in the projection.
  4. Deploy fingerprint before blaming code. gzip -9 | wc -c of the served CDN bundle vs known per-PR sizes answers "which code is live" in 30 seconds.
  5. Verify the environment you think you're testing. Preview-theme URLs silently fall back to the live theme under curl and across some redirects; assert window.Shopify.theme.id after every navigation.
  6. Maintain an explicit hypothesis tree in the dispatches; welcome human-supplied control tests — the user's "check the sibling batch" suggestion was the investigation's decisive move.