Orchestration Improvement Collation
Sources synthesized: holistic review (msqc/reports/orchestration-holistic-review.md), team audit (msqc/reports/team-audit.md), Option B context + non-workflow options (orchestration-impl-b/reports/context-and-non-workflow-options.md), endstate architecture + summary (orchestration-impl-b/reports/endstate-architecture.ascii, endstate-summary.md), review-pass-1 (orchestration-impl-b/reports/review-pass-1.md), Option B code audit of PR #3482 + PR #3486 (open, not merged), session empirics (qa-multiplier / option-b-impl / option-b-review / heading-toggle / chip-fix), Anthropic official docs for workflows / agent-teams / sub-agents.
1. TL;DR
Five items every source agrees on
- Verifier → implementer feedback path is broken after workflow exit. Holistic P0-B, team-audit B1, endstate PR1-Rec-3, review-pass-1 (PASS verdict on PR1). Once
feature-pr.jsexits, the implementer subagent is gone; verifier prose forbids fixing; lead becomes mandatory broker. PR1 codifies the "Escalating Fixes Outside the Loop" section on all three verifiers + qc-investigator Paths A/B/C. No source dissents. - The QC role was hallucinated; codify it. Holistic P0-C, team-audit B4, endstate PR1-Rec-2, session empirics (qa-multiplier did real P0 work + refused PR raise on a contract that didn't exist). PR1 adds
.claude/agents/qc-investigator.md(161 lines). Empirically demanded; unanimously endorsed. tools:frontmatter must move from prose to harness-enforced lists on every agent. Holistic P0-A / Rec-A2, team-audit A1 / Rec4, endstate PR1-Rec-1. PR1 implements this on 11 agent files. Anthropic sub-agents doc echoes: "Limit tool access: grant only necessary permissions for security and focus." Unanimous./raiseownership must be single-sourced to the persistent teammate, not duplicated in the workflow. Holistic Rec-A5, team-audit A3 / Rec5, endstate PR1-Rec-4. PR1 removesfeature-pr.jsPhase 4; ownership lives incode-implementer.mdstep 6. Unanimous.- Workspace contract must be absolute path + isolation:'worktree' or it silently fails. Holistic Hooks-inventory + P0-A, team-audit F1/F2/F3 + Rec-F1, endstate Layer5-gap-2 + Anecdote-LeadIsolation (the implementer literally hit this gap during PR1 work). PR #3470 is the prerequisite; Option B layers
assertAbsoluteWorkspaceon top. Unanimous + empirically observed.
Three disagreements between sources
- Where universal gates live. Endstate-summary says "AGENTS.md (CLAUDE.md symlink)" and the Option B code audit confirms the section was added to
AGENTS.mdonly —CLAUDE.mdis unmodified, yetuniversal-gates.test.jsopensCLAUDE.mdand asserts the section. Holistic Rec-B5 explicitly said "CLAUDE.md". Verdict: bug in PR2. Either move the section toCLAUDE.mdor change the test to readAGENTS.md. Holistic + review-pass-1 + the test itself favorCLAUDE.md; the actual diff favorsAGENTS.md. - Whether
await import()works at the real Workflow runtime. Endstate-summary claims the lib is "canonical source of truth"; review-pass-1 P1-B says the pattern is unvalidated and the production code is in fact INLINED copies of the lib in every workflow. Session empirics confirm option-b-impl never drove a refactored workflow through the realWorkflowtool. Verdict: review-pass-1 is right. The lib file is dead-at-runtime; only the test harness loads it. - Domain-pointer richness in verifier prompts. Pre-Option-B, css-section and search-tuning verifier prompts named specific AGENTS.md anchors ("for verification standards and element selectors", "for the override header reference"). Option B collapsed these into a generic
playbookClause. Holistic Rec-B2 endorsed the generic refactor; review-pass-1 P2-A flagged the regression; option-b-impl's iteration-1 then re-added apurposearg. Verdict: review-pass-1 was right to flag. Thepurposearg now exists, but the audit confirms callers ingeneral-feature-pr.jsmay not be passing it consistently.
2. The Four Streams — Quick Status
Inline session retrospective. This session was simultaneously the venue and the subject. option-b-impl built PR #3482 + #3486 (Option B substrate) and self-verified through a Node test harness. option-b-review then independently re-verified and caught 2 P1 bugs the implementer missed (SHARED_WORKSPACE label drift, await import() unvalidated against real runtime), 2 P2 issues (domain-pointer regression, TeammateSpawned deferral undocumented), and 1 P3. qa-multiplier discovered a P0 storefront bug across the chip-multiplier incident; the lead had to spawn chip-fix as a broker because qa-multiplier (no agent definition) refused to raise the PR — the exact failure mode the audit was investigating. heading-toggle teammate did multi-pass investigation, self-corrected on the wrong-theme-ID mistake, and ended awaiting human approval. namespaced-tags produced research only.
Option B implementation (#3482 + #3486). PR1 (Option A tail) is contracts + dedup, no new abstractions: tools frontmatter on all 11 agents, verifier escalation sections, qc-investigator codification, /raise dedup, SKILL.md Phase 1.1 rewrite + isolation contract, audit D3 documented. PR2 (Option B core) adds the shared loop primitive (lib + inlined copies in 4 workflows), playbook arg, general-feature-pr workflow + skill, slim docs/dev/code-review-guide.md, universal gates section. Both PRs are OPEN — neither has merged. Audit confirms a real discrepancy: universal gates were added to AGENTS.md not CLAUDE.md, contradicting the test file and the holistic recommendation.
Prior workflow audits (holistic + non-workflow options + team audit). Three independent passes arrived at consistent verdicts. Holistic mapped 4 implementation options (A=docs-only, B=substrate, C=harness, D=investigator role) and recommended Steps 1-3 (PR #3470 + universal gates + qc-investigator → tools frontmatter + escalation sections + /raise dedup → extract loop primitive + parameterize playbook). Team audit named 28 specific findings with file:line citations. Non-workflow options laid out a 4-pattern taxonomy (full workflow / ad-hoc verifier subagent / persistent verifier teammate / monitor-pr-only) plus inline self-verify plus hybrid. context-and-non-workflow-options flagged a separate critical bug that the other streams missed: the fix-round prompt in feature-pr.js:114-123 (and lib/implement-verify-loop.js:128, general-feature-pr.js:80-87) drops spec, appGuide, workspace, and browserN entirely — round-2 sees only feature + verifier feedback. Verifier prompt also never gets spec, so it judges against CLAUDE.md + severity rubric with no scope contract.
Anthropic official recommendations. Workflows are for "more agents than one conversation can coordinate" — codebase audits, large migrations, cross-checked research, hard plans drafted from multiple angles. Anti-pattern: "No mid-run user input. For sign-off between stages, run each stage as its own workflow." Cost warning: "A single run can use meaningfully more tokens than working through the same task in conversation." Agent teams are explicitly experimental and gated on CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS=1; "no nested teams: teammates cannot spawn their own teams or teammates"; recommended scale 3-5 teammates. Subagents are for "quick, focused workers that report back" where "only the result matters"; cannot nest. The canonical decision axis is who decides the next step: workflows = script; subagents/skills = Claude turn-by-turn; agent teams = lead turn-by-turn with shared task list.
3. Convergence Map
| Recommendation | Sources that agree | Action urgency |
|---|---|---|
Add tools: frontmatter to every agent (matches AGENTS.md role table) | Holistic Rec-A2/Rec-C2, team-audit A1/Rec4, endstate PR1-Rec-1, Anthropic sub-agents doc | P0 — in PR1, ready to merge |
| Codify qc-investigator role (replaces hallucinated qa-multiplier) | Holistic Rec-A4/Rec-D1, team-audit B4/Rec3, endstate PR1-Rec-2, session empirics (qa-multiplier incident) | P0 — in PR1, ready to merge |
| Verifier "Escalating Fixes Outside the Loop" section on code/css/search verifiers | Holistic Rec-A3, team-audit B1/Rec2, endstate PR1-Rec-3 | P0 — in PR1, ready to merge |
Remove feature-pr.js Phase 4; teammate owns /raise | Holistic Rec-A5/Step2-c, team-audit A3/Rec5, endstate PR1-Rec-4 | P0 — in PR1, ready to merge |
| Land PR #3470 (workspace placeholder + isolation mandate) | Holistic Step1-a, team-audit Rec1/F1/F2/F3, endstate Layer5-gap-2 + Anecdote-LeadIsolation | P0 — already in flight, prerequisite for both Option B PRs |
Shared runImplementVerifyLoop primitive | Holistic Rec-B1, endstate PR2-Rec-1 | P1 — in PR2, blocked on await import resolution |
playbook workflow arg parameterizing the domain guide | Holistic Rec-B2, endstate PR2-Rec-3 | P1 — in PR2; needs purpose arg verified across callers |
| Universal gates section in project-wide doc (CLAUDE.md per most sources; AGENTS.md per actual PR2 diff) | Holistic Rec-B5/Step1-b, endstate PR2-Rec-6, Layer0-gate-1..6 | P1 — in PR2 but in wrong file; see §4 |
Domain-neutral general-feature-pr workflow + skill + slim playbook | Holistic Rec-B3, endstate PR2-Rec-4/5 | P2 — in PR2, demonstrates substrate composes |
Drive at least one refactored workflow through the real Workflow tool before merging PR2 | Review-pass-1 Required-2 / P1-B, session empirics (no end-to-end invocation observed) | P0 — gate on PR2 merge |
Persist spec + context into fix-round prompts AND into verifier prompts | context-and-non-workflow-options Rec-1/2/3 | P0 — separate fix, not yet in either PR |
Generalize hook scan to tmp/work/*/ for non-Shopify general-dev work | Holistic Step4-deferred, endstate FutureDefer-1 | P2 — explicitly deferred |
| TeammateSpawned hook (typed workspace + isolation enforcement) | Holistic Rec-C1, endstate FutureDefer-2 + Anecdote-LeadIsolation, review-pass-1 P2-B | P2 — deferred to Option C; flagged in followups |
| Reversibility classification of actions (reversible / effortful / irreversible) | Holistic Rec-C8 / blind spot #7, endstate Layer0-Per-domain (implicit) | P3 — open question Q8 deferred |
| Team observability surface / team-status slash command | Holistic Rec-C4 / blind spot #8 | P3 — Option C |
| Idle-watchdog hook auto-pinging lead | Holistic Rec-C5 | P3 — Option C |
Knowledge persistence across sessions (audit findings out of gitignored tmp/) | Holistic Q9 | P3 — open question |
4. Divergence Map
| Contested item | Source A says | Source B says | What we believe (session empirics) |
|---|---|---|---|
| Where universal gates live | Holistic Rec-B5 + universal-gates.test.js: CLAUDE.md | Endstate-summary + PR2 actual diff: AGENTS.md | CLAUDE.md is right. The test file already expects it there; CLAUDE.md is the project root and what every component reads; AGENTS.md is positioned as a Shopify playbook elsewhere in the repo. Action: move the section to CLAUDE.md (or symlink CLAUDE.md→AGENTS.md if that's the intent, but the test still fails today). |
Does the lib await import() work at the real Workflow runtime? | Endstate-summary: "canonical source of truth" | Review-pass-1 P1-B: "unvalidated; production code is inlined" | Review-pass-1 is right. option-b-impl never drove a workflow through the real Workflow tool; iteration-1 inlined the primitive. The lib is currently dead-at-runtime. Action: either drive one real invocation and confirm, or remove the lib + ship inlined-only with a follow-up. |
| Domain-specific anchors in verifier prompts | Holistic Rec-B2: collapse to generic playbookClause | Review-pass-1 P2-A: keep per-prompt purpose | Both partially right. The generic clause is the substrate; a purpose arg restores domain-pointer richness. option-b-impl iteration-1 added playbookClause(path, purpose). Need to verify all 4 workflows actually pass purpose where their pre-Option-B code did. |
| Should verifiers be permitted to spawn implementer subagents at all? | Team-audit Rec-B1-fix: yes, post-loop only | Holistic Q1: leaves it open as "sandbox vs self-orchestration" tension | Yes, post-loop only. PR1 codifies the tiering: in-loop = "verifier never writes"; post-loop = Agent without team_name permitted. tools frontmatter on verifiers now includes Agent. Empirically the lack of this caused the chip-fix incident. |
Should /raise phase stay in workflow or move to teammate? | Holistic Q3 + team-audit Q3 leave it open | Endstate-summary + PR1 actual: moved to teammate | Moved to teammate is right. Empirically, the workflow's Phase 4 was spawning anonymous subagents in the wrong worktree. PR1 removes it. Downside (teammate forgets to /raise) is mitigated by SKILL.md rule 8 wording. |
| Is qc-investigator Shopify-specific or general-dev? | Holistic Q6: open | Endstate: lives in .claude/agents/ (domain-neutral path) and is referenced in integrate-storefront/SKILL.md scope table | General-dev primitive. Path C (invoke feature-pr) makes it useful for any flow; the agent file lives at the root .claude/agents/ not inside the skill. Treat as general-dev. |
| Plan approval: lead-only or human-too? | Holistic Q2: open | (no other source) | Triage rule. Plans touching production / customer data / cross-component / over N files → human; rest → lead-only. The universal gates section in CLAUDE.md/AGENTS.md already enforces the production half. |
| Persistent verifier teammate (Option D) vs ad-hoc verifier subagent (Option B) | Context-and-non-workflow-options decision guide: D for long sessions (5+ verifies), B for single moderate PR | Session empirics: qa-multiplier mode-switched B→D organically | The mode switch is the right pattern. Don't pre-classify; spawn as ad-hoc, promote to persistent if recheck is needed across PR cycles. |
5. Iteration-Cycle Inventory — Where We Are Right Now
Mapping session teammates to the 4-pattern taxonomy from context-and-non-workflow-options.md:
| Teammate | Pattern | Did it work? | Token cost | Lesson |
|---|---|---|---|---|
| option-b-impl | A (the workflow primitive was the artifact, never invoked end-to-end) | Refactor landed; self-verification via Node harness MISSED 2 P1 bugs | ~0 spawn tokens (self-verify) | When the workflow is the artifact, you cannot self-verify with a Node harness. You must drive at least one real Workflow invocation as smoke test. |
| option-b-review | B (ad-hoc verifier subagent, one-shot Agent without team_name) | YES — caught both P1s + 2 P2s the implementer missed | ~30-50k | Fresh-context independent verification is empirically the highest-yield cheap insurance for substrate / contract work. |
| qa-multiplier (initial) | B (ad-hoc, no agent def — "hallucinated role") | YES on discovery (P0 chip-multiplier root cause); FAILED on follow-through (refused /raise) | ~20-40k | The discovery side of B is genuine; the contract-boundary side is model-self-imposed. qc-investigator.md now backs both. |
| qa-multiplier (recheck) | D (persistent verifier teammate via SendMessage) | YES — confirmed E/F pass post-deploy | ~5-15k (amortized) | Same teammate identity bridges B→D organically. Mode switch is empirical, not pre-classified. |
| chip-fix | Lead-as-broker fallback (NOT in 4-pattern taxonomy) | Worked but is the failure mode | ~10-20k | This is the pattern PR1 was designed to eliminate. With qc-investigator Path B/C, qa-multiplier would have raised the PR itself. |
| heading-toggle teammate | B (ad-hoc, no verifier dyad — investigation only) | Self-corrected on wrong-theme-ID; no independent verifier caught the mistake | ~30-50k | Investigation-only B-pattern works for human-gated output but is fragile without a paired verifier. |
| namespaced-tags research | B (ad-hoc, research-only, no dyad) | Produced plan; never implemented | ~10-20k | Appropriate for plan-only work. No grading needed because nothing ships. |
Pattern counts in session: A=0 confirmed end-to-end, B=4, C=0, D=1, mode-switch=1, lead-as-broker fallback=1.
Lessons (empirically grounded):
- Pattern A is the most expensive AND the rarest in this session. It's the right tool for "fuzzy spec + customer-facing + ratchet needed" per the decision guide, but most session work didn't fit that profile. When you're modifying the workflow itself, A is structurally impossible without bootstrapping (you need a smoke-test invocation).
- Pattern B dominated by a wide margin. It's the workhorse for both meta-work and live QA. The ~30-50k cost is acceptable when the alternative is "ship a bug that didn't get caught."
- Pattern C (monitor-pr-only) was a layer, never the primary verification. Both Option B PRs got claude[bot] review on top of option-b-review's ad-hoc check. Treat C as belt-and-suspenders.
- Pattern D emerges organically from B, not from pre-classification. Lead messages the same teammate again → it becomes persistent. The "decide upfront which pattern" mental model from the non-workflow-options table is too rigid.
- Lead-as-broker is empirically the bottleneck. Chip-fix happened because qa-multiplier had no contract authority. PR1's qc-investigator + verifier-spawns-fix paths are the right fix.
6. Anthropic-Anchor Cross-Reference
Where we LINE UP with Anthropic recommendations:
- Subagents for "quick focused workers that report back." PR1 verifier-spawns-fix (Agent without team_name, single-turn) maps exactly to Anthropic's subagent guidance: "results return to your main conversation" and "their context isn't seen by you." Used for tactical post-loop fixes, this is textbook.
- Workflows for "more agents than one conversation can coordinate."
feature-pr.js/css-section.js/search-tuning.js/general-feature-pr.jseach drive an implement→verify→iterate loop with up to MAX_ROUNDS turns. The plan is in the script; intermediate results live in script variables. Anthropic: "A workflow moves the plan into code." We've done this. - Agent teams for "handful of long-running peers." integrate-storefront SKILL.md spawns 3-5 teammates per shop integration (css-implementer + css-verifier + finisher + code-implementer + code-verifier). Anthropic recommends 3-5; we're in the sweet spot. SKILL.md rule 4 ("teammates own their workflow invocations") matches Anthropic's "teammates need to share findings, challenge each other, and coordinate on their own."
tools:frontmatter for least-privilege. PR1's universal frontmatter implements Anthropic's explicit "Limit tool access: grant only necessary permissions for security and focus" guidance.- "For sign-off between stages, run each stage as its own workflow." PR1's /raise dedup + teammate-owns-/raise pattern is exactly this:
feature-pris one stage,/raiseis the next stage,/monitor-pris the third. We don't try to cram it into one workflow.
Where we DIVERGE intentionally:
- Nested delegation via verifier-spawns-fix. Anthropic explicitly says "Subagents cannot spawn other subagents. If your workflow requires nested delegation, use Skills or chain subagents from the main conversation." Our
Escalating Fixes Outside the Looplets a verifier-class persistent teammate spawn a transient code-implementer subagent. This is technically allowed because the verifier is a teammate not a subagent — but it is one level deeper than Anthropic's recommended structure. Intentional, but worth noting: if Anthropic tightens "no nested" to include teammates, we'd need to reroute through the lead. - Long-running agent teams with experimental flag. Anthropic explicitly: "Agent teams are experimental and disabled by default." We've built an entire
integrate-storefrontskill around them. We accept the experimental risk because the empirical value is large (multi-shop work, parallel CSS sections). - Workflow used for moderate-spec verified PRs, not just dozens-to-hundreds of agents. Anthropic's headline use case is bigger than ours; we use workflows for ~3-5 agent rounds per invocation. The decision guide in
context-and-non-workflow-options.mdwas written specifically to acknowledge that we use workflow at the bottom of its scale band.
Where we DIVERGE accidentally (misuse to fix):
feature-pr.jsPhase 4 spawning an anonymous code-implementer in a possibly-wrong worktree. This violated Anthropic's subagent-lifetime model (subagents can't spawn other subagents; the workflow's last call effectively did just that). PR1 removed it.- The fix-round prompt dropping
spec+appGuide+workspace. Anthropic on subagents: "Give teammates enough context — they don't inherit lead's conversation history." Our fix-round implementer subagent inherits NOTHING of the original scope. This is a flat-out misuse of the subagent context model.context-and-non-workflow-options.mdRec-1/2/3 are the fix. - The Node test harness as substitute for real
Workflowinvocation. Anthropic: "Background runs that persist while the session is responsive… runs count toward plan usage and rate limits." Our test harness is not the runtime. option-b-impl essentially asserted "my code compiles in Node, ship it." Review-pass-1 P1-B caught this. - Hooks scanning
tmp/integrations/*/only. Anthropic's recommendation is to use convention-based discovery. Our discovery is hard-coded to one path tier. For general-dev workflows, this is a quiet misuse —general-feature-prwrites outputs that hooks don't see.
7. Prioritized Next Steps
P0 — block-on-this-before-merging-anything-else
- Fix the CLAUDE.md vs AGENTS.md gate-location discrepancy. Either move the "Executing Actions with Care" section from AGENTS.md to CLAUDE.md (recommended; matches the test + holistic recommendation), or update
universal-gates.test.jsto read AGENTS.md. Owner: option-b-impl. Effort: 15 min. Why: The unit test currently fails against the actual diff; this is a self-inflicted incoherence. - Drive at least one refactored workflow through the real
Workflowtool before PR2 merges. Pick the smallest of feature-pr / css-section / search-tuning / general-feature-pr; invoke asWorkflow({name: 'general-feature-pr', args: {...trivial spec...}}); confirmawait import()resolves OR confirm the inlined copies execute. Log the run as evidence in PR description. Owner: option-b-impl (with option-b-review as ad-hoc verifier). Effort: 30-60 min. Why: Review-pass-1 P1-B; the lib is currently dead-at-runtime. - Persist
spec+contextinto fix-round AND verifier prompts. Implementcontext-and-non-workflow-options.mdRec-1/2/3. Round-2 implementer + verifier currently judge against CLAUDE.md + severity rubric with no scope contract — out-of-scope changes silently pass. Addargs.context/args.outOfScopetyped fields; stamp into every prompt INCLUDING fix-rounds; auto-writeargs.specto${workspace}/spec/${feature-slug}.md. Owner: option-b-impl. Effort: 1-2 hours including regression test (Rec-5). Why: This is the highest-severity bug NOT in either PR; it caused unnamed prior incidents implicitly (scope creep passing review). - Land PR #3470 (workspace placeholder + isolation mandate) if not already merged. Prerequisite for PR1 + PR2. Owner: human / lead. Effort: review + merge.
- Verify
playbookClause(path, purpose)is invoked with purpose at every site that used to have one pre-Option-B. Cross-check feature-pr.js / css-section.js / search-tuning.js / general-feature-pr.js verifier + fix prompts against pre-Option-B equivalents. Owner: option-b-impl. Effort: 30 min. Why: Review-pass-1 P2-A; partial fix in iteration-1, verify completeness.
P1 — finish-this-PR-cycle
- Add a smoke-test workflow that exercises the loop primitive on a no-op diff. Persistent regression guard against future workflow-runtime drift. Owner: new teammate (smoke-test) or option-b-impl. Effort: 1-2 hours. Why: Closes Test-Gap from review-pass-1.
- Document the verifier-spawns-fix contract with a test or example invocation. PR1 ships the prose contract; no test exercises Paths A/B/C of qc-investigator or the three verifier escalation sections. Add an example invocation pattern to SKILL.md or a fixture. Owner: new teammate. Effort: 1 hour.
- Resolve the orphan agents (
plan-verifier,test-case-generator,project-clarity-interviewer). Three options: delete, integrate into a future general-dev plan-first workflow, or mark as deferred indocs/dev/orchestration-followups.md. Holistic Q7 + endstate OutOfScope-Orphans. Owner: lead decision. Effort: 15 min decision + 30 min execution. - Lead CANNOT list in SKILL.md. Currently every role has CANNOT bullets except lead. Document "lead never does implementation work inline — always spawn a teammate" + "lead never invokes workflows — teammates own workflows." Holistic Q10 / blind spot #9. Owner: lead. Effort: 20 min.
P2 — Option C runway
- TeammateSpawned hook validating absolute
${SHARED_WORKSPACE}+isolation: 'worktree'. The empirically-observed gap from the Anecdote-LeadIsolation incident. ~30 lines of bash. Holistic Rec-C1 / Fix-P2B-Option1. Owner: new teammate (harness-hooks). Effort: 2-3 hours including test. - Generalize hook scan from
tmp/integrations/*/totmp/{integrations,work}/*/. Required for general-feature-pr to participate in the artifact-collection conventions hooks enforce. Owner: new teammate. Effort: 1 hour + regression test. - Idle-watchdog hook auto-pinging lead after N turns of TeammateIdle. Holistic Rec-C5. Owner: new teammate. Effort: 2-3 hours.
- Team observability surface (
/team-statusslash command with JSON output). Holistic Rec-C4. Owner: new teammate. Effort: 4-6 hours. - Verifier tool tiering (declare loop-scope tools vs escalation-scope tools). Resolves the cross-lens tension in Holistic Q1. Today the
Agenttool is in the verifier's frontmatter unconditionally; tiering would make the in-loop CANNOT enforceable. Owner: harness work. Effort: TBD (depends on harness team).
P3 — open questions to schedule discussion on
- Reversibility classification (Holistic Q8) — one-tap approve / show diff / two-step confirm UX per tier.
- Knowledge persistence across sessions (Holistic Q9) —
docs/integrations/learnings.mdor auto-proposed AGENTS.md edits. - Plan approval triage rule (Holistic Q2) — formal cutoff for human-required plans.
- Auto Mode precedence vs hard gates (Holistic Q5) — codify which gates Auto Mode does and does not override (universal gates already say "Auto Mode does NOT relax these").
- Mode-switch as first-class pattern (session lesson) — document the B→D promotion explicitly in
context-and-non-workflow-options.mdtable or successor doc; today it lives only as session empirics.
End of collation.