fix(ce-brainstorm): scoping synthesis and Q&A interaction cleanup#829
Conversation
Replace the three-bucket "Stated / Inferred / Out of scope" audit
that Phase 2.5 emitted with a collaborator-style scoping synthesis:
prose "What we're building" plus three render-conditional sections
(Key trade-offs / What's not in scope / Call outs). Each section
earns its slot through a section-specific keep test; empty sections
are omitted, not padded.
Before, the audit reliably produced 20+ bullets for a Standard
brainstorm, even when granularity rules were followed. The volume
made confirmation feel like rubber-stamping rather than a real
checkpoint, and the audit shape did not match how two product
collaborators actually confirm scope before writing a PRD.
Key decisions:
- Path A vs Path B gate fires on TWO signals (questions asked AND
tier), not just question signal. Path A (1-3 sentence
announce-mode) fires only when tier is Lightweight AND no
questions fired. Standard and Deep tiers always get Path B
regardless of question signal: substance earns the checkpoint,
not interaction history. Without the tier guard, a richly
pre-loaded brainstorm context got a 1-sentence checkpoint for
20+ items worth of scope.
- Each bullet passes both an affirmability test (can the user
evaluate without reading code?) and a detail test (1-2 lines
max, conversational not documentary). The detail test catches a
failure mode the count cap alone misses: fewer bullets that each
bloat to paragraph length.
- Pre-flight re-review: re-read the draft as a user would before
emitting. One mental act, not a checklist. Catches "reads like
a doc preview" drift the keep tests miss.
- Confirmation phrasing tells the user what happens next ("Confirm
and I'll write the requirements doc next, drawing on our dialogue
and this synthesis") so the gate does not ambiguously ask
"proceed to what?"
- Section heading is "What's not in scope," not "What's not in V1,"
to avoid presuming versioned-software brainstorms.
doc updated: Section 5 reframed from three-bucket synthesis to
scoping synthesis with conditional sections; Quick Example and FAQ
revised to match.
Replace the soft-cut blocking question option text "Proceed with the current revised synthesis" / "Stop and redirect: discuss further before [research / plan-write]" with plain-English labels "Proceed and continue to [research / plan-write]" / "Hold off: keep discussing before continuing." The old labels used "redirect" jargon, which collides with the unrelated self-redirect mechanism. Plain English says what each option does without needing internal-spec vocabulary. Parity fix discovered while updating ce-brainstorm's soft-cut menu; both skills are now consistent. doc not updated: change is internal blocking-question wording, does not surface at doc level.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 50c83732ac
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Three changes that together reduce call-out leakage:
1. Phase 1.3 integration check (SKILL.md): before exiting dialogue,
the agent mentally combines user answers and probes any non-obvious
consequences. Combination effects ("X applies per-channel AND no
warning on delete: silent loss") should be resolved in dialogue
while the user is in flow, not stacked up as call-outs at the
exit gate.
2. Stage 2 call-outs framing tightening (synthesis-summary.md): the
section description now explicitly bounds call-outs to residual
forks (post-dialogue consequences, silent agent inferences, or
pre-loaded contexts with no dialogue). Explicitly NOT "questions
the agent could have asked but didn't"; if a candidate call-out
reads like a missed Phase 1.3 question, flag the gap rather than
pad the section.
3. Headless mode summary preservation (synthesis-summary.md):
addresses Codex review feedback on PR #829. The prior "skip stage
2 entirely" wording left no defined step producing the "What we're
building" prose, but requirements-capture.md requires a
`## Summary` for all but trivial Lightweight docs. Clarified that
in headless mode the prose IS still composed (it routes to
`## Summary`); only the chat-time rendering (conditional sections,
confirmation, Path A/B gate) is skipped.
The Phase 1.3 and Stage 2 changes are a paired upstream/downstream
fix for the same failure mode: defense in depth against call-out
leakage when the question could have been resolved in dialogue.
doc not updated: changes are internal phase-2.5 mechanics and
phase-1.3 exit-gate guidance, do not surface at doc level.
Three changes that make the Q&A interaction guidance more human and
actionable:
1. New Rule 6: open-ended question discipline. Apply Rule 5 silently
(do not narrate the form choice to the user); the question itself
must give the user something concrete to anchor on. "What's your
take?" is too thin and wastes the open question; rigor-probe-style
specificity ("What's the most concrete thing someone's already
done about this: paid for it, built a workaround, quit a tool
over it?") earns the open-endedness. Anti-patterns include
narrating the form choice, "in one sentence" framings, yes/no
traps, and AI-slop warmth wrappers.
2. Plain-English Q&A vocabulary. Replaced "leak your priors" and
"bias the answer by signaling which dimensions the agent considers
relevant" (clinical Bayesian jargon, confusing to a reader) with
"unintentionally influence the user's answer": accurate to the
dynamic (it is a side effect, not deliberate manipulation),
conversational, and voice-consistent (no mid-sentence switching
between "you" and "the user").
3. "Prose" -> "open-ended" for interaction-form references. The
property is open-endedness, not the textual form. Updated
SKILL.md Rule 5, Phase 1.3 rigor probes, the integration check,
the parallel rule in universal-brainstorming.md, and two
parenthetical justifications in synthesis-summary.md. Kept
"prose" where it means "flowing text" (vs bullets / diagrams).
doc not updated: changes are interaction-rule wording, do not
surface at doc level.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: da17124876
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Headless mode in ce-brainstorm was over-engineered for a use case that does not exist. Brainstorming is interactive by definition: the dialogue IS the value (it surfaces context the user does not yet know they have, pressure-tests their premise, uncovers consequences they could not see in isolation). Stripping the dialogue leaves an agent making up requirements from context and flagging them with `## Assumptions`: not a brainstorm, just hallucinating requirements with a fig leaf. No legitimate use case to defend: - LFG pipeline: if the feature description is clear enough for autonomous execution, LFG goes straight to /ce-plan. If vague enough to need brainstorming, it needs a human; LFG should refuse and prompt the user, not silently fake a brainstorm. - Skill-to-skill invocation: if the calling skill has enough context to write the doc, it should write the doc directly, not invoke /ce-brainstorm on the human's behalf. What got removed: - SKILL.md Phase 2.5 "Headless mode" stub line - synthesis-summary.md entire "## Headless mode" section - synthesis-summary.md interactive / non-interactive branches in the preamble and doc-shape routing table - requirements-capture.md `## Assumptions` template section and the section-matrix row - docs/skills/ce-brainstorm.md Section 5 and FAQ headless mentions What got added: one explanatory sentence in synthesis-summary.md documenting the decision so a future contributor does not reintroduce headless mode without thinking it through. Side effect: resolves the Codex review feedback on PR #829 (about preserving `## Summary` in headless mode) by removing the inconsistency at its source — deleting the over-elaborated thing rather than further elaborating it. doc updated: Section 5 and FAQ entry trimmed to drop headless references.
Three places in the spec disagreed about when Phase 3 fires under Path A: 1. SKILL.md Phase 0.2 (line 86): "go straight to ... announce-mode, then to Phase 3" — the right behavior, no change. 2. SKILL.md Phase 2.5 Path A bullet: said "end turn ... otherwise Phase 3 fires" — contradicted (1). 3. synthesis-summary.md Path A template: user-facing prompt said "writing the requirements doc" (immediate action) but template instructions said "End turn. On the next user message ..." — contradicted (1) and itself. 4. synthesis-summary.md Path A vs Path B description: same "end turn ... otherwise Phase 3 fires" as (2). The end-turn behavior created a UX dead-end: user reads "writing the requirements doc" and waits, but the agent has already ended the turn waiting for an acknowledgment. The doc never lands in the common case where the user takes the prompt at face value. Aligned all four locations: Path A now proceeds to Phase 3 doc-write in the same turn. The "Interrupt if wrong" affordance still works — the user can revise after the doc lands. Lightweight Path A docs are short, so post-hoc revision is cheap. The historical rationale for end-turn (preserving an interruption window) was overkill for Lightweight specifically; the tier-guarded Path A (Lightweight-only) doesn't need it. Addresses Codex review feedback on PR #829. doc not updated: change is internal Phase 2.5 mechanics, doesn't surface at doc level.
Bring in PR #829 (ce-brainstorm scoping synthesis and Q&A interaction cleanup) which removed the brainstorm headless mode and reshaped Phase 2.5. Conflict resolution: - requirements-capture.md: main dropped the Assumptions row (headless mode is gone), my branch added Key Decisions at section 3. Kept the Key Decisions placement and dropped Assumptions from both the section matrix and the template — there's nothing emitting an `## Assumptions` block anymore. - ce-brainstorm-section-order.test.ts: removed Assumptions references from the order assertions for the same reason. All other touched files (ce-brainstorm SKILL.md, ce-plan SKILL.md, visual-communication.md, plan-template.md, etc.) auto-merged cleanly — PR #829's Phase 2.5 scoping synthesis and my Phase 0.0/0.1 output-mode resolution sit at different phases and don't overlap. bun test: 1433 pass, bun run release:validate: in sync.
Apply the shape and discipline changes from ce-brainstorm's scoping-synthesis fix (#829) to ce-plan's Phase 0.7 / 5.1.5: - Tier guard on auto-proceed: Lightweight + zero call-outs is the only path that skips the confirmation gate. Standard and Deep plans always fire the confirmation gate even with zero call-outs, because substance earns the checkpoint. A 1-3 line summary on a Deep plan is exactly the rubber-stamping case the gate is supposed to prevent. - Confirmation phrasing names what happens on confirm ("Confirm and I'll proceed to research, drawing on this scope" / "Confirm and I'll write the plan next..."), replacing the ambiguous "Confirm to proceed." - Detail test for each surviving call-out and summary bullet: 1-2 lines max, conversational not documentary. The count cap was gameable without it -- three call-outs could each be a 6-line paragraph and still "fit." - Re-cut rule extended to fire on detail overflow, not just count overflow. - Summary form is flexible: prose, bullets, or mix, whichever communicates best. Tier-aware budgets (Lightweight 1-3 lines; Standard 3-5 lines or 2-4 bullets; Deep 4-6 lines or 3-6 bullets). - Rename "Scope Summary" / "Synthesis Summary" to "Scoping Synthesis" for parity with ce-brainstorm's terminology. - Soft-cut option wording updated per the parity note in #819 (the "redirect" verb collided with the unrelated self-redirect mechanism). Skill doc updated -- the Quick Example referenced "short prose summary" and "the gate skips when there are no forks worth flagging," both of which would mislead a reader under the new behavior.
Apply the shape and discipline changes from ce-brainstorm's scoping-synthesis fix (#829) to ce-plan's Phase 0.7 / 5.1.5: - Tier guard on auto-proceed: Lightweight + zero call-outs is the only path that skips the confirmation gate. Standard and Deep plans always fire the confirmation gate even with zero call-outs, because substance earns the checkpoint. A 1-3 line summary on a Deep plan is exactly the rubber-stamping case the gate is supposed to prevent. - Confirmation phrasing names what happens on confirm ("Confirm and I'll proceed to research, drawing on this scope" / "Confirm and I'll write the plan next..."), replacing the ambiguous "Confirm to proceed." - Detail test for each surviving call-out and summary bullet: 1-2 lines max, conversational not documentary. The count cap was gameable without it -- three call-outs could each be a 6-line paragraph and still "fit." - Re-cut rule extended to fire on detail overflow, not just count overflow. - Summary form is flexible: prose, bullets, or mix, whichever communicates best. Tier-aware budgets (Lightweight 1-3 lines; Standard 3-5 lines or 2-4 bullets; Deep 4-6 lines or 3-6 bullets). - Rename "Scope Summary" / "Synthesis Summary" to "Scoping Synthesis" for parity with ce-brainstorm's terminology. - Soft-cut option wording updated per the parity note in #819 (the "redirect" verb collided with the unrelated self-redirect mechanism). Skill doc updated -- the Quick Example referenced "short prose summary" and "the gate skips when there are no forks worth flagging," both of which would mislead a reader under the new behavior.
Summary
ce-brainstormnow confirms scope with a collaborator-style scoping synthesis before writing the requirements doc, instead of the 20+ bullet audit Phase 2.5 used to emit. The PR also tightens upstream Phase 1.3 dialogue (integration check before exit), replaces clinical jargon throughout the Q&A interaction rules with plain English, and deletes the over-engineered headless mode that produced one-sentence checkpoints on Deep-tier pre-loads and internal## Summaryinconsistencies.Two failure modes prompted the original change. The Stated / Inferred / Out of scope audit shape reliably produced 20+ bullets for Standard brainstorms, even when granularity rules were followed. And the Phase 0.2 fast-path routing to "announce-mode" produced a 1-sentence checkpoint for richly pre-loaded brainstorm contexts with 20+ items of scope. Both made the confirmation gate feel like rubber-stamping rather than a real checkpoint.
What changed
Phase 2.5 now has a two-stage shape:
Each bullet passes both an affirmability test (can the user evaluate this without reading code?) and a detail test (1-2 lines max, conversational not documentary). Tier-aware bullet budgets cap total bullets across sections 2-4 (Lightweight 0-1, Standard 2-4, Deep-feature 3-5, Deep-product 4-7) with re-cut on overflow rather than cap-raising.
Phase 1.3 now has an integration check before exit: combine user answers and probe any non-obvious consequences before reaching Phase 2.5, so the dialogue resolves them in flow rather than stacking them as call-outs.
The Q&A interaction rules use plain English throughout: "open-ended" instead of "prose," "unintentionally influence the user's answer" instead of "leak your priors / bias by signaling which dimensions matter."
Headless mode is gone. Brainstorming is interactive by design — the dialogue IS the value. Without it, the skill is just an agent making up requirements with
## Assumptionsmarkers; that's hallucination with a fig leaf, not brainstorming. Removing the mode resolves an internal## Summaryinconsistency at its source and simplifies the spec.Design decisions
Path A vs Path B gate fires on two signals, not one. Path A (1-3 sentence announce-mode, end turn) fires only when tier is Lightweight AND no questions fired. Standard and Deep tiers always get Path B regardless of question signal: substance earns the checkpoint, not interaction history. The earlier single-signal gate let richly pre-loaded brainstorm contexts hit Path A and produce 1-sentence checkpoints for 20+ items of scope.
Affirmability and detail tests both apply per bullet. The count cap addresses bullet volume; the detail test addresses bullet bloat. Without the detail test, an agent can hit the count cap by compressing horizontally (fewer bullets) without compressing vertically (less per bullet), and the cap becomes meaningless.
Pre-flight re-review is a single mental act, not a checklist. Re-read the draft as the user would read it before emitting. Catches "reads like a doc preview" drift the keep tests miss. Heavy multi-step checklists become performative; one act tied to the user's reading frame is the right forcing function.
Confirmation phrasing sets expectation. "Confirm and I'll write the requirements doc next, drawing on our dialogue and this synthesis" tells the user what happens on confirm. The previous "Confirm to proceed" was ambiguous about what "proceed" meant.
Section heading is "What's not in scope," not "What's not in V1." The old heading presumed versioned-software brainstorms. The skill supports non-software topics (naming briefs, decisions) and unversioned work. Agent-facing guidance and worked-example bullets were also generalized so the agent does not default to versioning language.
Phase 1.3 integration check is the upstream side of a defense-in-depth pair. Before exiting dialogue, the agent mentally combines user answers and probes any non-obvious consequences. Combination effects ("if X applies per-channel AND no warning on delete, then rule-delete silently loses pause state") get resolved in dialogue while the user is in flow, instead of stacking up as call-outs at the exit gate. This pairs with the downstream call-outs framing tightening (call-outs are bounded to genuine residuals: post-dialogue consequences, silent agent inferences — explicitly NOT "questions the agent should have asked but didn't").
Headless mode removed. Brainstorming requires dialogue with a synchronous user — the dialogue IS the value. No legitimate use case to defend: LFG should refuse and prompt the user when a feature description is too vague for autonomous execution, not fake a brainstorm; skill-to-skill invocation should write the doc from context directly if it has the context. The previous headless mode produced wrong-shaped output (1-sentence checkpoints on Deep-tier pre-loads) and internal spec inconsistencies (Codex correctly flagged that "skip stage 2 entirely" left no defined step producing the
## Summary). Deleting the mode resolves the inconsistency at its source.New Interaction Rule 6: open-ended questions earn their place when they're specific enough to elicit a substantive answer. Apply Rule 5 silently (don't narrate the form choice); the question itself must give the user something concrete to anchor on. "What's your take?" is too thin; rigor-probe-style specificity earns it ("What's the most concrete thing someone's already done about this — paid for it, built a workaround, quit a tool over it?"). Anti-patterns include narrating the choice, "in one sentence" framings, yes/no traps, and AI-slop warmth wrappers.
Plain-English Q&A vocabulary throughout. Replaced "leak your priors," "bias the answer by signaling which dimensions the agent considers relevant," and "use prose" with plain phrasings: "unintentionally influence the user's answer," "ask it open-ended." The technical term "priors" (Bayesian) was confusing in a user-facing context, and "prose" conflated form with the property we actually care about (open-endedness). "Prose" preserved where it means "flowing text" (vs bullets / diagrams).
Soft-cut blocking question options use plain English. Old: "Proceed with the current revised synthesis" / "Stop and redirect — discuss further before writing the doc." New: "Proceed and write the requirements doc" / "Hold off — keep discussing before the doc." The "redirect" verb collided with the unrelated self-redirect mechanism. The same fix propagates to
ce-plan's soft-cut menu in a parity commit.Test plan
bun test(1340 pass) andbun run release:validatepass.Behavioral changes to skill prose are not exercised by automated tests. Verify by running
/ce-brainstormon: