Thanks to visit codestin.com
Credit goes to github.com

Skip to content

fix(ce-brainstorm): scoping synthesis and Q&A interaction cleanup#829

Merged
tmchow merged 6 commits into
mainfrom
tmchow/ce-brainstorm-scope-confirm
May 14, 2026
Merged

fix(ce-brainstorm): scoping synthesis and Q&A interaction cleanup#829
tmchow merged 6 commits into
mainfrom
tmchow/ce-brainstorm-scope-confirm

Conversation

@tmchow
Copy link
Copy Markdown
Collaborator

@tmchow tmchow commented May 14, 2026

Summary

ce-brainstorm now confirms scope with a collaborator-style scoping synthesis before writing the requirements doc, instead of the 20+ bullet audit Phase 2.5 used to emit. The PR also tightens upstream Phase 1.3 dialogue (integration check before exit), replaces clinical jargon throughout the Q&A interaction rules with plain English, and deletes the over-engineered headless mode that produced one-sentence checkpoints on Deep-tier pre-loads and internal ## Summary inconsistencies.

Two failure modes prompted the original change. The Stated / Inferred / Out of scope audit shape reliably produced 20+ bullets for Standard brainstorms, even when granularity rules were followed. And the Phase 0.2 fast-path routing to "announce-mode" produced a 1-sentence checkpoint for richly pre-loaded brainstorm contexts with 20+ items of scope. Both made the confirmation gate feel like rubber-stamping rather than a real checkpoint.

What changed

Phase 2.5 now has a two-stage shape:

  1. Internal three-bucket draft (Stated / Inferred / Out of scope): the agent's comprehensive thinking surface. Dissolves into requirements doc body sections when Phase 3 writes (Requirements, Key Decisions, Scope Boundaries).
  2. Chat-time scoping synthesis: shaped like what two product collaborators would confirm before writing a PRD. Prose "What we're building" plus three render-conditional sections (Key trade-offs / What's not in scope / Call outs). Each section earns its slot through a section-specific keep test; empty sections are omitted, not padded.

Each bullet passes both an affirmability test (can the user evaluate this without reading code?) and a detail test (1-2 lines max, conversational not documentary). Tier-aware bullet budgets cap total bullets across sections 2-4 (Lightweight 0-1, Standard 2-4, Deep-feature 3-5, Deep-product 4-7) with re-cut on overflow rather than cap-raising.

Phase 1.3 now has an integration check before exit: combine user answers and probe any non-obvious consequences before reaching Phase 2.5, so the dialogue resolves them in flow rather than stacking them as call-outs.

The Q&A interaction rules use plain English throughout: "open-ended" instead of "prose," "unintentionally influence the user's answer" instead of "leak your priors / bias by signaling which dimensions matter."

Headless mode is gone. Brainstorming is interactive by design — the dialogue IS the value. Without it, the skill is just an agent making up requirements with ## Assumptions markers; that's hallucination with a fig leaf, not brainstorming. Removing the mode resolves an internal ## Summary inconsistency at its source and simplifies the spec.

Design decisions

Path A vs Path B gate fires on two signals, not one. Path A (1-3 sentence announce-mode, end turn) fires only when tier is Lightweight AND no questions fired. Standard and Deep tiers always get Path B regardless of question signal: substance earns the checkpoint, not interaction history. The earlier single-signal gate let richly pre-loaded brainstorm contexts hit Path A and produce 1-sentence checkpoints for 20+ items of scope.

Affirmability and detail tests both apply per bullet. The count cap addresses bullet volume; the detail test addresses bullet bloat. Without the detail test, an agent can hit the count cap by compressing horizontally (fewer bullets) without compressing vertically (less per bullet), and the cap becomes meaningless.

Pre-flight re-review is a single mental act, not a checklist. Re-read the draft as the user would read it before emitting. Catches "reads like a doc preview" drift the keep tests miss. Heavy multi-step checklists become performative; one act tied to the user's reading frame is the right forcing function.

Confirmation phrasing sets expectation. "Confirm and I'll write the requirements doc next, drawing on our dialogue and this synthesis" tells the user what happens on confirm. The previous "Confirm to proceed" was ambiguous about what "proceed" meant.

Section heading is "What's not in scope," not "What's not in V1." The old heading presumed versioned-software brainstorms. The skill supports non-software topics (naming briefs, decisions) and unversioned work. Agent-facing guidance and worked-example bullets were also generalized so the agent does not default to versioning language.

Phase 1.3 integration check is the upstream side of a defense-in-depth pair. Before exiting dialogue, the agent mentally combines user answers and probes any non-obvious consequences. Combination effects ("if X applies per-channel AND no warning on delete, then rule-delete silently loses pause state") get resolved in dialogue while the user is in flow, instead of stacking up as call-outs at the exit gate. This pairs with the downstream call-outs framing tightening (call-outs are bounded to genuine residuals: post-dialogue consequences, silent agent inferences — explicitly NOT "questions the agent should have asked but didn't").

Headless mode removed. Brainstorming requires dialogue with a synchronous user — the dialogue IS the value. No legitimate use case to defend: LFG should refuse and prompt the user when a feature description is too vague for autonomous execution, not fake a brainstorm; skill-to-skill invocation should write the doc from context directly if it has the context. The previous headless mode produced wrong-shaped output (1-sentence checkpoints on Deep-tier pre-loads) and internal spec inconsistencies (Codex correctly flagged that "skip stage 2 entirely" left no defined step producing the ## Summary). Deleting the mode resolves the inconsistency at its source.

New Interaction Rule 6: open-ended questions earn their place when they're specific enough to elicit a substantive answer. Apply Rule 5 silently (don't narrate the form choice); the question itself must give the user something concrete to anchor on. "What's your take?" is too thin; rigor-probe-style specificity earns it ("What's the most concrete thing someone's already done about this — paid for it, built a workaround, quit a tool over it?"). Anti-patterns include narrating the choice, "in one sentence" framings, yes/no traps, and AI-slop warmth wrappers.

Plain-English Q&A vocabulary throughout. Replaced "leak your priors," "bias the answer by signaling which dimensions the agent considers relevant," and "use prose" with plain phrasings: "unintentionally influence the user's answer," "ask it open-ended." The technical term "priors" (Bayesian) was confusing in a user-facing context, and "prose" conflated form with the property we actually care about (open-endedness). "Prose" preserved where it means "flowing text" (vs bullets / diagrams).

Soft-cut blocking question options use plain English. Old: "Proceed with the current revised synthesis" / "Stop and redirect — discuss further before writing the doc." New: "Proceed and write the requirements doc" / "Hold off — keep discussing before the doc." The "redirect" verb collided with the unrelated self-redirect mechanism. The same fix propagates to ce-plan's soft-cut menu in a parity commit.

Test plan

bun test (1340 pass) and bun run release:validate pass.

Behavioral changes to skill prose are not exercised by automated tests. Verify by running /ce-brainstorm on:

  • A tight Lightweight prompt with no dialogue (expect Path A: 1-3 sentence announce, end turn)
  • A Standard prompt with multi-turn Q&A (expect Path B: full scoping synthesis with confirmation gate, even if zero call-outs survive)
  • A richly pre-loaded Deep-feature context where no questions fire (expect Path B: tier guard routes correctly, not Path A; this was the bug case)
  • A prompt that previously produced 20+ audit bullets (expect compression to scoping synthesis with substance proportional to dialogue, capped at tier ceiling)
  • Any open-ended question the agent asks: should be specific enough to anchor a real answer, with no "in prose, because options would leak my priors" narration

Compound Engineering
Claude Code

tmchow added 2 commits May 13, 2026 17:40
Replace the three-bucket "Stated / Inferred / Out of scope" audit
that Phase 2.5 emitted with a collaborator-style scoping synthesis:
prose "What we're building" plus three render-conditional sections
(Key trade-offs / What's not in scope / Call outs). Each section
earns its slot through a section-specific keep test; empty sections
are omitted, not padded.

Before, the audit reliably produced 20+ bullets for a Standard
brainstorm, even when granularity rules were followed. The volume
made confirmation feel like rubber-stamping rather than a real
checkpoint, and the audit shape did not match how two product
collaborators actually confirm scope before writing a PRD.

Key decisions:

- Path A vs Path B gate fires on TWO signals (questions asked AND
  tier), not just question signal. Path A (1-3 sentence
  announce-mode) fires only when tier is Lightweight AND no
  questions fired. Standard and Deep tiers always get Path B
  regardless of question signal: substance earns the checkpoint,
  not interaction history. Without the tier guard, a richly
  pre-loaded brainstorm context got a 1-sentence checkpoint for
  20+ items worth of scope.

- Each bullet passes both an affirmability test (can the user
  evaluate without reading code?) and a detail test (1-2 lines
  max, conversational not documentary). The detail test catches a
  failure mode the count cap alone misses: fewer bullets that each
  bloat to paragraph length.

- Pre-flight re-review: re-read the draft as a user would before
  emitting. One mental act, not a checklist. Catches "reads like
  a doc preview" drift the keep tests miss.

- Confirmation phrasing tells the user what happens next ("Confirm
  and I'll write the requirements doc next, drawing on our dialogue
  and this synthesis") so the gate does not ambiguously ask
  "proceed to what?"

- Section heading is "What's not in scope," not "What's not in V1,"
  to avoid presuming versioned-software brainstorms.

doc updated: Section 5 reframed from three-bucket synthesis to
scoping synthesis with conditional sections; Quick Example and FAQ
revised to match.
Replace the soft-cut blocking question option text "Proceed with
the current revised synthesis" / "Stop and redirect: discuss further
before [research / plan-write]" with plain-English labels "Proceed
and continue to [research / plan-write]" / "Hold off: keep
discussing before continuing."

The old labels used "redirect" jargon, which collides with the
unrelated self-redirect mechanism. Plain English says what each
option does without needing internal-spec vocabulary.

Parity fix discovered while updating ce-brainstorm's soft-cut
menu; both skills are now consistent.

doc not updated: change is internal blocking-question wording,
does not surface at doc level.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 50c83732ac

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread plugins/compound-engineering/skills/ce-brainstorm/references/synthesis-summary.md Outdated
tmchow added 2 commits May 13, 2026 19:34
Three changes that together reduce call-out leakage:

1. Phase 1.3 integration check (SKILL.md): before exiting dialogue,
   the agent mentally combines user answers and probes any non-obvious
   consequences. Combination effects ("X applies per-channel AND no
   warning on delete: silent loss") should be resolved in dialogue
   while the user is in flow, not stacked up as call-outs at the
   exit gate.

2. Stage 2 call-outs framing tightening (synthesis-summary.md): the
   section description now explicitly bounds call-outs to residual
   forks (post-dialogue consequences, silent agent inferences, or
   pre-loaded contexts with no dialogue). Explicitly NOT "questions
   the agent could have asked but didn't"; if a candidate call-out
   reads like a missed Phase 1.3 question, flag the gap rather than
   pad the section.

3. Headless mode summary preservation (synthesis-summary.md):
   addresses Codex review feedback on PR #829. The prior "skip stage
   2 entirely" wording left no defined step producing the "What we're
   building" prose, but requirements-capture.md requires a
   `## Summary` for all but trivial Lightweight docs. Clarified that
   in headless mode the prose IS still composed (it routes to
   `## Summary`); only the chat-time rendering (conditional sections,
   confirmation, Path A/B gate) is skipped.

The Phase 1.3 and Stage 2 changes are a paired upstream/downstream
fix for the same failure mode: defense in depth against call-out
leakage when the question could have been resolved in dialogue.

doc not updated: changes are internal phase-2.5 mechanics and
phase-1.3 exit-gate guidance, do not surface at doc level.
Three changes that make the Q&A interaction guidance more human and
actionable:

1. New Rule 6: open-ended question discipline. Apply Rule 5 silently
   (do not narrate the form choice to the user); the question itself
   must give the user something concrete to anchor on. "What's your
   take?" is too thin and wastes the open question; rigor-probe-style
   specificity ("What's the most concrete thing someone's already
   done about this: paid for it, built a workaround, quit a tool
   over it?") earns the open-endedness. Anti-patterns include
   narrating the form choice, "in one sentence" framings, yes/no
   traps, and AI-slop warmth wrappers.

2. Plain-English Q&A vocabulary. Replaced "leak your priors" and
   "bias the answer by signaling which dimensions the agent considers
   relevant" (clinical Bayesian jargon, confusing to a reader) with
   "unintentionally influence the user's answer": accurate to the
   dynamic (it is a side effect, not deliberate manipulation),
   conversational, and voice-consistent (no mid-sentence switching
   between "you" and "the user").

3. "Prose" -> "open-ended" for interaction-form references. The
   property is open-endedness, not the textual form. Updated
   SKILL.md Rule 5, Phase 1.3 rigor probes, the integration check,
   the parallel rule in universal-brainstorming.md, and two
   parenthetical justifications in synthesis-summary.md. Kept
   "prose" where it means "flowing text" (vs bullets / diagrams).

doc not updated: changes are interaction-rule wording, do not
surface at doc level.
@tmchow tmchow changed the title fix(ce-brainstorm): replace three-bucket audit with scoping synthesis fix(ce-brainstorm): scoping synthesis, integration check, and plain-English Q&A rules May 14, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: da17124876

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread plugins/compound-engineering/skills/ce-brainstorm/references/synthesis-summary.md Outdated
Headless mode in ce-brainstorm was over-engineered for a use case
that does not exist. Brainstorming is interactive by definition:
the dialogue IS the value (it surfaces context the user does not
yet know they have, pressure-tests their premise, uncovers
consequences they could not see in isolation). Stripping the
dialogue leaves an agent making up requirements from context and
flagging them with `## Assumptions`: not a brainstorm, just
hallucinating requirements with a fig leaf.

No legitimate use case to defend:

- LFG pipeline: if the feature description is clear enough for
  autonomous execution, LFG goes straight to /ce-plan. If vague
  enough to need brainstorming, it needs a human; LFG should
  refuse and prompt the user, not silently fake a brainstorm.
- Skill-to-skill invocation: if the calling skill has enough
  context to write the doc, it should write the doc directly,
  not invoke /ce-brainstorm on the human's behalf.

What got removed:
- SKILL.md Phase 2.5 "Headless mode" stub line
- synthesis-summary.md entire "## Headless mode" section
- synthesis-summary.md interactive / non-interactive branches in
  the preamble and doc-shape routing table
- requirements-capture.md `## Assumptions` template section and
  the section-matrix row
- docs/skills/ce-brainstorm.md Section 5 and FAQ headless mentions

What got added: one explanatory sentence in synthesis-summary.md
documenting the decision so a future contributor does not
reintroduce headless mode without thinking it through.

Side effect: resolves the Codex review feedback on PR #829 (about
preserving `## Summary` in headless mode) by removing the
inconsistency at its source — deleting the over-elaborated thing
rather than further elaborating it.

doc updated: Section 5 and FAQ entry trimmed to drop headless
references.
@tmchow tmchow changed the title fix(ce-brainstorm): scoping synthesis, integration check, and plain-English Q&A rules fix(ce-brainstorm): scoping synthesis, integration check, plain-English Q&A rules, headless removal May 14, 2026
@tmchow tmchow changed the title fix(ce-brainstorm): scoping synthesis, integration check, plain-English Q&A rules, headless removal fix(ce-brainstorm): scoping synthesis and Q&A interaction cleanup May 14, 2026
Three places in the spec disagreed about when Phase 3 fires under
Path A:

1. SKILL.md Phase 0.2 (line 86): "go straight to ... announce-mode,
   then to Phase 3" — the right behavior, no change.
2. SKILL.md Phase 2.5 Path A bullet: said "end turn ... otherwise
   Phase 3 fires" — contradicted (1).
3. synthesis-summary.md Path A template: user-facing prompt said
   "writing the requirements doc" (immediate action) but template
   instructions said "End turn. On the next user message ..." —
   contradicted (1) and itself.
4. synthesis-summary.md Path A vs Path B description: same "end
   turn ... otherwise Phase 3 fires" as (2).

The end-turn behavior created a UX dead-end: user reads "writing
the requirements doc" and waits, but the agent has already ended
the turn waiting for an acknowledgment. The doc never lands in the
common case where the user takes the prompt at face value.

Aligned all four locations: Path A now proceeds to Phase 3
doc-write in the same turn. The "Interrupt if wrong" affordance
still works — the user can revise after the doc lands. Lightweight
Path A docs are short, so post-hoc revision is cheap. The historical
rationale for end-turn (preserving an interruption window) was
overkill for Lightweight specifically; the tier-guarded Path A
(Lightweight-only) doesn't need it.

Addresses Codex review feedback on PR #829.

doc not updated: change is internal Phase 2.5 mechanics, doesn't
surface at doc level.
@tmchow tmchow merged commit 6df3f96 into main May 14, 2026
2 checks passed
@github-actions github-actions Bot mentioned this pull request May 14, 2026
tmchow added a commit that referenced this pull request May 14, 2026
Bring in PR #829 (ce-brainstorm scoping synthesis and Q&A interaction
cleanup) which removed the brainstorm headless mode and reshaped Phase 2.5.

Conflict resolution:
- requirements-capture.md: main dropped the Assumptions row (headless mode
  is gone), my branch added Key Decisions at section 3. Kept the Key
  Decisions placement and dropped Assumptions from both the section matrix
  and the template — there's nothing emitting an `## Assumptions` block
  anymore.
- ce-brainstorm-section-order.test.ts: removed Assumptions references from
  the order assertions for the same reason.

All other touched files (ce-brainstorm SKILL.md, ce-plan SKILL.md,
visual-communication.md, plan-template.md, etc.) auto-merged cleanly —
PR #829's Phase 2.5 scoping synthesis and my Phase 0.0/0.1 output-mode
resolution sit at different phases and don't overlap.

bun test: 1433 pass, bun run release:validate: in sync.
tmchow added a commit that referenced this pull request May 14, 2026
Apply the shape and discipline changes from ce-brainstorm's
scoping-synthesis fix (#829) to ce-plan's Phase 0.7 / 5.1.5:

- Tier guard on auto-proceed: Lightweight + zero call-outs is the only
  path that skips the confirmation gate. Standard and Deep plans always
  fire the confirmation gate even with zero call-outs, because substance
  earns the checkpoint. A 1-3 line summary on a Deep plan is exactly
  the rubber-stamping case the gate is supposed to prevent.
- Confirmation phrasing names what happens on confirm ("Confirm and
  I'll proceed to research, drawing on this scope" / "Confirm and
  I'll write the plan next..."), replacing the ambiguous "Confirm
  to proceed."
- Detail test for each surviving call-out and summary bullet: 1-2
  lines max, conversational not documentary. The count cap was
  gameable without it -- three call-outs could each be a 6-line
  paragraph and still "fit."
- Re-cut rule extended to fire on detail overflow, not just count
  overflow.
- Summary form is flexible: prose, bullets, or mix, whichever
  communicates best. Tier-aware budgets (Lightweight 1-3 lines;
  Standard 3-5 lines or 2-4 bullets; Deep 4-6 lines or 3-6 bullets).
- Rename "Scope Summary" / "Synthesis Summary" to "Scoping Synthesis"
  for parity with ce-brainstorm's terminology.
- Soft-cut option wording updated per the parity note in #819 (the
  "redirect" verb collided with the unrelated self-redirect mechanism).

Skill doc updated -- the Quick Example referenced "short prose
summary" and "the gate skips when there are no forks worth flagging,"
both of which would mislead a reader under the new behavior.
tmchow added a commit that referenced this pull request May 14, 2026
Apply the shape and discipline changes from ce-brainstorm's
scoping-synthesis fix (#829) to ce-plan's Phase 0.7 / 5.1.5:

- Tier guard on auto-proceed: Lightweight + zero call-outs is the only
  path that skips the confirmation gate. Standard and Deep plans always
  fire the confirmation gate even with zero call-outs, because substance
  earns the checkpoint. A 1-3 line summary on a Deep plan is exactly
  the rubber-stamping case the gate is supposed to prevent.
- Confirmation phrasing names what happens on confirm ("Confirm and
  I'll proceed to research, drawing on this scope" / "Confirm and
  I'll write the plan next..."), replacing the ambiguous "Confirm
  to proceed."
- Detail test for each surviving call-out and summary bullet: 1-2
  lines max, conversational not documentary. The count cap was
  gameable without it -- three call-outs could each be a 6-line
  paragraph and still "fit."
- Re-cut rule extended to fire on detail overflow, not just count
  overflow.
- Summary form is flexible: prose, bullets, or mix, whichever
  communicates best. Tier-aware budgets (Lightweight 1-3 lines;
  Standard 3-5 lines or 2-4 bullets; Deep 4-6 lines or 3-6 bullets).
- Rename "Scope Summary" / "Synthesis Summary" to "Scoping Synthesis"
  for parity with ce-brainstorm's terminology.
- Soft-cut option wording updated per the parity note in #819 (the
  "redirect" verb collided with the unrelated self-redirect mechanism).

Skill doc updated -- the Quick Example referenced "short prose
summary" and "the gate skips when there are no forks worth flagging,"
both of which would mislead a reader under the new behavior.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant