Codestin Search App

tmchow · 2026-05-14T00:40:58Z

Summary

ce-brainstorm now confirms scope with a collaborator-style scoping synthesis before writing the requirements doc, instead of the 20+ bullet audit Phase 2.5 used to emit. The PR also tightens upstream Phase 1.3 dialogue (integration check before exit), replaces clinical jargon throughout the Q&A interaction rules with plain English, and deletes the over-engineered headless mode that produced one-sentence checkpoints on Deep-tier pre-loads and internal ## Summary inconsistencies.

Two failure modes prompted the original change. The Stated / Inferred / Out of scope audit shape reliably produced 20+ bullets for Standard brainstorms, even when granularity rules were followed. And the Phase 0.2 fast-path routing to "announce-mode" produced a 1-sentence checkpoint for richly pre-loaded brainstorm contexts with 20+ items of scope. Both made the confirmation gate feel like rubber-stamping rather than a real checkpoint.

What changed

Phase 2.5 now has a two-stage shape:

Internal three-bucket draft (Stated / Inferred / Out of scope): the agent's comprehensive thinking surface. Dissolves into requirements doc body sections when Phase 3 writes (Requirements, Key Decisions, Scope Boundaries).
Chat-time scoping synthesis: shaped like what two product collaborators would confirm before writing a PRD. Prose "What we're building" plus three render-conditional sections (Key trade-offs / What's not in scope / Call outs). Each section earns its slot through a section-specific keep test; empty sections are omitted, not padded.

Each bullet passes both an affirmability test (can the user evaluate this without reading code?) and a detail test (1-2 lines max, conversational not documentary). Tier-aware bullet budgets cap total bullets across sections 2-4 (Lightweight 0-1, Standard 2-4, Deep-feature 3-5, Deep-product 4-7) with re-cut on overflow rather than cap-raising.

Phase 1.3 now has an integration check before exit: combine user answers and probe any non-obvious consequences before reaching Phase 2.5, so the dialogue resolves them in flow rather than stacking them as call-outs.

The Q&A interaction rules use plain English throughout: "open-ended" instead of "prose," "unintentionally influence the user's answer" instead of "leak your priors / bias by signaling which dimensions matter."

Headless mode is gone. Brainstorming is interactive by design — the dialogue IS the value. Without it, the skill is just an agent making up requirements with ## Assumptions markers; that's hallucination with a fig leaf, not brainstorming. Removing the mode resolves an internal ## Summary inconsistency at its source and simplifies the spec.

Design decisions

Path A vs Path B gate fires on two signals, not one. Path A (1-3 sentence announce-mode, end turn) fires only when tier is Lightweight AND no questions fired. Standard and Deep tiers always get Path B regardless of question signal: substance earns the checkpoint, not interaction history. The earlier single-signal gate let richly pre-loaded brainstorm contexts hit Path A and produce 1-sentence checkpoints for 20+ items of scope.

Affirmability and detail tests both apply per bullet. The count cap addresses bullet volume; the detail test addresses bullet bloat. Without the detail test, an agent can hit the count cap by compressing horizontally (fewer bullets) without compressing vertically (less per bullet), and the cap becomes meaningless.

Pre-flight re-review is a single mental act, not a checklist. Re-read the draft as the user would read it before emitting. Catches "reads like a doc preview" drift the keep tests miss. Heavy multi-step checklists become performative; one act tied to the user's reading frame is the right forcing function.

Confirmation phrasing sets expectation. "Confirm and I'll write the requirements doc next, drawing on our dialogue and this synthesis" tells the user what happens on confirm. The previous "Confirm to proceed" was ambiguous about what "proceed" meant.

Section heading is "What's not in scope," not "What's not in V1." The old heading presumed versioned-software brainstorms. The skill supports non-software topics (naming briefs, decisions) and unversioned work. Agent-facing guidance and worked-example bullets were also generalized so the agent does not default to versioning language.

Phase 1.3 integration check is the upstream side of a defense-in-depth pair. Before exiting dialogue, the agent mentally combines user answers and probes any non-obvious consequences. Combination effects ("if X applies per-channel AND no warning on delete, then rule-delete silently loses pause state") get resolved in dialogue while the user is in flow, instead of stacking up as call-outs at the exit gate. This pairs with the downstream call-outs framing tightening (call-outs are bounded to genuine residuals: post-dialogue consequences, silent agent inferences — explicitly NOT "questions the agent should have asked but didn't").

Headless mode removed. Brainstorming requires dialogue with a synchronous user — the dialogue IS the value. No legitimate use case to defend: LFG should refuse and prompt the user when a feature description is too vague for autonomous execution, not fake a brainstorm; skill-to-skill invocation should write the doc from context directly if it has the context. The previous headless mode produced wrong-shaped output (1-sentence checkpoints on Deep-tier pre-loads) and internal spec inconsistencies (Codex correctly flagged that "skip stage 2 entirely" left no defined step producing the ## Summary). Deleting the mode resolves the inconsistency at its source.

New Interaction Rule 6: open-ended questions earn their place when they're specific enough to elicit a substantive answer. Apply Rule 5 silently (don't narrate the form choice); the question itself must give the user something concrete to anchor on. "What's your take?" is too thin; rigor-probe-style specificity earns it ("What's the most concrete thing someone's already done about this — paid for it, built a workaround, quit a tool over it?"). Anti-patterns include narrating the choice, "in one sentence" framings, yes/no traps, and AI-slop warmth wrappers.

Plain-English Q&A vocabulary throughout. Replaced "leak your priors," "bias the answer by signaling which dimensions the agent considers relevant," and "use prose" with plain phrasings: "unintentionally influence the user's answer," "ask it open-ended." The technical term "priors" (Bayesian) was confusing in a user-facing context, and "prose" conflated form with the property we actually care about (open-endedness). "Prose" preserved where it means "flowing text" (vs bullets / diagrams).

Soft-cut blocking question options use plain English. Old: "Proceed with the current revised synthesis" / "Stop and redirect — discuss further before writing the doc." New: "Proceed and write the requirements doc" / "Hold off — keep discussing before the doc." The "redirect" verb collided with the unrelated self-redirect mechanism. The same fix propagates to ce-plan's soft-cut menu in a parity commit.

Test plan

bun test (1340 pass) and bun run release:validate pass.

Behavioral changes to skill prose are not exercised by automated tests. Verify by running /ce-brainstorm on:

A tight Lightweight prompt with no dialogue (expect Path A: 1-3 sentence announce, end turn)
A Standard prompt with multi-turn Q&A (expect Path B: full scoping synthesis with confirmation gate, even if zero call-outs survive)
A richly pre-loaded Deep-feature context where no questions fire (expect Path B: tier guard routes correctly, not Path A; this was the bug case)
A prompt that previously produced 20+ audit bullets (expect compression to scoping synthesis with substance proportional to dialogue, capped at tier ceiling)
Any open-ended question the agent asks: should be specific enough to anchor a real answer, with no "in prose, because options would leak my priors" narration

Replace the three-bucket "Stated / Inferred / Out of scope" audit that Phase 2.5 emitted with a collaborator-style scoping synthesis: prose "What we're building" plus three render-conditional sections (Key trade-offs / What's not in scope / Call outs). Each section earns its slot through a section-specific keep test; empty sections are omitted, not padded. Before, the audit reliably produced 20+ bullets for a Standard brainstorm, even when granularity rules were followed. The volume made confirmation feel like rubber-stamping rather than a real checkpoint, and the audit shape did not match how two product collaborators actually confirm scope before writing a PRD. Key decisions: - Path A vs Path B gate fires on TWO signals (questions asked AND tier), not just question signal. Path A (1-3 sentence announce-mode) fires only when tier is Lightweight AND no questions fired. Standard and Deep tiers always get Path B regardless of question signal: substance earns the checkpoint, not interaction history. Without the tier guard, a richly pre-loaded brainstorm context got a 1-sentence checkpoint for 20+ items worth of scope. - Each bullet passes both an affirmability test (can the user evaluate without reading code?) and a detail test (1-2 lines max, conversational not documentary). The detail test catches a failure mode the count cap alone misses: fewer bullets that each bloat to paragraph length. - Pre-flight re-review: re-read the draft as a user would before emitting. One mental act, not a checklist. Catches "reads like a doc preview" drift the keep tests miss. - Confirmation phrasing tells the user what happens next ("Confirm and I'll write the requirements doc next, drawing on our dialogue and this synthesis") so the gate does not ambiguously ask "proceed to what?" - Section heading is "What's not in scope," not "What's not in V1," to avoid presuming versioned-software brainstorms. doc updated: Section 5 reframed from three-bucket synthesis to scoping synthesis with conditional sections; Quick Example and FAQ revised to match.

Replace the soft-cut blocking question option text "Proceed with the current revised synthesis" / "Stop and redirect: discuss further before [research / plan-write]" with plain-English labels "Proceed and continue to [research / plan-write]" / "Hold off: keep discussing before continuing." The old labels used "redirect" jargon, which collides with the unrelated self-redirect mechanism. Plain English says what each option does without needing internal-spec vocabulary. Parity fix discovered while updating ce-brainstorm's soft-cut menu; both skills are now consistent. doc not updated: change is internal blocking-question wording, does not surface at doc level.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 50c83732ac

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Three changes that together reduce call-out leakage: 1. Phase 1.3 integration check (SKILL.md): before exiting dialogue, the agent mentally combines user answers and probes any non-obvious consequences. Combination effects ("X applies per-channel AND no warning on delete: silent loss") should be resolved in dialogue while the user is in flow, not stacked up as call-outs at the exit gate. 2. Stage 2 call-outs framing tightening (synthesis-summary.md): the section description now explicitly bounds call-outs to residual forks (post-dialogue consequences, silent agent inferences, or pre-loaded contexts with no dialogue). Explicitly NOT "questions the agent could have asked but didn't"; if a candidate call-out reads like a missed Phase 1.3 question, flag the gap rather than pad the section. 3. Headless mode summary preservation (synthesis-summary.md): addresses Codex review feedback on PR #829. The prior "skip stage 2 entirely" wording left no defined step producing the "What we're building" prose, but requirements-capture.md requires a `## Summary` for all but trivial Lightweight docs. Clarified that in headless mode the prose IS still composed (it routes to `## Summary`); only the chat-time rendering (conditional sections, confirmation, Path A/B gate) is skipped. The Phase 1.3 and Stage 2 changes are a paired upstream/downstream fix for the same failure mode: defense in depth against call-out leakage when the question could have been resolved in dialogue. doc not updated: changes are internal phase-2.5 mechanics and phase-1.3 exit-gate guidance, do not surface at doc level.

Three changes that make the Q&A interaction guidance more human and actionable: 1. New Rule 6: open-ended question discipline. Apply Rule 5 silently (do not narrate the form choice to the user); the question itself must give the user something concrete to anchor on. "What's your take?" is too thin and wastes the open question; rigor-probe-style specificity ("What's the most concrete thing someone's already done about this: paid for it, built a workaround, quit a tool over it?") earns the open-endedness. Anti-patterns include narrating the form choice, "in one sentence" framings, yes/no traps, and AI-slop warmth wrappers. 2. Plain-English Q&A vocabulary. Replaced "leak your priors" and "bias the answer by signaling which dimensions the agent considers relevant" (clinical Bayesian jargon, confusing to a reader) with "unintentionally influence the user's answer": accurate to the dynamic (it is a side effect, not deliberate manipulation), conversational, and voice-consistent (no mid-sentence switching between "you" and "the user"). 3. "Prose" -> "open-ended" for interaction-form references. The property is open-endedness, not the textual form. Updated SKILL.md Rule 5, Phase 1.3 rigor probes, the integration check, the parallel rule in universal-brainstorming.md, and two parenthetical justifications in synthesis-summary.md. Kept "prose" where it means "flowing text" (vs bullets / diagrams). doc not updated: changes are interaction-rule wording, do not surface at doc level.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: da17124876

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Headless mode in ce-brainstorm was over-engineered for a use case that does not exist. Brainstorming is interactive by definition: the dialogue IS the value (it surfaces context the user does not yet know they have, pressure-tests their premise, uncovers consequences they could not see in isolation). Stripping the dialogue leaves an agent making up requirements from context and flagging them with `## Assumptions`: not a brainstorm, just hallucinating requirements with a fig leaf. No legitimate use case to defend: - LFG pipeline: if the feature description is clear enough for autonomous execution, LFG goes straight to /ce-plan. If vague enough to need brainstorming, it needs a human; LFG should refuse and prompt the user, not silently fake a brainstorm. - Skill-to-skill invocation: if the calling skill has enough context to write the doc, it should write the doc directly, not invoke /ce-brainstorm on the human's behalf. What got removed: - SKILL.md Phase 2.5 "Headless mode" stub line - synthesis-summary.md entire "## Headless mode" section - synthesis-summary.md interactive / non-interactive branches in the preamble and doc-shape routing table - requirements-capture.md `## Assumptions` template section and the section-matrix row - docs/skills/ce-brainstorm.md Section 5 and FAQ headless mentions What got added: one explanatory sentence in synthesis-summary.md documenting the decision so a future contributor does not reintroduce headless mode without thinking it through. Side effect: resolves the Codex review feedback on PR #829 (about preserving `## Summary` in headless mode) by removing the inconsistency at its source — deleting the over-elaborated thing rather than further elaborating it. doc updated: Section 5 and FAQ entry trimmed to drop headless references.

Three places in the spec disagreed about when Phase 3 fires under Path A: 1. SKILL.md Phase 0.2 (line 86): "go straight to ... announce-mode, then to Phase 3" — the right behavior, no change. 2. SKILL.md Phase 2.5 Path A bullet: said "end turn ... otherwise Phase 3 fires" — contradicted (1). 3. synthesis-summary.md Path A template: user-facing prompt said "writing the requirements doc" (immediate action) but template instructions said "End turn. On the next user message ..." — contradicted (1) and itself. 4. synthesis-summary.md Path A vs Path B description: same "end turn ... otherwise Phase 3 fires" as (2). The end-turn behavior created a UX dead-end: user reads "writing the requirements doc" and waits, but the agent has already ended the turn waiting for an acknowledgment. The doc never lands in the common case where the user takes the prompt at face value. Aligned all four locations: Path A now proceeds to Phase 3 doc-write in the same turn. The "Interrupt if wrong" affordance still works — the user can revise after the doc lands. Lightweight Path A docs are short, so post-hoc revision is cheap. The historical rationale for end-turn (preserving an interruption window) was overkill for Lightweight specifically; the tier-guarded Path A (Lightweight-only) doesn't need it. Addresses Codex review feedback on PR #829. doc not updated: change is internal Phase 2.5 mechanics, doesn't surface at doc level.

Bring in PR #829 (ce-brainstorm scoping synthesis and Q&A interaction cleanup) which removed the brainstorm headless mode and reshaped Phase 2.5. Conflict resolution: - requirements-capture.md: main dropped the Assumptions row (headless mode is gone), my branch added Key Decisions at section 3. Kept the Key Decisions placement and dropped Assumptions from both the section matrix and the template — there's nothing emitting an `## Assumptions` block anymore. - ce-brainstorm-section-order.test.ts: removed Assumptions references from the order assertions for the same reason. All other touched files (ce-brainstorm SKILL.md, ce-plan SKILL.md, visual-communication.md, plan-template.md, etc.) auto-merged cleanly — PR #829's Phase 2.5 scoping synthesis and my Phase 0.0/0.1 output-mode resolution sit at different phases and don't overlap. bun test: 1433 pass, bun run release:validate: in sync.

Apply the shape and discipline changes from ce-brainstorm's scoping-synthesis fix (#829) to ce-plan's Phase 0.7 / 5.1.5: - Tier guard on auto-proceed: Lightweight + zero call-outs is the only path that skips the confirmation gate. Standard and Deep plans always fire the confirmation gate even with zero call-outs, because substance earns the checkpoint. A 1-3 line summary on a Deep plan is exactly the rubber-stamping case the gate is supposed to prevent. - Confirmation phrasing names what happens on confirm ("Confirm and I'll proceed to research, drawing on this scope" / "Confirm and I'll write the plan next..."), replacing the ambiguous "Confirm to proceed." - Detail test for each surviving call-out and summary bullet: 1-2 lines max, conversational not documentary. The count cap was gameable without it -- three call-outs could each be a 6-line paragraph and still "fit." - Re-cut rule extended to fire on detail overflow, not just count overflow. - Summary form is flexible: prose, bullets, or mix, whichever communicates best. Tier-aware budgets (Lightweight 1-3 lines; Standard 3-5 lines or 2-4 bullets; Deep 4-6 lines or 3-6 bullets). - Rename "Scope Summary" / "Synthesis Summary" to "Scoping Synthesis" for parity with ce-brainstorm's terminology. - Soft-cut option wording updated per the parity note in #819 (the "redirect" verb collided with the unrelated self-redirect mechanism). Skill doc updated -- the Quick Example referenced "short prose summary" and "the gate skips when there are no forks worth flagging," both of which would mislead a reader under the new behavior.

tmchow added 2 commits May 13, 2026 17:40

chatgpt-codex-connector Bot reviewed May 14, 2026

View reviewed changes

Comment thread plugins/compound-engineering/skills/ce-brainstorm/references/synthesis-summary.md Outdated

tmchow added 2 commits May 13, 2026 19:34

tmchow changed the title ~~fix(ce-brainstorm): replace three-bucket audit with scoping synthesis~~ fix(ce-brainstorm): scoping synthesis, integration check, and plain-English Q&A rules May 14, 2026

chatgpt-codex-connector Bot reviewed May 14, 2026

View reviewed changes

Comment thread plugins/compound-engineering/skills/ce-brainstorm/references/synthesis-summary.md Outdated

tmchow changed the title ~~fix(ce-brainstorm): scoping synthesis, integration check, and plain-English Q&A rules~~ fix(ce-brainstorm): scoping synthesis, integration check, plain-English Q&A rules, headless removal May 14, 2026

tmchow changed the title ~~fix(ce-brainstorm): scoping synthesis, integration check, plain-English Q&A rules, headless removal~~ fix(ce-brainstorm): scoping synthesis and Q&A interaction cleanup May 14, 2026

tmchow merged commit 6df3f96 into main May 14, 2026
2 checks passed

github-actions Bot mentioned this pull request May 14, 2026

chore: release main #831

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ce-brainstorm): scoping synthesis and Q&A interaction cleanup#829

fix(ce-brainstorm): scoping synthesis and Q&A interaction cleanup#829
tmchow merged 6 commits into
mainfrom
tmchow/ce-brainstorm-scope-confirm

tmchow commented May 14, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tmchow commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Design decisions

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tmchow commented May 14, 2026 •

edited

Loading