Codestin Search App

tmchow · 2026-05-12T05:02:18Z

The synthesis gate at Phase 0.7 / 5.1.5 now fires reliably without depending on the synthesis-summary.md reference loading. The literal templates and the "silent proceeding is not allowed" rule are inlined in SKILL.md so the gate output appears even on a load-failure case. The gate block leads with a firm read-the-reference instruction so call-outs are still well-shaped when the load succeeds.

The inlined templates also drop phase-number jargon from user-facing text ("Phase 0.4 bootstrap" became "our brief discussion"), reduce a two-bullet placeholder that was biasing call-out count, and add purpose context to the Stated / Inferred / Out of scope bucket names.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a24e092abe

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…plate Address PR review feedback (#822) The inlined solo template dropped the "(You can also redirect to /ce-brainstorm if this is bigger than you initially thought...)" parenthetical that the reference template carries. Inlining was supposed to make the gate fire reliably without the reference loading, so dropping the escape-hatch line from the inline copy weakened the guardrail it was meant to preserve. Sync the inline template back to the reference's wording.

The Phase 0.7 / 5.1.5 synthesis gate was being skipped silently when the synthesis-summary.md reference did not load — the templates and mandatory-announce rule lived only there, behind a "STOP, read this" indirection that the agent could (and did) skip. Move the load-bearing pieces inline so the gate fires reliably even on a load-failure case, and reorder the gate block so the reference-load instruction is the first step. The reference now provides best-effort quality guidance for call-out shaping; the gate itself no longer depends on it loading. Also fix the inlined templates: replace "Phase 0.4 bootstrap" / "Phase 1 research" with user-facing language (users do not track phase numbers), reduce two-bullet placeholders to a single placeholder with explicit count guidance (the multi-bullet placeholder biased toward a fixed count), and add purpose context to the Stated / Inferred / Out of scope bucket names so it is clear they drive plan-body routing rather than chat output.

…plate Address PR review feedback (#822) The inlined solo template dropped the "(You can also redirect to /ce-brainstorm if this is bigger than you initially thought...)" parenthetical that the reference template carries. Inlining was supposed to make the gate fire reliably without the reference loading, so dropping the escape-hatch line from the inline copy weakened the guardrail it was meant to preserve. Sync the inline template back to the reference's wording.

Apply the shape and discipline changes from ce-brainstorm's scoping-synthesis fix (#829) to ce-plan's Phase 0.7 / 5.1.5: - Tier guard on auto-proceed: Lightweight + zero call-outs is the only path that skips the confirmation gate. Standard and Deep plans always fire the confirmation gate even with zero call-outs, because substance earns the checkpoint. A 1-3 line summary on a Deep plan is exactly the rubber-stamping case the gate is supposed to prevent. - Confirmation phrasing names what happens on confirm ("Confirm and I'll proceed to research, drawing on this scope" / "Confirm and I'll write the plan next..."), replacing the ambiguous "Confirm to proceed." - Detail test for each surviving call-out and summary bullet: 1-2 lines max, conversational not documentary. The count cap was gameable without it -- three call-outs could each be a 6-line paragraph and still "fit." - Re-cut rule extended to fire on detail overflow, not just count overflow. - Summary form is flexible: prose, bullets, or mix, whichever communicates best. Tier-aware budgets (Lightweight 1-3 lines; Standard 3-5 lines or 2-4 bullets; Deep 4-6 lines or 3-6 bullets). - Rename "Scope Summary" / "Synthesis Summary" to "Scoping Synthesis" for parity with ce-brainstorm's terminology. - Soft-cut option wording updated per the parity note in #819 (the "redirect" verb collided with the unrelated self-redirect mechanism). Skill doc updated -- the Quick Example referenced "short prose summary" and "the gate skips when there are no forks worth flagging," both of which would mislead a reader under the new behavior.

The brainstorm-sourced synthesis was producing plan-pitch outputs that read like a Table of Contents for the plan body: enumerating Implementation Units, restating brainstorm constraints, and accounting for how deferred-Qs route into plan sections. None of that gives the user something to push back on; it just rubber-stamps work the brainstorm already validated. Restructure the brainstorm-sourced synthesis into two distinct content sections plus call-outs: 1. Brainstorm-scope restatement (1-2 sentences). The user wrote this content, but the synthesis may be read days later or in parallel with other plans. The restatement is the topic anchor that names which artifact is being planned against, in the brainstorm's own vocabulary. Not an enumeration. 2. Plan-specific scoping (prose or bullets). What this plan covers vs. defers vs. expands relative to the brainstorm: staging decisions, test scope, adjacent refactors. This is the part the user can actively push back on at plan-time. Solo plans have no upstream and the summary is a single scope claim. Other changes: - Tier budgets are reframed as ceilings, not targets. Filling the budget when there is not more substantive to say produces noise. - Source-document vocabulary discipline: when a brainstorm exists, use its terms; do not invent agent-coded shorthand like "skill-instruction shape" or "hooks engine selection at Step 2a entry" that forces the user to flip back and translate. - Both templates renamed and restructured to communicate the new shape via placeholder hints.

…scipline A test run of the new two-paragraph synthesis still surfaced plan-pitch leakage in three patterns the rules didn't yet block: 1. The agent claimed "one PR" — a sequencing decision plan-write produces, not something knowable at synthesis time. 2. "Plan-specific scoping" was enumerating where the implementation reaches into the codebase (file paths, Implementation Unit inventory) instead of stating scope-claim decisions. 3. Call-outs kept the 3-5 line "name fork, explain A, explain B, my default is X" rationale-dump shape, which is exactly what belongs in Key Technical Decisions in the plan body. Encode the underlying rule explicitly: the synthesis is composed before plan-write, so it can only surface what the agent knows from the brainstorm + research + posture commitments. Implementation Unit boundaries, PR count, commit/branch sequencing, effort estimates, and exact file paths are all plan-write outputs the synthesis cannot honestly claim. Even when the agent has formed plan-write opinions earlier in the session, those stay internal until plan-write. Other refinements: - Reword plan-specific scoping from "what this plan covers vs defers vs expands" to "scope-level decisions" — the "covers" framing was pulling agents toward inventory. - Make plan-specific scoping items pass the same affirmability test as call-outs: the user can affirm or redirect without reading code. - Strengthen the call-out template placeholder to forbid multi-sentence rationale and "my default is X" pitches. - Generalize the bare-ID anti-pattern in source-vocabulary discipline (AE4, R6, F3 all flip the user back to the brainstorm).

A test run showed two wording patterns the prior rules didn't block: 1. Bare ID references resurfaced ("AE1-AE3", "AE4", "AE5", "R6") even when the cases were already named in plain terms in the same sentence. Strengthen the source-vocabulary rule into a mechanical pre-emit scan: before emitting, look for `AE\d+`, `R\d+`, `F\d+`, `A\d+`, `U\d+` patterns and replace with plain names. 2. Numerical attestation ("all nine requirements, all three flows, all five acceptance examples") read as the agent showing its work — "covers the full brainstorm scope" already conveys the claim and the count adds nothing the user can affirm. Add as a named anti-pattern alongside synthesis-as-plan-pitch. Both are wording-polish refinements on top of structural rules that are now landing. Reference-only changes; no SKILL.md inline updates needed since these refine quality, not gate firing.

Fresh-session test still produced "The touch surface is X (subpaths), Y (subpaths), Z..." enumeration in paragraph 2 plus "all 9 requirements" numerical attestation. The rules forbidding both were in the reference but the touch-surface prohibition was buried in a comma-separated list of NOTs and the file-path scan didn't exist yet. Promote both to load-reliable inline placement in SKILL.md and add the file-path scan as a pre-emit mechanical check: - "Do NOT enumerate the touch surface" gets its own bold inline paragraph in both Phase 0.7 and Phase 5.1.5. Names trigger phrases ("The touch surface is...", "This plan touches...", "The implementation reaches into...", "Files modified include...") so the agent recognizes the pattern even when the buried rule misses. - Pre-emit scan rule expanded from bare-IDs-only to bare-IDs + file paths. Same mechanical shape: before emitting, scan for `path/like.md` / `path/like.py` patterns and cut unless the path IS the topic of an explicit fork in the call-outs. - Reference section reorganized: source-vocabulary covers vocab choice; a separate "Pre-emit mechanical checks" bullet groups both scans with examples of allowed vs forbidden path usage.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 08476de9a7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Address PR review feedback (#822) I'd copy-pasted the same "Synthesis is pre-plan-write" rule into both Phase 0.7 (solo) and Phase 5.1.5 (brainstorm-sourced), naming "brainstorm + research + agent posture" as the inputs available at synthesis time. That's correct for Phase 5.1.5 (which fires after Phase 1 research and from an upstream brainstorm doc), but wrong for Phase 0.7: the solo variant fires when no brainstorm exists AND before Phase 1 research runs. Naming sources that aren't there can push the agent to fabricate grounding or overstate confidence. Phase 0.7 now names what's actually available — the user's request, the Phase 0.4 bootstrap dialogue, and the agent's internal three-bucket draft — and explicitly says Phase 1 research has not happened yet and there is no upstream brainstorm. Phase 5.1.5 unchanged; its phrasing is accurate for that variant.

Adds an eval suite that tests whether ce-sessions findings preserve terminology resolution context — specifically, whether distinctive coined terms and their resolution rationale survive the session-historian synthesis step intact. Four test cases with ground truth from recently merged PRs: - synthesis-gate-recovery (PR #822) — distinctive term recovery - mode-headless-semantic-alignment (PR #813) — multi-piece nuance - tangential-term-recovery — indexing-gap test - near-miss-false-positive — discriminating-power test Two-stage grader: programmatic substring match per criticality tier, plus LLM-graded context preservation. Variance protocol: 3 runs per eval. This suite was built during PR #838's design exploration to validate a load-bearing assumption (that ce-sessions findings could feed ce-compound Phase 2.4's vocabulary scan). That assumption was ultimately retired in favor of doc-and-conversation-only scanning, so the suite is not load-bearing for PR #838. Kept as future infrastructure for validating ce-sessions's behavior as the skill evolves — e.g., when changing the session-historian synthesis prompt or adjusting scan-window defaults. Iteration-1 results (executed via skill-creator framework, captured to /tmp/compound-engineering/ce-sessions/evals/iteration-1/) showed ce-sessions preserved terminology strongly across all 4 evals with 100% must-tier recall and 0% stddev — but this is a capability test of the skill in isolation, not a test of any specific integration.

chatgpt-codex-connector Bot reviewed May 12, 2026

View reviewed changes

Comment thread plugins/compound-engineering/skills/ce-plan/SKILL.md Outdated

tmchow added 3 commits May 14, 2026 12:06

tmchow force-pushed the tmchow/ce-plan-synthesis-gate branch from 322c521 to 0c0afc8 Compare May 14, 2026 19:07

tmchow added 4 commits May 14, 2026 12:41

chatgpt-codex-connector Bot reviewed May 14, 2026

View reviewed changes

Comment thread plugins/compound-engineering/skills/ce-plan/SKILL.md Outdated

tmchow merged commit 39cb9da into main May 15, 2026
2 checks passed

github-actions Bot mentioned this pull request May 15, 2026

chore: release main #834

Merged

LLMpsycho pushed a commit to LLMpsycho/compound-engineering-plugin that referenced this pull request May 21, 2026

fix(ce-plan): inline synthesis gate output into SKILL.md (EveryInc#822)

edfe4c3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ce-plan): inline synthesis gate output into SKILL.md#822

fix(ce-plan): inline synthesis gate output into SKILL.md#822
tmchow merged 8 commits into
mainfrom
tmchow/ce-plan-synthesis-gate

tmchow commented May 12, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tmchow commented May 12, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant