Codestin Search App

tmchow · 2026-04-26T04:52:26Z

Summary

ce-session-historian was running 17+ minutes with 33 tool calls and stream-idle-timing-out on dispatches with no relevant prior sessions, where the correct answer should arrive in under a minute. The agent had no way to filter sessions by content efficiently, treated repo membership as sufficient relevance signal, and received a verbose dispatch prompt that licensed it to keep widening the search.

Fixes #696

What changed

`ce-session-inventory` gains a `--keyword` filter

extract-metadata.py adds an opt-in --keyword K1[,K2,...] mode that scans only user and assistant text content (not JSONL metadata, tool calls, or thinking blocks), filters out zero-match sessions, and emits per-session match_count plus per-keyword counts. This replaces the pattern of hand-rolling per-file grep -l invocations across multi-MB JSONL files.

The keyword scan runs only after --cwd-filter passes, so cross-repo Codex sessions are not scanned just to be discarded. The empty-input branch emits files_matched: 0 when --keyword was supplied, so callers that gate on it can short-circuit cleanly. For Codex sessions specifically, the <system_instruction>...</system_instruction> wrapper is stripped before counting, so environment terms like "Conductor" do not false-match.

`ce-session-historian` adds a relevance gate before extraction

Step 3 is now an explicit numbered decision sequence: branch filter first; if zero candidates, run --keyword and stop on files_matched: 0 without extracting; otherwise apply the hard cap of 5 deep-dives and extract. A new top-level guardrail (Never extract a session to verify whether it is relevant) makes the gate binding, since prose-level priority lists did not prevent extract-to-verify behavior. Tail extraction is now conditional, only invoked when head:200 terminates mid-investigation. The gitBranch caveat (captured at the first user message only, so branch-miss is not conclusive) is documented inline. The time budget drops the minute target in favor of "stop when complete"; structural caps in Step 3 and Step 4 bound runtime by construction.

`/ce-compound` dispatch tightens

The Session Historian dispatch in Phase 1 was a long context block with topic-keyword bullets that licensed widening. The new dispatch is a 5-field schema: pre-resolved repo and branch, 7-day window, one-sentence problem topic, one-line filter rule, fixed output schema. Pre-resolution uses a case on absolute-vs-relative output from git rev-parse --git-common-dir, which correctly handles repo root, subdirectory, and linked-worktree invocations.

AGENTS.md gains a "Validating Agent and Skill Changes" section

The section documents how to test agent and skill changes correctly: use the skill-creator skill, which spawns a generic subagent and injects content from disk at dispatch time. Both plugin agents and skills cache at session start, so dispatching the typed agent or invoking via the Skill tool inside the same session tests cached pre-edit content. Editing ~/.claude/plugins/cache/ or ~/.claude/plugins/marketplaces/ to force a reload is explicitly called out as wrong.

Test plan

tests/session-history-scripts.test.ts adds 12 new cases under --keyword mode covering single-keyword filtering, zero-match exclusion, OR semantics, case insensitivity, content-only scanning (sessionId / gitBranch / tool names do not false-match), CWD-filter ordering with --keyword, empty-input emits files_matched: 0, and the Codex system_instruction strip. bun test passes 951/951; bun run release:validate clean.

Agent prose changes are validated via the skill-creator pattern documented in the new AGENTS.md section: spawn a general-purpose subagent with the agent definition injected from disk. Three scenarios were exercised during this work (sparse-mismatch, branch-match success, keyword-match without branch-match), all behaved as designed.

… tighten dispatch Sparse-history dispatches to ce-session-historian were running 17+ minutes with 33 tool calls before stream-idle-timing-out, when the correct answer ("no relevant prior sessions") should arrive in under a minute. Six gaps were compounding: - No skill primitive for "search inventory by keyword across sessions", forcing the agent to roll 20 by-hand `grep -l` invocations across MB-sized JSONL files. - Soft "typically 2-5 sessions" guidance with no hard cap. - Tail-after-head extraction allowed unconditionally; ran reflexively on half of selected sessions. - Verbose dispatch prompt from /ce-compound with topic-keyword bullets that licensed the agent to keep widening. - Repo name not pre-resolved by caller; agent burned its first turn deriving via git rev-parse. - No wall-time budget anywhere in the agent. This change: - Adds an opt-in --keyword K1[,K2,...] mode to ce-session-inventory's extract-metadata.py. When set, the script does a full-file case-insensitive substring scan, filters out zero-match sessions, and emits per-session match_count plus per-keyword counts. _meta gains files_matched. - Tightens ce-session-historian: 5-7 min wall budget, hard cap of 5 deep-dives, conditional tail-extract, gitBranch start-of-session limitation note, Step 3 #4 now points to --keyword instead of by-hand grep. - Tightens /ce-compound: pre-resolves repo + branch via backtick syntax, rewrites the historian dispatch block as a 5-field schema (pre-resolved context, time window, problem topic, filter rule, output schema). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 4d753c08ea

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…d evals scaffolding The first iteration on session-historian (this PR's prior commit) tightened prose around extraction caps, conditional tail extraction, and a keyword-search hint, but did not stop the agent's strongest default behavior: when the candidate list is small enough to fit under the cap, the agent extracts every returned session "to verify" rather than running the keyword filter first. On sparse-mismatch dispatches that means deep-extracting 3 unrelated sessions instead of issuing one --keyword invocation that returns 0 matches in a single call. This commit replaces the soft Step 3 priority list with an explicit numbered decision sequence: - Branch filter first. - If branch filter is empty, run ce-session-inventory --keyword with keywords derived from the dispatch problem topic. - If files_matched is 0, return "no relevant prior sessions" and STOP -- ce-session-extract is not invoked. Step 4 gains an explicit "only run if Step 3 produced selected sessions" guard, and a new guardrail at the top forbids extraction-to-verify outright. The time-budget block drops the 5-7 minute target wording (which read as a target rather than a max) in favor of "stop when complete; structural caps bound runtime by construction." Adds top-level evals/ for repo-only LLM-driven behavioral checks. The session-historian eval covers the sparse-history scenario via synthetic ~/.claude/projects/-tmp-eval-... fixtures and a generic-subagent dispatch pattern (inject the agent definition into a general-purpose subagent rather than dispatching the typed agent, which reads cached definitions from session start). Documents the cache caveat in evals/README.md. Validated on the sparse-mismatch scenario: - Before: 4 tool calls (inventory + 3 deep extracts), agent ignored keyword filter even with explicit prose guidance. - After: 2 tool calls (inventory + --keyword filter), zero deep extractions, correct response. Wall time 55s, well under the 60s soft target. Full results in evals/session-historian/results-2026-04-25.md. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

…ll changes When iterating on a plugin agent or skill, dispatching the typed agent in the same session does not test repo edits — the agent definition is loaded once at session start and cached in memory. The previous PR commit added an evals/ scaffold but pointed to "restart the session" as the validation path, which is correct but heavy. The right tool is the skill-creator skill: it spawns a generic subagent and injects the agent or skill content into the subagent's prompt at dispatch time, so each run reads from current disk and iteration works inside a single session. Adds a "Validating Agent and Skill Changes" section to repo-root AGENTS.md that names skill-creator as the primary path, calls out the session-start caching gotcha, and explicitly warns against editing ~/.claude/plugins/ to try to force a reload — that path was tried during this work, did not bypass the cache, and is not a valid testing technique. Updates evals/README.md and the session-historian eval docs to point to skill-creator first, restart as a fallback only, and to reframe the "file-sync under ~/.claude" narrative so it doesn't read as a recommended approach. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 79add10c20

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…scaffolding Two changes that emerged from validating the prior commits: 1. --keyword was scanning JSONL metadata, not user/assistant content. Common topic words like "session" matched every file via the sessionId field — on a 4-session keyword-match fixture, only 1 had real topical content, but all 4 returned non-zero match_count from metadata noise. extract-metadata.py now extracts user message text + assistant text blocks first and scans only those, skipping JSONL field names, tool_use blocks (tool names + inputs), tool_result blocks, and thinking blocks. Verified on the same fixture: files_matched drops from 4 to 1, only the real match remains. Adds 3 tests: - sessionId / gitBranch / parentUuid as keywords return zero matches against the Claude fixture (would have all matched under the prior impl) - "Edit" as a keyword does not match against tool_use names in the fixture - "auth" still matches against actual user/assistant text content 2. Drops the evals/ scaffolding added in earlier commits. The skill-creator skill is the canonical tool for evaluating agent and skill changes — it has its own conventions (evals/evals.json, <skill>-workspace/) and a purpose-built workflow. The repo-local scaffolding under evals/ was a one-off investigation artifact that doesn't conform to skill-creator's shape and would silently rot when agent definitions change. The durable lessons from that work are kept in repo-root AGENTS.md (skill-creator as the canonical path; agents AND skills both cache at session start; never edit ~/.claude/plugins/ to test). Removes the evals/ entry from the AGENTS.md Directory Layout section in the same commit so the directory reference doesn't outlive the directory. The two positive-path scenarios (branch-match success, keyword-match without branch-match) were both validated during this work via the skill-creator dispatch pattern. Results captured in the PR description and commit history; they don't need a committed results doc. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7b65f188b5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

- Pre-resolved repo name: distinguish absolute vs relative output from git rev-parse --git-common-dir. The previous case-on-".git" check failed from a normal repo subdirectory (where the command returns ../.git, not .git or an absolute path), making the prompt resolve to ".." instead of the repo name. Applied to ce-compound, ce-sessions, and the ce-session-historian agent's Step 1 example. - extract-metadata.py: defer the full-file --keyword scan until after --cwd-filter passes. Previously process_file ran the keyword scan before the cwd_filter check, which on Codex (cross-repo discovery) wasted scanning on sessions immediately discarded by the filter — recreating the long-runtime behavior this work is trying to eliminate. - extract-metadata.py: emit files_matched: 0 in the empty-input _meta branch when --keyword was supplied. Without it, no-result keyword scans were ambiguous to the historian's "files_matched: 0 -> stop" rule. - Tests: added a Codex cross-repo-filtered + --keyword regression test and an empty-stdin + --keyword test covering the previously-uncovered no-input branch. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

- Strip Codex <system_instruction>...</system_instruction> wrapper from the keyword-scan corpus in _extract_user_assistant_text. Codex prepends the wrapper to event_msg.user_message payloads (e.g., "You are working inside Conductor."), and counting matches against that text produced false positives on environment-label terms. Mirrors the existing split in extract-skeleton.py. - Test: searching the Codex fixture for "Conductor" now returns zero matches, since "Conductor" only appears inside the wrapper. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9a8291047a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

- Reorder Step 3 sub-steps in ce-session-historian.agent.md: drop out-of-window sessions and exclude the current session BEFORE applying the 5-session deep-dive cap. Previous order capped first, which could discard all in-window candidates when high-scoring older sessions occupied the cap slots — leaving the agent to falsely return "no relevant prior sessions" even when valid in-window matches existed further down the candidate list. Tie-breaker rules (branch-match → match_count → file size → recency) and STOP semantics unchanged. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

… tighten dispatch (EveryInc#699) Co-authored-by: Claude Opus 4.7 (1M context) <[email protected]>

chatgpt-codex-connector Bot reviewed Apr 26, 2026

View reviewed changes

Comment thread plugins/compound-engineering/skills/ce-compound/SKILL.md Outdated

Comment thread plugins/compound-engineering/skills/ce-session-inventory/scripts/extract-metadata.py Outdated

tmchow and others added 2 commits April 25, 2026 22:24

chatgpt-codex-connector Bot reviewed Apr 26, 2026

View reviewed changes

Comment thread plugins/compound-engineering/skills/ce-session-inventory/scripts/extract-metadata.py

chatgpt-codex-connector Bot reviewed Apr 26, 2026

View reviewed changes

Comment thread plugins/compound-engineering/skills/ce-session-inventory/scripts/extract-metadata.py Outdated

tmchow and others added 2 commits April 25, 2026 23:25

chatgpt-codex-connector Bot reviewed Apr 26, 2026

View reviewed changes

Comment thread plugins/compound-engineering/agents/ce-session-historian.agent.md Outdated

tmchow merged commit a91270c into main Apr 26, 2026
2 checks passed

github-actions Bot mentioned this pull request Apr 26, 2026

chore: release main #684

Merged

tmchow mentioned this pull request Apr 26, 2026

fix(skills): replace case statements blocked by permission check #701

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(session-historian): cap deep-dives, add keyword filter primitive, tighten dispatch#699

fix(session-historian): cap deep-dives, add keyword filter primitive, tighten dispatch#699
tmchow merged 7 commits into
mainfrom
fix/ce-session-historian-sparse-history

tmchow commented Apr 26, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tmchow commented Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

ce-session-inventory gains a --keyword filter

ce-session-historian adds a relevance gate before extraction

/ce-compound dispatch tightens

AGENTS.md gains a "Validating Agent and Skill Changes" section

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tmchow commented Apr 26, 2026 •

edited

Loading

`ce-session-inventory` gains a `--keyword` filter

`ce-session-historian` adds a relevance gate before extraction

`/ce-compound` dispatch tightens