Codestin Search App

tmchow · 2026-03-20T21:57:05Z

Summary

The repo-research-analyst agent previously jumped straight into open-ended exploration without grounding itself in the project's actual technology stack. This caused unfocused research, wasted tool calls on ecosystems that don't exist, and missed context that downstream planning agents need.

This PR adds a structured Phase 0 scan that runs before the existing research phases, and updates ce-plan-beta to leverage the structured output.

What Phase 0 does

0.1 Root-level discovery: One broad glob to detect which ecosystems are present, then selective manifest reads (only files that exist -- no globbing for go.mod in a TypeScript repo)
0.1b Monorepo detection: Reuses data already read in 0.1 to identify workspace structure and scope subsequent scans to the relevant service
0.2 Infrastructure & API surface: Conditional checks with explicit skip rules -- entire categories are skipped when 0.1 rules them out (e.g., no API/data layer globs for a pure CLI tool with no backend)
0.3 Module structure: Scans internal boundaries within the detected scope

The output is a Technology & Infrastructure section at the top of every research summary, reporting languages, frameworks, deployment model, API styles, data stores, and monorepo layout. Absence is reported as an explicit signal ("None detected") rather than being silently omitted.

ce-plan-beta integration

Updated the external research decision logic to leverage Phase 0's structured output:

Pass detected framework versions to framework-docs-researcher for version-specific lookups
Skip external research when the technology layer is well-established locally
Lean toward external research when the technology layer is absent or thin
Scope research to the relevant monorepo service, not the aggregate

Eval results

Tested with 3 scenarios (basic scan, scoped monorepo, absent tech layer), each comparing with-skill vs baseline:

Eval	With Skill	Baseline	Delta
Basic scan	44 calls, 57K tokens, 204s	68 calls, 86K tokens, 415s	35% fewer calls, 34% fewer tokens, 2x faster
Scoped monorepo	Phase 0 in 4 calls	No structured scan	Efficient scoping
Absent tech	Explicit "None detected"	Absence buried in prose	Clearer signal for planning

Token efficiency was the main iteration focus. The conditional skip rules in 0.2 are the key optimization -- they prevent 7+ wasted globs when the manifest already shows the project has no backend framework, database, or server dependencies.

Test plan

Run repo-research-analyst against this repo (basic scan) -- produces structured T&I section
Run with scoped monorepo context -- Phase 0 completes in 4 tool calls
Run with absent tech query (gRPC) -- absence explicitly reported
Verify ce-plan-beta SKILL.md integrates with Phase 0 output
Confirm no changes to release-owned files (plugin.json, marketplace.json, CHANGELOG.md)

The repo-research-analyst agent previously jumped straight into open-ended exploration without first grounding itself in the project's actual technology stack. This led to unfocused research and missed context that downstream planning agents (like ce-plan-beta) needed to make sharp decisions about external research. This adds a structured Phase 0 scan that runs before the existing research phases: - 0.1 Root-level discovery: single broad glob to detect ecosystems, then selective manifest reads (only files that exist) - 0.1b Monorepo detection: reuses 0.1 data to identify workspace structure and scope subsequent scans - 0.2 Infrastructure & API surface: conditional checks that skip entire categories when 0.1 rules them out (e.g., no API/data layer globs for a pure CLI tool) - 0.3 Module structure: scans internal boundaries within the detected scope The scan output is a Technology & Infrastructure section at the top of every research summary, reporting languages, frameworks, deployment model, API styles, data stores, and monorepo layout. ce-plan-beta's SKILL.md is updated to leverage this structured output for smarter external research routing -- using detected framework versions for targeted doc lookups and skipping external research when local patterns are already established. Tested with 3 eval scenarios (basic scan, scoped monorepo, absent tech layer), each with skill vs baseline comparisons: - Basic scan: 44 vs 68 tool calls, 57K vs 86K tokens (2x faster) - Scoped monorepo: Phase 0 completed in 4 tool calls - Absent tech: correctly reported absence of gRPC/GraphQL/REST as explicit signal rather than omitting it Token efficiency was the main iteration focus -- the conditional skip rules prevent wasted globs when 0.1 already shows the project has no backend framework, database, or server dependencies.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: aeaf8b25b2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

- Read workspace config files (pnpm-workspace.yaml, nx.json, lerna.json) when detected in root listing so monorepo scoping can extract actual workspace paths - Require root listing to show no API-related directories/files before skipping API surface checks, preventing false negatives for Go/Node stdlib servers - Check for infra files within scoped monorepo subtrees, not just root

tmchow · 2026-03-20T22:05:40Z

@codex re-review this

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 63230a1e65

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

…e probe - Data layer and API surface now have independent skip conditions. A CLI or worker with a database (e.g., Prisma/SQLite) but no HTTP surface will still get its data layer scanned. - Go multi-module detection now adds a conditional */go.mod glob when Go directories are visible but no root go.mod was found.

tmchow changed the title ~~feat: add structured Phase 0 technology scan to repo-research-analyst~~ feat: improve repo-research-analyst by adding a structured technology scan Mar 20, 2026

chatgpt-codex-connector Bot reviewed Mar 20, 2026

View reviewed changes

Comment thread plugins/compound-engineering/agents/research/repo-research-analyst.md Outdated

Comment thread plugins/compound-engineering/agents/research/repo-research-analyst.md Outdated

Comment thread plugins/compound-engineering/agents/research/repo-research-analyst.md Outdated

chatgpt-codex-connector Bot reviewed Mar 20, 2026

View reviewed changes

Comment thread plugins/compound-engineering/agents/research/repo-research-analyst.md Outdated

Comment thread plugins/compound-engineering/agents/research/repo-research-analyst.md Outdated

tmchow merged commit 1c28d03 into main Mar 20, 2026
2 checks passed

github-actions Bot mentioned this pull request Mar 20, 2026

chore: release main #326

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: improve `repo-research-analyst` by adding a structured technology scan#327

feat: improve `repo-research-analyst` by adding a structured technology scan#327
tmchow merged 3 commits into
mainfrom
feat/structured-repo-scan

tmchow commented Mar 20, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tmchow commented Mar 20, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tmchow commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What Phase 0 does

ce-plan-beta integration

Eval results

Test plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tmchow commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tmchow commented Mar 20, 2026 •

edited

Loading

tmchow commented Mar 20, 2026 •

edited

Loading