feat: improve repo-research-analyst by adding a structured technology scan#327
Conversation
The repo-research-analyst agent previously jumped straight into open-ended exploration without first grounding itself in the project's actual technology stack. This led to unfocused research and missed context that downstream planning agents (like ce-plan-beta) needed to make sharp decisions about external research. This adds a structured Phase 0 scan that runs before the existing research phases: - 0.1 Root-level discovery: single broad glob to detect ecosystems, then selective manifest reads (only files that exist) - 0.1b Monorepo detection: reuses 0.1 data to identify workspace structure and scope subsequent scans - 0.2 Infrastructure & API surface: conditional checks that skip entire categories when 0.1 rules them out (e.g., no API/data layer globs for a pure CLI tool) - 0.3 Module structure: scans internal boundaries within the detected scope The scan output is a Technology & Infrastructure section at the top of every research summary, reporting languages, frameworks, deployment model, API styles, data stores, and monorepo layout. ce-plan-beta's SKILL.md is updated to leverage this structured output for smarter external research routing -- using detected framework versions for targeted doc lookups and skipping external research when local patterns are already established. Tested with 3 eval scenarios (basic scan, scoped monorepo, absent tech layer), each with skill vs baseline comparisons: - Basic scan: 44 vs 68 tool calls, 57K vs 86K tokens (2x faster) - Scoped monorepo: Phase 0 completed in 4 tool calls - Absent tech: correctly reported absence of gRPC/GraphQL/REST as explicit signal rather than omitting it Token efficiency was the main iteration focus -- the conditional skip rules prevent wasted globs when 0.1 already shows the project has no backend framework, database, or server dependencies.
repo-research-analyst by adding a structured technology scan
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: aeaf8b25b2
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
- Read workspace config files (pnpm-workspace.yaml, nx.json, lerna.json) when detected in root listing so monorepo scoping can extract actual workspace paths - Require root listing to show no API-related directories/files before skipping API surface checks, preventing false negatives for Go/Node stdlib servers - Check for infra files within scoped monorepo subtrees, not just root
|
@codex re-review this |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 63230a1e65
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
…e probe - Data layer and API surface now have independent skip conditions. A CLI or worker with a database (e.g., Prisma/SQLite) but no HTTP surface will still get its data layer scanned. - Go multi-module detection now adds a conditional */go.mod glob when Go directories are visible but no root go.mod was found.
Summary
The
repo-research-analystagent previously jumped straight into open-ended exploration without grounding itself in the project's actual technology stack. This caused unfocused research, wasted tool calls on ecosystems that don't exist, and missed context that downstream planning agents need.This PR adds a structured Phase 0 scan that runs before the existing research phases, and updates ce-plan-beta to leverage the structured output.
What Phase 0 does
go.modin a TypeScript repo)The output is a Technology & Infrastructure section at the top of every research summary, reporting languages, frameworks, deployment model, API styles, data stores, and monorepo layout. Absence is reported as an explicit signal ("None detected") rather than being silently omitted.
ce-plan-beta integration
Updated the external research decision logic to leverage Phase 0's structured output:
Eval results
Tested with 3 scenarios (basic scan, scoped monorepo, absent tech layer), each comparing with-skill vs baseline:
Token efficiency was the main iteration focus. The conditional skip rules in 0.2 are the key optimization -- they prevent 7+ wasted globs when the manifest already shows the project has no backend framework, database, or server dependencies.
Test plan