Thanks to visit codestin.com
Credit goes to github.com

Skip to content

feat: improve repo-research-analyst by adding a structured technology scan#327

Merged
tmchow merged 3 commits into
mainfrom
feat/structured-repo-scan
Mar 20, 2026
Merged

feat: improve repo-research-analyst by adding a structured technology scan#327
tmchow merged 3 commits into
mainfrom
feat/structured-repo-scan

Conversation

@tmchow
Copy link
Copy Markdown
Collaborator

@tmchow tmchow commented Mar 20, 2026

Summary

The repo-research-analyst agent previously jumped straight into open-ended exploration without grounding itself in the project's actual technology stack. This caused unfocused research, wasted tool calls on ecosystems that don't exist, and missed context that downstream planning agents need.

This PR adds a structured Phase 0 scan that runs before the existing research phases, and updates ce-plan-beta to leverage the structured output.

What Phase 0 does

  • 0.1 Root-level discovery: One broad glob to detect which ecosystems are present, then selective manifest reads (only files that exist -- no globbing for go.mod in a TypeScript repo)
  • 0.1b Monorepo detection: Reuses data already read in 0.1 to identify workspace structure and scope subsequent scans to the relevant service
  • 0.2 Infrastructure & API surface: Conditional checks with explicit skip rules -- entire categories are skipped when 0.1 rules them out (e.g., no API/data layer globs for a pure CLI tool with no backend)
  • 0.3 Module structure: Scans internal boundaries within the detected scope

The output is a Technology & Infrastructure section at the top of every research summary, reporting languages, frameworks, deployment model, API styles, data stores, and monorepo layout. Absence is reported as an explicit signal ("None detected") rather than being silently omitted.

ce-plan-beta integration

Updated the external research decision logic to leverage Phase 0's structured output:

  • Pass detected framework versions to framework-docs-researcher for version-specific lookups
  • Skip external research when the technology layer is well-established locally
  • Lean toward external research when the technology layer is absent or thin
  • Scope research to the relevant monorepo service, not the aggregate

Eval results

Tested with 3 scenarios (basic scan, scoped monorepo, absent tech layer), each comparing with-skill vs baseline:

Eval With Skill Baseline Delta
Basic scan 44 calls, 57K tokens, 204s 68 calls, 86K tokens, 415s 35% fewer calls, 34% fewer tokens, 2x faster
Scoped monorepo Phase 0 in 4 calls No structured scan Efficient scoping
Absent tech Explicit "None detected" Absence buried in prose Clearer signal for planning

Token efficiency was the main iteration focus. The conditional skip rules in 0.2 are the key optimization -- they prevent 7+ wasted globs when the manifest already shows the project has no backend framework, database, or server dependencies.

Test plan

  • Run repo-research-analyst against this repo (basic scan) -- produces structured T&I section
  • Run with scoped monorepo context -- Phase 0 completes in 4 tool calls
  • Run with absent tech query (gRPC) -- absence explicitly reported
  • Verify ce-plan-beta SKILL.md integrates with Phase 0 output
  • Confirm no changes to release-owned files (plugin.json, marketplace.json, CHANGELOG.md)

The repo-research-analyst agent previously jumped straight into
open-ended exploration without first grounding itself in the project's
actual technology stack. This led to unfocused research and missed
context that downstream planning agents (like ce-plan-beta) needed to
make sharp decisions about external research.

This adds a structured Phase 0 scan that runs before the existing
research phases:

- 0.1 Root-level discovery: single broad glob to detect ecosystems,
  then selective manifest reads (only files that exist)
- 0.1b Monorepo detection: reuses 0.1 data to identify workspace
  structure and scope subsequent scans
- 0.2 Infrastructure & API surface: conditional checks that skip
  entire categories when 0.1 rules them out (e.g., no API/data layer
  globs for a pure CLI tool)
- 0.3 Module structure: scans internal boundaries within the detected
  scope

The scan output is a Technology & Infrastructure section at the top of
every research summary, reporting languages, frameworks, deployment
model, API styles, data stores, and monorepo layout.

ce-plan-beta's SKILL.md is updated to leverage this structured output
for smarter external research routing -- using detected framework
versions for targeted doc lookups and skipping external research when
local patterns are already established.

Tested with 3 eval scenarios (basic scan, scoped monorepo, absent tech
layer), each with skill vs baseline comparisons:

- Basic scan: 44 vs 68 tool calls, 57K vs 86K tokens (2x faster)
- Scoped monorepo: Phase 0 completed in 4 tool calls
- Absent tech: correctly reported absence of gRPC/GraphQL/REST as
  explicit signal rather than omitting it

Token efficiency was the main iteration focus -- the conditional skip
rules prevent wasted globs when 0.1 already shows the project has no
backend framework, database, or server dependencies.
@tmchow tmchow changed the title feat: add structured Phase 0 technology scan to repo-research-analyst feat: improve repo-research-analyst by adding a structured technology scan Mar 20, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: aeaf8b25b2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread plugins/compound-engineering/agents/research/repo-research-analyst.md Outdated
Comment thread plugins/compound-engineering/agents/research/repo-research-analyst.md Outdated
Comment thread plugins/compound-engineering/agents/research/repo-research-analyst.md Outdated
- Read workspace config files (pnpm-workspace.yaml, nx.json, lerna.json)
  when detected in root listing so monorepo scoping can extract actual
  workspace paths
- Require root listing to show no API-related directories/files before
  skipping API surface checks, preventing false negatives for Go/Node
  stdlib servers
- Check for infra files within scoped monorepo subtrees, not just root
@tmchow
Copy link
Copy Markdown
Collaborator Author

tmchow commented Mar 20, 2026

@codex re-review this

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 63230a1e65

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread plugins/compound-engineering/agents/research/repo-research-analyst.md Outdated
Comment thread plugins/compound-engineering/agents/research/repo-research-analyst.md Outdated
…e probe

- Data layer and API surface now have independent skip conditions. A CLI
  or worker with a database (e.g., Prisma/SQLite) but no HTTP surface
  will still get its data layer scanned.
- Go multi-module detection now adds a conditional */go.mod glob when Go
  directories are visible but no root go.mod was found.
@tmchow tmchow merged commit 1c28d03 into main Mar 20, 2026
2 checks passed
@github-actions github-actions Bot mentioned this pull request Mar 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant