-
Couldn't load subscription status.
- Fork 10
π€ feat: add in-place workspace support for CLI/benchmark sessions #472
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Enables cmux to work directly in provided directories without requiring git worktrees, essential for terminal-bench integration and CLI usage. Changes: - agentSession: Detect in-place workspaces (not under srcBaseDir) and store path directly by setting projectPath === name as sentinel - aiService: Check for in-place mode and use stored path instead of reconstructing via runtime.getWorkspacePath() - streamManager: Fix cleanup safety by running rm -rf from parent directory instead of root (limits blast radius if path is malformed) Before: Terminal-bench failed with 'Working directory does not exist' After: Agents run successfully in task containers (e.g., /app) Tested with terminal-bench harness running multiple tasks successfully.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
π‘ Codex Review
Here are some automated review suggestions for this pull request.
βΉοΈ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with π.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
The workflow was trying to upload terminal-bench-results/ which doesn't exist. Terminal-bench writes results to runs/ by default.
Downloaded artifacts from terminal-bench CI runs should not be committed.
In-place workspaces (identified by projectPath === workspaceName) are direct workspace directories used by CLI/benchmark sessions, not git worktrees. Attempting to run 'git worktree remove' on them fails or attempts to remove the main checkout. This fix detects the in-place sentinel pattern and skips git worktree operations, allowing session cleanup without destructive filesystem operations. Resolves Codex review comment in PR #472.
When the current branch has no upstream, automatically run git push -u to set it instead of failing. This makes the script more user-friendly for new branches.
- Run full benchmark suite (~80 tasks) every night at midnight UTC - Concurrency=4 is appropriate for full suite (60-90 min estimated) - Timeout=180 min (3 hours) provides safety margin - Use default fallbacks for scheduled runs (no inputs) - Add unique artifact names with run_id to avoid conflicts - Set 30-day retention for nightly benchmark artifacts
terminal-bench-core==0.1.1 contains ~80 tasks, which is the complete stable benchmark suite. The -head version is bleeding-edge dev.
Use matrix strategy to run both models every night: - anthropic:claude-sonnet-4-5 (high thinking) - openai:gpt-5-codex (high thinking) Matrix only applies to scheduled runs (cron), not manual workflow_dispatch. Artifacts are named uniquely per model to avoid conflicts. This enables direct comparison of model performance on the full 80-task suite.
Cleaner architecture: - terminal-bench.yml: Reusable workflow (workflow_call + workflow_dispatch) - nightly-terminal-bench.yml: Scheduled runner with matrix strategy Benefits: - Main workflow stays simple for manual use - Nightly schedule logic isolated in dedicated file - Easy to add more models to nightly runs - Manual workflow_dispatch supports model/thinking overrides Nightly runs both models at midnight UTC: - anthropic:claude-sonnet-4-5 (high thinking) - openai:gpt-5-codex (high thinking)
Enables cmux to work directly in provided directories without requiring git worktrees. This is essential for terminal-bench integration and agentSessionCli usage.
Problem
Terminal-bench harness (and agentSessionCli) need to work in arbitrary directories like
/appin benchmark containers. Previously, cmux assumed all workspaces were git worktrees under~/.cmux/src/<project>/<branch>, causing systematic failures:Solution
Detect "in-place" workspaces (directories not under srcBaseDir) and store them directly without worktree reconstruction. Uses a simple sentinel:
projectPath === nameindicates in-place mode.agentSession.ts: When
workspacePathis outside~/.cmux/src/, store it directly by setting bothprojectPathandnameto the absolute path.aiService.ts: Check for in-place mode (
projectPath === name) and use the path directly instead of callingruntime.getWorkspacePath().streamManager.ts: Fixed cleanup safetyβrun
rm -rffrom parent directory instead of/to limit blast radius if path is malformed.Testing
Ran terminal-bench harness with multiple tasks:
/appdirectoryGenerated with
cmux