Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@spiceoogway
Copy link
Contributor

Summary

This PR improves subagent error messages by categorizing errors and providing actionable hints for common failure scenarios.

Changes

1. Enhanced Error Types

  • Extended SubagentRunOutcome to include errorType and errorHint fields
  • Error types: model, tool, network, config, timeout, unknown

2. Error Categorization

  • Added categorizeError() helper that identifies common error patterns:
    • File system errors: ENOENT, EACCES, EISDIR, etc.
    • API/model errors: Rate limits (429), auth failures (401), invalid requests (400), service errors (500/503)
    • Network errors: Connection refused, DNS failures, network unreachable
    • Timeout errors: Various timeout scenarios
    • Configuration errors: Missing credentials, quota limits

3. Improved Error Emission

  • Updated agent-runner-execution.ts to categorize errors before emitting lifecycle events
  • Error metadata now includes type and hint for remediation

4. Better User-Facing Messages

  • Added buildErrorStatusLabel() to create descriptive error messages
  • Announcements now include error type context and remediation hints

Examples

Before:

A background task "research-task" just failed: unknown error.

After:

A background task "research-task" just failed (tool error): ENOENT — File or directory not found.

More examples:

  • failed (API error): Rate limit exceeded — Rate limit exceeded - retry in a few moments
  • failed (configuration error): Missing API key — Check API credentials and permissions
  • failed (network error): ECONNREFUSED — Connection failed - check network connectivity

Impact

  • Better DX: Developers can quickly understand what went wrong
  • Faster debugging: Error type and hints guide towards solutions
  • Backward compatible: Existing error handling continues to work
  • Low risk: Changes are additive; no existing behavior modified

Testing

  • ✅ Build passes
  • ⏳ Existing tests pass (some unrelated memory-lancedb failures)
  • Manual testing recommended with real subagent failures

Related

Research document: ~/clawd/memory/research/2026-01-30-subagent-errors.md

@moltbot-barnacle moltbot-barnacle bot added the agents Agent runtime and tooling label Jan 30, 2026
@spiceoogway
Copy link
Contributor Author

CI failures are all pre-existing on main — none caused by this PR:

  • install-smoke — Docker smoke test infra issue
  • formatsrc/commands/onboard-helpers.ts formatting (exists on main)
  • testsrc/config/paths.test.ts:52 expects length 1 but gets 16 (exists on main)

…d Opus

Resolves openclaw#4315. The slug-generator embedded run was hardcoded to use
DEFAULT_MODEL (claude-opus-4-5) regardless of the user's configured
agents.defaults.model.primary. This caused unexpected Opus charges on
every /new command.

Now uses resolveDefaultModelForAgent() to honor the user's configured
default model, falling back to DEFAULT_MODEL only when no config exists.
@moltbot-barnacle moltbot-barnacle bot added the commands Command implementations label Jan 30, 2026
…ixes openclaw#4367)

The message processing pipeline had a synchronization bug where
limitHistoryTurns() truncated conversation history AFTER
repairToolUseResultPairing() had already fixed tool_use/tool_result
pairings. This could split assistant messages (with tool_use) from
their corresponding tool_result blocks, creating orphaned tool_result
blocks that the Anthropic API rejects.

This fix calls sanitizeToolUseResultPairing() AFTER limitHistoryTurns()
to repair any pairings broken by truncation, ensuring the transcript
remains valid before being sent to the LLM API.

Changes:
- Added import for sanitizeToolUseResultPairing from session-transcript-repair.js
- Call sanitizeToolUseResultPairing() on the limited message array
- Updated variable name from 'limited' to 'repaired' for clarity
- Enhanced SubagentRunOutcome type with errorType and errorHint fields
- Added categorizeError() helper to classify common error patterns:
  * File system errors (ENOENT, EACCES, etc.)
  * API/model errors (rate limits, auth failures, invalid requests)
  * Network errors (connection refused, DNS failures)
  * Timeout errors
  * Configuration errors (missing credentials, quota limits)
- Updated error emission in agent-runner-execution.ts to categorize errors
- Updated subagent-registry.ts to capture and propagate new error fields
- Added buildErrorStatusLabel() helper for user-friendly error messages
- Error announcements now include error type and remediation hints

Example improved messages:
- Before: 'failed: unknown error'
- After: 'failed (tool error): ENOENT — File or directory not found'

This makes subagent failures much easier to understand and debug while
maintaining backward compatibility.
- Tests timeout errors (timeout keyword, 'timed out', ETIMEDOUT)
- Tests authentication errors (401, unauthorized, 403, forbidden)
- Tests rate limit errors (rate limit keyword, HTTP 429)
- Tests unknown/unrecognized errors
- Tests API/model errors (400, 500, 503, quota, billing)
- Tests network errors (ECONNREFUSED, ENOTFOUND, DNS, ENETUNREACH)
- Tests file system errors (ENOENT, EACCES, EPERM, EISDIR)
- Tests configuration errors (missing key/token)
- Tests context/memory errors
- Export categorizeError() function for testing
- 37 passing tests covering all error categorization paths
@spiceoogway spiceoogway force-pushed the improve/subagent-error-messages branch from c713a8b to a39811c Compare January 30, 2026 07:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agents Agent runtime and tooling commands Command implementations

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant