Improve subagent error messages with categorization and hints #4395

spiceoogway · 2026-01-30T05:26:44Z

Summary

This PR improves subagent error messages by categorizing errors and providing actionable hints for common failure scenarios.

Changes

1. Enhanced Error Types

Extended SubagentRunOutcome to include errorType and errorHint fields
Error types: model, tool, network, config, timeout, unknown

2. Error Categorization

Added categorizeError() helper that identifies common error patterns:
- File system errors: ENOENT, EACCES, EISDIR, etc.
- API/model errors: Rate limits (429), auth failures (401), invalid requests (400), service errors (500/503)
- Network errors: Connection refused, DNS failures, network unreachable
- Timeout errors: Various timeout scenarios
- Configuration errors: Missing credentials, quota limits

3. Improved Error Emission

Updated agent-runner-execution.ts to categorize errors before emitting lifecycle events
Error metadata now includes type and hint for remediation

4. Better User-Facing Messages

Added buildErrorStatusLabel() to create descriptive error messages
Announcements now include error type context and remediation hints

Examples

Before:

A background task "research-task" just failed: unknown error.

After:

A background task "research-task" just failed (tool error): ENOENT — File or directory not found.

More examples:

failed (API error): Rate limit exceeded — Rate limit exceeded - retry in a few moments
failed (configuration error): Missing API key — Check API credentials and permissions
failed (network error): ECONNREFUSED — Connection failed - check network connectivity

Impact

Better DX: Developers can quickly understand what went wrong
Faster debugging: Error type and hints guide towards solutions
Backward compatible: Existing error handling continues to work
Low risk: Changes are additive; no existing behavior modified

Testing

✅ Build passes
⏳ Existing tests pass (some unrelated memory-lancedb failures)
Manual testing recommended with real subagent failures

…d Opus Resolves openclaw#4315. The slug-generator embedded run was hardcoded to use DEFAULT_MODEL (claude-opus-4-5) regardless of the user's configured agents.defaults.model.primary. This caused unexpected Opus charges on every /new command. Now uses resolveDefaultModelForAgent() to honor the user's configured default model, falling back to DEFAULT_MODEL only when no config exists.

…ixes openclaw#4367) The message processing pipeline had a synchronization bug where limitHistoryTurns() truncated conversation history AFTER repairToolUseResultPairing() had already fixed tool_use/tool_result pairings. This could split assistant messages (with tool_use) from their corresponding tool_result blocks, creating orphaned tool_result blocks that the Anthropic API rejects. This fix calls sanitizeToolUseResultPairing() AFTER limitHistoryTurns() to repair any pairings broken by truncation, ensuring the transcript remains valid before being sent to the LLM API. Changes: - Added import for sanitizeToolUseResultPairing from session-transcript-repair.js - Call sanitizeToolUseResultPairing() on the limited message array - Updated variable name from 'limited' to 'repaired' for clarity

- Enhanced SubagentRunOutcome type with errorType and errorHint fields - Added categorizeError() helper to classify common error patterns: * File system errors (ENOENT, EACCES, etc.) * API/model errors (rate limits, auth failures, invalid requests) * Network errors (connection refused, DNS failures) * Timeout errors * Configuration errors (missing credentials, quota limits) - Updated error emission in agent-runner-execution.ts to categorize errors - Updated subagent-registry.ts to capture and propagate new error fields - Added buildErrorStatusLabel() helper for user-friendly error messages - Error announcements now include error type and remediation hints Example improved messages: - Before: 'failed: unknown error' - After: 'failed (tool error): ENOENT — File or directory not found' This makes subagent failures much easier to understand and debug while maintaining backward compatibility.

- Tests timeout errors (timeout keyword, 'timed out', ETIMEDOUT) - Tests authentication errors (401, unauthorized, 403, forbidden) - Tests rate limit errors (rate limit keyword, HTTP 429) - Tests unknown/unrecognized errors - Tests API/model errors (400, 500, 503, quota, billing) - Tests network errors (ECONNREFUSED, ENOTFOUND, DNS, ENETUNREACH) - Tests file system errors (ENOENT, EACCES, EPERM, EISDIR) - Tests configuration errors (missing key/token) - Tests context/memory errors - Export categorizeError() function for testing - 37 passing tests covering all error categorization paths

moltbot-barnacle bot added the agents Agent runtime and tooling label Jan 30, 2026

spiceoogway added 2 commits January 30, 2026 01:01

fix: apply format to onboard-helpers.ts (pre-existing formatting issue)

d89ca4a

moltbot-barnacle bot added the commands Command implementations label Jan 30, 2026

spiceoogway added 3 commits January 30, 2026 02:10

spiceoogway force-pushed the improve/subagent-error-messages branch from c713a8b to a39811c Compare January 30, 2026 07:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve subagent error messages with categorization and hints #4395

Improve subagent error messages with categorization and hints #4395

spiceoogway commented Jan 30, 2026

Uh oh!

spiceoogway commented Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Improve subagent error messages with categorization and hints #4395

Are you sure you want to change the base?

Improve subagent error messages with categorization and hints #4395

Conversation

spiceoogway commented Jan 30, 2026

Summary

Changes

1. Enhanced Error Types

2. Error Categorization

3. Improved Error Emission

4. Better User-Facing Messages

Examples

Impact

Testing

Related

Uh oh!

spiceoogway commented Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant