Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Monitor and wake agent sessions from the CLI (sessions watch / wake / list) #219

Description

@germanescobar

Problem

Today controller sessions start <project> --worktree <id> --message <text> (#190) is a fire-and-forget primitive. The CLI kicks off the agent run, prints { sessionId, url }, and exits. There is no symmetric way for the same agent (or any other agent) to:

  1. Observe what another session is doing — is it running? which provider? what events has it emitted?
  2. Wait for a long-running run to finish without holding a turn open for the entire duration.
  3. Wake a session with a follow-up message at a later point, including arbitrarily in the future.

The pieces are already wired on the server:

  • The runtime map in server/lib/session-runtime.ts already tracks active, provider, projectId, worktreeId, plus the child process and pending approvals. It is exposed via GET /api/runtimes (bulk) and GET /api/projects/:id/sessions/:sessionId/runtime (per-session). The React UI already consumes both.
  • Events persist to <orchestratorHome>/projects/<name>-<hash>/events/<sessionId>.jsonl (server/lib/sessions.ts) and are read back via GET /api/projects/:id/sessions/:sessionId/events. The headless advanceSessionQueue flow (Message enqueuing + steering for Claude and Ada (unify with Codex) #113) already replays queued messages on a clean run completion without any client attached.
  • The per-session message queue (session-queue.ts) is already CRUD-exposed at /api/projects/:id/sessions/:sessionId/queue[/messageId].

What is missing is the CLI surface that lets an agent (or another script) reach all of this. Today the only session-aware CLI surface is sessions start, so an agent can spawn work but cannot supervise it.

Concrete motivation — "run a half-hour script, then keep going"

A common pattern we would like to support:

# Kick off a long-running build.
controller sessions start coding-orchestrator \
  --worktree <w> --message "Run ./big-build.sh and summarize the failures"

# Time passes. The agent's own turn has long since ended, or it is now
# doing other work in a different session.
controller sessions watch coding-orchestrator <sessionId> --until terminal
#   blocks until run.completed / run.failed / run.cancelled, then prints
#   a one-line summary + exit code.

# Or wake it later (from the same or a different agent) with the next step.
controller sessions wake coding-orchestrator <sessionId> --message "Build is done; now deploy."

The wake is the same primitive as the existing in-UI "send while running" path — POST /api/projects/:id/sessions/:sessionId/queue + advanceSessionQueue — so this does not introduce a new execution model. It just makes the existing one reachable from the CLI.

Proposed surfaces

All under the existing controller CLI (cli/controller) so they live next to sessions start, mirror its argument style, and inherit the existing CONTROLLER_SERVER_URL resolution + project/worktree resolvers from #190.

  1. controller sessions list <project> [--worktree <id>] [--include-archived]
    Wraps GET /api/projects/:id/sessions. The server already returns SessionSummary[] (metadata without message history, see getSessionSummaries). Print id, title, status, provider, lastActiveAt. Lets an agent enumerate its own past work or check what is running.

  2. controller sessions status <project> <sessionId> (optional, can fold into runtime)
    Wraps GET /api/projects/:id/sessions/:sessionId/runtime. Prints the runtime snapshot: active, provider, projectId, worktreeId. Quick "is it still running?" probe.

  3. controller sessions watch <project> <sessionId> (the headline new surface)
    Two modes:

    • --until terminal (default): long-poll/SSE on a new GET /api/projects/:id/sessions/:sessionId/wait route. The server watches the runtime map (the same data advanceSessionQueue already uses) and resolves when run.completed / run.failed / run.cancelled lands for that session, or when the child process exits. Prints a one-line summary + the exit code and exits with the same exit code (so agents can if ! controller sessions watch ...).
    • --tail [N]: prints the last N events (default 20) and exits. Lets an agent quickly catch up after re-entering a session, or after coming back from a delay. Wraps GET /api/projects/:id/sessions/:sessionId/events and uses the existing dedupeUserMessageEvents from routes/sessions.ts.
  4. controller sessions wake <project> <sessionId> --message <text>
    Wraps POST /api/projects/:id/sessions/:sessionId/queue. Writes a QueuedMessage (session-queue.ts) using the same { text, provider, model, mode, ... } shape that the UI queue uses, so the existing advanceSessionQueue picks it up unchanged on clean completion. Resolves to the new messageId and exits.

  5. controller sessions wake <project> <sessionId> --message <text> --delay <duration> (follow-up; depends on 4)
    Writes a { runAt: <ISO>, ... } envelope instead of a ready-to-replay item. The server queue-advance loop checks for due items when it wakes for any reason (e.g. an existing run finishing, or a lightweight setTimeout set at insert time). This is the literal "wake me in 30 min" primitive. Should be filed separately from 1–4 if we want to keep the first PR reviewable.

Why this fits the current architecture

  • The runtime map and event log are already the source of truth for the React UI sidebar and SessionView. The CLI surfaces only add a read/write path; they do not add new state.
  • advanceSessionQueue is already server-driven and client-independent — a headless wake (no SSE client attached) drains the queue to completion. So sessions wake from another shell, cron, or agent works without any UI open.
  • The CLI install path and project/worktree resolution are already abstracted in cli/controller (controllerCliInstalledPath, resolveProjectId from Expose worktree + session start to agents (worktree the conversation, then start working on issue X) #190). Adding surfaces is parseX(argv) + runX(argv, serverUrl) + a server route or wrapper.
  • The agent preamble (server/lib/agent-preamble.ts) and the agent system prompt already document the absolute CLI install path, so agents will discover these new subcommands via the same channel.

Non-goals

  • Not adding a generic scheduler / cron system. --delay is just a deferred-enqueue, not a recurring job.
  • Not exposing the child process handle or letting the agent kill a sibling session. controller sessions stop can be a separate surface (and POST /sessions/:id/stop already exists on the server).
  • Not changing the runtime map itself. The map is server-internal; the CLI surfaces read snapshots, not state.

Open questions

  • Should sessions watch --until terminal use SSE on the server, or short-poll GET /runtime + GET /events? SSE keeps the cost on the server (push when terminal lands); polling is simpler but chattier. SSE matches the existing pattern (every other live surface uses SSE), but the server route does not exist yet — would need a new endpoint, or reuse the existing /events SSE if we add a ?wait=terminal mode.
  • Should sessions wake deduplicate identical follow-ups? The existing queue is just a list, so two identical --messages will both replay.
  • For --delay, do we need the server to actively poll due items even when no run is active (i.e. across a full idle period), or is "deliver on the next natural run" acceptable? An idle server with no session running is the case where the agent is most likely to want a wake.

Acceptance criteria

  • controller sessions list <project> prints the session list with status + provider.
  • controller sessions watch <project> <sessionId> --until terminal blocks until the run terminates and exits with the run's exit code.
  • controller sessions watch <project> <sessionId> --tail prints the last N events.
  • controller sessions wake <project> <sessionId> --message <text> enqueues a follow-up that runs on the existing advanceSessionQueue path; verified by starting session A, kicking off a long tool call, waking with a follow-up from another shell, observing that the follow-up replays headlessly when the first run completes.
  • All surfaces work with the absolute install path (~/coding-orchestrator/bin/controller) and inherit the existing CONTROLLER_SERVER_URL resolution.
  • New server routes (if any) get tests; CLI parsing gets unit tests under cli/__tests__/.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions