Thanks to visit codestin.com
Credit goes to www.onorca.dev

Computer use

Drive local desktop apps from an agent via accessibility trees, screenshots, and safe UI actions.

The orca computer CLI lets an agent inspect and control native desktop apps — list running apps, read accessibility trees, click controls, set values, type text, scroll, and take screenshots. Use it when a task needs to operate the OS or a third-party app rather than a terminal or the built-in browser.

First-time setup

Check the runtime and permissions:

orca status --json
orca computer permissions --json
orca computer capabilities --json

If permissions reports anything missing, grant Accessibility (and Screen Recording on macOS) to Orca Computer Use in System Settings, then re-run permissions --json to confirm.

Snapshot → act → snapshot

Every interaction follows the same loop: read the app's current state, act on a specific element, then re-read state to verify the result.

orca computer list-apps --json
orca computer get-app-state --app com.spotify.client --json
orca computer click --app com.spotify.client --element-index 42 --json

Element indexes are scoped to the latest get-app-state result. Refresh state after navigation, focus changes, scrolling, or any app re-render before reusing an index.

Selecting an app

Prefer bundle IDs returned by list-apps:

orca computer get-app-state --app com.microsoft.edgemac --json

App names work when unambiguous (--app Spotify). Use --app pid:<number> only when bundle ID and name both collide.

Available actions

orca computer click --app <app> --element-index <i> --json
orca computer set-value --app <app> --element-index <i> --value "text" --json
orca computer type-text --app <app> --text "text" --json
orca computer press-key --app <app> --key Return --json
orca computer hotkey --app <app> --key CmdOrCtrl+A --json
orca computer paste-text --app <app> --text "text" --json
orca computer scroll --app <app> --element-index <i> --direction down --json
orca computer drag --app <app> --from-x 100 --from-y 100 --to-x 300 --to-y 300 --json
orca computer perform-secondary-action --app <app> --element-index <i> --action <name> --json

Prefer semantic actions (click, set-value, perform-secondary-action) over raw type-text or press-key — they target accessibility elements directly and survive focus changes that keyboard input doesn't.

Sensitive input

Pass secrets through stdin so they don't land in shell history:

printf '%s' "$TEXT" | orca computer set-value \
  --app com.apple.Safari --element-index 7 --value-stdin --json

--text-stdin works the same way for type-text and paste-text.

Screenshots

get-app-state returns an accessibility tree and, by default, a screenshot. With --json, the image bytes are written to disk and the path is returned in screenshot.path rather than embedded in the response. Pass --no-screenshot when pixels aren't needed (faster, smaller payload). Pass --restore-window to bring a hidden or minimized window into view before capture.

Use it from an agent

The shipped computer-use skill packages the same command surface with safety guidance. Install it into the agent's skill directory:

npx skills add https://github.com/stablyai/orca --skill computer-use

See Skills registry & MCP for how skills are picked up.

Next steps