A Mac-native, Python-first operating framework for embodied AI agents.
JROS is the operating framework for Jaegers — humanoid robots, drones, and digital AI agents that share a single coherent runtime. It provides the nervous system (transport, nodes, topics) and the brain (agent loop, memory, learned skills) so the same agent code runs on an LED-faced drone or a chat-only desktop companion.
Built from real hardware pain, JROS runs on Apple Silicon and Jetson Orin — no Docker, no special OS versions, no dependency hell. One curl line installs the whole stack.
- 🧠 Local-first — runs entirely on-device on an in-process LLM. No cloud account required.
- 🛠️ ~70 built-in tools across 11 toolset categories — files (read / write / edit / search), memory, web, code execution, scheduling, background processes, kanban, delegation. A 20-tool CORE is always visible; the rest are reachable via
describe_tool/load_toolsetwhen scoping is enabled. - 📋 Kanban task board — the agent plans multi-step work as cards; Deep Think jobs live on the same board.
/boardto view it. - 📚 Self-authored skills — the agent researches, writes, smoke-tests, benchmarks, and versions its own skills.
- 🖥️ Computer use — the flagship skill: drive any macOS app through the accessibility tree (see the screen, click, type, work menus).
- 🌙 Deep Think — an idle "deep sleep" mode that swaps to a heavier coder model and drains a skill-development queue.
- 🔌 Model-agnostic — opt into LM Studio, an OpenAI-compatible endpoint, or Anthropic Claude. Local stays the default.
- 🔒 6-tier permission ladder — every tool is gated; high-risk actions are confirmation-prompted and audit-logged.
- 🤖 Embodiment-ready — the body contract and the capability-gated skill loader are already in place for hardware.
Status —
0.3.0released. Voice pipeline rebuild + skill system v3 + persona prefill. The 0.2.x in-process Rich TUI stays as the operator surface; the 0.3.0 work layers underneath it:
- Persistent TTS output stream — one long-lived OutputStream opens at warm time, stays alive for the session. Two backends, config-toggled via
config.voice.audio_backend:
sounddevice(default) — PortAudio, with the output device resolved LIVE via CoreAudio so it follows Settings → Sound.avaudio— PyObjC AVAudioEngine, directscheduleBuffer:completionHandler:(no PortAudio in the loop).- Skill system v3 — unified
jros.skill/v3manifest schema (id, version, origin, package, runtime, domains, embodiment, permissions, capabilities with per-capability scoring bands + levels, dependencies, artifacts, entrypoint, body, provenance). Capability state persists in<instance>/capabilities/; promotion /demotion rules update the live band the router consults.- Persona prefill — wizard-time YAML templates in
jaeger_os/personas/prefillidentity.yaml+soul.mdwhen a new instance is created. Zero runtime cost on existing instances.- Whisper STT hardening —
is_non_speech_marker()suppresses[BLANK_AUDIO]/(beep)/[music]in follow-up + no-wake-word modes. Optional AEC plumbing on_MicStream.- Gemma 4 12B-it Q4 added to the model registry; promoted to the 24 GB tier asleep pick (Mac Mini sweet spot — leaderboard #1 at 94.9 % routing on the 2026-06-04 bench).
./launch— sandbox launcher with a real-verification boot scroll (every row a check the launcher actually performs against the instance bundle). Housekeeping flags:--status,--stop,--restart,--reset-audio,--clean-logs,--health.See
CHANGELOG.mdfor the full entry and the explicit "Skipped from the upstream 0.3.0 plan" list (the Swift desktop app + the daemon-attachedrich_tuisurface stay in tree as archived code, not wired into install or run).
┌────────────────────────────────────────────────────────────┐
│ AGENT (BRAIN) │
│ perceive → plan → act + memory + skills │
│ one loop per Jaeger body │
└──────────────────────────┬─────────────────────────────────┘
│ invokes
┌──────────────────────────▼─────────────────────────────────┐
│ NODES (NERVOUS SYSTEM) │
│ tts │ stt │ llm │ vision │ motors │ leds │ mcu_serial │
│ pluggable, hot-swappable, transport-agnostic │
└──────────────────────────────────────────────────────────────┘
Nodes are processes that do one thing — capture audio, run TTS, drive servos, talk to a Teensy. They speak over standardized topics (ZMQ + UDP).
Agents are the brain. They subscribe to perception topics, reason with an LLM, look up memories, plan an action sequence, and dispatch it to nodes.
A Jaeger is the union of one agent loop and a configured set of nodes.
- Python 3.11 or 3.12 (not 3.13 yet — some native deps lack 3.13 wheels).
- A C/C++ toolchain —
llama-cpp-pythonandpywhispercppbuild native code. macOS:xcode-select --install. Debian/Ubuntu:sudo apt install build-essential. - PortAudio — for microphone / speaker I/O. macOS:
brew install portaudio. Debian/Ubuntu:sudo apt install portaudio19-dev.
One-line install — clones JROS to ~/jaeger, sets up a venv,
installs the full runtime, scaffolds ~/.jaeger/:
curl -fsSL https://raw.githubusercontent.com/JenkinsRobotics/JROS/master/scripts/install.sh | bashThen:
cd ~/jaeger
./run.sh setup # create your first agent (wizard: memory tier,
# model choice, voice). Default name is auto-picked.
./run.sh # launch the default agentOr scaffold a named agent:
./run.sh setup lilith # create "lilith" via the wizard
./run.sh --instance lilith # launch "lilith"Manage multiple agents:
./run.sh list # show every installed agent
./run.sh delete eren # remove "eren" (asks you to type the name)
./run.sh help # full subcommand cheatsheetThat's the whole flow. The single install pulls the entire runtime — local LLM, Kokoro TTS, Whisper STT, vision, the external-model pipeline, messaging bridges. Nothing is left behind an extra. A GGUF model is fetched from Hugging Face on first run, and nothing else phones home.
Upgrades — same one-line, idempotent:
cd ~/jaeger && git pull && ./install.shOr re-run the curl command — it detects an existing clone and just pulls + re-installs.
Pinning a release — for reproducible installs:
JAEGER_REF=0.3.0 curl -fsSL \
https://raw.githubusercontent.com/JenkinsRobotics/JROS/0.3.0/scripts/install.sh | bashCustom install location — override with an env var:
JAEGER_HOME=/opt/jaeger curl -fsSL \
https://raw.githubusercontent.com/JenkinsRobotics/JROS/master/scripts/install.sh | bashWhere everything lives — two clear buckets, side by side:
| Layer | Path | What |
|---|---|---|
| Framework | ~/jaeger/jaeger_os/ |
The code (git-tracked, upgraded by git pull) |
| Operator state | ~/jaeger/.jaeger_os/instances/<name>/ |
Each agent's persona, config, memory, skills, prompts, workspace, logs, credentials — one folder per agent |
The two sibling dirs at the install root make the framework / operator
split obvious. Operator state is fully .gitignored; framework upgrades
never touch it. See dev_docs/architecture/system_runtime_user.md
for the design rationale.
Manual install (no curl) — if you'd rather see every step:
git clone https://github.com/JenkinsRobotics/JROS.git ~/jaeger
cd ~/jaeger
./install.sh
./run.sh setup # create your first agent
./run.sh # launch itThe 0.2.x in-process Rich TUI is still the operator surface in 0.3.0. Pick whichever launch path matches what you're doing.
Run a named agent — the production-flow:
./run.sh --instance lilith # in-process TUI for the 'lilith' instance
./run.sh --instance lilith --no-voice # text-only (no mic, no Kokoro warm)Sandbox dev loop — for working on JROS itself. The ./launch
wrapper boots the in-repo sandbox at sandbox/.jaeger_os/instances/jros-dev/
with a real-verification boot scroll, then hands the terminal to the
TUI. Daily flags:
./launch # boot the sandbox TUI
./launch --status # what's running across modes
./launch --stop # kill a lingering TUI singleton
./launch --restart # stop, then boot
./launch --health # preflight checks and exit
./launch --reset-audio # sudo killall coreaudiod
./launch --clean-logs # truncate <instance>/run/jaeger.log
./launch --no-voice # tell the TUI to skip voice startupThe ./launch boot scroll runs every check before handing off:
sandbox bundle, library import resolution, instance manifest schema,
GGUF model on disk, AVAudioEngine bridge import, Whisper assets,
Kokoro package, skill registry walk, TUI module import. A red row
stops boot with the actual reason.
Pick the audio backend — 0.3.0 ships two persistent-stream backends
for TTS. Configure once in your instance's config.yaml:
voice:
audio_backend: sounddevice # PortAudio (default; macOS Settings-default device live-resolved)
# or:
# audio_backend: avaudio # PyObjC AVAudioEngine, direct AVAudioPlayerNode schedulingOr override per-run without editing the config:
JAEGER_AUDIO_BACKEND=avaudio ./launch
JAEGER_AUDIO_OUTPUT="Mac Studio Speakers" ./launch # sounddevice device overrideBench against the model registry leaderboard — runs the full
59-case flat bench and updates dev_benchmark/HISTORY.md:
./dev_benchmark/run_flat_bench.py # full corpus
./dev_benchmark/run_flat_bench.py --limit 5 # 5-case smoke
./dev_benchmark/run_flat_bench.py --tags routing,multistepThe 2026-06-04 leaderboard row for gemma-4-26B-A4B-it-Q4_K_M is
55/59 (93 %) at permissions.mode=allow; bench history is
regenerated on every run.
0.3.0 ships the brain. 0.4.0 wires the spine — the node-based embodied architecture that turns JROS from a Mac-side agent into a robot operating framework that drives JP01-class hardware.
The position no one else owns:
JROS = ROS + Agentic AI + Mac-first local hardware. One developer, one Mac, one robot. Local LLM thinks; dedicated hardware nodes do the perception and action. Same code laptop or fleet, no Docker, no cloud.
┌──────────────────────────────────────┐
│ BRAIN NODE (Mac) │
│ │
│ LLM (Gemma) + agent loop │
│ In-process: tools, memory, skills, │
│ permissions, persona │
│ │
│ Tools = networking shims: │
│ text_to_speech → /act/speech │
│ listen → /sense/transcript│
│ vision_analyze → /sense/vision_analysis │
│ computer_use → /act/motion etc. │
└────────────┬──────────────────────────┘
│ ZMQ pub/sub (or inproc in monolith mode)
┌─────────────────┼──────────────────┐
│ │ │
┌────────▼─────┐ ┌───────▼──────┐ ┌───────▼──────┐
│ audio_in │ │ audio_out │ │ vision │
│ (Mac mic) │ │ (Mac spk) │ │ (Jetson) │
└──────┬───────┘ └──────▲───────┘ └──────────────┘
│ │
┌──────▼───────┐ ┌─────┴────────┐
│ stt │ │ tts │ ← own nodes, backend-swappable
│ (Whisper) │ │ (Kokoro) │ (tomorrow: MLX-TTS, NeuTTS,
└──────────────┘ └──────────────┘ Mistral Voxtral STT, …)
│
▼ /sense/transcript ┌─────────────────────────────────┐
│ Canonical topic namespaces │
│ /sense/audio_in binary mic │
│ /sense/transcript STT text │
│ /sense/camera_frame raw frames│
│ /sense/vision_analysis scene │
│ /sense/proprio encoder+IMU│
│ /act/speech text→TTS │
│ /act/audio_out binary spk │
│ /act/motion motor cmd │
│ /act/light LED cmd │
└─────────────────────────────────┘
┌─────────────────────────────────┐
│ │
┌────────▼─────────┐ ┌────────▼────────────┐
│ motor_ctrl │ │ led_ctrl │
│ (ESP32, MC01) │ │ (Teensy, AVC01) │
└──────────────────┘ └─────────────────────┘
Key architectural decisions (locked 2026-06-06):
- One brain process, N hardware-bound peripheral nodes. Not one-node-per-tool — that's the ROS 2 mistake (extreme granularity). The brain's tools, memory, and skill registry stay in-process for sub-microsecond function-call latency.
- STT and TTS get their own nodes. Voice pipelines evolve; today's Kokoro becomes tomorrow's MLX-TTS without touching the brain. Same topic contract, swap the subscriber.
- Tool ↔ node contract — "A tool does the networking, the
node does the execution." The agent's tool signatures
(
text_to_speech("hi"),listen(seconds=5)) stay identical. What changes is the implementation: in-process call becomesbus.publish("/act/speech", …)+ correlation-ID wait for the/sense/spokenack. - The brain doesn't know where its peripherals run. Same
code laptop or fleet — only the transport changes (
inproc://→tcp://when nodes move across boards).
See dev_docs/ROADMAP_0.4.md for the
full track breakdown.
| ROS 2 | Hermes / agent frameworks | JROS | |
|---|---|---|---|
| Embodied robotics | ✅ industry standard | ❌ doesn't think about bodies | ✅ Mac → Jetson → Teensy → ESP32 first-class |
| Local LLM agent | ❌ no agent layer | ❌ assumes cloud | ✅ Gemma local, no internet needed |
| Mac-native dev | ❌ Linux + Docker | ✅ runs on Mac | ✅ Mac-first since 0.2 |
| Transport weight | ❌ DDS (~2 GB install) | n/a (single process) | ✅ ZMQ (50 KB) |
| Learning curve | hard | easy | medium — one Python file per node |
| One-Mac development | painful | easy | ✅ monolithic mode = same code, no IPC |
| Multi-board production | ✅ designed for it | ❌ no | ✅ flip a config flag |
| Operator UX out of the box | ❌ build your own | n/a | ✅ TUI + (Track F) web inspector |
| Crash isolation per subsystem | ✅ best | ❌ none | ✅ per node when split |
The pitch in one line: the only framework where a local LLM agent thinks and a dedicated set of hardware nodes act — designed for one developer driving one robot from a Mac.
| Jaeger | Form | Role |
|---|---|---|
| Lilith | Digital — local LLM with adjustable personality, runs on Mac | First JROS-native agent — proves the jaeger-os agent layer before JP01 inherits it. |
| JP01 | Drone — Mac + Jetson + Teensy + ESP32 + LED panel + servos + cameras + mics | First hardware Jaeger — inherits Lilith's agent unchanged, adds hardware middleware. |
The strategy is agent first, body second: Lilith proves the brain in software, then JP01 puts a body around the same brain. A new Jaeger is a config file plus a logic node — not a fork of the runtime.
JROS/ ← clone goes here (default ~/jaeger)
├── install.sh ← venv + deps; safe to re-run
├── run.sh ← launcher
├── requirements.txt ← runtime deps (installed into .venv)
├── scripts/install.sh ← curl one-liner target (user-facing)
├── jaeger_os/ ← framework code (git-tracked)
│ ├── run.py, main.py ← entry points
│ ├── core/, plugins/, skills/, prompts/, assets/, daemon/, interfaces/
│ ├── migrations/ ← per-version migration scripts
│ └── models/ ← downloaded GGUF weights (gitignored except README)
├── .jaeger_os/ ← operator state (gitignored)
│ ├── instances/<name>/ ← each agent's full state, one folder per agent
│ ├── models/ ← shared model cache
│ ├── backups/ ← `./run.sh backup` output
│ └── jaeger.env ← sourceable instance pin
├── dev_docs/ ← architecture + design notes
├── dev_tests/ ← framework test suite
├── dev_benchmark/ ← bench corpus + sweep + sanity probe
├── dev_scripts/ ← dev_env.sh, run_tests.sh, generators
├── sandbox/ ← in-repo isolated test install (gitignored)
├── pyproject.toml ← pytest + ruff config
├── README.md, CHANGELOG.md, LICENSE
└── .git/, .gitignore
Two clear buckets at the install root: jaeger_os/ (framework, owned
by upstream) and .jaeger_os/ (operator state, gitignored). git pull
only touches the first; instance memory/logs/credentials survive every
upgrade.
JROS ships two complementary benches under benchmark/ for picking
which local model to run:
run_flat_bench.py+run_model_sweep.py— task benchmark. Runs the 59-case corpus (routing, multistep, recovery, multi-turn, context, safety, hallucination, cross-turn) per model and writes per-run rows + summary underbenchmark/flat/<model>/<ts>/. The sweep auto-runs hybrid thinking models (Qwen3.x, gemma-4) in BOTH modes — once with thinking ON, once OFF — so the leaderboard shows the deep-think vs direct-mode tradeoff side-by-side.run_model_sanity.py— hardware-health benchmark, separate from task accuracy. Per model: GPU layer offload + Metal/CPU buffer split (did it fully fit?), raw tok/s on a fixed prompt (compare a 35B-A3B and a 9B on generation speed alone), and for hybrid models the think vs direct token-count and wall-clock so you can see what reasoning mode actually costs per query.
Useful env knobs (all bench-scoped, default off):
JAEGER_BENCH_THINKING=auto|on|off— force hybrid models into a specific mode for a run (cloud-style toggle, same as Claude / GPT-o1 / Gemini'sthinkingflag).JAEGER_BENCH_MODEL_TIMEOUT=<seconds>— per-model wall-clock cap for the sweep (default3600).JAEGER_BENCH_STALL_S=<seconds>— per-call stall watchdog (default120; reasoning models still get bumped to a300sfloor).
Results aggregate into benchmark/HISTORY.md — leaderboard with a
weighted Score column (tools / real-time / context / multi-turn /
safety), per-category counts, and safety as a hard gate (any safety
failure → DQ regardless of other scores).
| Doc | What |
|---|---|
dev_docs/setup.md |
Canonical install, upgrade, and uninstall guide |
dev_docs/architecture/system_runtime_user.md |
Three-layer architecture — System / Runtime / User |
dev_docs/external_models.md |
Running the agent on LM Studio / OpenAI / Anthropic Claude |
dev_docs/deep_think_design.md |
Deep Think — the idle skill-development mode |
dev_docs/marketplace_spec.md |
The skill marketplace |
dev_docs/physical_skills_status.md |
Where embodiment / physical skills stand |
dev_docs/kanban_design.md |
The kanban task board |
dev_docs/hermes_tool_parity.md |
Tool-surface audit vs. Hermes Agent |
The full JROS spec — architecture, transport, the node standard, the agent
and skill systems — continues to land under dev_docs/.
- 0.1 — Agent layer. Local-first agent, 54 tools, self-authored skills,
the
computer_useskill, the kanban task board, Deep Think, the external-model pipeline. ✅ shipped - 0.2 — Node standard. ZMQ + UDP transport, the node/plugin contract, the first hardware nodes.
- 0.3 — Lilith. The first JROS-native digital Jaeger.
- 0.4 — JP01. The first hardware Jaeger — same brain, a body around it.
Built in the open by Jenkins Robotics.
Follow — Discord · YouTube · Instagram · Facebook
Apache-2.0 © Jenkins Robotics