Autonomy - Self-Discipline/Self-Harness Agent Focus on Enterprise (In Active Early-stage development)
- Python 3.13+
- Ollama or an API key for a supported OpenAI-compatible provider
Clone the repository, create an isolated Python environment, install this checkout, configure a model provider, then run the TUI:
git clone https://github.com/willweimike/autonomy.git
cd autonomy
python3.13 -m venv .venv
source .venv/bin/activate
python -m pip install -U pip
python -m pip install -e ".[dev]"
autonomy --help
autonomy model setup
autonomy doctor
autonomy tuiThis project builds an AI system around a skill-aware autonomy loop:
goal -> candidates -> scored candidate selection -> execution boundary validation -> one action
-> observation -> outcome evaluation -> agent decision -> learning -> explicit termination
The previous Kernel and Runtime concepts have been retired. Interactive
sessions now flow through ConversationLoop into AgentLoop, while one-shot
autonomy run calls the same loop directly. Actual tool execution is routed
through ActionGateway, so future loops can propose or initiate action while
sharing the same governed execution boundary.
SessionShell -> ConversationLoop -> AgentLoop -> ActionGateway -> ToolRegistry
AgentLoop -> ConversationResponder
autonomy run -> AgentLoop -> ActionGateway -> ToolRegistry
AgentLoop -> OutcomeEvaluator
AgentLoop -> LearningLoop -> CuratorDaemon
The system separates procedure knowledge, executable experience, and per-turn candidates:
Procedure Skill -> planning knowledge from SKILL.md
ActionRecipe -> successful single-action template learned from tool use
CandidatePath -> current-turn candidate, not long-term graph memory
- Python 3.13+
- Ollama or an API key for a supported OpenAI-compatible provider
Clone the repository, create an isolated Python environment, install this checkout, configure a model provider, then run the TUI:
git clone https://github.com/willweimike/autonomy.git
cd autonomy
python3.13 -m venv .venv
source .venv/bin/activate
python -m pip install -U pip
python -m pip install -e ".[dev]"
autonomy --help
autonomy model setup
autonomy doctor
autonomy tuiUse autonomy model setup ollama if you run Ollama locally, or choose one of
the API providers when you have a provider key. Workspace configuration and
secrets are written under <workspace>/.autonomy/.
Browser tools are optional. Install Chromium only if you want website automation:
python -m playwright install chromium
autonomy doctorRun the test suite from the activated environment:
python -m pytestpython3.13 -m autonomy
python3.13 -m autonomy tui
python3.13 -m autonomy tui --workspace . --max-steps 5
python3.13 -m autonomy model setup
python3.13 -m autonomy model setup ollama
python3.13 -m autonomy model setup openai-api
python3.13 -m autonomy model setup nvidia
python3.13 -m autonomy model setup openrouter
python3.13 -m autonomy model setup deepseek
python3.13 -m autonomy --db /tmp/autonomy.db doctor
python3.13 -m autonomy run "Analyze why this project's tests fail" --workspace .
python3.13 -m autonomy inspect RUN_ID
python3.13 -m autonomy recipes list # ActionRecipe commands
python3.13 -m autonomy recipes activate RECIPE_ID
python3.13 -m autonomy recipes disable RECIPE_ID
python3.13 -m autonomy skills list
python3.13 -m autonomy skills install-bundled code-editing process-management systematic-debugging test-driven-development technical-spike api-debugging codebase-documentation requesting-code-review plan writing-plans procedure-skill-authoring browser-navigation website-inspection email-himalaya database-retrieval
python3.13 -m autonomy skills install-clawhub SKILL_SLUG
python3.13 -m autonomy skills install-hermes SKILL_NAME_OR_DOC_URL
python3.13 -m autonomy skills view test-diagnosis
python3.13 -m autonomy skills candidates
python3.13 -m autonomy skills view-candidate CANDIDATE_ID
python3.13 -m autonomy skills approve CANDIDATE_ID
python3.13 -m autonomy skills reject CANDIDATE_ID
python3.13 -m autonomy skills disable SKILL_NAMEAutonomy includes a local Chrome side panel UI for development.
The extension does not execute tools directly. It talks to the local native messaging host, and the host routes requests into:
ConversationLoop(interface="chrome") -> AgentLoop -> ActionGateway -> ToolRegistry
Host commands:
autonomy chrome-host
autonomy-chrome-hostDevelopment setup:
- Install this checkout in your active Python environment.
- Load
chrome-extension/as an unpacked extension in Chrome. - Copy
chrome-extension/native-host.example.jsonto Chrome's native messaging host directory. - Replace
EXTENSION_IDwith the unpacked extension ID. - Replace
pathwith the absolute path to theautonomy-chrome-hostexecutable from your environment.
Native host name:
com.autonomy.app
The native host manifest restricts access to the configured extension origin.
The extension never receives provider API keys or .autonomy/.env content.
Approval prompts default to deny on timeout or disconnect.
The panel status action currently reports host/session count only; it does not expose model/tool status.
Autonomy can run as an owner-only Discord DM bot for mobile chat access. Discord is only a UI adapter:
ConversationLoop(interface="discord") -> AgentLoop -> ActionGateway -> ToolRegistry
Install the optional Discord dependency:
python -m pip install -e ".[discord]"Create a Discord application and bot in the Discord Developer Portal, enable Message Content Intent for the bot, then add the bot token and your Discord user id to the workspace secrets file:
<workspace>/.autonomy/.env
DISCORD_BOT_TOKEN="your-bot-token"
DISCORD_OWNER_ID="123456789012345678"Keep the file mode at 0600. Start the bot from the workspace:
autonomy discord-bot --workspace . --max-steps 12Only DISCORD_OWNER_ID can use the bot. Normal DM messages become Autonomy
prompts. DM commands:
!status
!inspect RUN_ID
!reset
Approval prompts are sent as Discord Allow/Deny buttons. Pending approvals
default to deny on timeout or shutdown. The bot never sends provider API keys
or .autonomy/.env content to Discord.
Autonomy can also run as an owner-only Telegram private chat bot. Telegram is only a UI adapter:
ConversationLoop(interface="telegram") -> AgentLoop -> ActionGateway -> ToolRegistry
Install the optional Telegram dependency:
python -m pip install -e ".[telegram]"Create a bot with BotFather, then add the bot token and your Telegram user id to the workspace secrets file:
<workspace>/.autonomy/.env
TELEGRAM_BOT_TOKEN="your-bot-token"
TELEGRAM_OWNER_ID="123456789"Keep the file mode at 0600. Start the polling bot from the workspace:
autonomy telegram-bot --workspace . --max-steps 12Only TELEGRAM_OWNER_ID can use the bot. v1 ignores group and channel
messages; use a private chat with the bot. Commands:
/status
/inspect RUN_ID
/reset
Approval prompts are sent as Telegram inline Allow/Deny buttons. Pending
approvals default to deny on timeout or shutdown. The bot never sends provider
API keys or .autonomy/.env content to Telegram.
autonomy and autonomy tui start the terminal UI. Natural language input now
flows directly into AgentLoop; there is no pre-agent chat/task classifier
that can stop a task before governance runs. The model can either choose
governed tools or return a direct answer through assistant.respond, which is
still journaled as a low-risk action. Each turn gets a run_id, ActionGateway
authorization, outcome evaluation, and audit trail. The conversation session
keeps the recent transcript and linked run summaries available as context for
follow-up requests. autonomy run "goal" remains available for one-shot tasks
and automation, and uses the same agent loop.
The TUI wraps the same ConversationLoop. It renders a responsive startup banner, a session overview
panel, explicit runtime boundary notes, a compact status rule before each prompt
with turn count, run state, transcript-style response panels, route metadata,
run metadata, an Action trail derived from the run journal, a toggleable
compact/full details mode, a / command palette, and skill review queues while
keeping the same runtime boundaries: the UI never executes tools directly, and
all actions still go through
AgentLoop -> ActionGateway -> ToolRegistry.
Session commands:
/help
/
/?
/exit
/quit
/doctor
/inspect RUN_ID
/details compact
/details full
/workspace PATH
/max-steps N
/skills
/recipes # ActionRecipe view
/tools
The system supports ollama plus OpenAI-compatible providers: openai-api,
nvidia, openrouter, deepseek, xai, zai, kimi-coding, and alibaba.
Run autonomy model setup from the workspace to choose a provider, endpoint,
and model. Re-running setup is the only way to switch that workspace's provider
or model. The interactive setup shows the current configuration and accepts
either numbered selections or provider/model names.
Validated workspace configuration is stored under:
<workspace>/.autonomy/config.yaml # active provider, endpoint, model, and timeout
<workspace>/.autonomy/.env # provider API keys, mode 0600
Live runs do not read legacy model environment variables, do not read
~/.autonomy as fallback storage, and do not accept per-run provider or model
overrides. autonomy doctor is the diagnostic entry point for configuration,
credentials, endpoint reachability, and model availability.
autonomy model setup ollama
autonomy doctor
autonomy run "Read README.md and summarize the implemented system" \
--workspace . \
--max-steps 5 \
--non-interactiveOllama's base URL must include /v1. The default is
http://127.0.0.1:11434/v1.
autonomy model setup nvidia
autonomy doctor
autonomy run "Read README.md and summarize the implemented system" \
--workspace . \
--max-steps 5 \
--non-interactiveThe default NVIDIA endpoint is https://integrate.api.nvidia.com/v1, the
default model is moonshotai/kimi-k2.6, and the API key is stored as
NVIDIA_API_KEY in the workspace .autonomy/.env file.
autonomy model setup openrouter
autonomy model setup deepseek
autonomy model setup xai
autonomy model setup zai
autonomy model setup kimi-coding
autonomy model setup alibabaProvider API keys are stored in .autonomy/.env using the provider's native
environment variable name, such as OPENROUTER_API_KEY, DEEPSEEK_API_KEY,
XAI_API_KEY, GLM_API_KEY, KIMI_API_KEY, or DASHSCOPE_API_KEY.
At run start, Autonomy loads the first workspace guidance file it finds from:
AUTONOMY.md
.autonomy.md
AGENTS.md
agents.md
.cursorrules
The content is bounded and passed to planning as project context. It can guide
candidate generation, but it does not grant tool permissions, bypass
ActionGateway, or affect approval and outcome evaluation.
Tool availability is controlled by an Autonomy-native toolset catalog and a workspace configuration file:
<workspace>/.autonomy/tools.yaml
The default enabled toolsets are:
assistant
browser
file
terminal
search
skills
memory
Inspect or change toolset exposure with:
autonomy tools status
autonomy tools enable project
autonomy tools enable database
autonomy tools enable mcp
autonomy tools enable delegate
autonomy tools enable browser
autonomy tools disable terminalThe catalog includes implemented project, browser, memory, database,
mcp, and delegate toolsets plus planned Hermes-like toolsets such as
cronjob and computer_use. Planned or unavailable tools are not exposed to
the agent loop.
Enabling a toolset only controls which implemented and available tools are
visible to planning; it does not grant extra permissions or bypass
ActionGateway.
Explicit subagent requests expose delegate.run; otherwise the delegate toolset
stays hidden from planning even when enabled.
Configure databases for the database.retrieve tool in:
# <workspace>/.autonomy/database_connections.yaml
connections:
sample:
dialect: sqlite
path: sample.db
allowed_tables: [orders]
warehouse:
dialect: postgres
allowed_tables: [orders]
schema:
tables:
orders:
id: integer
total: numericThe tool uses SQLGlot to validate and transpile read-only SQL across dialects,
and action: generate can ask the configured workspace model to draft SQL from
a natural-language request before validation. Use action: explain to inspect a
SQLite query plan before running a complex query. SQLite paths are
workspace-bounded and executable; other dialects support configured schema,
validation, transpilation, and generation until a connector is added.
To import external MCP tools, install the optional SDK and enable the mcp
toolset:
python3.13 -m pip install -e ".[mcp]"
autonomy tools enable mcpThen add servers to <workspace>/.autonomy/mcp_servers.yaml:
servers:
filesystem:
command: npx
args: ["-y", "@modelcontextprotocol/server-filesystem", "."]
tools:
include: ["read_file"]
remote:
url: "https://example.test/mcp"Discovered tools are exposed as mcp_<server>_<tool> through the normal
ToolRegistry and ActionGateway path, with MEDIUM risk and
external-mcp side effects. This imports MCP tools only; Autonomy does not
act as an MCP server and does not implement OAuth, resources, prompts, sampling,
hot reload, SSE, or dynamic tools/list_changed.
Tool implementation code is grouped under autonomy/tools/:
autonomy/tools/registry.py:ToolSpec,ToolRegistry, andApprovalPolicyautonomy/tools/local.py: local registry assembly plus file/search/shell toolsautonomy/tools/toolsets/: toolset-specific implementations such asproject,browser, andprocess
The first implemented local software-engineering tools are:
filesystem.readfilesystem.read_manyfilesystem.listfilesystem.treefilesystem.statfilesystem.stat_manyfilesystem.difffilesystem.outlinefilesystem.importsfilesystem.symbol_searchfilesystem.syntax_checkfilesystem.writefilesystem.patchfilesystem.trashfilesystem.mkdirfilesystem.movefilesystem.search_filessearch.textshell.executeprocess.startprocess.pollprocess.logprocess.waitprocess.stopgit.statusgit.diffgit.loggit.showjson.parseyaml.parseproject.detectpython.test_discover
filesystem.read supports line pagination with optional offset and limit.
Small files still return raw text. Large or explicitly paginated reads return
LINE|CONTENT output with a continuation hint, which keeps model context
bounded during repository analysis. filesystem.list also supports offset
and limit so broad or recursive directory listings can be paged.
filesystem.read_many reads up to 12 UTF-8 text files in one bounded JSON
observation with shared line-window options and a total character budget. Use
it for small manifest, README, entrypoint, or config batches.
filesystem.tree returns a compact bounded ASCII tree and is the preferred
first step for repository orientation before broad recursive listing.
filesystem.stat returns bounded JSON metadata for one workspace path, such as
type, size, modified time, and immediate directory counts, without reading file
content. filesystem.stat_many returns the same metadata for up to 50 paths in
one bounded observation, which reduces tool turns when checking whether several
candidate files or directories exist before reading them. filesystem.stat,
filesystem.stat_many, and filesystem.read expose a lightweight file
revision token that can be passed as expected_revision to
filesystem.write or filesystem.patch to fail fast if a file changed after it
was inspected.
filesystem.diff returns bounded read-only git status and diff information for
the workspace or one path. It omits secret-bearing environment files and should
be preferred over shell git diff when checking what changed after edits.
filesystem.outline returns a compact Python class/function/method outline for
a file or directory, which helps locate relevant code before reading full files.
filesystem.imports summarizes Python import statements for a file or
directory, which helps identify module dependencies and likely integration
points.
filesystem.symbol_search searches Python class/function/method definitions by
name, match mode, and symbol kind, which is useful for jumping directly to
relevant code.
filesystem.syntax_check checks Python syntax without executing code and is a
cheap post-edit diagnostic before running broader tests.
filesystem.search_files and search.text also support offset and limit
so broad searches can be paged instead of flooding the model context.
filesystem.search_files can additionally return output_mode=files_only or
output_mode=count, and context=N can include nearby lines around content
matches to reduce follow-up file reads.
filesystem.patch defaults to exact replacement. When a recent read proves
the intended lines are present but indentation or surrounding whitespace has
drifted, match_mode=strip_lines can match the same line sequence after
trimming each line.
Successful filesystem.write and filesystem.patch actions against Python
files also include lightweight syntax_ok diagnostics in their observation
payloads, without executing code.
filesystem.trash moves one workspace file or directory to the system Trash
through the trash CLI. Use it for deletion instead of shell rm, rmdir,
or rm -rf; it is medium risk and is only exposed when the trash CLI is
available.
filesystem.mkdir creates workspace directories and filesystem.move renames
or moves one workspace file or directory without overwriting an existing
destination. Use them instead of shell mkdir or mv commands.
When filesystem.read, filesystem.list, or search tools receive a missing
workspace path, they include similar path suggestions when available.
Secret-bearing environment files such as .env, .env.local, and .envrc
are blocked from file read/list/search/write/patch tools to avoid putting
credentials into model context. Use .env.example when configuration shape is
needed.
Use shell.execute for short foreground commands. It runs the command string
through the platform shell, so shell operators such as ||, &&, and pipes
use normal shell semantics. Use process.start for dev servers, watchers,
long tests, or other commands that need later inspection through
process.poll, process.log, or process.wait.
process.stop terminates managed background processes. Starting and stopping
processes are medium-risk actions and still require approval in interactive
use; non-interactive runs reject them by default.
shell.execute bounds stdout and stderr by default and accepts optional
max_chars for focused command output, which prevents large build or test logs
from flooding the model context. Shell and managed process output also redacts
common API keys, bearer tokens, credential assignments, and private key blocks
before observations are written to the run journal.
The implemented project toolset is read-only and opt-in. It adds bounded
project-inspection helpers for git state, recent commits, commit summaries,
JSON/YAML validation, manifest detection, and Python test command discovery.
Use autonomy tools enable project before exposing these tools to planning.
The implemented browser tools use headless Chromium through Playwright:
browser.navigatebrowser.snapshotbrowser.clickbrowser.typebrowser.scrollbrowser.backbrowser.pressbrowser.screenshotbrowser.get_imagesbrowser.consolebrowser.dialog
Install the Python package through the project environment, then install the Chromium runtime:
python3.13 -m playwright install chromium
autonomy doctorIf Chromium is missing, doctor and tools status report the browser tools as
unavailable and they are not exposed to planning.
browser.snapshot returns URL, title, bounded visible text, and an elements
inventory of visible actionable controls. Use optional full and max_chars
when a compact snapshot is not enough. Browser interaction candidates should
use selectors from this inventory instead of guessing selectors from page text.
browser.screenshot captures a PNG under workspace .autonomy/browser-screenshots/
when visual evidence is needed.
browser.get_images returns page image URLs, alt text, dimensions, and
selectors. browser.console returns console output and JavaScript page errors,
or evaluates a small diagnostic JavaScript expression in the current page.
browser.dialog accepts or dismisses native JavaScript dialogs reported by
browser.snapshot. Browser observations redact secret-like page URLs, image
URLs, console output, and diagnostic expression results before journaling.
Read-only local actions are low risk. File write/patch/trash, mkdir, move, and browser actions are medium risk. Unknown shell commands require interactive approval and are rejected in non-interactive mode. File editing tools are workspace-only and text-only; use them instead of shell heredocs or in-place shell edits.
Model-generated tool use is represented as an ActionIntent:
tool
arguments
purpose optional
The model does not provide risk, progress, cost, uncertainty, expected effect,
or outcome judgment. ActionGateway derives executable Action metadata from
the registered ToolSpec, validates the execution boundary, and applies
approval before a single tool action can run.
Tool execution returns an Observation; the agent loop evaluates that observation
into an Outcome with execution status, goal status, reason, evidence, and
confidence. Deterministic agent-side evidence is authoritative. Model assistance
is used only to interpret ambiguous successful observations, and it cannot
override a tool failure.
Procedure Skills are governed SKILL.md documents that teach the model how to
plan a class of task. They never execute tools, grant permission, bypass
execution governance, or participate in outcome evaluation.
The formal skill loader scans one workspace store:
<workspace>/.autonomy/skills/
Each planning round filters skills by platform, required tools, and enabled state. The model chooses at most three summaries, and only those full documents are loaded for candidate generation.
Initial workspace skills can be installed under <workspace>/.autonomy/skills/:
repository-orientationtest-diagnosisimplementation-status-auditread-only-code-reviewcode-editingprocess-managementsystematic-debuggingtest-driven-developmenttechnical-spikeapi-debuggingcodebase-documentationrequesting-code-reviewplanwriting-plansprocedure-skill-authoringemail-himalayadatabase-retrieval
Bundled Procedure Skills are Autonomy-native workflow guidance, adapted from
Hermes as an engineering reference without importing Hermes runtime or skill
files. Bundled skill sources live under
autonomy/bundled_skills/<skill-name>/SKILL.md; add a new bundled skill by
creating that directory and matching the YAML frontmatter name to the
directory name. Code editing, process, software-engineering, and browser
planning skills can be installed from bundled templates:
autonomy skills install-bundled code-editing process-management systematic-debugging test-driven-development technical-spike api-debugging codebase-documentation requesting-code-review plan writing-plans procedure-skill-authoring browser-navigation website-inspection email-himalaya database-retrievalThese skills require the corresponding enabled and available tools before they are considered by the agent loop.
Public ClawHub skills can also be installed into the same workspace store:
autonomy skills install-clawhub SKILL_SLUGThe installer downloads from https://clawhub.ai/api/v1/download, validates
the archive contains one SKILL.md, rejects unsafe archive paths, and installs
the skill enabled by default.
Hermes Agent skill docs can be installed from the public Skills Hub too:
autonomy skills install-hermes apple-notes
autonomy skills install-hermes https://hermes-agent.nousresearch.com/docs/user-guide/skills/bundled/apple/apple-apple-notesThe installer uses https://hermes-agent.nousresearch.com/docs/api/skills.json
to resolve the skill, pulls the generated source markdown, extracts the
Reference: full SKILL.md section, and installs the skill enabled by default.
During candidate generation, the model receives the enabled and available tool
specs from the live ToolRegistry, including descriptions, argument contracts,
toolset, risk level, and side effects. The model still only proposes
ActionIntent; execution remains gated by ActionGateway.
ActionRecipes are learned single-action templates. They can propose one
ActionIntent after being activated through the recipes CLI, but they do not
form graph paths and do not participate in governance. There is no long-term
graph path layer for recipes; governance remains in
ActionGateway, ToolSpec, ApprovalPolicy, and outcome evaluation.
Every run finishes with a lightweight LearningLoop review. Achieved runs
with at least two successful outcomes may generate a new_skill candidate
under <workspace>/.autonomy/skill-candidates/. Candidate documents are not
scanned or used until a user approves them with autonomy skills approve.
Rejected and approved candidates remain as audit artifacts and are hidden from
the default candidate list.
Model-generated Procedure Skills are always candidate-first. The model can
draft a SKILL.md from a successful run, but it cannot approve or activate
that draft. Approval is the boundary that moves a candidate into the formal
Procedure Skill Library.
CuratorDaemon runs in the background after each run and uses SkillCurator
to consolidate clear duplicate or subcase Skills. Auto-merge is allowed only
when required tools and platforms do not expand and the merged target
SKILL.md validates. After a successful merge, the source Skill is deleted
from the formal store; agent prompts do not retain source lineage.
python3.13 -m pytest