AXIS is an open source tooling and a scoring framework to measure how well services work for AI agents. Think Lighthouse, but for agent experience.
Give AXIS a scenario, an agent, and a prompt. It runs the agent, captures a full transcript, and produces a graded score across four independent dimensions: Goal Achievement, Environment, Service, and Agent.
The web has Lighthouse. APIs have contract testing. Performance has k6. But there's no standardized way to answer: "How well does my system work when an AI agent tries to use it?".
As agents become a primary interface for interacting with sites, APIs, and developer platforms, the systems they interact with need to be measured and optimized for that experience — just like we optimize for page load time or accessibility. AXIS is that measurement.
npm install @netlify/axisaxis.config.json:
{
"scenarios": "./scenarios",
"agents": ["claude-code"]
}scenarios/hello-world.json:
{
"name": "Hello world",
"prompt": "Navigate to https://example.com and describe what you see on the page.",
"judge": [
{ "check": "Agent visited the target URL", "weight": 0.5 },
{ "check": "Agent provided a description of the page content", "weight": 0.5 }
]
}axis runAXIS executes the scenario, scores the result, and writes a report to .axis/reports/.
Full documentation lives at axis.run:
- Quick start - install through your first scored run
- Configuration -
axis.config.json, scenarios, MCP servers, skills - CLI reference -
axis run,axis reports,axis baseline - Running tests - execution model, workspace isolation, custom adapters, CI integration
- Scoring framework - the four dimensions, signals, calibration
Use the programmatic API when you want to integrate AXIS into an existing test runner, build tool, or CI pipeline rather than calling the CLI directly.
Delivered: scenario runner, four-dimension scoring pipeline, baselines with regression detection, MCP/skills wiring, custom adapter API, built-in adapters for Claude Code, Codex, and Gemini.
Planned:
- Historical trending - score regression detection over time
- AXIS badge - embeddable score badge for READMEs
- Configurable judge - separate adapter/model for scoring, independent of the agent under test
- Score thresholds - CI gating with configurable pass/fail thresholds
- Human interruption detection - penalize agent requests for human intervention
AXIS is built in the open. Contributions are welcome. New scenarios, agent adapters,
bug fixes, and documentation improvements all help.
AXIS is open source under the MIT license, created by Netlify and developed with founding contributors including Auth0 and Resend.
Full docs: axis.run