Thanks to visit codestin.com
Credit goes to github.com

Skip to content

METR/vivaria

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Vivaria

Vivaria is METR's tool for running evaluations and conducting agent elicitation research. Vivaria is a web application with which users can interact using a web UI and a command-line interface.

See https://vivaria.metr.org for more documentation.

Transitioning to Inspect

METR is transitioning its internal tooling from Vivaria to Inspect for evaluations and agent elicitation research. While Vivaria remains available as an open-source tool, we recommend that new projects consider using Inspect instead:

  • For new evaluation and research projects, we recommend using Inspect as your primary tool.
  • While Vivaria remains functional and open source, we are ramping down new feature development.
  • Existing Vivaria users can continue using it but should be aware of this transition.

For questions about this transition, please contact [email protected]. To learn more about Inspect, please visit inspect.ai-safety-institute.org.uk.

Demo

Getting started

See here for a tutorial on running Vivaria on your own computer using Docker Compose.

Features

  • Start task environments based on METR Task Standard task definitions
  • Run AI agents inside these task environments
  • Powerful tools for performing agent elicitation research
    • View LLM API requests and responses, agent actions and observations, etc.
    • Add tags and comments to important points in a run's trajectory, for later analysis
    • Quick feedback loop for "run agent on task, observe issue, make change to agent or reconfigure it, repeat"
    • Run results are stored in a PostgreSQL database, making it easy to perform data analysis on them
  • Built-in playground for testing arbitrary prompts against LLMs
  • Authentication and authorization using Auth0

Screenshots

The Vivaria runs page, displaying a list of recent runs.

The Vivaria runs page, displaying a list of recent runs.

A Vivaria run page, showing details for a particular run.

A Vivaria run page, showing details for a particular run.

The Vivaria playground, where users can test arbitrary prompts against LLMs.

The Vivaria playground, where users can test arbitrary prompts against LLMs.

Contents of this repo

  • server: A web server, written in TypeScript and using PostgreSQL, for creating METR Task Standard task environments and running agents on them
  • ui: A web UI, written in TypeScript and React, that uses the server to let users view runs, annotate traces, and interact with agents as they complete tasks
  • cli: A command-line interface, written in Python, that uses the server to let users create and interact with runs and task environments
  • pyhooks: A Python package that Vivaria agents use to interact with the server (to call LLM APIs, record trace entries, etc.)
  • scripts: Scripts for Vivaria developers and users, as well as a couple of scripts used by the Vivaria server

Security issues

If you discover a security issue in Vivaria, please email [email protected].

Versioning

The METR Task Standard and pyhooks follow Semantic Versioning.

The Vivaria server's HTTP API, the Vivaria UI, and the viv CLI don't have versions. Their interfaces are unstable and can change at any time.

Contact us

We encourage you to either file an issue on this repo or email [email protected].

About

Vivaria is METR's tool for running evaluations and conducting agent elicitation research.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors 29