Harbor

Harbor is a framework for evals, post-training, and prompt optimization using agentic environments.

Installation

uv tool install harbor

pip install harbor

Getting started

Run the following command to see a list of all available commands:

harbor --help

Running an eval

The primary command is harbor run, which is used to run evals or generate rollouts.

harbor run --help

To view registered datasets, run

harbor datasets list

Running a registered dataset

To evaluate an agent and model one of these datasets, you can use the following command:

harbor run -d "<dataset@version>" -m "<model>" -a "<agent>"

Harbor will automatically download registered datasets.

Running a local dataset

Local datasets (directories of tasks) can also be run using

harbor run -p "<path/to/dataset>" -m "<model>" -a "<agent>"

Running a cloud sandbox

To run using a cloud sandbox provider like Daytona, you can use the following command:

harbor run -d "<dataset@version>" -m "<model>" -a "<agent>" --env "daytona" -n 32

If you run a cloud sandbox using an API model, trials become I/O bounded rather than compute bounded, which means you can typically parallelize far above your CPU count (the example command above runs 32 trials concurrently).

Sandboxed agent evaluations are often slow, because they can require many turns to complete and each command requires time to execute. Horizontal scaling becomes the only viable way to accelerate experimentation, so we recommend using a cloud sandbox provider like Daytona.

Name		Name	Last commit message	Last commit date
Latest commit History 154 Commits
.github/workflows		.github/workflows
.vscode		.vscode
adapters		adapters
docs		docs
examples		examples
src/harbor		src/harbor
tests		tests
.gitignore		.gitignore
.python-version		.python-version
AGENTS.md		AGENTS.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
registry.json		registry.json
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Harbor

Installation

Getting started

Running an eval

Running a registered dataset

Running a local dataset

Running a cloud sandbox

About

Uh oh!

Releases

Packages

Languages

License

Vedang12345/harbor

Folders and files

Latest commit

History

Repository files navigation

Harbor

Installation

Getting started

Running an eval

Running a registered dataset

Running a local dataset

Running a cloud sandbox

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages