canhazgpu

A GPU reservation tool for single host shared development systems

In shared development environments with multiple GPUs, researchers and developers often face conflicts when trying to use GPUs simultaneously, leading to out-of-memory errors, failed training runs, and wasted time debugging resource conflicts. This utility provides a simple reservation system that coordinates GPU access across multiple users and processes on a single machine.

Who this is for

You peacefully share a host but want a helper to avoid accidental conflicts.

You have a single host with GPUs (NVIDIA or AMD) shared by multiple users
You all log in and run commands manually for development and/or testing
You can still talk to each other about playing nice and sharing your (GPU) toys

Who this is NOT for

If your needs are more than this, you probably want something more powerful like Kubernetes.

You want to manage resources across a cluster
You want to set resource usage limits or other policies
You want to support workload priorities and preemption

Quick Start

# Start Redis server listening on localhost:6379
# This is the default configuration in most cases.

# Initialize GPU pool
canhazgpu admin --gpus 8

# Check current status
canhazgpu status

# Run vLLM with an automatic 2 GPU reservation.
# - CUDA_VISIBLE_DEVICES is set in the environment before running the command.
canhazgpu run --gpus 2 -- vllm serve my/model --tensor-parallel-size 2

# Reserve specific GPUs by ID
canhazgpu run --gpu-ids 1,3 -- python train.py

# Reserve a single GPU manually for development
canhazgpu reserve --gpus 1 --duration 4h

# Reserve specific GPU IDs manually
canhazgpu reserve --gpu-ids 0,2 --duration 2h

# Release manual reservations when done
canhazgpu release

# Generate reservation reports
canhazgpu report --days 7

# Start web dashboard
canhazgpu web --port 8080

Key Features

Race condition protection: Uses Redis-based distributed locking
Automatic cleanup: GPUs auto-released when processes end or reservations expire
MRU-per-user allocation: Smart GPU affinity using most recently used per-user strategy with LRU fallback
Specific GPU reservation: Reserve exact GPU IDs when needed (e.g., --gpu-ids 1,3)
Unreserved usage detection: Identifies GPUs in use without proper reservations
Real-time validation: Uses nvidia-smi or amd-smi to verify actual GPU usage
Multi-provider support: Supports both NVIDIA and AMD GPUs with automatic detection
Flexible reservations: Support for both command execution and manual reservations
Reservation reporting: Track and analyze GPU reservation patterns over time by user
Web dashboard: Real-time monitoring interface with status and reservation reports

Web Dashboard

The web dashboard provides:

Real-time GPU status monitoring with automatic refresh
Interactive reservation reports with customizable time periods
Visual status indicators and usage visualization
Quick access to documentation and source code

Start the dashboard with canhazgpu web and access it at http://localhost:8080

Documentation

For detailed usage, configuration, and administration:

📚 Full Documentation

Installation Guide - Setup and dependencies
Quick Start - Get up and running
Usage Guide - Detailed command examples
Configuration - Configuration options and settings
Troubleshooting - Common issues and solutions

Requirements

Go 1.23+ (for building from source)
Redis server running on localhost:6379
GPUs with appropriate management tools:
- NVIDIA GPUs: nvidia-smi available
- AMD GPUs: amd-smi available (ROCm 5.7+)
System access to /proc filesystem or ps command

Installation

# Option 1: Install directly from GitHub (recommended)
go install github.com/russellb/canhazgpu@latest

# Option 2: Build from source
git clone https://github.com/russellb/canhazgpu.git
cd canhazgpu
make install

# Option 3: Download pre-built binary (when available)
wget https://github.com/russellb/canhazgpu/releases/latest/download/canhazgpu
chmod +x canhazgpu
sudo cp canhazgpu /usr/local/bin/

# Install bash completion (optional but recommended)
wget https://raw.githubusercontent.com/russellb/canhazgpu/main/autocomplete_canhazgpu.sh
sudo cp autocomplete_canhazgpu.sh /etc/bash_completion.d/

# Optional: Create short alias symlink (after installing to /usr/local/bin)
sudo ln -s /usr/local/bin/canhazgpu /usr/local/bin/chg

# Initialize GPU pool (auto-detects GPU provider)
canhazgpu admin --gpus $(nvidia-smi -L | wc -l)  # For NVIDIA
# OR
canhazgpu admin --gpus $(amd-smi list --json | jq 'length')  # For AMD

# Initialize with specific provider (optional)
canhazgpu admin --gpus 8 --provider nvidia
canhazgpu admin --gpus 8 --provider amd

How It Works

Validation: Uses nvidia-smi or amd-smi to detect actual GPU usage and identify conflicts
Coordination: Uses Redis for distributed state management and race condition prevention
Allocation: MRU-per-user (Most Recently Used per user) strategy provides GPU affinity with LRU fallback for fair distribution
Monitoring: Heartbeat system tracks active reservations and handles cleanup
Enforcement: Automatically excludes unreserved GPU usage from allocation

Contributing

See the Contributing Guide for development setup, coding standards, and how to submit contributions.

Name		Name	Last commit message	Last commit date
Latest commit History 138 Commits
.github		.github
docs		docs
internal		internal
.gitignore		.gitignore
.goreleaser.yml		.goreleaser.yml
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
autocomplete_canhazgpu.sh		autocomplete_canhazgpu.sh
go.mod		go.mod
go.sum		go.sum
main.go		main.go
mkdocs.yml		mkdocs.yml
requirements-docs.txt		requirements-docs.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

canhazgpu

Who this is for

Who this is NOT for

Quick Start

Key Features

Web Dashboard

Documentation

Requirements

Installation

How It Works

Contributing

About

Uh oh!

Releases 35

Packages

Uh oh!

Contributors 6

Uh oh!

Languages

License

russellb/canhazgpu

Folders and files

Latest commit

History

Repository files navigation

canhazgpu

Who this is for

Who this is NOT for

Quick Start

Key Features

Web Dashboard

Documentation

Requirements

Installation

How It Works

Contributing

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 35

Packages 0

Uh oh!

Contributors 6

Uh oh!

Languages

Packages