doma 🔥

A smart GPU resource manager that automatically holds idle GPU memory and maintains controlled utilization to prevent resource preemption.

doma (DOg in the MAnager) is a lightweight daemon tool designed to intelligently occupy idle GPU resources. It monitors GPU usage patterns and automatically claims memory and maintains specified utilization levels when GPUs become idle, preventing resource preemption because of low utilization.

✨ Features

🤖 Automatic GPU Detection: Monitors all available CUDA GPUs automatically
⏱️ Smart Idle Detection: Waits for configurable idle periods before claiming resources
🎛️ Precise Utilization Control: Maintains target GPU utilization using adaptive algorithms
💾 Memory Management: Configurable memory holding with automatic cleanup
🔧 Daemon Architecture: Runs as a background service with socket-based control
📊 Real-time Monitoring: Continuous tracking of GPU memory and utilization metrics
🛡️ Safe Resource Handling: Graceful cleanup and release of GPU resources

🚀 Quick Start

Installation

# Install using uv (recommended)
git clone <repository-url>
cd doma
uv tool install ./

Basic Usage

Launch the doma server:
```
doma launch
```
Start holding idle GPUs:
```
doma start
```
Check server status:
```
doma status
```
Stop holding GPUs (keeps server running):
```
doma stop
```
Shutdown the server:
```
doma shutdown
```

📋 Commands

`doma launch`

Starts the doma daemon server in the background.

Options:

--log-path: Path to log file (default: /tmp/doma/doma.log)

`doma start`

Begins monitoring and holding idle GPUs with specified configuration.

Options:

--wait-minutes: Minutes to wait before holding GPU (default: 10)
--mem-threshold: Memory threshold in GB for idle detection (default: 0.5)
--hold-mem: Memory to hold in GB (default: 10GB)
--hold-util: Target GPU utilization to maintain (0-1, default: 0.5)

Algorithm Options:

--operator-gb: Operator size in GB for control precision (default: 1.0)
--util-eps: Utilization epsilon for convergence (default: 0.01)
--max-sleep-time: Init maximum sleep time in seconds of binary search (default: 1)
--min-sleep-time: Init minimum sleep time in seconds of binary search (default: 0)
--inspect-interval: Interval in seconds to inspect GPU utilization during binary search (default: 1)
--util-samples-num: Number of samples to take for utilization during binary search (default: 5)

`doma restart`

Releases all GPUs and restarts with new configuration.

`doma stop`

Stops holding GPUs and releases all resources (server continues running).

`doma shutdown`

Completely shuts down the doma server.

`doma status`

Shows current server status.

🎯 How It Works

1. Idle Detection

Doma continuously monitors each GPU's memory usage and utilization. A GPU is considered "idle" when:

Memory usage stays below the configured threshold (--mem-threshold)
This condition persists for the specified waiting period (--wait-minutes)

2. Resource Holding

When a GPU becomes idle, doma:

Allocates the specified amount of memory (--hold-mem)
Maintains target utilization (--hold-util) through controlled compute operations
Uses adaptive algorithms to precisely control utilization levels

3. Smart Release

Resources are automatically released when:

The stop command is issued
The server is shut down

⚙️ Configuration Examples

Conservative Setup (Light Resource Usage)

doma start --wait-minutes 15 --hold-util 0.3 --hold-mem 2.0

Aggressive Setup (Maximum Resource Claiming)

doma start --wait-minutes 5 --hold-util 0.8 --mem-threshold 0.1

High Precision Control

doma start --util-eps 0.005 --operator-gb 0.5 --util-samples-num 10

🔧 Advanced Usage

Custom Log Location

doma launch --log-path /var/log/doma/doma.log

Dynamic Configuration Updates

# Change configuration without restarting server
doma restart --hold-util 0.7 --wait-minutes 5

Production Deployment

# Launch with custom log path
doma launch --log-path /opt/doma/logs/doma.log

# Start with production settings
doma start --wait-minutes 20 --hold-util 0.6 --mem-threshold 0.5

🛠️ Development

Requirements

Python ≥ 3.11
CUDA-capable GPU(s)
PyTorch with CUDA support
NVIDIA drivers

Development Installation

git clone <repository-url>
cd doma
uv sync --group dev

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

⚠️ Important Notes

Resource Management: Doma is designed for responsible resource sharing. Always ensure you have permission to use GPU resources in shared environments.
Memory Safety: The tool includes automatic cleanup mechanisms, but system crashes may require manual GPU memory cleanup.
Compatibility: Requires NVIDIA GPUs with CUDA support. AMD GPUs are not currently supported.
Performance Impact: Holding operations use minimal resources but may slightly impact system performance.
GPU Memory Calculation: Doma uses the torch.cuda.device_memory_used to calculate the GPU memory. It may not be the same as the nvidia-smi command.

🆘 Troubleshooting

Common Issues

Server won't start:

# Check if socket file exists
ls -la /tmp/doma/
# Remove if necessary
rm -f /tmp/doma/doma.sock

GPU memory not released:

# Force shutdown and restart
doma shutdown
# Wait a moment, then relaunch
doma launch

Permission issues:

# Ensure proper CUDA permissions
nvidia-smi
# Check if user has access to CUDA devices

Author: TideDra ([email protected])
Version: 0.1.0

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
assets		assets
src/doma		src/doma
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Uh oh!

License

Uh oh!

X1AOX1A/doma

Folders and files

Latest commit

History

Repository files navigation

doma 🔥

✨ Features

🚀 Quick Start

Installation

Basic Usage

📋 Commands

doma launch

doma start

doma restart

doma stop

doma shutdown

doma status

🎯 How It Works

1. Idle Detection

2. Resource Holding

3. Smart Release

⚙️ Configuration Examples

Conservative Setup (Light Resource Usage)

Aggressive Setup (Maximum Resource Claiming)

High Precision Control

🔧 Advanced Usage

Custom Log Location

Dynamic Configuration Updates

Production Deployment

🛠️ Development

Requirements

Development Installation

🤝 Contributing

📝 License

⚠️ Important Notes

🆘 Troubleshooting

Common Issues

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

`doma launch`

`doma start`

`doma restart`

`doma stop`

`doma shutdown`

`doma status`

Packages