OpenSloth 🦥⚡

Scale Unsloth to multiple GPUs with just torchrun. No configuration files, no custom frameworks - pure PyTorch DDP.

🚀 2-4x faster than single GPU
🎯 Zero configuration - works out of the box
💾 Same VRAM per GPU as single GPU Unsloth
🔧 Any Unsloth model - Qwen, Llama, Gemma, etc.

Installation

# Install dependencies
uv add torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
uv add unsloth datasets transformers trl
uv add git+https://github.com/anhvth/opensloth.git

Quick Start

Replace python with torchrun:

# Single GPU
python train_scripts/train_ddp.py

# Multi-GPU 
torchrun --nproc_per_node=2 train_scripts/train_ddp.py  # 2 GPUs
torchrun --nproc_per_node=4 train_scripts/train_ddp.py  # 4 GPUs

OpenSloth automatically handles GPU distribution, gradient sync, and batch sizing.

Performance

Setup	Time	Speedup
1 GPU	19m 34s	1.0x
2 GPUs	8m 28s	2.3x

Expected scaling: 2 GPUs = ~2.3x, 4 GPUs = ~4.5x, 8 GPUs = ~9x

Usage

from unsloth import FastLanguageModel
from trl import SFTConfig, SFTTrainer
from opensloth.patching.ddp_patch import ddp_patch

ddp_patch()  # Enable DDP compatibility

# Standard Unsloth setup
local_rank = int(os.environ.get("LOCAL_RANK", 0))
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/Qwen3-0.6B",
    device_map={"": local_rank},
    load_in_4bit=True,
)

model = FastLanguageModel.get_peft_model(model, r=16)
trainer = SFTTrainer(model=model, tokenizer=tokenizer, ...)
trainer.train()

Run: torchrun --nproc_per_node=4 your_script.py

Migration from Old Approach

Current (Recommended): Simple torchrun + DDP patch

from opensloth.patching.ddp_patch import ddp_patch
ddp_patch()
# ... standard Unsloth code

Old Approach (v0.1.8): For complex configuration files, use:

git checkout https://github.com/anhvth/opensloth/releases/tag/v0.1.8

Links

Unsloth - 2x faster training library
TRL - Transformer Reinforcement Learning
PyTorch DDP - Distributed training

git clone https://github.com/anhvth/opensloth.git
cd opensloth  
torchrun --nproc_per_node=4 train_scripts/train_ddp.py

Happy training! 🦥⚡

Name		Name	Last commit message	Last commit date
Latest commit History 479 Commits
.github		.github
docs		docs
examples		examples
images		images
legacy		legacy
notebooks		notebooks
outputs/sanity-check-tensorboard		outputs/sanity-check-tensorboard
scripts		scripts
src/opensloth		src/opensloth
train_scripts		train_scripts
.gitignore		.gitignore
.python-version		.python-version
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
TRAINING_README.md		TRAINING_README.md
build.sh		build.sh
install.sh		install.sh
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
ruff.toml		ruff.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenSloth 🦥⚡

Installation

Quick Start

Performance

Usage

Migration from Old Approach

Links

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

anhvth/opensloth

Folders and files

Latest commit

History

Repository files navigation

OpenSloth 🦥⚡

Installation

Quick Start

Performance

Usage

Migration from Old Approach

Links

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages