Optimized Docker image for Unsloth fine-tuning + GGUF export via llama.cpp
This image combines Unsloth (ultra-fast LLM fine-tuning) with llama.cpp to seamlessly export quantized GGUF models after training.
- Pre-installed Unsloth (with FlashAttention, xformers, and optimized kernels)
- Full llama.cpp toolchain (including
convert_hf_to_gguf.py) - Jupyter Lab environment ready for development
- GPU-accelerated (CUDA 12.1 + cuDNN)
- Quantization-ready (supports all GGUF quant types)
Install Docker and NVIDIA Container Toolkit.
# Build the image
docker compose build
# Start the container (runs Jupyter Lab on port 8888)
docker compose up -dπ‘ Note: Remove the
#comment if you need to push to a registry:
docker compose push
Open your browser at http://127.0.0.1:38888 and enter your password.
Create a Jupyter notebook to train the model.
At the bottom of the Jupyter notebook, after setting it up, use the following:
# Save merged model (Unsloth syntax)
model.save_pretrained_merged("your-new-model", tokenizer)
# Convert to GGUF (using pre-installed llama.cpp)
!python /workspace/llama.cpp/convert_hf_to_gguf.py --outfile your-new-model-gguf --outtype q8_0 your-new-modelReplace q8_0 with your preferred quant type:
f16(no quantization)q4_k_m(recommended balance)q5_k_mq6_kq8_0(highest quality quant)