This repository contains implementations of post-training techniques for Large Language Models (LLMs), providing hands-on examples for fine-tuning and adapting models for specific tasks.
This lab includes practical implementations of various post-training techniques:
- SFT (Supervised Fine-Tuning) - Complete pipeline with QLoRA for efficient fine-tuning
- DPO (Direct Preference Optimization) - (WIP) Preference-based training
- RL (Reinforcement Learning) - (WIP) RL-based fine-tuning methods
Each module contains its own setup instructions and examples. Start with the SFT module for a complete fine-tuning pipeline.
Contributions are welcome! Please feel free to submit issues, feature requests, or pull requests.
This project is licensed under the MIT License.