Lightron is a lightweight, educational, yet modern distributed training framework for LLMs. Lightron aims to bridge the gap between minimal implementations and modern production features such as 4-D Parallelism, including Tensor Parallelism, Pipeline Parallelism, Data Parallelism, and Context Parallelism.
- Distributed Ready: Support 4-D Parallelism(TP, PP, DP, CP), EP and FSDP V2.
- Modern Architecture: RMSNorm, SwiGLU, Rotary Embeddings (RoPE), FlashAttention V2.
- Clean Code: Type-hinted, dataclass-based configuration, <1000 lines of core code.
git clone https://github.com/lwj2015/lightron.git
cd lightron
pip install -r requirements.txt# run on local machine with 8 GPUs, tp_size=2, dp_size=4
torchrun --nproc_per_node=8 trainer.py --config examples/config_tinystories.jsonTest All Reduce Communication
torchrun --nproc_per_node=8 tests/test_all_reduce.pyTest Ring Attention
python tests/test_ring_attention.py Test DataLoader
torchrun --nproc_per_node=8 tests/test_dataloader.pyIf you use Lightron in your research or learning journey, please cite it as follows:
@misc{lightron2025,
author = {Wenjun Liu},
title = {Lightron: A Modern Minimalist Distributed Training Framework},
year = {2025},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/lwj2015/lightron}}
}