slime is an LLM post-training framework for RL scaling, providing two core capabilities:
- High-Performance Training: Supports efficient training in various modes by connecting Megatron with SGLang;
- Flexible Data Generation: Enables arbitrary training data generation workflows through custom data generation interfaces and server-based engines.
- Our vision: slime: An SGLang-Native Post-Training Framework for RL Scaling.
- Our ideas on agentic training: Agent-Oriented Design: An Asynchronous and Decoupled Framework for Agentic RL.
- slime has served as the RL framework for GLM-4.5: GLM-4.5: Reasoning, Coding, and Agentic Abililties
- Architecture Overview
- Quick Start
- Checkpoint Format Conversion
- Starting the Training Process
- Argument Descriptions
- Developer Guide
- FAQ & Acknowledgements
Module Descriptions:
- training (Megatron): Responsible for the main training process, reads data from the Data Buffer, and synchronizes parameters to the rollout module after training.
- rollout (SGLang + router): Generates new data (including rewards/verifier outputs) and stores it in the Data Buffer.
- data buffer: A bridge module that manages prompt initialization, custom data, and rollout generation methods.
For a comprehensive quick start guide covering environment setup, data preparation, training startup, and key code analysis, please refer to:
Since slime uses Megatron, and Megatron does not support loading Hugging Face checkpoints directly, we need to convert the model to the torch_dist format that Megatron supports.
We are using mbridge for conversion:
cd slime/
source scripts/models/glm4-9B.sh
PYTHONPATH=/root/Megatron-LM python tools/convert_hf_to_torch_dist.py \
${MODEL_ARGS[@]} \
--hf-checkpoint /root/GLM-Z1-9B-0414 \
--save /root/GLM-Z1-9B-0414_torch_distThis conversion requires GPU, so for large models, you can use the following methods to convert with multiple GPUS, note that you can add parallel config the same way as training:
source scripts/models/glm4.5-355B-A32B.sh
PYTHONPATH=/root/Megatron-LM/ torchrun \
--nproc-per-node 8 \
--master-addr ${MASTER_ADDR} --master-port 12345 \
--nnodes=2 --node-rank ${NODE_RANK} \
tools/convert_hf_to_torch_dist.py \
${MODEL_ARGS[@]} \
--hf-checkpoint $BASE_DIR/GLM-4.5-355B-A32B/ \
--save $BASE_DIR/GLM-4.5-355B-A32B_torch_dist/pip install -e . in the slime directory.
To convert a torch_dist checkpoint saved during training back to a Hugging Face checkpoint:
cd slime/
PYTHONPATH=/root/Megatron-LM python tools/convert_torch_dist_to_hf.py \
--input-dir /path/to/torch_dist_ckpt/iter_xxx/ \
--output-dir /root/GLM-Z1-9B-0414-iter_xxx \
--origin-hf-dir /root/GLM-Z1-9B-0414There are times when Megatron padded embedding, you can pass --vocab-size to make sure the embedding size of the converted HF ckpt is correct.
torch_dist checkpoint converted by mbridge does not currently save args, you cannot convert the checkpoint from the previous step back to HF format.
Applicable for custom save formats (e.g., --ckpt-format torch).
The principle behind this conversion method is to reuse the function that updates parameters from Megatron to SGLang during training. This means reusing the training script and changing the original command from:
ray job submit --address="http://127.0.0.1:8265" \
--runtime-env-json='{
"env_vars": { ...}
}' \
-- python3 train.py \
... # Other training argsTo:
torchrun --nproc_per_node ${NUM_GPU} tools/convert_to_hf.py \
--load /your/saved/megatron_ckpt \
--output-dir /your/converted/hf_ckpt \
... # Other training argsThat is, keep all other arguments the same, and:
- Change the task launcher from
raytotorchrun. Set the number of GPUs to the minimum required for Megatron's parallelism without data parallelism (DP). For example, if you are usingtp4, set it to 4. - Make sure to change
--loadto the path of the checkpoint you want to load. - Add the
--output-dirargument to specify where the converted Hugging Face checkpoint should be saved.
The entire program needs to be launched using Ray. First, you need to start a Ray cluster. On node 0, run:
# Node0 (HEAD)
ray start --head --node-ip-address ${MASTER_ADDR} \
--num-gpus 8 --disable-usage-stats
# Other Nodes
ray start --address=${MASTER_ADDR}:6379 --num-gpus 8After the Ray cluster has started, you can submit a job from node 0, for example:
ray job submit --address="http://127.0.0.1:8265" \
--runtime-env-json='{
"env_vars": {
"PYTHONPATH": "/root/Megatron-LM/",
... # e.g., no_proxy, API variables, etc.
}
}' \
-- python3 train.py \
--... # Other Megatron/SGLang/slime argumentsArguments are divided into three categories:
- Megatron arguments: slime reads all arguments set in Megatron via
PYTHONPATH. You can configure Megatron by passing arguments like--tensor-model-parallel-size 2. - SGLang arguments: All arguments for the installed SGLang are supported. These arguments must be prefixed with
--sglang-. For example,--mem-fraction-staticshould be passed as--sglang-mem-fraction-static. - slime-specific arguments: Please refer to: slime/utils/arguments.py
For complete usage instructions, please refer to the Usage Documentation.
-
Contributions are welcome! If you have suggestions for new features, performance tuning, or feedback on user experience, feel free to submit an Issue or PR 😊
-
Use pre-commit to ensure code style consistency for your commits:
apt install pre-commit -y pre-commit install
-
For debugging tips, please refer to the Debugging Guide
- Nvidia: refer to this repo README
- AMD: refer to the tutorial
- For frequently asked questions, please see the Q&A
- Special thanks to the following projects & communities: SGLang, Megatron‑LM, mbridge, OpenRLHF, veRL, Pai-Megatron-Patch and others.