- Hangzhou, China
-
17:48
(UTC +08:00)
LLM_RL
Train transformer language models with reinforcement learning.
verl: Volcano Engine Reinforcement Learning for LLMs
[ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
[NeurIPS 2025] TTRL: Test-Time Reinforcement Learning
A comrephensive collection of learning from rewards in the post-training and test-time scaling of LLMs, with a focus on both reward models and learning strategies across training, inference, and po…
This repository serves as a collection of research notes and resources on training large language models (LLMs) and Reinforcement Learning from Human Feedback (RLHF). It focuses on the latest resea…
Fully open reproduction of DeepSeek-R1
slime is an LLM post-training framework for RL Scaling.
An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models
A live stream development of RL tunning for LLM agents