Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View Joserii's full-sized avatar
😪
I may be slow to respond.
😪
I may be slow to respond.
  • Hangzhou, China
  • 17:48 (UTC +08:00)

Block or report Joserii

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

LLM_RL

18 repositories
Python 214 9 Updated Feb 20, 2025

Train transformer language models with reinforcement learning.

Python 17,017 2,425 Updated Jan 18, 2026

verl: Volcano Engine Reinforcement Learning for LLMs

Python 18,439 3,046 Updated Jan 18, 2026

s1: Simple test-time scaling

Python 6,631 764 Updated Jun 25, 2025

[ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning

Jupyter Notebook 512 45 Updated Oct 20, 2024

Reproduce R1 Zero on Logic Puzzle

Python 2,430 164 Updated Mar 20, 2025

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Python 3,132 563 Updated Apr 15, 2024
Python 260 12 Updated May 14, 2025

[NeurIPS 2025] TTRL: Test-Time Reinforcement Learning

Python 957 66 Updated Sep 26, 2025

A comrephensive collection of learning from rewards in the post-training and test-time scaling of LLMs, with a focus on both reward models and learning strategies across training, inference, and po…

60 2 Updated Jun 13, 2025

Democratizing Reinforcement Learning for LLMs

Python 4,995 487 Updated Jan 18, 2026

Simple RL training for reasoning

Python 3,826 283 Updated Dec 23, 2025

This repository serves as a collection of research notes and resources on training large language models (LLMs) and Reinforcement Learning from Human Feedback (RLHF). It focuses on the latest resea…

Python 125 11 Updated Jul 28, 2025

Fully open reproduction of DeepSeek-R1

Python 25,825 2,411 Updated Nov 24, 2025

slime is an LLM post-training framework for RL Scaling.

Python 3,378 427 Updated Jan 18, 2026

An Efficient and User-Friendly Scaling Library for Reinforcement Learning with Large Language Models

Python 2,664 204 Updated Jan 18, 2026

A live stream development of RL tunning for LLM agents

Python 3,814 527 Updated Oct 8, 2025