An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & vLLM & Ray & Dynamic Sampling & Async Agentic RL)

Python 8,377 812 Updated Nov 9, 2025

huggingface / open-r1

Fully open reproduction of DeepSeek-R1

Python 25,640 2,399 Updated Sep 8, 2025

dottxt-ai / outlines

Structured Outputs

Python 12,892 646 Updated Oct 27, 2025

hijkzzz / Awesome-LLM-Strawberry

A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.

6,842 371 Updated Oct 17, 2025

EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.

Python 10,618 2,851 Updated Nov 11, 2025

ScalingIntelligence / CATS

Python 29 5 Updated Nov 11, 2024

HKUNLP / ChunkLlama

[ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"

Python 442 22 Updated Oct 16, 2024

ericwtodd / function_vectors

Function Vectors in Large Language Models (ICLR 2024)

Python 183 40 Updated Apr 17, 2025

hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)

Python 62,444 7,561 Updated Nov 13, 2025

S-LoRA / S-LoRA

S-LoRA: Serving Thousands of Concurrent LoRA Adapters

Python 1,867 115 Updated Jan 21, 2024

predibase / lorax

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

Python 3,532 283 Updated May 21, 2025

TUDB-Labs / mLoRA

An Efficient "Factory" to Build Multiple LoRA Adapters

Python 355 64 Updated Feb 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Saoyu99

Block or report Saoyu99

Stars

sierra-research / tau2-bench

zai-org / GLM-4.5

callsys / GMPO

yueliu1999 / GuardReasoner

volcengine / verl

qiancheng0 / ToolRL

huggingface / trl

QwenLM / Qwen-Agent

modelcontextprotocol / servers

Eclipsess / Awesome-Efficient-Reasoning-LLMs

QwenLM / Qwen3

fscdc / Awesome-Efficient-Reasoning-Models

sierra-research / tau-bench

zai-org / GLM-4

SkyworkAI / Skywork-OR1

tmgthb / Autonomous-Agents

BytedTsinghua-SIA / DAPO

HowieHwong / TrustLLM

OpenRLHF / OpenRLHF