Stars
τ²-Bench: Evaluating Conversational Agents in a Dual-Control Environment
GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models
[ICLR Workshop 2025] An official source code for paper "GuardReasoner: Towards Reasoning-based LLM Safeguards".
verl: Volcano Engine Reinforcement Learning for LLMs
Train transformer language models with reinforcement learning.
Agent framework and applications built upon Qwen>=3.0, featuring Function Calling, MCP, Code Interpreter, RAG, Chrome extension, etc.
Model Context Protocol Servers
[TMLR 2025] Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models
Qwen3 is the large language model series developed by Qwen team, Alibaba Cloud.
[TMLR 2025] Efficient Reasoning Models: A Survey
GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型
Unleashing the Power of Reinforcement Learning for Math and Code Reasoners
Autonomous Agents (LLMs) research papers. Updated Daily.
An Open-source RL System from ByteDance Seed and Tsinghua AIR
[ICML 2024] TrustLLM: Trustworthiness in Large Language Models
An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & vLLM & Ray & Dynamic Sampling & Async Agentic RL)
Fully open reproduction of DeepSeek-R1
A collection of LLM papers, blogs, and projects, with a focus on OpenAI o1 🍓 and reasoning techniques.
A framework for few-shot evaluation of language models.
[ICML'24] Data and code for our paper "Training-Free Long-Context Scaling of Large Language Models"
Function Vectors in Large Language Models (ICLR 2024)
Unified Efficient Fine-Tuning of 100+ LLMs & VLMs (ACL 2024)
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs
An Efficient "Factory" to Build Multiple LoRA Adapters