Stars
Implementing DeepSeek R1's GRPO algorithm from scratch
🌾 OAT: A research-friendly framework for LLM online alignment, including reinforcement learning, preference learning, etc.
A Framework for LLM-based Multi-Agent Reinforced Training and Inference
Low ReSource Reinforcement Learning with CPU Offloading Training Support
The absolute trainer to light up AI agents.
Our library for RL environments + evals
verl-agent is an extension of veRL, designed for training LLM/VLM agents via RL. verl-agent is also the official code for paper "Group-in-Group Policy Optimization for LLM Agent Training"
A version of verl to support diverse tool use
Agent Reinforcement Trainer: train multi-step agents for real-world tasks using GRPO. Give your agents on-the-job training. Reinforcement learning for Qwen2.5, Qwen3, Llama, and more!
A collection of prompts to challenge the reasoning abilities of large language models in presence of misguiding information
Official Repo for Open-Reasoner-Zero
A library with extensible implementations of DPO, KTO, PPO, ORPO, and other human-aware loss functions (HALOs).
A python module to repair invalid JSON from LLMs
Turns Data and AI algorithms into production-ready web applications in no time.
A simple tool to update bib entries with their official information (e.g., DBLP or the ACL anthology).
The Learning Interpretability Tool: Interactively analyze ML models to understand their behavior in an extensible and framework agnostic interface.
Hydra is a framework for elegantly configuring complex applications
Datasets, tools, and benchmarks for representation learning of code.
🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
One hundred challenge problems for logical formalizations of commonsense psychology
🔮 A refreshing functional take on deep learning, compatible with your favorite libraries
Documentation on how to access and use the Quick, Draw! Dataset.