Highlights
- Pro
Lists (32)
Sort Name ascending (A-Z)
Agents
Algorithm💶
BigModelTraining
ChatGPT
Code
Conversation
DB
Finetune-Lightweight
Graph
Inc
interact
IR
KGC
KGE
Knowledge Summary
llama
LLM-Benchmark
LLM+Data
llm dataset
LLM Framework
LLM Tutorial
LM-RL
Reinforce learning for language modelmm
multimodalNLP+KG📚
Paper
PLM
Pretraining
RAG
structureddata
Tokenizer
Tools
work
Starred repositories
An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)
CSC-SQL: Corrective Self-Consistency in Text-to-SQL via Reinforcement Learning
Fully open reproduction of DeepSeek-R1
MCPMark is a comprehensive, stress-testing MCP benchmark designed to evaluate model and agent capabilities in real-world MCP use.
Postgres MCP Pro provides configurable read/write access and performance analysis for you and your AI agents.
τ²-Bench: Evaluating Conversational Agents in a Dual-Control Environment
[NeurIPS 2025] TTRL: Test-Time Reinforcement Learning
MUA-RL: MULTI-TURN USER-INTERACTING AGENT REINFORCEMENT LEARNING FOR AGENTIC TOOL USE
The raw UserRL repo under construction
VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications
The 100 line AI agent that solves GitHub issues or helps you in your command line. Radically simple, no huge configs, no giant monorepo—but scores >70% on SWE-bench verified!
An open-source AI agent that brings the power of Gemini directly into your terminal.
[TMLR] A curated list of language modeling researches for code (and other software engineering activities), plus related datasets.
verl: Volcano Engine Reinforcement Learning for LLMs
Code, benchmark and environment for "ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows"
[VLDB' 25] Synthesizing High-quality Text-to-SQL Data at Scale. SynSQL-2.5M is the first million-scale cross-domain text-to-SQL dataset.
Universal database MCP server connecting to MySQL, PostgreSQL, SQL Server, MariaDB.
[BIRD-INTERACT] Re-imagines Text-to-SQL evaluation via lens of dynamic interactions.
[PVLDB 2024 Best Paper Nomination] TFB: Towards Comprehensive and Fair Benchmarking of Time Series Forecasting Methods
[NeurIPS 2025] Atom of Thoughts for Markov LLM Test-Time Scaling
RAGEN leverages reinforcement learning to train LLM reasoning agents in interactive, stochastic environments.
TiDB - the open-source, cloud-native, distributed SQL database designed for modern applications.
An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & vLLM & Ray & Dynamic Sampling & Async Agentic RL)
Production-ready platform for agentic workflow development.
[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments