Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View Tebmer's full-sized avatar
👋
👋

Highlights

  • Pro

Block or report Tebmer

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)

Python 10,457 1,165 Updated Oct 22, 2025

CSC-SQL: Corrective Self-Consistency in Text-to-SQL via Reinforcement Learning

Python 41 5 Updated Aug 12, 2025

Fully open reproduction of DeepSeek-R1

Python 25,567 2,397 Updated Sep 8, 2025

MCPMark is a comprehensive, stress-testing MCP benchmark designed to evaluate model and agent capabilities in real-world MCP use.

Python 286 17 Updated Oct 22, 2025

Postgres MCP Pro provides configurable read/write access and performance analysis for you and your AI agents.

Python 1,370 153 Updated May 16, 2025

τ²-Bench: Evaluating Conversational Agents in a Dual-Control Environment

Python 364 63 Updated Oct 13, 2025

[NeurIPS 2025] TTRL: Test-Time Reinforcement Learning

Python 870 64 Updated Sep 26, 2025

MUA-RL: MULTI-TURN USER-INTERACTING AGENT REINFORCEMENT LEARNING FOR AGENTIC TOOL USE

Python 35 Updated Sep 29, 2025

The raw UserRL repo under construction

Python 63 7 Updated Sep 25, 2025

VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications

Python 15 Updated Oct 17, 2025

The 100 line AI agent that solves GitHub issues or helps you in your command line. Radically simple, no huge configs, no giant monorepo—but scores >70% on SWE-bench verified!

Python 1,916 200 Updated Oct 21, 2025

An open-source AI agent that brings the power of Gemini directly into your terminal.

TypeScript 80,141 8,812 Updated Oct 23, 2025

[TMLR] A curated list of language modeling researches for code (and other software engineering activities), plus related datasets.

2,975 193 Updated Oct 15, 2025

verl: Volcano Engine Reinforcement Learning for LLMs

Python 14,642 2,332 Updated Oct 23, 2025

Code, benchmark and environment for "ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows"

Python 112 10 Updated Aug 28, 2025

[VLDB' 25] Synthesizing High-quality Text-to-SQL Data at Scale. SynSQL-2.5M is the first million-scale cross-domain text-to-SQL dataset.

Python 357 40 Updated Sep 8, 2025

Universal database MCP server connecting to MySQL, PostgreSQL, SQL Server, MariaDB.

TypeScript 1,453 129 Updated Oct 12, 2025

🙌 OpenHands: Code Less, Make More

Python 64,399 7,808 Updated Oct 22, 2025

[BIRD-INTERACT] Re-imagines Text-to-SQL evaluation via lens of dynamic interactions.

Python 319 12 Updated Oct 21, 2025
Python 109 5 Updated Oct 21, 2025

[PVLDB 2024 Best Paper Nomination] TFB: Towards Comprehensive and Fair Benchmarking of Time Series Forecasting Methods

Shell 1,049 77 Updated Oct 15, 2025

[NeurIPS 2025] Atom of Thoughts for Markov LLM Test-Time Scaling

Python 590 51 Updated Jun 16, 2025

RAGEN leverages reinforcement learning to train LLM reasoning agents in interactive, stochastic environments.

Python 2,364 183 Updated Oct 23, 2025

The Desktop AgentOS.

Python 7,668 934 Updated Sep 5, 2025

TiDB - the open-source, cloud-native, distributed SQL database designed for modern applications.

Go 39,202 6,038 Updated Oct 23, 2025

An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray (PPO & GRPO & REINFORCE++ & vLLM & Ray & Dynamic Sampling & Async Agentic RL)

Python 8,211 804 Updated Oct 23, 2025

Production-ready platform for agentic workflow development.

TypeScript 117,085 18,088 Updated Oct 23, 2025

[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

Python 2,258 313 Updated Oct 23, 2025
Next