Thanks to visit codestin.com
Credit goes to github.com

Skip to content
View hanningzhang's full-sized avatar

Block or report hanningzhang

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.

Python 347 88 Updated Oct 29, 2025

The official implementation of "Self-play LLM Theorem Provers with Iterative Conjecturing and Proving"

Python 112 9 Updated Mar 28, 2025

Post-training with Tinker

Python 1,865 149 Updated Nov 12, 2025

Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends

Python 2,108 377 Updated Nov 12, 2025

Build resilient language agents as graphs.

Python 20,952 3,683 Updated Nov 10, 2025

Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflo…

TypeScript 42,192 2,793 Updated Nov 12, 2025

[ICLR 25 Oral] RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style

Python 66 3 Updated Jul 18, 2025

AIRA-dojo: a framework for developing and evaluating AI research agents

Python 110 18 Updated Sep 26, 2025

Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym [ICML 2025]

Jupyter Notebook 573 35 Updated Jul 29, 2025

Build effective agents using Model Context Protocol and simple workflow patterns

Python 7,725 782 Updated Nov 13, 2025

SWE-agent takes a GitHub issue and tries to automatically fix it, using your LM of choice. It can also be employed for offensive cybersecurity or competitive coding challenges. [NeurIPS 2024]

Python 17,768 1,877 Updated Nov 10, 2025

SWE-bench: Can Language Models Resolve Real-world Github Issues?

Python 3,786 686 Updated Oct 11, 2025

AIDE: AI-Driven Exploration in the Space of Code. The machine Learning engineering agent that automates AI R&D.

Python 1,070 157 Updated Nov 5, 2025

An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)

Python 10,616 1,181 Updated Nov 6, 2025

A toolkit for developing and comparing reinforcement learning algorithms.

Python 36,763 8,708 Updated Oct 11, 2024

[ICLR 2025] DSBench: How Far are Data Science Agents from Becoming Data Science Experts?

Jupyter Notebook 81 6 Updated Aug 17, 2025
Python 78 5 Updated Oct 30, 2025

[NeurIPS'25] Official codebase for "SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution"

Python 617 50 Updated Mar 16, 2025

MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering

Python 1,168 176 Updated Nov 11, 2025

The official implementation of "ML-Agent: Reinforcing LLM Agents for Autonomous Machine Learning Engineering"

Python 50 4 Updated Jun 21, 2025

⚡FlashRAG: A Python Toolkit for Efficient RAG Research (WWW2025 Resource)

Python 3,134 269 Updated Nov 8, 2025

Fully open reproduction of DeepSeek-R1

Python 25,634 2,398 Updated Sep 8, 2025
Python 999 46 Updated Jul 2, 2025

Fully open data curation for reasoning models

Python 2,141 178 Updated Sep 3, 2025

The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search

Python 1,767 337 Updated Oct 24, 2025

RAGEN leverages reinforcement learning to train LLM reasoning agents in interactive, stochastic environments.

Python 2,395 186 Updated Nov 12, 2025

Google Research

Jupyter Notebook 36,720 8,237 Updated Nov 10, 2025
Python 2 Updated Apr 11, 2025

WIP - Automated Question Answering for ArXiv Papers with Large Language Models (https://arxiv.taesiri.xyz/)

Python 363 16 Updated Aug 25, 2025
Next