Thanks to visit codestin.com
Credit goes to github.com

hanningzhang

Follow

Hanning Zhang hanningzhang

Follow

MSCS at UIUC. Undergraduate at HKUST

16 followers · 21 following

UIUC
Illinois
01:48 (UTC -06:00)
hanningzhang.github.io

Achievements

Achievements

Stars

ethz-spylab / agentdojo

A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.

Python 347 88 Updated Oct 29, 2025

kfdong / STP

The official implementation of "Self-play LLM Theorem Provers with Iterative Conjecturing and Proving"

Python 112 9 Updated Mar 28, 2025

thinking-machines-lab / tinker-cookbook

Post-training with Tinker

Python 1,865 149 Updated Nov 12, 2025

MBZUAI-IFM / K2-Think-Inference

Python 48 5 Updated Sep 8, 2025

huggingface / lighteval

Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends

Python 2,108 377 Updated Nov 12, 2025

langchain-ai / langgraph

Build resilient language agents as graphs.

Python 20,952 3,683 Updated Nov 10, 2025

anthropics / claude-code

Claude Code is an agentic coding tool that lives in your terminal, understands your codebase, and helps you code faster by executing routine tasks, explaining complex code, and handling git workflo…

TypeScript 42,192 2,793 Updated Nov 12, 2025

THU-KEG / RM-Bench

[ICLR 25 Oral] RM-Bench: Benchmarking Reward Models of Language Models with Subtlety and Style

Python 66 3 Updated Jul 18, 2025

facebookresearch / aira-dojo

AIRA-dojo: a framework for developing and evaluating AI research agents

Python 110 18 Updated Sep 26, 2025

SWE-Gym / SWE-Gym

Code for Paper: Training Software Engineering Agents and Verifiers with SWE-Gym [ICML 2025]

Jupyter Notebook 573 35 Updated Jul 29, 2025

lastmile-ai / mcp-agent

Build effective agents using Model Context Protocol and simple workflow patterns

Python 7,725 782 Updated Nov 13, 2025

SWE-agent / SWE-agent

SWE-agent takes a GitHub issue and tries to automatically fix it, using your LM of choice. It can also be employed for offensive cybersecurity or competitive coding challenges. [NeurIPS 2024]

Python 17,768 1,877 Updated Nov 10, 2025

SWE-bench / SWE-bench

SWE-bench: Can Language Models Resolve Real-world Github Issues?

Python 3,786 686 Updated Oct 11, 2025

WecoAI / aideml

AIDE: AI-Driven Exploration in the Space of Code. The machine Learning engineering agent that automates AI R&D.

Python 1,070 157 Updated Nov 5, 2025

Farama-Foundation / Gymnasium

An API standard for single-agent reinforcement learning environments, with popular reference environments and related utilities (formerly Gym)

Python 10,616 1,181 Updated Nov 6, 2025

openai / gym

A toolkit for developing and comparing reinforcement learning algorithms.

Python 36,763 8,708 Updated Oct 11, 2024

LiqiangJing / DSBench

[ICLR 2025] DSBench: How Far are Data Science Agents from Becoming Data Science Experts?

Jupyter Notebook 81 6 Updated Aug 17, 2025

MLE-Dojo / MLE-Dojo

Python 78 5 Updated Oct 30, 2025

facebookresearch / swe-rl

[NeurIPS'25] Official codebase for "SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution"

Python 617 50 Updated Mar 16, 2025

openai / mle-bench

MLE-bench is a benchmark for measuring how well AI agents perform at machine learning engineering

Python 1,168 176 Updated Nov 11, 2025

MASWorks / ML-Agent

The official implementation of "ML-Agent: Reinforcing LLM Agents for Autonomous Machine Learning Engineering"

Python 50 4 Updated Jun 21, 2025

RUC-NLPIR / FlashRAG

⚡FlashRAG: A Python Toolkit for Efficient RAG Research (WWW2025 Resource)

Python 3,134 269 Updated Nov 8, 2025

huggingface / open-r1

Fully open reproduction of DeepSeek-R1

Python 25,634 2,398 Updated Sep 8, 2025

huggingface / Math-Verify

Python 999 46 Updated Jul 2, 2025

open-thoughts / open-thoughts

Fully open data curation for reasoning models

Python 2,141 178 Updated Sep 3, 2025

SakanaAI / AI-Scientist-v2

The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search

Python 1,767 337 Updated Oct 24, 2025

mll-lab-nu / RAGEN

RAGEN leverages reinforcement learning to train LLM reasoning agents in interactive, stochastic environments.

Python 2,395 186 Updated Nov 12, 2025

google-research / google-research

Google Research

Jupyter Notebook 36,720 8,237 Updated Nov 10, 2025

WeiXiongUST / Online-RLHF

Forked from RLHFlow/Online-RLHF

Python 2 Updated Apr 11, 2025

taesiri / ArXivQA

WIP - Automated Question Answering for ArXiv Papers with Large Language Models (https://arxiv.taesiri.xyz/)

Python 363 16 Updated Aug 25, 2025