The 100 line AI agent that solves GitHub issues or helps you in your command line. Radically simple, no huge configs, no giant monorepo—but scores >74% on SWE-bench verified!

Python 2,436 308 Updated Jan 7, 2026

google-gemini / gemini-cli

An open-source AI agent that brings the power of Gemini directly into your terminal.

TypeScript 90,085 10,409 Updated Jan 8, 2026

purpcode-uiuc / purpcode

🔮Reasoning for Safer Code Generation; 🥇Winner Solution of Amazon Nova AI Challenge 2025

Python 35 1 Updated Aug 24, 2025

QwenLM / qwen-code

An open-source AI agent that lives in your terminal.

TypeScript 17,164 1,486 Updated Jan 8, 2026

metauto-ai / agent-as-a-judge

👩‍⚖️ Agent-as-a-Judge: The Magic for Open-Endedness

Python 703 97 Updated May 14, 2025

eval-sys / mcpmark

MCPMark is a comprehensive, stress-testing MCP benchmark designed to evaluate model and agent capabilities in real-world MCP use.

Python 362 27 Updated Dec 30, 2025

microsoft / playwright

Playwright is a framework for Web Testing and Automation. It allows testing Chromium, Firefox and WebKit with a single API.

TypeScript 80,924 4,983 Updated Jan 8, 2026

browser-use / browser-use

🌐 Make websites accessible for AI agents. Automate tasks online with ease.

Python 74,823 8,938 Updated Jan 7, 2026

ag2ai / Agents_Failure_Attribution

Benchmark for automated failure attributions in agentic systems (🏆 ICML 2025 Spotlight)

Python 337 20 Updated Jan 6, 2026

n8n-io / n8n

Fair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations.

TypeScript 167,291 53,237 Updated Jan 7, 2026

laude-institute / terminal-bench

A benchmark for LLMs on complicated tasks in the terminal

Python 1,302 445 Updated Dec 26, 2025

lyang36 / IMO25

An AI agent system for solving International Mathematical Olympiad (IMO) problems using Google's Gemini, OpenAI, and XAI APIs.

Python 899 122 Updated Oct 1, 2025

facebookresearch / BigOBench

BigOBench assesses the capacity of Large Language Models (LLMs) to comprehend time-space computational complexity of input or generated code.

Python 39 5 Updated Apr 15, 2025

jlowin / fastmcp

🚀 The fast, Pythonic way to build MCP servers and clients

Python 21,771 1,631 Updated Jan 7, 2026

evalplus / evalplus

Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024

Python 1,665 188 Updated Oct 2, 2025

The-Pocket / PocketFlow

Pocket Flow: 100-line LLM framework. Let Agents build Agents!

Python 9,420 1,041 Updated Dec 24, 2025

huangd1999 / AgentCoder

AgentCoder: multi-agent code generation framework.

Python 368 74 Updated Nov 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AlphaPav

Achievements

Achievements

Highlights

Organizations

Block or report AlphaPav

Stars

relai-ai / relai-sdk

trailofbits / anamorpher

google-research / android_world

OSU-NLP-Group / Online-Mind2Web

MinorJerry / WebVoyager

open-thought / reasoning-gym

facebookresearch / Meta_SecAlign

uiuc-kang-lab / InjecAgent

facebookresearch / rl-injector

AI-secure / PolyGuard

algorithmicsuperintelligence / openevolve

docling-project / docling

microsoft / latent-zoning-networks

SWE-agent / mini-swe-agent