Thanks to visit codestin.com
Credit goes to Github.com

Skip to content
View AlphaPav's full-sized avatar
🏠
Working from home
🏠
Working from home

Highlights

  • Pro

Organizations

@AI-secure

Block or report AlphaPav

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

A platform for building reliable AI agents

Python 88 5 Updated Dec 23, 2025

image scaling attacks for multi-modal prompt injection

Python 1,012 89 Updated Sep 4, 2025

AndroidWorld is an environment and benchmark for autonomous agents

Python 572 116 Updated Nov 24, 2025

An Illusion of Progress? Assessing the Current State of Web Agents

Python 132 7 Updated Jan 2, 2026

Code for "WebVoyager: WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models"

Python 990 111 Updated Mar 4, 2024

[NeurIPS 2025 Spotlight] Reasoning Environments for Reinforcement Learning with Verifiable Rewards

Python 1,300 108 Updated Dec 15, 2025

Repo for the paper "Meta SecAlign: A Secure Foundation LLM Against Prompt Injection Attacks".

Python 39 10 Updated Jan 6, 2026
Python 104 16 Updated Jul 2, 2024

Official release of code for the paper RL is a hammer and LLMs are nails A simple RL approach to stronger prompt injection attacks

Python 34 4 Updated Nov 24, 2025
Python 16 2 Updated Jun 18, 2025

Open-source implementation of AlphaEvolve

Python 5,057 787 Updated Dec 24, 2025

Get your documents ready for gen AI

Python 49,361 3,430 Updated Jan 7, 2026

[NeurIPS 2025] Latent Zoning Networks

Python 56 3 Updated Oct 29, 2025

The 100 line AI agent that solves GitHub issues or helps you in your command line. Radically simple, no huge configs, no giant monorepo—but scores >74% on SWE-bench verified!

Python 2,436 308 Updated Jan 7, 2026

An open-source AI agent that brings the power of Gemini directly into your terminal.

TypeScript 90,085 10,409 Updated Jan 8, 2026

🔮Reasoning for Safer Code Generation; 🥇Winner Solution of Amazon Nova AI Challenge 2025

Python 35 1 Updated Aug 24, 2025

An open-source AI agent that lives in your terminal.

TypeScript 17,164 1,486 Updated Jan 8, 2026

👩‍⚖️ Agent-as-a-Judge: The Magic for Open-Endedness

Python 703 97 Updated May 14, 2025

MCPMark is a comprehensive, stress-testing MCP benchmark designed to evaluate model and agent capabilities in real-world MCP use.

Python 362 27 Updated Dec 30, 2025

Playwright is a framework for Web Testing and Automation. It allows testing Chromium, Firefox and WebKit with a single API.

TypeScript 80,924 4,983 Updated Jan 8, 2026

🌐 Make websites accessible for AI agents. Automate tasks online with ease.

Python 74,823 8,938 Updated Jan 7, 2026

Benchmark for automated failure attributions in agentic systems (🏆 ICML 2025 Spotlight)

Python 337 20 Updated Jan 6, 2026

Fair-code workflow automation platform with native AI capabilities. Combine visual building with custom code, self-host or cloud, 400+ integrations.

TypeScript 167,291 53,237 Updated Jan 7, 2026

A benchmark for LLMs on complicated tasks in the terminal

Python 1,302 445 Updated Dec 26, 2025

An AI agent system for solving International Mathematical Olympiad (IMO) problems using Google's Gemini, OpenAI, and XAI APIs.

Python 899 122 Updated Oct 1, 2025

BigOBench assesses the capacity of Large Language Models (LLMs) to comprehend time-space computational complexity of input or generated code.

Python 39 5 Updated Apr 15, 2025

🚀 The fast, Pythonic way to build MCP servers and clients

Python 21,771 1,631 Updated Jan 7, 2026

Rigourous evaluation of LLM-synthesized code - NeurIPS 2023 & COLM 2024

Python 1,665 188 Updated Oct 2, 2025

Pocket Flow: 100-line LLM framework. Let Agents build Agents!

Python 9,420 1,041 Updated Dec 24, 2025

AgentCoder: multi-agent code generation framework.

Python 368 74 Updated Nov 18, 2025
Next