Optimize any text parameter — prompts, code, agent architectures, configurations — using LLM-based reflection and Pareto-efficient evolutionary search.
Website | Quick Start | Paper | Blog | Discord
GEPA (Genetic-Pareto) is a framework for optimizing any system with textual parameters against any evaluation metric. Unlike RL or gradient-based methods that collapse execution traces into a single scalar reward, GEPA uses LLMs to read full execution traces — error messages, profiling data, reasoning logs — to diagnose why a candidate failed and propose targeted fixes. Through iterative reflection, mutation, and Pareto-aware selection, GEPA evolves high-performing variants with minimal evaluations.
If you can measure it, you can optimize it: prompts, code, agent architectures, scheduling policies, vector graphics, and more.
| 90x cheaper | Open-source models + GEPA beat Claude Opus 4.1 at Databricks |
| 35x faster than RL | 100–500 evaluations vs. 5,000–25,000+ for GRPO (paper) |
| 32% → 89% | ARC-AGI agent accuracy via architecture discovery |
| 40.2% cost savings | Cloud scheduling policy discovered by GEPA, beating expert heuristics |
| 55% → 82% | Coding agent resolve rate on Jinja via auto-learned skills |
| 50+ production uses | Across Shopify, Databricks, Dropbox, OpenAI, Pydantic, MLflow, Comet ML, and more |
"Both DSPy and (especially) GEPA are currently severely under hyped in the AI context engineering world" — Tobi Lutke, CEO, Shopify
pip install gepaTo install the latest from main:
pip install git+https://github.com/gepa-ai/gepa.gitOptimize a system prompt for math problems from the AIME benchmark in a few lines of code (full tutorial):
import gepa
trainset, valset, _ = gepa.examples.aime.init_dataset()
seed_prompt = {
"system_prompt": "You are a helpful assistant. Answer the question. "
"Put your final answer in the format '### <answer>'"
}
result = gepa.optimize(
seed_candidate=seed_prompt,
trainset=trainset,
valset=valset,
task_lm="openai/gpt-4.1-mini",
max_metric_calls=150,
reflection_lm="openai/gpt-5",
)
print("Optimized prompt:", result.best_candidate['system_prompt'])Result: GPT-4.1 Mini goes from 46.6% → 56.6% on AIME 2025 (+10 percentage points).
The most powerful way to use GEPA for prompt optimization is within DSPy, where it's available as dspy.GEPA. See dspy.GEPA tutorials for executable notebooks.
import dspy
optimizer = dspy.GEPA(
metric=your_metric,
max_metric_calls=150,
reflection_lm="openai/gpt-5",
)
optimized_program = optimizer.compile(student=MyProgram(), trainset=trainset, valset=valset)The optimize_anything API optimizes any text artifact — code, agent architectures, configurations, SVGs — not just prompts. You provide an evaluator; the system handles the search.
import gepa.optimize_anything as oa
from gepa.optimize_anything import optimize_anything, GEPAConfig, EngineConfig
def evaluate(candidate: str) -> float:
result = run_my_system(candidate)
oa.log(f"Output: {result.output}") # Actionable Side Information
oa.log(f"Error: {result.error}") # feeds back into reflection
return result.score
result = optimize_anything(
seed_candidate="<your initial artifact>",
evaluator=evaluate,
objective="Describe what you want to optimize for.",
config=GEPAConfig(engine=EngineConfig(max_metric_calls=100)),
)Traditional optimizers know that a candidate failed but not why. GEPA takes a different approach:
- Select a candidate from the Pareto frontier (candidates excelling on different task subsets)
- Execute on a minibatch, capturing full execution traces
- Reflect — an LLM reads the traces (error messages, profiler output, reasoning logs) and diagnoses failures
- Mutate — generate an improved candidate informed by accumulated lessons from all ancestors
- Accept — add to the pool if improved, update the Pareto front
GEPA also supports system-aware merge — combining strengths of two Pareto-optimal candidates excelling on different tasks. The key concept is Actionable Side Information (ASI): diagnostic feedback returned by evaluators that serves as the text-optimization analogue of a gradient.
For details, see the paper and the documentation.
GEPA connects to your system via the GEPAAdapter interface — implement evaluate and make_reflective_dataset, and GEPA handles the rest.
Built-in adapters:
| Adapter | Description |
|---|---|
| DefaultAdapter | System prompt optimization for single-turn LLM tasks |
| DSPy Full Program | Evolves entire DSPy programs (signatures, modules, control flow). 67% → 93% on MATH. |
| Generic RAG | Vector store-agnostic RAG optimization (ChromaDB, Weaviate, Qdrant, Pinecone) |
| MCP Adapter | Optimize MCP tool descriptions and system prompts |
| TerminalBench | Optimize the Terminus terminal-use agent |
| AnyMaths | Mathematical problem-solving and reasoning tasks |
See the adapters guide for how to build your own, and DSPy's adapter as a reference.
GEPA is integrated into several major frameworks:
- DSPy —
dspy.GEPAfor optimizing DSPy programs. Tutorials. - MLflow —
mlflow.genai.optimize_prompts()for automatic prompt improvement. - Comet ML Opik — Core optimization algorithm in Opik Agent Optimizer.
- Pydantic — Prompt optimization for Pydantic AI.
- OpenAI Cookbook — Self-evolving agents with GEPA.
- HuggingFace Cookbook — Prompt optimization guide.
- Google ADK — Optimizing Google Agent Development Kit agents.
GEPA can be thought of as precomputing reasoning during optimization to produce a plan for future task instances. Here are examples of the detailed prompts GEPA discovers:
- Expensive rollouts — Scientific simulations, complex agents with tool calls, slow compilation. GEPA needs 100–500 evals vs 10K+ for RL.
- Scarce data — Works with as few as 3 examples. No large training sets required.
- API-only models — No weights access needed. Optimize GPT-5, Claude, Gemini directly through their APIs.
- Interpretability — Human-readable optimization traces show why each prompt changed.
- Complements RL — Use GEPA for rapid initial optimization, then apply RL/fine-tuning for additional gains (BetterTogether).
- Paper: GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning (arXiv:2507.19457)
- Experiment reproduction artifact: GEPA Artifact Repository
- Talk Slides: GEPA Talk Slides
- Blog Posts:
- Tutorials & Examples:
- dspy.GEPA Tutorials, with executable notebooks Step-by-step notebooks showing how to use GEPA for practical optimization tasks via DSPy, including math, structured data extraction for enterprise tasks and privacy conscious delegation task.
- Video tutorial by @weaviate on using dspy.GEPA to optimize a listwise reranker
- Matei Zaharia - Reflective Optimization of Agents with GEPA and DSPy
- Building and optimizing a multi-agent system for healthcare domain using DSPy+GEPA
- Social and Discussion:
- X (formerly Twitter) Announcement Thread (Lakshya A Agrawal)
- GEPA covered by VentureBeat
- GEPA's use by Databricks covered by VentureBeat
- Stay up to date:
- Questions, Discussions?
- GEPA Integrations:
Want to use GEPA in other frameworks?
- DSPy Adapter Code (integrates GEPA with DSPy),
- MLflow Prompt Optimization - GEPA is integrated into MLflow's
mlflow.genai.optimize_prompts()API for automatic prompt improvement using evaluation metrics and training data. Works with any agent framework and supports multi-prompt optimization. - Pydantic AI - Prompt optimization for Pydantic AI.
- Comet ML Opik - Core optimization algorithm in Opik Agent Optimizer.
- OpenAI Cookbook - Self-evolving agents with GEPA.
- HuggingFace Cookbook - Prompt optimization guide.
- Google ADK - Optimizing Google Agent Development Kit agents.
- Contributed Adapters – see our adapter templates and issue tracker to request new integrations.
- DefaultAdapter - System Prompt Optimization for a single-turn task.
- DSPy Full Program Adapter - Evolves entire DSPy programs including signatures, modules, and control flow. Achieves 93% accuracy on MATH benchmark (vs 67% with basic DSPy ChainOfThought).
- Generic RAG Adapter - Vector store-agnostic RAG optimization supporting ChromaDB, Weaviate, Qdrant, Pinecone, and more. Optimizes query reformulation, context synthesis, answer generation, and document reranking prompts.
- MCP Adapter - Optimize Model Context Protocol (MCP) tool usage. Supports local stdio servers, remote SSE/HTTP servers, and optimizes tool descriptions and system prompts.
- TerminalBench Adapter - Easily integrating GEPA into a Terminus, a sophisticated external agentic pipeline, and optimizing the agents' system prompt.
- AnyMaths Adapter - Adapter for optimizing mathematical problem-solving and reasoning tasks. Contributed by @egmaminta.
- GEPA uses
- Context Compression using GEPA
- GEPA Integration into SuperOptiX-AI
- GEPA for Observable Javascript
- bandit_dspy
- GEPA in Go Programming Language
- 100% accuracy using GEPA on the clock-hands problem
- Prompt Optimization for Reliable Backdoor Detection in AI-Generated Code
- Teaching LLMs to Diagnose Production Incidents with ATLAS+GEPA
- DataBricks: Building State-of-the-Art Enterprise Agents 90x Cheaper with GEPA
- comet-ml/opik adds support for GEPA
- Tuning small models (Gemma3-1B) for writing fiction
- Cut OCR Error Rates by upto 38% across model classes (Gemini 2.5 Pro, 2.5 Flash, 2.0 Flash)
- Optimizing a Data Analysis coding agent with GEPA, using execution-guided feedback on real-world workloads
- Generating Naruto (Anime) style dialogues with GPT-4o-mini using GEPA
- Augmenting RL-tuned models with GEPA: Achieving +142% student performance improvement by augmenting a RL-tuned teacher with GEPA
- DeepResearch Agent Optimized with GEPA
- Boosting Sanskrit QA: Finetuning EmbeddingGemma with 50k GEPA generated synthetic data samples (Tweet), (Code)
- Simulating Realistic Market Research Focus Groups with GEPA-Optimized AI Personas
- Optimizing Google ADK Agents' SOP using GEPA
- HuggingFace Cookbook on prompt optimization for with DSPy and GEPA
- OpenAI Cookbook showing how to build self-evolving agents using GEPA
We welcome adapters, bug fixes, and new use cases. See src/gepa/adapters/ for adapter examples and the contributing guide.
Want to highlight your use case? Reach out to [email protected] or submit via GitHub.
@misc{agrawal2025gepareflectivepromptevolution,
title={GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning},
author={Lakshya A Agrawal and Shangyin Tan and Dilara Soylu and Noah Ziems and Rishi Khare and Krista Opsahl-Ong and Arnav Singhvi and Herumb Shandilya and Michael J Ryan and Meng Jiang and Christopher Potts and Koushik Sen and Alexandros G. Dimakis and Ion Stoica and Dan Klein and Matei Zaharia and Omar Khattab},
year={2025},
eprint={2507.19457},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2507.19457},
}