Add Offline GEPA (Genetic-Pareto) Optimizer #4604

anndvision · 2025-11-14T03:15:26Z

Implements an offline version GEPA, a prompt optimization algorithm that uses genetic programming to evolve prompt templates through iterative evaluation and mutation.

Key Files

Main optimization loop: tensorzero-optimizers/src/gepa/lib.rs - run_gepa_optimization() orchestrates the iterative evaluation, analysis, mutation cycle
Configuration types: tensorzero-core/src/optimization/gepa.rs - GEPAConfig, UninitializedGEPAConfig, and GEPAJobHandle definitions

Key Features

Multi-objective optimization with Pareto frontier management - finds prompts that balance multiple metrics (accuracy, latency, cost, etc.)
Instance-wise Pareto filtering - identifies locally optimal variants per datapoint, then globally filters dominated solutions
Frequency-based sampling - selects parent variants proportional to how often they're instance-optimal
Built-in analysis and mutation functions - tensorzero::optimization::gepa::analyze provides structured feedback, tensorzero::optimization::gepa::mutate generates improved templates
Graceful error handling - synchronous execution with proper failure modes when no improvements are found

Infrastructure Changes

Extended optimizer type system to support returning multiple variants (OptimizerOutput::Variants)
Made UninitializedChatTemplates::inner public

Testing

Comprehensive unit tests for Pareto logic, sampling, and validation
E2E tests for analysis and mutation built-in functions
Integration tests with both mock and live models (needs work)

TODO

make live and mock tests meaningful
add retries
use timeout
verify that loop continues on any error
add batch eval scores to analyze (reflect) function

- 29 tests covering Pareto frontier logic and helper functions - Integration tests for 3-step GEPA algorithm - Performance test: 10 variants × 100 datapoints

- Explain Step 1 generalization: max score → Pareto non-dominated - Add rationale for aggressive missing data imputation - Clarify design decisions for multi-evaluator support

- Add module-level imports for EvaluatorConfig, EvaluationConfig, GEPAConfig - Extract compare_values helper to eliminate 38 lines of duplication - Extract calculate_frequencies helper to eliminate duplicate logic - Replace explicit loops with iterator chains (more idiomatic Rust) - Fix (*variant_name).clone() → variant_name.to_string() - All 29 tests passing, clippy clean

…core

…epa-optimizer

Copilot

Pull Request Overview

Copilot reviewed 37 out of 37 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull Request Overview

Copilot reviewed 37 out of 37 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull Request Overview

Copilot reviewed 37 out of 37 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull Request Overview

Copilot reviewed 37 out of 37 changed files in this pull request and generated 4 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

tensorzero-optimizers/tests/common/gepa.rs

Copilot · 2025-11-15T19:59:01Z

tensorzero-optimizers/tests/e2e/gepa/mod.rs

@@ -0,0 +1,410 @@
+//! Shared test helpers for GEPA e2e tests
+
+#![allow(clippy::unwrap_used, clippy::expect_used, clippy::missing_panics_doc)]


Inconsistent use of #![allow(...)] here versus #![expect(...)] in other test files (e.g., tests/common/gepa.rs). For consistency across the codebase, use the same approach in all test files.

Copilot · 2025-11-15T19:59:01Z

tensorzero-optimizers/tests/e2e/gepa/analyze.rs

+//! These tests exercise the analyze_inferences component using real gateway clients
+//! and the full inference pipeline.
+
+#![allow(clippy::unwrap_used, clippy::expect_used)]


Inconsistent use of #![allow(...)] versus #![expect(...)] used in tests/common/gepa.rs. Use the same lint suppression approach consistently across all test files in this PR.

Copilot · 2025-11-15T19:59:01Z

tensorzero-optimizers/tests/e2e/gepa/mutate.rs

+//! These tests exercise the mutate_templates component using real gateway clients
+//! and the full inference pipeline.
+
+#![allow(clippy::unwrap_used, clippy::expect_used)]


Inconsistent use of #![allow(...)] versus #![expect(...)] used in tests/common/gepa.rs. Use the same lint suppression approach consistently across all test files in this PR.

…point_input -> inference_input, document

…epa-optimizer

Copilot

Pull Request Overview

Copilot reviewed 41 out of 41 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

anndvision added 30 commits November 12, 2025 18:15

Add GEPA optimizer types and built-in functions to tensorzero-core

a82b2d6

add gepa lib functions

9007a49

don't use tools

23354c5

modularize

a6f6a8e

add config validation

855cb78

create dataset for pareto frontier

7e2c9bc

add proper UninitializedChatCompletionConfig extraction

7d50ae4

Add unit tests for GEPA Pareto module

32db166

- 29 tests covering Pareto frontier logic and helper functions - Integration tests for 3-step GEPA algorithm - Performance test: 10 variants × 100 datapoints

Document multi-objective extension and missing data rationale

213e38c

- Explain Step 1 generalization: max score → Pareto non-dominated - Add rationale for aggressive missing data imputation - Clarify design decisions for multi-evaluator support

no tools, and modern templates

27aae65

Remove unused config files and duplicate planning.md from tensorzero-…

dfdc9fb

…core

reoder import

8ed7869

Merge branch 'main' of github.com:tensorzero/tensorzero into andrew/g…

f75aac3

…epa-optimizer

move validation functions to module and add unit tests

e211026

refactor schemas and parallize loop

6c8c908

use correct loop for template

4cf188d

add unit tests for sampling and use rand crate

8b2959e

Merge branch 'andrew/gepa-sample' into andrew/gepa-analyze

c024b37

simplify docstrings

45dbe2c

evaluate variants, sample batch, sample variant

59ac29e

allow failed analyses

4aeda97

Merge branch 'andrew/gepa-analyze' into andrew/gepa-optimizer

5ddafac

use new schema and template format

f57a3c1

add e2e tests for analyze and fix input datatype

4a407a5

parallelize evals

575345c

simplify evaluations

cd01c3f

refactor mutate inputs

17b997a

add mutate tests

480294b

use new format

5de902c

Copilot finished reviewing on behalf of anndvision November 15, 2025 01:38

Copilot AI reviewed Nov 15, 2025

View reviewed changes

anndvision added 2 commits November 15, 2025 12:43

clean analyze e2e tests

6f230d0

clean mutate function tests

010e56f

Copilot AI review requested due to automatic review settings November 15, 2025 18:34

Copilot started reviewing on behalf of anndvision November 15, 2025 18:35 View session

Copilot finished reviewing on behalf of anndvision November 15, 2025 18:35

Copilot AI reviewed Nov 15, 2025

View reviewed changes

anndvision added 2 commits November 15, 2025 13:49

fix pareto datapoin_id fetching

0fd2185

coninue on error

c1f794e

Copilot AI review requested due to automatic review settings November 15, 2025 18:58

Copilot started reviewing on behalf of anndvision November 15, 2025 18:58 View session

Copilot finished reviewing on behalf of anndvision November 15, 2025 18:59

Copilot AI reviewed Nov 15, 2025

View reviewed changes

anndvision added 2 commits November 15, 2025 14:19

split concurrency over evals

ad64c39

clean up datasets

a6d6e33

Copilot AI review requested due to automatic review settings November 15, 2025 19:56

Copilot started reviewing on behalf of anndvision November 15, 2025 19:57 View session

Copilot finished reviewing on behalf of anndvision November 15, 2025 19:57

Copilot AI reviewed Nov 15, 2025

View reviewed changes

anndvision added 6 commits November 15, 2025 21:17

use meaningful test example (needs work)

b698766

render inference_input in mutate user schema if provided, rename data…

5b5e36c

…point_input -> inference_input, document

add max_tokens to init

b56eba4

update bindings

22c038a

Merge branch 'main' of github.com:tensorzero/tensorzero into andrew/g…

33779a7

…epa-optimizer

fix linting errors

18cafba

Copilot AI review requested due to automatic review settings November 16, 2025 02:55

Copilot started reviewing on behalf of anndvision November 16, 2025 02:56 View session

Copilot finished reviewing on behalf of anndvision November 16, 2025 02:56

Copilot AI reviewed Nov 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Offline GEPA (Genetic-Pareto) Optimizer #4604

Add Offline GEPA (Genetic-Pareto) Optimizer #4604

Uh oh!

anndvision commented Nov 14, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI Nov 15, 2025

Uh oh!

Copilot AI Nov 15, 2025

Uh oh!

Copilot AI Nov 15, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -0,0 +1,410 @@
		//! Shared test helpers for GEPA e2e tests

		#![allow(clippy::unwrap_used, clippy::expect_used, clippy::missing_panics_doc)]

Add Offline GEPA (Genetic-Pareto) Optimizer #4604

Are you sure you want to change the base?

Add Offline GEPA (Genetic-Pareto) Optimizer #4604

Uh oh!

Conversation

anndvision commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Key Files

Key Features

Infrastructure Changes

Testing

TODO

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Uh oh!

Copilot AI Nov 15, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 15, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 15, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

anndvision commented Nov 14, 2025 •

edited

Loading