Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@anndvision
Copy link
Member

@anndvision anndvision commented Nov 14, 2025

Implements an offline version GEPA, a prompt optimization algorithm that uses genetic programming to evolve prompt templates through iterative evaluation and mutation.

Key Files

  • Main optimization loop: tensorzero-optimizers/src/gepa/lib.rs - run_gepa_optimization() orchestrates the iterative evaluation, analysis, mutation cycle
  • Configuration types: tensorzero-core/src/optimization/gepa.rs - GEPAConfig, UninitializedGEPAConfig, and GEPAJobHandle definitions

Key Features

  • Multi-objective optimization with Pareto frontier management - finds prompts that balance multiple metrics (accuracy, latency, cost, etc.)
  • Instance-wise Pareto filtering - identifies locally optimal variants per datapoint, then globally filters dominated solutions
  • Frequency-based sampling - selects parent variants proportional to how often they're instance-optimal
  • Built-in analysis and mutation functions - tensorzero::optimization::gepa::analyze provides structured feedback, tensorzero::optimization::gepa::mutate generates improved templates
  • Graceful error handling - synchronous execution with proper failure modes when no improvements are found

Infrastructure Changes

  • Extended optimizer type system to support returning multiple variants (OptimizerOutput::Variants)
  • Made UninitializedChatTemplates::inner public

Testing

  • Comprehensive unit tests for Pareto logic, sampling, and validation
  • E2E tests for analysis and mutation built-in functions
  • Integration tests with both mock and live models (needs work)

TODO

  • make live and mock tests meaningful
  • add retries
  • use timeout
  • verify that loop continues on any error
  • add batch eval scores to analyze (reflect) function

- 29 tests covering Pareto frontier logic and helper functions
- Integration tests for 3-step GEPA algorithm
- Performance test: 10 variants × 100 datapoints
- Explain Step 1 generalization: max score → Pareto non-dominated
- Add rationale for aggressive missing data imputation
- Clarify design decisions for multi-evaluator support
- Add module-level imports for EvaluatorConfig, EvaluationConfig, GEPAConfig
- Extract compare_values helper to eliminate 38 lines of duplication
- Extract calculate_frequencies helper to eliminate duplicate logic
- Replace explicit loops with iterator chains (more idiomatic Rust)
- Fix (*variant_name).clone() → variant_name.to_string()
- All 29 tests passing, clippy clean
Copilot finished reviewing on behalf of anndvision November 15, 2025 01:38
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 37 out of 37 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings November 15, 2025 18:34
Copilot finished reviewing on behalf of anndvision November 15, 2025 18:35
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 37 out of 37 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings November 15, 2025 18:58
Copilot finished reviewing on behalf of anndvision November 15, 2025 18:59
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 37 out of 37 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings November 15, 2025 19:56
Copilot finished reviewing on behalf of anndvision November 15, 2025 19:57
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 37 out of 37 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@@ -0,0 +1,410 @@
//! Shared test helpers for GEPA e2e tests

#![allow(clippy::unwrap_used, clippy::expect_used, clippy::missing_panics_doc)]
Copy link

Copilot AI Nov 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent use of #![allow(...)] here versus #![expect(...)] in other test files (e.g., tests/common/gepa.rs). For consistency across the codebase, use the same approach in all test files.

Copilot uses AI. Check for mistakes.
//! These tests exercise the analyze_inferences component using real gateway clients
//! and the full inference pipeline.

#![allow(clippy::unwrap_used, clippy::expect_used)]
Copy link

Copilot AI Nov 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent use of #![allow(...)] versus #![expect(...)] used in tests/common/gepa.rs. Use the same lint suppression approach consistently across all test files in this PR.

Copilot uses AI. Check for mistakes.
//! These tests exercise the mutate_templates component using real gateway clients
//! and the full inference pipeline.

#![allow(clippy::unwrap_used, clippy::expect_used)]
Copy link

Copilot AI Nov 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent use of #![allow(...)] versus #![expect(...)] used in tests/common/gepa.rs. Use the same lint suppression approach consistently across all test files in this PR.

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings November 16, 2025 02:55
Copilot finished reviewing on behalf of anndvision November 16, 2025 02:56
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 41 out of 41 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants