REVOLVE: Optimizing AI Systems by Tracking Response Evolution in Textual Optimization

About

This is the code for paper: REVOLVE: Optimizing AI Systems by Tracking Response Evolution in Textual Optimization.
REVOLVE is an optimization framework that enhances the stability and efficiency of AI system optimization by tracking the evolution of model responses across iterations. Building on textual feedback from LLMs, Revolve simulates higher-order optimization effects, ensuring that adjustments are guided not only by immediate feedback but also by the model’s performance trajectory, leading to faster and more stable optimization without relying on traditional derivative-based methods.
REVOLVE offers an intuitive API, built upon the foundation of TextGrad, that allows users to define custom optimization tasks and loss functions. This makes it an adaptable and effective tool for optimizing LLM-based systems across a range of applications, including prompt optimization, solution refinement, and code optimization.

Installation

pip install revolve

Method Evaluation

Evaluating Solution Optimization

To evaluate solution optimization, you can use various LLMs as the evaluation engine. For example, we use the gpt-4o as the evaluation engine.

For GPQA_diamond dataset:

python evaluation/solution_optimization.py --task GPQA_diamond --engine gpt-4o --num_threads 10 --optimizer_version v2

For MMLU_machine_learning dataset:

python evaluation/solution_optimization.py --task MMLU_machine_learning --engine gpt-4o --num_threads 10 --optimizer_version v2

For MMLU_college_physics dataset:

python evaluation/solution_optimization.py --task MMLU_college_physics --engine gpt-4o --num_threads 10 --optimizer_version v2

Available Optimization Methods:

We provide multiple optimization methods for testing:

v1: Original TextGrad that optimizes based on textual feedback.
v1_momentum: Momentum-TextGrad which adjusts optimization steps using feedback trends across iterations.
v2: Our REVOLVE method that tracks response evolution over time for more stable and efficient optimization. You can use the --optimizer_version flag to select the desired method.

Evaluating Prompt Optimization

To evaluate prompt optimization, two LLMs need to be specified:

--backbone_engine: This is the LLM used by Revolve (or other optimizers) to perform the optimization process.
--model: This is the LLM on which the prompt is being optimized. For example, we use the gpt-4o as the backbone_engine, using gpt-3.5-turbo as the model:
For BBH_object_counting dataset:

python evaluation/prompt_optimization.py --task BBH_object_counting --backbone_engine gpt-4o --model gpt-3.5-turbo --num_threads 10 --optimizer_version v2

For GSM8K dataset:

python evaluation/prompt_optimization.py --task GSM8K_DSPy --backbone_engine gpt-4o --model gpt-3.5-turbo --num_threads 10 --optimizer_version v2

Evaluating Code Optimization

To evaluate code optimization, follow these steps:

Clone the leetcode-hard-gym repository:

git clone https://github.com/GammaTauAI/leetcode-hard-gym.git && cd leetcode-hard-gym

Install the package in editable mode:

python -m pip install -e .

Run the evaluation script:

python ./evaluation/code_optimization/leetcode_testtime_with_supervision.py --engine meta-llama/Meta-Llama-3.1-70B-Instruct --optimizer_version v1 (for TextGrad) / v1_momentum (for Momentum-TextGrad) / v2 (for Revolve) --size 200

BibTeX citation

If you find our work useful, please consider citing us:

@misc{zhang2024revolveoptimizingaisystems,
      title={Revolve: Optimizing AI Systems by Tracking Response Evolution in Textual Optimization}, 
      author={Peiyan Zhang and Haibo Jin and Leyang Hu and Xinnuo Li and Liying Kang and Man Luo and Yangqiu Song and Haohan Wang},
      year={2024},
      eprint={2412.03092},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2412.03092}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 214 Commits
.github/workflows		.github/workflows
assets		assets
dsp		dsp
dspy		dspy
evaluation		evaluation
revolve		revolve
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

REVOLVE: Optimizing AI Systems by Tracking Response Evolution in Textual Optimization

About

Installation

Method Evaluation

Evaluating Solution Optimization

Available Optimization Methods:

Evaluating Prompt Optimization

Evaluating Code Optimization

Related Links

BibTeX citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 5

Uh oh!

Languages

License

Peiyance/REVOLVE

Folders and files

Latest commit

History

Repository files navigation

REVOLVE: Optimizing AI Systems by Tracking Response Evolution in Textual Optimization

About

Installation

Method Evaluation

Evaluating Solution Optimization

Available Optimization Methods:

Evaluating Prompt Optimization

Evaluating Code Optimization

Related Links

BibTeX citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 5

Uh oh!

Languages

Packages