Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

chinganc
Copy link
Member

@chinganc chinganc commented Sep 12, 2025

This PR finishes the PrioritySearch

It's been tested on the convex optimization and some prompt optimization problems.
It supports

  • Running multiple optimizers in parallel
  • Branching out optimizers along their update chains. This is useful for using optimizer with memory and multiple optimizers.

Changes to core in PR:

  1. ModelWrapper is now moved to the top module as Model so that classed decorated by @trace.model can be pickled.
  2. The old implementation of Algorithm.save and Algorithm.load at the abstract class level are commented out. We will leave that to the subclasses.
  3. Some minor bug fixes.

NOTE
-Resuming experiments. will be done in future PRs

chinganc and others added 28 commits September 17, 2025 20:12
…emory

Add compress_candidate_memory method to PrioritySearch
add a saving method to pre-save source code
- Make the copied modules' parameter nodes have the same as the original one, so that optimizer's memory works.
- Add a flag to allow using the same optimizer instance across search
- Remove commented code
added GEPA in examples/priority_search_on_convex_fn.py
Copy link
Member

@allenanie allenanie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few files that need to be removed (including one that I accidentally committed/pushed). No immediate implementation problem spotted.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename this file to be gepa_on_convex_fn.py? @doxav

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is save/load removed here -- is it because this solution doesn't work and needs more testing or more customized implementation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the base Optimizer class now implements getstate and setstate, which skips parameters, so the optimizers can be pickled directly. There is save and load implemented there. What's done manually in the original code is basically doing that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the separation between Model and Module mostly for pickling?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Module is quite high level. It's also the parent of FunModule created by bundle.
Model is what is the user should use when building a traceable agent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we leave resume for a future feature/implementation?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GEPA should not be part of the Fix/priority search branch -- can it be a different one? @doxav any chance you can remove this from this PR? You can commit the convex_fn_BENCH to that PR too if you want.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess I don't know if @chinganc wants them in this PR lol -- I'll leave it to him to decide :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do it in a different PR. This PR is already quite large. Let's follow the principle to keep each PR targeted.
@doxav sorry for the trouble.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe another simpler resolution is to move GEPA under features in the PR?
In this way, the intention that the code is not fully tested is clear?

# Reset the index for the next epoch
self._i = 0
self.n_epochs += 1
"""Get the next batch of data, always of batch_size. If the dataset is smaller or at the end, the batch will include data from the next epoch after shuffling."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, is this the logic we want?
This will cause the step vs epoch distinction. I think a lot of trainer does this...so I'm ok if this is the route we want to go with...but just curious if there's any special rationale for this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to guarantee that the sampled batch is always of the same size.
In the many applications of generative optimization, what I found is that the training dataset size is much smaller than what we typically see in DL. So we see the boundary effects more often. A special degenerate case is single-node optimization, for which, the original sampler won't honor the batch size at all (even if you specify batch_size>0, the optimizer will only see one feedback as opposed to multiple feedback of batch_size).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see -- that makes sense!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the same concern as before -- GEPA should be its own PR (even for future references/code checks)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants