Dedup identical subtrees in Dcp2Cone canonicalization#3353
Closed
PTNobel wants to merge 5 commits into
Closed
Conversation
Add a per-apply common-subexpression cache to Dcp2Cone so that structurally identical Expression subtrees share one canonicalized expression and one set of auxiliary constraints within a reduction pass. For cp.Problem(cp.Minimize(cp.norm1(x)), [cp.norm1(x) <= 1]) this collapses two epigraph variables and two pairs of abs-epigraph inequalities down to one of each. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Previously the cache embedded `Constant.value.tobytes()` in keys, which made the cache footprint scale with the problem's constant data (~8 MB for an LP 1000x500). Switch Constant keying to object identity, which still deduplicates the common case of a shared Constant reference and brings cache size below ~16 KB across representative problems. Also skip caching any subtree whose canonicalization went through the quad branch. Those canonicalizers emit SymbolicQuadForm markers that downstream code (replace_quad_forms, coeff_extractor) identifies by Python id and assumes are distinct per occurrence; sharing one across sites silently halves quadratic coefficients. Track this with a counter incremented on quad branches; don't cache if it advances under a node. Release the cache at the end of apply() so it does not outlive the reduction. Add a regression test using the QuadForm 0.5*qf + 0.5*qf pattern from test_qp_solvers.py::rep_quad_form. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
|
Benchmarks that have improved:
Benchmarks that have stayed the same: |
replace_quad_form previously gave the placeholder Variable the same id as the quad form it replaced. When the same SymbolicQuadForm appeared at multiple positions in the objective expression (e.g. via CSE, or via 0.5*qf + 0.5*qf with a shared qf), the placeholders collapsed onto a single row in get_var_offsets and the quadratic coefficient was halved. Mint a fresh placeholder id per replacement. quad_forms is keyed by placeholder id, so all downstream lookups continue to work; only the incidental identity quad_form.id == placeholder.id is dropped. With this in place, the CSE cache in Dcp2Cone can also dedup subtrees that go through the quad branch, so the _quad_canon_count guard is removed. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
By the time coeff_extractor invokes this, Dcp2Cone has already rewritten every QuadForm into either a SymbolicQuadForm or sum_squares, so the isinstance check on QuadForm is defensive rather than load-bearing. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
- Cache hits discard the stored constraints anyway, so store just the canonical expression. Keeps the per-apply working set smaller. - Add two tests that exercise Dcp2Cone(quad_obj=True): one for shared quad_over_lin subtrees within the objective (dedup to a single SymbolicQuadForm), one for the same subtree appearing in objective and constraint (different canonicalizations kept distinct by the affine_above component of the cache key). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
9 tasks
Collaborator
Author
|
Combined into #3355, which now targets master directly. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Currently,
cp.Problem(cp.Minimize(cp.norm1(x)), [cp.norm1(x) <= 1])canonicalizes the two separately-constructedcp.norm1(x)expressions into independent epigraph variables and constraint pairs. This PR adds a per-apply()common-subexpression cache toDcp2Coneso structurally identical Expression subtrees share one canonicalized expression and one set of auxiliary constraints within a single reduction pass.For the example above, the canonicalized data the solver sees shrinks from 3 scalar variables / 5 constraint rows down to 2 scalar variables / 3 constraint rows.
How it works
Dcp2Coneinstance and is reset at the top of everyapply()call.get_data(), child keys, leaf cvxpy ids forVariable/Parameter, and shape+dtype+value bytes forConstant.affine_aboveis folded into the key only when a quad-canon-eligible descendant could make canonicalization depend on it; otherwise identical subtrees deduplicate across objective and constraint contexts (which start with differentaffine_abovevalues).partial_problemsubtrees are excluded so their IDs flow intoinverse_dataunchanged.get_data()entry, that subtree is silently skipped rather than risking incorrect reuse.Type of change
Contribution checklist
Test plan
cvxpy/tests/test_dcp2cone_cse.pycovers scalar dedup, vector dedup, distinct-subtree non-merge (norm1(x)vsnorm1(-x)), end-to-end solve equivalence, and dedup across a sharedParametersubtree.pytest cvxpy/tests/test_problem.py cvxpy/tests/test_qp_solvers.py cvxpy/tests/test_atoms.py cvxpy/tests/test_dgp2dcp.py cvxpy/tests/test_dqcp.pyall green.🤖 Generated with Claude Code