Dedup identical subtrees in Dcp2Cone and Dnlp2Smooth canonicalization#3355
Dedup identical subtrees in Dcp2Cone and Dnlp2Smooth canonicalization#3355PTNobel wants to merge 11 commits into
Conversation
Add a per-apply common-subexpression cache to Dcp2Cone so that structurally identical Expression subtrees share one canonicalized expression and one set of auxiliary constraints within a reduction pass. For cp.Problem(cp.Minimize(cp.norm1(x)), [cp.norm1(x) <= 1]) this collapses two epigraph variables and two pairs of abs-epigraph inequalities down to one of each. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Previously the cache embedded `Constant.value.tobytes()` in keys, which made the cache footprint scale with the problem's constant data (~8 MB for an LP 1000x500). Switch Constant keying to object identity, which still deduplicates the common case of a shared Constant reference and brings cache size below ~16 KB across representative problems. Also skip caching any subtree whose canonicalization went through the quad branch. Those canonicalizers emit SymbolicQuadForm markers that downstream code (replace_quad_forms, coeff_extractor) identifies by Python id and assumes are distinct per occurrence; sharing one across sites silently halves quadratic coefficients. Track this with a counter incremented on quad branches; don't cache if it advances under a node. Release the cache at the end of apply() so it does not outlive the reduction. Add a regression test using the QuadForm 0.5*qf + 0.5*qf pattern from test_qp_solvers.py::rep_quad_form. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
replace_quad_form previously gave the placeholder Variable the same id as the quad form it replaced. When the same SymbolicQuadForm appeared at multiple positions in the objective expression (e.g. via CSE, or via 0.5*qf + 0.5*qf with a shared qf), the placeholders collapsed onto a single row in get_var_offsets and the quadratic coefficient was halved. Mint a fresh placeholder id per replacement. quad_forms is keyed by placeholder id, so all downstream lookups continue to work; only the incidental identity quad_form.id == placeholder.id is dropped. With this in place, the CSE cache in Dcp2Cone can also dedup subtrees that go through the quad branch, so the _quad_canon_count guard is removed. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
By the time coeff_extractor invokes this, Dcp2Cone has already rewritten every QuadForm into either a SymbolicQuadForm or sum_squares, so the isinstance check on QuadForm is defensive rather than load-bearing. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
- Cache hits discard the stored constraints anyway, so store just the canonical expression. Keeps the per-apply working set smaller. - Add two tests that exercise Dcp2Cone(quad_obj=True): one for shared quad_over_lin subtrees within the objective (dedup to a single SymbolicQuadForm), one for the same subtree appearing in objective and constraint (different canonicalizations kept distinct by the affine_above component of the cache key). Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Moves the structural-key helpers from dcp2cone.py into a shared private module cvxpy/reductions/_cse.py so Dnlp2Smooth can reuse them, and wires an analogous per-apply cache into Dnlp2Smooth.canonicalize_tree. The NLP chain (Dnlp2Smooth -> NLP solver, via nlp_solving_chain.py) is not downstream of Dcp2Cone, so duplicate subtrees in DNLP problems were producing duplicate aux Variables and constraints; this PR fixes that the same way #3353 did for Dcp2Cone. Also tightens _constant_key: small Constants (<= 64 elements) are now keyed by value rather than id. This catches the case where two structurally identical user expressions embed distinct Constant objects for default scalar parameters (e.g. each cp.huber(x) call mints a fresh Constant(0.5) for the default M), which would otherwise defeat the merge. Large arrays stay id-keyed to avoid copying problem data into cache keys. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
cvxpy uses bare acronyms as module names (psd.py, soc_canon.py, scs_conif.py, dcp2cone/, dgp2dcp/, dnlp2smooth/, etc.) and does not prefix internal modules with an underscore. cse (Common Subexpression Elimination) is descriptive enough on its own. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
|
Benchmarks that have improved:
Benchmarks that have stayed the same: |
'CSE' is compiler-optimization jargon and not widely understood outside that niche. 'subexpression cache' describes what the module supports in terms that don't require CS expertise. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Two small fixes for PR #3355: 1. PR review nit: expr_key's docstring still claimed "Constants key by object identity," but small Constants (<= 64 elements) now key by value; updated to reflect both branches. 2. CI: test_copt_mi_socp_1 fails by ~7e-5 on the CSE-deduplicated formulation. The continuous SOCP relaxation (verified at high precision via CLARABEL) sits at x[0] = -0.78510, while COPT lands at -0.78503 with CSE -- both within typical MI-SOCP precision, but the test's hardcoded -0.78510265 expects 4 decimals. MOSEK, CPLEX, and SCIP already use places=3 on this same test for the same tolerance reason; align COPT. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Matches the existing MOSEK/CPLEX/SCIP places=3 lines, which carry no comment. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Constant.__init__ stores float64 ndarrays by reference (no copy in ndarray_interface.const_to_matrix), so two cp.Constant(arr) wrappers around the same source ndarray share _value. Keying on id(expr.value) in that case catches the dedup without copying bytes into the cache key. Restricted to float64 ndarrays because other dtypes go through astype(float64) (which copies) or scipy sparse's csc_array constructor (which builds a fresh wrapper), so the id-of-underlying branch only fires when sharing is real. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
|
This generally looks really good. @Transurgeon can you review the changes to DNLP? |
SteveDiamond
left a comment
There was a problem hiding this comment.
Reviewed with differential testing of shared-vs-unshared canonicalization across many atoms/solvers — the CSE correctness looks solid (the DAG-mutation risk is correctly defended by the fresh-placeholder-id fix, and the _affine_above_relevant key logic mirrors canonicalize_tree). A few inline notes below; the main one is the O(N²) canonicalization regression on deep expression trees.
| """ | ||
|
|
||
|
|
||
| def expr_key(expr): |
There was a problem hiding this comment.
Performance: O(N²) canonicalization regression on deep trees.
expr_key recursively walks the entire subtree to build a key, and canonicalize_tree calls it at every node with no memoization → O(N²) where canonicalization was O(N). The key tuples are themselves O(subtree)-sized, so per-node hashing/storage is also O(subtree) → O(N²) memory.
Measured on a depth-N cp.abs(-(-(…-x))) chain (toggling only the cache-key construction):
| depth | master | this PR | slowdown |
|---|---|---|---|
| 100 | 0.72 ms | 3.79 ms | 5× |
| 400 | 3.11 ms | 68.7 ms | 22× |
| 800 | 7.36 ms | 311 ms | 42× |
Master scales ×2 per doubling (linear); this PR scales ×4 (quadratic), and the factor widens with N. Flat n-ary sums are unaffected (CVXPY flattens associative ops) — it bites genuinely deep trees: nested abs/reshape/neg/index chains.
Suggested fix: thread a per-apply {id(expr): key} memo through expr_key so each node's key is built once and child keys are reused (one bottom-up O(N) pass). Same change applies to the Dnlp2Smooth call site.
| return (structural, bool(affine_above)) | ||
| return (structural, None) | ||
|
|
||
| def _affine_above_relevant(self, expr) -> bool: |
There was a problem hiding this comment.
This is a second, independent O(N²) whole-subtree walk: _make_cache_key calls _affine_above_relevant at every node (when quad_obj=True), and it recurses over the full subtree each time. Instrumented call count tracks N²/2 exactly (5,155 calls at depth 100; 80,605 at depth 400).
It can fold into the same bottom-up pass suggested for expr_key: a node is relevant iff it is quad-eligible or any child is relevant — O(1) amortized per node.
| cache_key = None | ||
| if isinstance(expr, Expression): | ||
| try: | ||
| cache_key = expr_key(expr) |
There was a problem hiding this comment.
Same O(N²) pattern as Dcp2Cone: expr_key(expr) is recomputed from scratch at every node with no reuse of child keys already built. The memoization fix (per-apply {id(expr): key} memo) should be applied here too.
| except (TypeError, ValueError): | ||
| return ("const", id(expr)) | ||
| if arr.size <= _CONSTANT_VALUE_HASH_MAX_SIZE: | ||
| return ("const-val", arr.shape, str(arr.dtype), arr.tobytes()) |
There was a problem hiding this comment.
Sparse Constants are silently mis-keyed here. CVXPY Constants frequently wrap SciPy sparse matrices, and np.asarray(sparse) returns a 0-dim object array of size 1 — which passes the arr.size <= 64 guard. arr.tobytes() then returns the 8 bytes of id(value) (verified), not the contents, under the 'const-val' label, with the real shape recorded as ().
This is not a correctness bug — within one apply() all Constants are live, so ids are unique and no false merge can occur — but it (a) defeats CSE entirely for every sparse-constant subtree and (b) is a fragile footgun: a "by-value" path that actually encodes a transient pointer and discards shape.
Suggest guarding arr.dtype != object before the tobytes() value branch, and handling sparse explicitly (e.g. key on (data.tobytes(), indices.tobytes(), indptr.tobytes(), shape)), falling back to id(expr) (the wrapper, kept alive by the problem tree) rather than id(value).
| # and user constraints are intentionally excluded so their IDs flow | ||
| # through to inverse_data unchanged. | ||
| cache_key = None | ||
| if isinstance(expr, Expression): |
There was a problem hiding this comment.
Minor / latent: Dcp2Cone explicitly excludes partial_problem from the cache (so embedded constraint ids flow through unchanged), but here the eligibility check is a bare isinstance(expr, Expression) — and PartialProblem is an Expression. It's currently latent (the NLP path doesn't support partial_optimize and would fail earlier), but it's an inconsistency that becomes a real id-collapse trap if NLP partial_optimize support is ever added. Worth mirroring the Dcp2Cone guard now.
| quad_form = expr.args[idx] | ||
| placeholder = Variable(quad_form.shape, | ||
| var_id=quad_form.id) | ||
| placeholder = Variable(quad_form.shape) |
There was a problem hiding this comment.
The fresh-id change is correct. One trivial follow-up: the comment at coeff_extractor.py:131 ("var_id is the placeholder's ID (= the SymbolicQuadForm's ID)") is now stale — the placeholder no longer shares the quad form's id. The logic is unaffected (orig_id is recomputed from quad_forms[var_id][2].args[0].id), but updating that comment in this PR would prevent someone from re-introducing the var_id=quad_form.id assumption.
Description
When the same Expression subtree appears in two places in a problem — e.g.
cp.norm1(x)in both the objective and a constraint — the recursive canonicalizers in Dcp2Cone and Dnlp2Smooth previously emitted a fresh set of auxiliary variables and epigraph constraints per occurrence. This PR adds a per-apply()structural-key cache so each canonicalizer fires once per structurally identical subtree.Worked example for the reported case:
The canonicalized problem now has one epigraph variable
tshared across the objective and the constraint, with one pair oft >= x,t >= -xinequalities.What's in the PR
cvxpy/reductions/subexpr_cache.py(new): shared structural-key helpers (expr_key,_constant_key,_hashable_value,UncacheableError). Keys treat two subtrees as equal exactly when they match on atom types, shapes,get_data()payloads, and Variable/Parameter ids at the leaves. Small Constants (≤ 64 elements) key by value so that the implicitConstant(0.5)minted forcp.huber(x)'s defaultMargument doesn't defeat the merge; larger arrays key byid()to avoid copying problem data into cache keys.Dcp2Cone.canonicalize_tree: per-apply()cache keyed on(expr_key, affine_above-if-relevant). Theaffine_abovecomponent is included only when the subtree could reach the quad-canonicalization branch (which depends onaffine_above); for purely cone-mode subtrees the result is independent of context and the merge is unconditional.Dnlp2Smooth.canonicalize_tree: per-apply()cache keyed purely onexpr_key.Dnlp2Smooth.canonicalize_exprdoesn't branch onaffine_above, so there's nothing else to include in the key.cvxpy/utilities/replace_quad_forms.py: a latent bug shown by the new CSE — when two occurrences of the sameSymbolicQuadFormshare an object after CSE, the QP coefficient extractor's placeholder Variable mechanism keyed byquad_form.idcollapsed them onto a single row and halved the quadratic coefficient. Fixed by minting a fresh placeholder id perreplace_quad_formcall. The QuadForm branch inreplace_quad_formsis now documented as defensive: Dcp2Cone rewrites every QuadForm into SymbolicQuadForm or sum_squares beforecoeff_extractorcalls this.Why this is one PR
The Dcp2Cone change came first, the Dnlp2Smooth change followed once we'd refactored the structural-key helpers into a shared module. They share enough surface (the keying helpers, the
_constant_keybehavior for default-parameter Constants, the audit pattern for downstream placeholder-id assumptions) that splitting them would just create review churn.Tests
New unit tests:
cvxpy/tests/test_dcp2cone_cse.py— 8 tests: scalar/vectornorm1dedup, distinct subtrees not merged, solve-matches-unduplicated, parameter subtree dedup, shared QuadForm solves correctly, quad-objective shared-subtree dedup, quad-objective cross-context (objective vs constraint) not merged.cvxpy/tests/test_dnlp2smooth_cse.py— 7 tests: shared huber across obj/constraint, shared pnorm across two constraints, distinct subtrees not merged, parameter subtree dedup, constraint id preservation under dedup, per-apply cache isolation, dangling-aux sanity.Existing tests run: full
cvxpy/tests/nlp_tests/(30 passed, 222 solver-skipped),test_qp_solvers,test_quad_form,test_quad_dpp,test_problem,test_atoms— all passing.Downstream audits (no analog of the
replace_quad_formsbug found)cvxpy/reductions/solvers/nlp_solvers/diff_engine/): pure recursive tree walker, var/param dicts key on.idfor lookup — that's the normal lookup pattern, unaffected by structural sharing.Type of change
Contribution checklist