Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Dedup identical subtrees in Dcp2Cone and Dnlp2Smooth canonicalization#3355

Open
PTNobel wants to merge 11 commits into
masterfrom
ptn/dnlp2smooth-cse
Open

Dedup identical subtrees in Dcp2Cone and Dnlp2Smooth canonicalization#3355
PTNobel wants to merge 11 commits into
masterfrom
ptn/dnlp2smooth-cse

Conversation

@PTNobel
Copy link
Copy Markdown
Collaborator

@PTNobel PTNobel commented May 27, 2026

Description

When the same Expression subtree appears in two places in a problem — e.g. cp.norm1(x) in both the objective and a constraint — the recursive canonicalizers in Dcp2Cone and Dnlp2Smooth previously emitted a fresh set of auxiliary variables and epigraph constraints per occurrence. This PR adds a per-apply() structural-key cache so each canonicalizer fires once per structurally identical subtree.

Worked example for the reported case:

cp.Problem(cp.Minimize(cp.norm1(x)), [cp.norm1(x) <= 1])

The canonicalized problem now has one epigraph variable t shared across the objective and the constraint, with one pair of t >= x, t >= -x inequalities.

What's in the PR

  • cvxpy/reductions/subexpr_cache.py (new): shared structural-key helpers (expr_key, _constant_key, _hashable_value, UncacheableError). Keys treat two subtrees as equal exactly when they match on atom types, shapes, get_data() payloads, and Variable/Parameter ids at the leaves. Small Constants (≤ 64 elements) key by value so that the implicit Constant(0.5) minted for cp.huber(x)'s default M argument doesn't defeat the merge; larger arrays key by id() to avoid copying problem data into cache keys.
  • Dcp2Cone.canonicalize_tree: per-apply() cache keyed on (expr_key, affine_above-if-relevant). The affine_above component is included only when the subtree could reach the quad-canonicalization branch (which depends on affine_above); for purely cone-mode subtrees the result is independent of context and the merge is unconditional.
  • Dnlp2Smooth.canonicalize_tree: per-apply() cache keyed purely on expr_key. Dnlp2Smooth.canonicalize_expr doesn't branch on affine_above, so there's nothing else to include in the key.
  • cvxpy/utilities/replace_quad_forms.py: a latent bug shown by the new CSE — when two occurrences of the same SymbolicQuadForm share an object after CSE, the QP coefficient extractor's placeholder Variable mechanism keyed by quad_form.id collapsed them onto a single row and halved the quadratic coefficient. Fixed by minting a fresh placeholder id per replace_quad_form call. The QuadForm branch in replace_quad_forms is now documented as defensive: Dcp2Cone rewrites every QuadForm into SymbolicQuadForm or sum_squares before coeff_extractor calls this.

Why this is one PR

The Dcp2Cone change came first, the Dnlp2Smooth change followed once we'd refactored the structural-key helpers into a shared module. They share enough surface (the keying helpers, the _constant_key behavior for default-parameter Constants, the audit pattern for downstream placeholder-id assumptions) that splitting them would just create review churn.

Tests

New unit tests:

  • cvxpy/tests/test_dcp2cone_cse.py — 8 tests: scalar/vector norm1 dedup, distinct subtrees not merged, solve-matches-unduplicated, parameter subtree dedup, shared QuadForm solves correctly, quad-objective shared-subtree dedup, quad-objective cross-context (objective vs constraint) not merged.
  • cvxpy/tests/test_dnlp2smooth_cse.py — 7 tests: shared huber across obj/constraint, shared pnorm across two constraints, distinct subtrees not merged, parameter subtree dedup, constraint id preservation under dedup, per-apply cache isolation, dangling-aux sanity.

Existing tests run: full cvxpy/tests/nlp_tests/ (30 passed, 222 solver-skipped), test_qp_solvers, test_quad_form, test_quad_dpp, test_problem, test_atoms — all passing.

Downstream audits (no analog of the replace_quad_forms bug found)

  • NLP chain (cvxpy/reductions/solvers/nlp_solvers/diff_engine/): pure recursive tree walker, var/param dicts key on .id for lookup — that's the normal lookup pattern, unaffected by structural sharing.

Type of change

  • New feature (backwards compatible)
  • New feature (breaking API changes)
  • Bug fix
  • Other (Documentation, CI, ...)

Contribution checklist

  • Add our license to new files.
  • Check that your code adheres to our coding style.
  • Write unittests.
  • Run the unittests and check that they're passing.
  • Run the benchmarks to make sure your change doesn't introduce a regression.

PTNobel and others added 7 commits May 27, 2026 13:45
Add a per-apply common-subexpression cache to Dcp2Cone so that
structurally identical Expression subtrees share one canonicalized
expression and one set of auxiliary constraints within a reduction
pass. For cp.Problem(cp.Minimize(cp.norm1(x)), [cp.norm1(x) <= 1])
this collapses two epigraph variables and two pairs of abs-epigraph
inequalities down to one of each.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Previously the cache embedded `Constant.value.tobytes()` in keys, which
made the cache footprint scale with the problem's constant data (~8 MB
for an LP 1000x500). Switch Constant keying to object identity, which
still deduplicates the common case of a shared Constant reference and
brings cache size below ~16 KB across representative problems.

Also skip caching any subtree whose canonicalization went through the
quad branch. Those canonicalizers emit SymbolicQuadForm markers that
downstream code (replace_quad_forms, coeff_extractor) identifies by
Python id and assumes are distinct per occurrence; sharing one across
sites silently halves quadratic coefficients. Track this with a counter
incremented on quad branches; don't cache if it advances under a node.

Release the cache at the end of apply() so it does not outlive the
reduction.

Add a regression test using the QuadForm 0.5*qf + 0.5*qf pattern from
test_qp_solvers.py::rep_quad_form.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
replace_quad_form previously gave the placeholder Variable the same id
as the quad form it replaced. When the same SymbolicQuadForm appeared
at multiple positions in the objective expression (e.g. via CSE, or via
0.5*qf + 0.5*qf with a shared qf), the placeholders collapsed onto a
single row in get_var_offsets and the quadratic coefficient was halved.

Mint a fresh placeholder id per replacement. quad_forms is keyed by
placeholder id, so all downstream lookups continue to work; only the
incidental identity quad_form.id == placeholder.id is dropped.

With this in place, the CSE cache in Dcp2Cone can also dedup subtrees
that go through the quad branch, so the _quad_canon_count guard is
removed.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
By the time coeff_extractor invokes this, Dcp2Cone has already rewritten
every QuadForm into either a SymbolicQuadForm or sum_squares, so the
isinstance check on QuadForm is defensive rather than load-bearing.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
- Cache hits discard the stored constraints anyway, so store just the
  canonical expression. Keeps the per-apply working set smaller.
- Add two tests that exercise Dcp2Cone(quad_obj=True): one for shared
  quad_over_lin subtrees within the objective (dedup to a single
  SymbolicQuadForm), one for the same subtree appearing in objective and
  constraint (different canonicalizations kept distinct by the
  affine_above component of the cache key).

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Moves the structural-key helpers from dcp2cone.py into a shared private
module cvxpy/reductions/_cse.py so Dnlp2Smooth can reuse them, and wires
an analogous per-apply cache into Dnlp2Smooth.canonicalize_tree. The NLP
chain (Dnlp2Smooth -> NLP solver, via nlp_solving_chain.py) is not
downstream of Dcp2Cone, so duplicate subtrees in DNLP problems were
producing duplicate aux Variables and constraints; this PR fixes that
the same way #3353 did for Dcp2Cone.

Also tightens _constant_key: small Constants (<= 64 elements) are now
keyed by value rather than id. This catches the case where two
structurally identical user expressions embed distinct Constant objects
for default scalar parameters (e.g. each cp.huber(x) call mints a fresh
Constant(0.5) for the default M), which would otherwise defeat the
merge. Large arrays stay id-keyed to avoid copying problem data into
cache keys.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
cvxpy uses bare acronyms as module names (psd.py, soc_canon.py,
scs_conif.py, dcp2cone/, dgp2dcp/, dnlp2smooth/, etc.) and does not
prefix internal modules with an underscore. cse (Common Subexpression
Elimination) is descriptive enough on its own.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 27, 2026

Benchmarks that have improved:

   before           after         ratio
 [6b637368]       [ca2037b6]
  •     904±0ms          784±0ms     0.87  matrix_stuffing.ConeMatrixStuffingBench.time_compile_problem
    

Benchmarks that have stayed the same:

   before           after         ratio
 [6b637368]       [ca2037b6]
     14.9±0ms         15.4±0ms     1.04  simple_QP_benchmarks.ParametrizedQPBenchmark.time_compile_problem
      964±0ms          994±0ms     1.03  gini_portfolio.Cajas.time_compile_problem
      2.59±0s          2.66±0s     1.03  quantum_hilbert_matrix.QuantumHilbertMatrix.time_compile_problem
      705±0ms          719±0ms     1.02  simple_QP_benchmarks.LeastSquares.time_compile_problem
      245±0ms          249±0ms     1.02  simple_QP_benchmarks.SimpleQPBenchmark.time_compile_problem
      274±0ms          278±0ms     1.01  slow_pruning_1668_benchmark.SlowPruningBenchmark.time_compile_problem
      1.79±0s          1.80±0s     1.01  simple_QP_benchmarks.UnconstrainedQP.time_compile_problem
      324±0ms          326±0ms     1.00  gini_portfolio.Yitzhaki.time_compile_problem
      12.9±0s          13.0±0s     1.00  finance.CVaRBenchmark.time_compile_problem
     14.8±0ms         14.9±0ms     1.00  simple_LP_benchmarks.SimpleFullyParametrizedLPBenchmark.time_compile_problem
      887±0ms          889±0ms     1.00  simple_LP_benchmarks.SimpleScalarParametrizedLPBenchmark.time_compile_problem
      9.92±0s          9.93±0s     1.00  simple_LP_benchmarks.SimpleLPBenchmark.time_compile_problem
      518±0ms          518±0ms     1.00  semidefinite_programming.SemidefiniteProgramming.time_compile_problem
      293±0ms          292±0ms     1.00  matrix_stuffing.ParamSmallMatrixStuffing.time_compile_problem
      21.2±0s          21.1±0s     1.00  sdp_segfault_1132_benchmark.SDPSegfault1132Benchmark.time_compile_problem
      982±0ms          979±0ms     1.00  finance.FactorCovarianceModel.time_compile_problem
      235±0ms          233±0ms     0.99  gini_portfolio.Murray.time_compile_problem
      1.48±0s          1.47±0s     0.99  matrix_stuffing.ParamConeMatrixStuffing.time_compile_problem
      5.59±0s          5.54±0s     0.99  optimal_advertising.OptimalAdvertising.time_compile_problem
      1.56±0s          1.55±0s     0.99  tv_inpainting.TvInpainting.time_compile_problem
      4.49±0s          4.44±0s     0.99  svm_l1_regularization.SVMWithL1Regularization.time_compile_problem
     49.6±0ms         48.7±0ms     0.98  matrix_stuffing.SmallMatrixStuffing.time_compile_problem
     23.7±0ms         22.8±0ms     0.96  high_dim_convex_plasticity.ConvexPlasticity.time_compile_problem
      4.31±0s          3.93±0s     0.91  huber_regression.HuberRegression.time_compile_problem

'CSE' is compiler-optimization jargon and not widely understood outside
that niche. 'subexpression cache' describes what the module supports in
terms that don't require CS expertise.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Comment thread cvxpy/reductions/subexpr_cache.py Outdated
@PTNobel PTNobel changed the base branch from ptn/dcp2cone-cse to master May 27, 2026 22:48
@PTNobel PTNobel changed the title Dedup identical subtrees in Dnlp2Smooth canonicalization Dedup identical subtrees in Dcp2Cone and Dnlp2Smooth canonicalization May 27, 2026
PTNobel and others added 3 commits May 27, 2026 15:56
Two small fixes for PR #3355:

1. PR review nit: expr_key's docstring still claimed "Constants key by
   object identity," but small Constants (<= 64 elements) now key by
   value; updated to reflect both branches.

2. CI: test_copt_mi_socp_1 fails by ~7e-5 on the CSE-deduplicated
   formulation. The continuous SOCP relaxation (verified at high
   precision via CLARABEL) sits at x[0] = -0.78510, while COPT lands
   at -0.78503 with CSE -- both within typical MI-SOCP precision, but
   the test's hardcoded -0.78510265 expects 4 decimals. MOSEK, CPLEX,
   and SCIP already use places=3 on this same test for the same
   tolerance reason; align COPT.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Matches the existing MOSEK/CPLEX/SCIP places=3 lines, which carry no
comment.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Constant.__init__ stores float64 ndarrays by reference (no copy in
ndarray_interface.const_to_matrix), so two cp.Constant(arr) wrappers
around the same source ndarray share _value. Keying on id(expr.value)
in that case catches the dedup without copying bytes into the cache
key.

Restricted to float64 ndarrays because other dtypes go through
astype(float64) (which copies) or scipy sparse's csc_array constructor
(which builds a fresh wrapper), so the id-of-underlying branch only
fires when sharing is real.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@SteveDiamond
Copy link
Copy Markdown
Collaborator

This generally looks really good. @Transurgeon can you review the changes to DNLP?

Copy link
Copy Markdown
Collaborator

@SteveDiamond SteveDiamond left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed with differential testing of shared-vs-unshared canonicalization across many atoms/solvers — the CSE correctness looks solid (the DAG-mutation risk is correctly defended by the fresh-placeholder-id fix, and the _affine_above_relevant key logic mirrors canonicalize_tree). A few inline notes below; the main one is the O(N²) canonicalization regression on deep expression trees.

"""


def expr_key(expr):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Performance: O(N²) canonicalization regression on deep trees.

expr_key recursively walks the entire subtree to build a key, and canonicalize_tree calls it at every node with no memoization → O(N²) where canonicalization was O(N). The key tuples are themselves O(subtree)-sized, so per-node hashing/storage is also O(subtree) → O(N²) memory.

Measured on a depth-N cp.abs(-(-(…-x))) chain (toggling only the cache-key construction):

depth master this PR slowdown
100 0.72 ms 3.79 ms
400 3.11 ms 68.7 ms 22×
800 7.36 ms 311 ms 42×

Master scales ×2 per doubling (linear); this PR scales ×4 (quadratic), and the factor widens with N. Flat n-ary sums are unaffected (CVXPY flattens associative ops) — it bites genuinely deep trees: nested abs/reshape/neg/index chains.

Suggested fix: thread a per-apply {id(expr): key} memo through expr_key so each node's key is built once and child keys are reused (one bottom-up O(N) pass). Same change applies to the Dnlp2Smooth call site.

return (structural, bool(affine_above))
return (structural, None)

def _affine_above_relevant(self, expr) -> bool:
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a second, independent O(N²) whole-subtree walk: _make_cache_key calls _affine_above_relevant at every node (when quad_obj=True), and it recurses over the full subtree each time. Instrumented call count tracks N²/2 exactly (5,155 calls at depth 100; 80,605 at depth 400).

It can fold into the same bottom-up pass suggested for expr_key: a node is relevant iff it is quad-eligible or any child is relevant — O(1) amortized per node.

cache_key = None
if isinstance(expr, Expression):
try:
cache_key = expr_key(expr)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same O(N²) pattern as Dcp2Cone: expr_key(expr) is recomputed from scratch at every node with no reuse of child keys already built. The memoization fix (per-apply {id(expr): key} memo) should be applied here too.

except (TypeError, ValueError):
return ("const", id(expr))
if arr.size <= _CONSTANT_VALUE_HASH_MAX_SIZE:
return ("const-val", arr.shape, str(arr.dtype), arr.tobytes())
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sparse Constants are silently mis-keyed here. CVXPY Constants frequently wrap SciPy sparse matrices, and np.asarray(sparse) returns a 0-dim object array of size 1 — which passes the arr.size <= 64 guard. arr.tobytes() then returns the 8 bytes of id(value) (verified), not the contents, under the 'const-val' label, with the real shape recorded as ().

This is not a correctness bug — within one apply() all Constants are live, so ids are unique and no false merge can occur — but it (a) defeats CSE entirely for every sparse-constant subtree and (b) is a fragile footgun: a "by-value" path that actually encodes a transient pointer and discards shape.

Suggest guarding arr.dtype != object before the tobytes() value branch, and handling sparse explicitly (e.g. key on (data.tobytes(), indices.tobytes(), indptr.tobytes(), shape)), falling back to id(expr) (the wrapper, kept alive by the problem tree) rather than id(value).

# and user constraints are intentionally excluded so their IDs flow
# through to inverse_data unchanged.
cache_key = None
if isinstance(expr, Expression):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor / latent: Dcp2Cone explicitly excludes partial_problem from the cache (so embedded constraint ids flow through unchanged), but here the eligibility check is a bare isinstance(expr, Expression) — and PartialProblem is an Expression. It's currently latent (the NLP path doesn't support partial_optimize and would fail earlier), but it's an inconsistency that becomes a real id-collapse trap if NLP partial_optimize support is ever added. Worth mirroring the Dcp2Cone guard now.

quad_form = expr.args[idx]
placeholder = Variable(quad_form.shape,
var_id=quad_form.id)
placeholder = Variable(quad_form.shape)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fresh-id change is correct. One trivial follow-up: the comment at coeff_extractor.py:131 ("var_id is the placeholder's ID (= the SymbolicQuadForm's ID)") is now stale — the placeholder no longer shares the quad form's id. The logic is unaffected (orig_id is recomputed from quad_forms[var_id][2].args[0].id), but updating that comment in this PR would prevent someone from re-introducing the var_id=quad_form.id assumption.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants