Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@nathanneike
Copy link

Types of changes

  • New feature (non-breaking change which adds functionality)
  • Tests (new tests or changes to existing tests)

Motivation and context / Related issue

This PR implements a sparse EMD solver for memory-efficient optimal transport when the cost matrix has many infinite or forbidden edges (e.g., k-NN graphs, sparse networks).

Problem: The current dense EMD solver requires O(n²) memory for the full cost matrix, which becomes prohibitive for large-scale
problems even when most edges are forbidden.

Solution: This PR adds a sparse bipartite graph solver that only stores edges with finite costs, reducing memory usage from O(n²) to O(E) where E is the number of edges.

Use cases:

  • k-NN graph optimal transport
  • Large-scale sparse matching problems
  • Network flow with forbidden edges

How has this been tested

Unit Tests

Added two comprehensive tests in test/test_ot.py:

  • test_emd_sparse_vs_dense() - Verifies sparse and dense solvers produce identical transport matrices
  • test_emd2_sparse_vs_dense() - Verifies sparse and dense solvers produce identical costs

Both tests use the augmented k-NN approach:

  1. Create initial k-NN sparse graph
  2. Solve with dense solver to identify needed edges
  3. Augment graph with those edges
  4. Compare both solvers on identical graph structure

Test results: All 50 tests in test/test_ot.py pass

Verification

  • Costs match between solvers
  • Marginal constraints satisfied for both solvers
  • No regression in existing tests

PR checklist

  • I have read the CONTRIBUTING document.
  • The documentation is up-to-date with the changes I made (check build artifacts). TODO: Add documentation
  • All tests passed, and additional code has been covered with new tests.
  • I have added the PR and Issue fix to the RELEASES.md file. TODO: Will add once ready for merge

TODO before [MRG]:

  • Add example script in examples/ folder demonstrating sparse solver usage
  • Add documentation explaining when to use sparse vs dense
  • Performance benchmarks comparing memory usage and runtime
  • Update RELEASES.md

Feedback requested:

  • Is the API design appropriate? (using sparse=True parameter)
  • Should we add more comprehensive tests?
  • Any concerns about the C++ implementation approach?

  - Implement sparse bipartite graph EMD solver in C++
  - Add Python bindings for sparse solver (emd_wrap.pyx, _network_simplex.py)
  - Add unit tests to verify sparse and dense solvers produce identical results
  - Tests use augmented k-NN approach to ensure fair comparison
  - Update setup.py to include sparse solver compilation

  Both test_emd_sparse_vs_dense() and test_emd2_sparse_vs_dense() verify:
    * Identical costs between sparse and dense solvers
    * Marginal constraint satisfaction for both solvers
  This PR implements a sparse bipartite graph EMD solver for memory-efficient
  optimal transport when the cost matrix has many infinite or forbidden edges.

  Changes:
  - Implement sparse bipartite graph EMD solver in C++
  - Add Python bindings for sparse solver (emd_wrap.pyx, _network_simplex.py)
  - Add unit tests to verify sparse and dense solvers produce identical results
  - Tests use augmented k-NN approach to ensure fair comparison

  Tests verify correctness:
    * test_emd_sparse_vs_dense() - verifies identical costs and marginal constraints
    * test_emd2_sparse_vs_dense() - verifies cost-only version

  Status: WIP - seeking feedback on implementation approach
  TODO: Add example script and documentation
@rflamary rflamary changed the title Sparse emd implementation [WIP] Sparse emd implementation Oct 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant