-
-
Notifications
You must be signed in to change notification settings - Fork 26.3k
ENH Private Cython Submodule for Reduction over Pairwise Distances #20254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
jjerphan
wants to merge
261
commits into
scikit-learn:main
from
jjerphan:pairwise_aggregation_cython
Closed
Changes from all commits
Commits
Show all changes
261 commits
Select commit
Hold shift + click to select a range
cb85791
Adapt cython submodule for heaps
jjerphan 5abda94
Reintroduce deleted test_neighbors_heap
jjerphan bc8925e
Lint
jjerphan e2bb562
Minify utils._heap definition file
jjerphan 9e9065d
Merge branch 'main' into pairwise_aggregation_cython
jjerphan ac76852
Post-merge black code formatting
jjerphan cac7313
Spread datasets for the tests of the fast_sqeuclidean strategy
jjerphan 8a06c3f
Rectify test
jjerphan 41bd644
[WIP] Adapting to use class hierarchy
jjerphan 568ed2a
[WIP] Adapting to use class hierarchy
jjerphan 2bf34aa
[WIP] Adapting to use class hierarchy
jjerphan c1415d6
[WIP] Adapting to use class hierarchy
jjerphan 80aaf0b
[WIP] Adapting to use class hierarchy
jjerphan 49e247d
[WIP] Adapting to use class hierarchy
jjerphan 25c9a2c
fixup! [WIP] Adapting to use class hierarchy
jjerphan eb8b931
[WIP] Adapting to use class hierarchy
jjerphan e0d1c99
Move neighbors.DistanceMetric to metrics
jjerphan ac5ddc1
Rename private submodule to _parallel_reductions
jjerphan 4e7b3cb
Introduce DistanceMetric and ArgKmin factory method
jjerphan d386fe1
Support DistanceMetric's and branch on ArgKmin when possible
jjerphan a61e81f
Document GEMM call and change wording to "approximated distance"
jjerphan e0d2881
Support memmapped and integral arrays
jjerphan b23f972
Pull _parallel_on_{X,Y} up on ParallelReduction
jjerphan 67e02d4
Pull GEMM buffers down to FastSquaredEuclideanArgKmin
jjerphan a8331da
Skip tests for translation invariance
jjerphan 0617dbd
Define methods on the base class
jjerphan 38b97c3
Improve tests for NearestNeighbors the algorithm
jjerphan ef9c7f6
Propagate metric kwargs in KNeighborsMixin.kneighbors
jjerphan 882e6e7
Add DistanceMetric data validation at initialisation
jjerphan 15c110a
Remove warning checks for 'wminkowski' now that Scipy is not used
jjerphan 7e3c4b7
Parametrise test_k_and_radius_neighbors_duplicates on algorithms
jjerphan 2ec36c1
Remove uncalled snippet
jjerphan ad496f0
Do not branch on sparse arrays, yet
jjerphan 3cdc476
Merge pull request #2 from jjerphan/pairwise_aggregation_cython-oop
jjerphan 68af6a7
Document
jjerphan f740334
Remove unnecessary parallel_on_X thread-local datastructures
jjerphan 0b9732b
Remove attributes for dtypes' size
jjerphan 5c57860
Use all threads when sorting
jjerphan 384b9a8
Rename ParallelReduction to PairwiseDistancesReduction
jjerphan ed03b88
Cast pointer to const value for gemm interface
jjerphan 6c5e0b9
[WIP] Add RadiusNeighborhood
jjerphan e37b147
[WIP] Add RadiusNeighborhood
jjerphan 29d145a
[WIP] Add RadiusNeighborhood
jjerphan dcca503
[WIP] Add RadiusNeighborhood
jjerphan eaedf53
Excluse some distances from valid ones
jjerphan 70b13cd
Add temporary consistency test
jjerphan 0f87e6f
Move results allocation from initialisation to compute
jjerphan 633d3f4
Address review comments
jjerphan f5d2915
Exclude some DistanceMetrics for PairwiseDistancesReduction
jjerphan 39c4788
Introduce utils functions for vector to ndarray coercion
jjerphan b58321f
Introduce parallel_on_Y for RadiusNeighborhood
jjerphan c24d184
Sort returned valid distances
jjerphan 1ee3ae1
Update comments
jjerphan 6ecf3f3
Change template for parallel_on_X
jjerphan 4e0c465
Fix number of threads for parallel_on_Y
jjerphan 30dcbba
Adapt test for 'brute' algorithm
jjerphan 1a25476
Only allocate temporary buffer once
jjerphan 93db1ee
Remove called to Argkmin.__dealloc__ in subclass'
jjerphan 12d7dfd
[WIP] Introduce parallel_on_Y for RadiusNeighborhood
jjerphan 937c4f3
Introduce parallel_on_Y for RadiusNeighborhood
jjerphan 9780f28
Remove unnecessary intermediary sorts
jjerphan 40e369d
Remove duplicate code to vector-to-ndarray coercion
jjerphan a3de08a
Use consistent names
jjerphan 81b1b1b
Plug RadiusNeighborhood in RadiusNeighborsMixin.radius_neighbors
jjerphan b8fe6e1
Correctly free vectors using del
jjerphan 54fb2c5
Use a sentinel for managing vectors' memory
jjerphan 9df4047
Remove temporary consistency test
jjerphan 7c713a1
Add comments
jjerphan 7ea6daa
Merge pull request #3 from jjerphan/pairwise_aggregation_cython-radius
jjerphan 8e54572
Merge branch 'main' into pairwise_aggregation_cython
jjerphan 1d3336d
Revert to 'euclidean' when 'fast_sqeuclidean' can't be used
jjerphan d96b163
Use 'fast_sqeuclidean' for Birch internals
jjerphan f660dbf
Fix PairwiseDistancesReduction.is_usable
jjerphan bb24d95
Make array C-ordered for test
jjerphan 6eea1aa
Annotate with cython.final when relevant
jjerphan 15c4150
Black contains all the color that I like
jjerphan a9706d6
Use relative imports
jjerphan 0935975
Use method for flatten
jjerphan e2b5398
Correct cross-referencing for metrics.DistanceMetric
jjerphan ce1ccdc
Precise that p is the parameter used by 'minkowski'
jjerphan a9fe71f
Prefer assert_allclose over assert_array_equal
jjerphan 1593dab
Prefer csr_matrix.toarray over csr_matrix.A
jjerphan 01c1294
Rework test for correct behavior regarding the radius
jjerphan 4b6a041
Inline heap pushes
jjerphan c26b583
Mirror ValueError for incorrectly set sort_results and return_distances
jjerphan 5bcea9f
Parametrise test_radius_neighbors_graph_sparse
jjerphan 474804c
Lighten and correct test
jjerphan e84ff1b
Allow other dtypes than np.float64
jjerphan ac2ce70
Parametrise test_kneighbors_graph_sparse
jjerphan 9f612a1
fixup! Remove uncalled snippet
jjerphan 74240bd
Mark a test case as xfail for test_fast_sqeuclidean_correctness
jjerphan 32b08af
fixup! Inline heap pushes
jjerphan 8cfabc8
fixup! Adapt cython submodule for heaps
jjerphan dc1079f
Do not push if the element is identical to the largest
jjerphan f5308b0
Remove X and Y from _reduce_on_chunks signature
jjerphan 19d461f
Introduce DistanceMetric.sparse_{rdist,dist}
jjerphan 444c4bc
Introduce DistanceComputer
jjerphan cf24fd0
Adapt tests for behavior on duplicates
jjerphan e7aaa71
Free datastructure if present and do not raise otherwise
jjerphan 718a4c2
Adapt doctest to the new behavior
jjerphan b70db13
Monkey-patch neighbors to alias metrics.DistanceMetric
jjerphan 39e8189
Revert "Make array C-ordered for test"
jjerphan 71f0130
Makes test parametrisation execution order fixed
jjerphan cd7e1a2
Introduce _openmp_thread_num in helpers
jjerphan 29e5c7b
Reorder macros, cython and python imports
jjerphan dbbb8e9
Merge branch 'main' into pairwise_aggregation_cython
jjerphan 550bb76
Improve comments for DistanceComputer
jjerphan 0d5480c
Use 'proxy' instead for 'approx' as a wording
jjerphan 305f007
Follow PEP 257
jjerphan ba89532
Fix RadiusNeighborhood.__dealloc__
jjerphan 3b1e98c
Rename DistanceComputer for DatasetPairs
jjerphan 7ae5091
Minimalistically document template methods
jjerphan 2f9dc02
Reallocate datastructures for results at each new call
jjerphan 4f3bd4c
Avoid thread over-subscription for BLAS
jjerphan 9d6b83b
Introduce FastSquaredEuclideanRadiusNeighborhood
jjerphan a319358
Adapt internal checks
jjerphan 3ebb200
Add test suite for PairwiseDistancesReduction
jjerphan 69b0ad9
Merge branch 'main' into pairwise_aggregation_cython
jjerphan f63692a
Merge vectors at the really end using dynamic scheduling
jjerphan 5f1d3a0
Pull the distance metric up and make it readonly
jjerphan bd81245
Use proxy distance for RadiusNeighborhood reduction
jjerphan 2e48051
fixup! Correct cross-referencing for metrics.DistanceMetric
jjerphan 53ba89d
Improve inputs checks
jjerphan 1632e14
Rename submodule for pairwise distances reductions
jjerphan 906b1e4
Move DatasetsPair closer to DistanceMetrics
jjerphan eb988ea
Add const qualifier on squared norms memoryviews
jjerphan 9fc5e95
Improve style
jjerphan 4f73848
Improve docstring notse for FastSquaredEuclidean alternatives
jjerphan ea3d791
Fix __cinit__
jjerphan 445a22d
Skip for tests for 32 bits
jjerphan 2647327
Adapt tests for better parametrisation variability
jjerphan 420dbc8
Merge branch 'main' into pairwise_aggregation_cython
jjerphan 4b2c165
Adaptation for DistanceMetrics
jjerphan 3c71fd6
Improve style
jjerphan eaa892f
Parametrise tests by seed last
jjerphan 7d5f603
Lighten tests parametrization
jjerphan 91e3b27
Use squared norms of X vectors in FastSquaredEuclideanArgKmin
jjerphan c4add77
fixup! Adapt tests for better parametrisation variability
jjerphan caa7faa
Simplify tests
jjerphan c48c886
Improve test parametrisation on DistanceMetric
jjerphan 4f06c3a
Improve heap routines' interfaces
jjerphan 962f535
fixup! Improve heap routines' interfaces
jjerphan 1b71282
Fix docstring
jjerphan dc8ddf4
Add missing dtype for indices
jjerphan 5505f4f
Merge branch 'main' into pairwise_aggregation_cython
jjerphan bd1b0d9
Better deprecate neighbors.DistanceMetric
jjerphan c0dbc97
Add link to CPython docs regarding reference stealing
jjerphan 2b0d3a6
Force the coretype to be armv8 on linux-arm64
jjerphan fb3866c
Revert "Force the coretype to be armv8 on linux-arm64"
jjerphan e55fd94
Use conda-forge to test arm64
jjerphan 50d2669
Use Mambaforge instead
jjerphan 84c4315
Install all dependencies in a row via mamba
jjerphan 2411ffc
Mark tests as xfail when in unstable OpenBLAS configuration
jjerphan b7bbd06
Lighten tests' parametrizations
jjerphan 042e228
Improve checks for unstable OpenBLAS configuration
jjerphan 394d9dc
fixup! Use Mambaforge instead
jjerphan b2d80dc
Check against now made privated in_unstable_openblas_configuration
jjerphan ae097cf
fixup! Use conda-forge to test arm64
jjerphan 1821aad
Remove SparseEfficiencyWarning
jjerphan ca942a5
Apply suggestions from reviews comments and discussions
jjerphan 89f909c
Correct string alignement
jjerphan 6887a37
Improve comment for 32 bits fallback
jjerphan 0832dc4
Remove some checks and interactions with python
jjerphan 4fba30a
CI Specify latest lib versions for linux-arm64
jjerphan 2a8c026
DOC Fix glossary
jjerphan f4c5b64
Remove neighbors.DistanceMetric.__init__
jjerphan dfd9661
Format Parameters section
jjerphan afc8bf8
Improve some docstrings and comments
jjerphan de2dbf6
Use libc.float.DBL_MAX instead of constant defined via macro
jjerphan 36f7b6e
Change default metric for 'fast_sqeuclidean'
jjerphan 9e7c8a0
Merge branch 'main' into pairwise_aggregation_cython
jjerphan 16bf24a
Adapt error message in test
jjerphan 5353794
Add guard against negative zeros when computing exact distances
jjerphan d7b984d
Introduce 'fast_euclidean' and adapt KNeighborsMixins accordingly
jjerphan 6427883
[WIP] Use 'fast_sqeuclidean' instead when possible in KNeighborsMixins
jjerphan c865dc6
fixup! Introduce 'fast_euclidean' and adapt KNeighborsMixins accordingly
jjerphan be6741b
fixup! [WIP] Use 'fast_sqeuclidean' instead when possible in KNeighbo…
jjerphan e8664df
Fix NearestNeighbors docstring for Numpydoc
jjerphan 339ab30
Use metric="fast_sqeuclidean" for pairwise_distances_argmin internal …
jjerphan f9e337c
Add n_threads on PairwiseDistancesReduction
jjerphan d14af8e
Pass n_threads on PairwiseDistancesReduction calls
jjerphan e8f0468
Factorise tests and add another for n_threads agnosticism
jjerphan 7e5775c
Add docstring for RadiusNeighborhood.compute and improve others
jjerphan 2d267b4
Merge branch 'main' into pairwise_aggregation_cython
jjerphan e9803de
Fix conjugation
jjerphan fb05746
Use correct wording
jjerphan 5f14488
Format code à la black
jjerphan b71811c
Format docstring for 'auto' strategy
jjerphan 9c819c8
Reword 'reduced distance' for 'rank-preserving surrogate distance'
jjerphan 1ff2433
Look-up for the strategy in scikit-learn's configuration if not speci…
jjerphan 0170af3
Merge branch 'main' into pairwise_aggregation_cython
jjerphan 8530267
Clarify wording in comment regarding n_threads
jjerphan 0365c34
Apply some small reviews suggestions
jjerphan e754b67
Remove redundant statement
jjerphan fe2f8de
Do not validate X and Y for same number of dimensions
jjerphan 4c6253e
Do not validate X and Y for same number of dimensions
jjerphan 7c86a39
Remove checks for CSR matrices in DatasetsPair
jjerphan b3efe85
Rename sparse_{dist,rdist} to csr_{dist,rdist}
jjerphan a81c2f8
Merge branch 'main' into pairwise_aggregation_cython
jjerphan a6f0e4a
fixup! Do not validate X and Y for same number of dimensions
jjerphan 7231535
Don't use f-strings for docstrings
jjerphan faed7cc
Remove some checks on arrays
jjerphan 7ca3d5e
Merge branch 'main' into pairwise_aggregation_cython
jjerphan 66b60b8
Compute squared euclidean norm for rows in parallel
jjerphan 90cb9fd
Validate arrays for C-contiguity where needed
jjerphan 1ae16d7
Add tests for Neighbors-mixins subclasses
jjerphan e9dfc95
Xfail test on another numerical edge-case
jjerphan 2dcac3f
Use PyArray_SetBaseObject via NumPy Cython API
jjerphan c5524f3
Apply review suggestions
jjerphan eda2b26
Revert whats_new entry.
jjerphan 952d41a
Test the fast euclidean overriding
jjerphan 86c0d6f
Mention distances computations and their reduction in dedicated method
jjerphan bd5a1db
fixup! Apply review suggestions
jjerphan 1e2181d
Tight self.neigh_{indices,distances}'s lifetime to their composite
jjerphan 3052db3
Adapt docstrings
jjerphan a758fc1
Factor compute in base class
jjerphan 3dbe038
Do not use frenchism
jjerphan e8deb0f
Test fast metric alternatives fallbacks
jjerphan 5911f1c
fixup! Test fast metric alternatives fallbacks
jjerphan 9b9fb7c
Change warning message
jjerphan 5fc91e1
Document and reword interfaces
jjerphan f75f08e
Better adapt the strategy for uniform weighting
jjerphan b484320
fixup! Better adapt the strategy for uniform weighting
jjerphan c134674
Merge branch 'main' into pairwise_aggregation_cython
jjerphan 79786d0
Make pytest happy with proper checks on array-likes
jjerphan cbf40ea
Optimize squared norms' computations
jjerphan d25021e
Merge branch 'main' into pairwise_aggregation_cython
jjerphan 55af187
Merge branch 'main' into pairwise_aggregation_cython
jjerphan bf37c3d
Import backport to avoid runtime introspection
jjerphan 5660c0e
Clarify comments for sorting arrays
jjerphan 072de9e
Correct indentation
jjerphan e191be2
Prefer "surrogate distance" as a naming
jjerphan 775a10d
Use better notations for maths
jjerphan dab0d4c
Clarify the fast specialized alternative internals in the docstring
jjerphan b423d6a
Some more doc-string
jjerphan f6f76ce
Clean post-merge Circle CI script
jjerphan 4078b0d
Doc Add whats_new entry for #20254
jjerphan ceff923
Better explain PairwiseDistancesArgKmin datastructures usage
jjerphan 51caae2
fixup! Doc Add whats_new entry for #20254
jjerphan 8613fd6
Better explain PairwiseDistancesRadiusNeighborhood behavior
jjerphan 8b5e0b6
Merge branch 'main' into pairwise_aggregation_cython
jjerphan 4bf7eee
Move changelog entry under the Miscellaneous section
jjerphan 7fa4a40
Address review comments
jjerphan 4a89d7f
Address review comments
jjerphan 8f63e01
Fix config for 'pairwise_dist_chunk_size'
jjerphan 2ad33ec
Delay and better scope arrays C ordering
jjerphan eba6f03
Simplify counting for remainder chunks
jjerphan 34468ad
Better motivate heaps' parallel allocation
jjerphan fa424a4
Remove PairwiseDistancesRadiusNeighborhood
jjerphan 5678666
Remove DatasetsPair used for sparse datasets
jjerphan File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.