-
-
Notifications
You must be signed in to change notification settings - Fork 26k
ENH Pairwise Distances ArgKmin #21462
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
322 commits
Select commit
Hold shift + click to select a range
40e369d
Remove duplicate code to vector-to-ndarray coercion
jjerphan a3de08a
Use consistent names
jjerphan 81b1b1b
Plug RadiusNeighborhood in RadiusNeighborsMixin.radius_neighbors
jjerphan b8fe6e1
Correctly free vectors using del
jjerphan 54fb2c5
Use a sentinel for managing vectors' memory
jjerphan 9df4047
Remove temporary consistency test
jjerphan 7c713a1
Add comments
jjerphan 7ea6daa
Merge pull request #3 from jjerphan/pairwise_aggregation_cython-radius
jjerphan 8e54572
Merge branch 'main' into pairwise_aggregation_cython
jjerphan 1d3336d
Revert to 'euclidean' when 'fast_sqeuclidean' can't be used
jjerphan d96b163
Use 'fast_sqeuclidean' for Birch internals
jjerphan f660dbf
Fix PairwiseDistancesReduction.is_usable
jjerphan bb24d95
Make array C-ordered for test
jjerphan 6eea1aa
Annotate with cython.final when relevant
jjerphan 15c4150
Black contains all the color that I like
jjerphan a9706d6
Use relative imports
jjerphan 0935975
Use method for flatten
jjerphan e2b5398
Correct cross-referencing for metrics.DistanceMetric
jjerphan ce1ccdc
Precise that p is the parameter used by 'minkowski'
jjerphan a9fe71f
Prefer assert_allclose over assert_array_equal
jjerphan 1593dab
Prefer csr_matrix.toarray over csr_matrix.A
jjerphan 01c1294
Rework test for correct behavior regarding the radius
jjerphan 4b6a041
Inline heap pushes
jjerphan c26b583
Mirror ValueError for incorrectly set sort_results and return_distances
jjerphan 5bcea9f
Parametrise test_radius_neighbors_graph_sparse
jjerphan 474804c
Lighten and correct test
jjerphan e84ff1b
Allow other dtypes than np.float64
jjerphan ac2ce70
Parametrise test_kneighbors_graph_sparse
jjerphan 9f612a1
fixup! Remove uncalled snippet
jjerphan 74240bd
Mark a test case as xfail for test_fast_sqeuclidean_correctness
jjerphan 32b08af
fixup! Inline heap pushes
jjerphan 8cfabc8
fixup! Adapt cython submodule for heaps
jjerphan dc1079f
Do not push if the element is identical to the largest
jjerphan f5308b0
Remove X and Y from _reduce_on_chunks signature
jjerphan 19d461f
Introduce DistanceMetric.sparse_{rdist,dist}
jjerphan 444c4bc
Introduce DistanceComputer
jjerphan cf24fd0
Adapt tests for behavior on duplicates
jjerphan e7aaa71
Free datastructure if present and do not raise otherwise
jjerphan 718a4c2
Adapt doctest to the new behavior
jjerphan b70db13
Monkey-patch neighbors to alias metrics.DistanceMetric
jjerphan 39e8189
Revert "Make array C-ordered for test"
jjerphan 71f0130
Makes test parametrisation execution order fixed
jjerphan cd7e1a2
Introduce _openmp_thread_num in helpers
jjerphan 29e5c7b
Reorder macros, cython and python imports
jjerphan dbbb8e9
Merge branch 'main' into pairwise_aggregation_cython
jjerphan 550bb76
Improve comments for DistanceComputer
jjerphan 0d5480c
Use 'proxy' instead for 'approx' as a wording
jjerphan 305f007
Follow PEP 257
jjerphan ba89532
Fix RadiusNeighborhood.__dealloc__
jjerphan 3b1e98c
Rename DistanceComputer for DatasetPairs
jjerphan 7ae5091
Minimalistically document template methods
jjerphan 2f9dc02
Reallocate datastructures for results at each new call
jjerphan 4f3bd4c
Avoid thread over-subscription for BLAS
jjerphan 9d6b83b
Introduce FastSquaredEuclideanRadiusNeighborhood
jjerphan a319358
Adapt internal checks
jjerphan 3ebb200
Add test suite for PairwiseDistancesReduction
jjerphan 69b0ad9
Merge branch 'main' into pairwise_aggregation_cython
jjerphan f63692a
Merge vectors at the really end using dynamic scheduling
jjerphan 5f1d3a0
Pull the distance metric up and make it readonly
jjerphan bd81245
Use proxy distance for RadiusNeighborhood reduction
jjerphan 2e48051
fixup! Correct cross-referencing for metrics.DistanceMetric
jjerphan 53ba89d
Improve inputs checks
jjerphan 1632e14
Rename submodule for pairwise distances reductions
jjerphan 906b1e4
Move DatasetsPair closer to DistanceMetrics
jjerphan eb988ea
Add const qualifier on squared norms memoryviews
jjerphan 9fc5e95
Improve style
jjerphan 4f73848
Improve docstring notse for FastSquaredEuclidean alternatives
jjerphan ea3d791
Fix __cinit__
jjerphan 445a22d
Skip for tests for 32 bits
jjerphan 2647327
Adapt tests for better parametrisation variability
jjerphan 420dbc8
Merge branch 'main' into pairwise_aggregation_cython
jjerphan 4b2c165
Adaptation for DistanceMetrics
jjerphan 3c71fd6
Improve style
jjerphan eaa892f
Parametrise tests by seed last
jjerphan 7d5f603
Lighten tests parametrization
jjerphan 91e3b27
Use squared norms of X vectors in FastSquaredEuclideanArgKmin
jjerphan c4add77
fixup! Adapt tests for better parametrisation variability
jjerphan caa7faa
Simplify tests
jjerphan c48c886
Improve test parametrisation on DistanceMetric
jjerphan 4f06c3a
Improve heap routines' interfaces
jjerphan 962f535
fixup! Improve heap routines' interfaces
jjerphan 1b71282
Fix docstring
jjerphan dc8ddf4
Add missing dtype for indices
jjerphan 5505f4f
Merge branch 'main' into pairwise_aggregation_cython
jjerphan bd1b0d9
Better deprecate neighbors.DistanceMetric
jjerphan c0dbc97
Add link to CPython docs regarding reference stealing
jjerphan 2b0d3a6
Force the coretype to be armv8 on linux-arm64
jjerphan fb3866c
Revert "Force the coretype to be armv8 on linux-arm64"
jjerphan e55fd94
Use conda-forge to test arm64
jjerphan 50d2669
Use Mambaforge instead
jjerphan 84c4315
Install all dependencies in a row via mamba
jjerphan 2411ffc
Mark tests as xfail when in unstable OpenBLAS configuration
jjerphan b7bbd06
Lighten tests' parametrizations
jjerphan 042e228
Improve checks for unstable OpenBLAS configuration
jjerphan 394d9dc
fixup! Use Mambaforge instead
jjerphan b2d80dc
Check against now made privated in_unstable_openblas_configuration
jjerphan ae097cf
fixup! Use conda-forge to test arm64
jjerphan 1821aad
Remove SparseEfficiencyWarning
jjerphan ca942a5
Apply suggestions from reviews comments and discussions
jjerphan 89f909c
Correct string alignement
jjerphan 6887a37
Improve comment for 32 bits fallback
jjerphan 0832dc4
Remove some checks and interactions with python
jjerphan 4fba30a
CI Specify latest lib versions for linux-arm64
jjerphan 2a8c026
DOC Fix glossary
jjerphan f4c5b64
Remove neighbors.DistanceMetric.__init__
jjerphan dfd9661
Format Parameters section
jjerphan afc8bf8
Improve some docstrings and comments
jjerphan de2dbf6
Use libc.float.DBL_MAX instead of constant defined via macro
jjerphan 36f7b6e
Change default metric for 'fast_sqeuclidean'
jjerphan 9e7c8a0
Merge branch 'main' into pairwise_aggregation_cython
jjerphan 16bf24a
Adapt error message in test
jjerphan 5353794
Add guard against negative zeros when computing exact distances
jjerphan d7b984d
Introduce 'fast_euclidean' and adapt KNeighborsMixins accordingly
jjerphan 6427883
[WIP] Use 'fast_sqeuclidean' instead when possible in KNeighborsMixins
jjerphan c865dc6
fixup! Introduce 'fast_euclidean' and adapt KNeighborsMixins accordingly
jjerphan be6741b
fixup! [WIP] Use 'fast_sqeuclidean' instead when possible in KNeighbo…
jjerphan e8664df
Fix NearestNeighbors docstring for Numpydoc
jjerphan 339ab30
Use metric="fast_sqeuclidean" for pairwise_distances_argmin internal …
jjerphan f9e337c
Add n_threads on PairwiseDistancesReduction
jjerphan d14af8e
Pass n_threads on PairwiseDistancesReduction calls
jjerphan e8f0468
Factorise tests and add another for n_threads agnosticism
jjerphan 7e5775c
Add docstring for RadiusNeighborhood.compute and improve others
jjerphan 2d267b4
Merge branch 'main' into pairwise_aggregation_cython
jjerphan e9803de
Fix conjugation
jjerphan fb05746
Use correct wording
jjerphan 5f14488
Format code à la black
jjerphan b71811c
Format docstring for 'auto' strategy
jjerphan 9c819c8
Reword 'reduced distance' for 'rank-preserving surrogate distance'
jjerphan 1ff2433
Look-up for the strategy in scikit-learn's configuration if not speci…
jjerphan 0170af3
Merge branch 'main' into pairwise_aggregation_cython
jjerphan 8530267
Clarify wording in comment regarding n_threads
jjerphan 0365c34
Apply some small reviews suggestions
jjerphan e754b67
Remove redundant statement
jjerphan fe2f8de
Do not validate X and Y for same number of dimensions
jjerphan 4c6253e
Do not validate X and Y for same number of dimensions
jjerphan 7c86a39
Remove checks for CSR matrices in DatasetsPair
jjerphan b3efe85
Rename sparse_{dist,rdist} to csr_{dist,rdist}
jjerphan a81c2f8
Merge branch 'main' into pairwise_aggregation_cython
jjerphan a6f0e4a
fixup! Do not validate X and Y for same number of dimensions
jjerphan 7231535
Don't use f-strings for docstrings
jjerphan faed7cc
Remove some checks on arrays
jjerphan 7ca3d5e
Merge branch 'main' into pairwise_aggregation_cython
jjerphan 66b60b8
Compute squared euclidean norm for rows in parallel
jjerphan 90cb9fd
Validate arrays for C-contiguity where needed
jjerphan 1ae16d7
Add tests for Neighbors-mixins subclasses
jjerphan e9dfc95
Xfail test on another numerical edge-case
jjerphan 2dcac3f
Use PyArray_SetBaseObject via NumPy Cython API
jjerphan c5524f3
Apply review suggestions
jjerphan eda2b26
Revert whats_new entry.
jjerphan 952d41a
Test the fast euclidean overriding
jjerphan 86c0d6f
Mention distances computations and their reduction in dedicated method
jjerphan bd5a1db
fixup! Apply review suggestions
jjerphan 1e2181d
Tight self.neigh_{indices,distances}'s lifetime to their composite
jjerphan 3052db3
Adapt docstrings
jjerphan a758fc1
Factor compute in base class
jjerphan 3dbe038
Do not use frenchism
jjerphan e8deb0f
Test fast metric alternatives fallbacks
jjerphan 5911f1c
fixup! Test fast metric alternatives fallbacks
jjerphan 9b9fb7c
Change warning message
jjerphan 5fc91e1
Document and reword interfaces
jjerphan f75f08e
Better adapt the strategy for uniform weighting
jjerphan b484320
fixup! Better adapt the strategy for uniform weighting
jjerphan c134674
Merge branch 'main' into pairwise_aggregation_cython
jjerphan 79786d0
Make pytest happy with proper checks on array-likes
jjerphan cbf40ea
Optimize squared norms' computations
jjerphan d25021e
Merge branch 'main' into pairwise_aggregation_cython
jjerphan 55af187
Merge branch 'main' into pairwise_aggregation_cython
jjerphan bf37c3d
Import backport to avoid runtime introspection
jjerphan 5660c0e
Clarify comments for sorting arrays
jjerphan 072de9e
Correct indentation
jjerphan e191be2
Prefer "surrogate distance" as a naming
jjerphan 775a10d
Use better notations for maths
jjerphan dab0d4c
Clarify the fast specialized alternative internals in the docstring
jjerphan b423d6a
Some more doc-string
jjerphan f6f76ce
Clean post-merge Circle CI script
jjerphan 4078b0d
Doc Add whats_new entry for #20254
jjerphan ceff923
Better explain PairwiseDistancesArgKmin datastructures usage
jjerphan 51caae2
fixup! Doc Add whats_new entry for #20254
jjerphan 8613fd6
Better explain PairwiseDistancesRadiusNeighborhood behavior
jjerphan 8b5e0b6
Merge branch 'main' into pairwise_aggregation_cython
jjerphan 4bf7eee
Move changelog entry under the Miscellaneous section
jjerphan 7fa4a40
Address review comments
jjerphan 4a89d7f
Address review comments
jjerphan 8f63e01
Fix config for 'pairwise_dist_chunk_size'
jjerphan 2ad33ec
Delay and better scope arrays C ordering
jjerphan eba6f03
Simplify counting for remainder chunks
jjerphan 34468ad
Better motivate heaps' parallel allocation
jjerphan fa424a4
Remove PairwiseDistancesRadiusNeighborhood
jjerphan 5678666
Remove DatasetsPair used for sparse datasets
jjerphan 45c7f6e
Add some general notes about the implementations
jjerphan 0b85167
fixup! Remove PairwiseDistancesRadiusNeighborhood
jjerphan e7b0689
Turn off finitness checks in pairwise_distances_argmin{,_min}
jjerphan 8d2a3d2
Improve test for pairwise_distances_argmin{,_min}
jjerphan 843a894
Update whats_new entry
jjerphan effd897
Check for consistency when X_train is the query
jjerphan 00577c5
Inject placeholder value for MeanShift.bandwidth
jjerphan b1338d5
Merge branch 'main' into pairwise-distances-argkmin
ogrisel 4c3dd1f
Merge branch 'main' into pairwise-distances-argkmin
jjerphan eab07b5
Rename PairwiseDistancesReduction callbacks
jjerphan 8a48ffd
Link back to _openmp_effective_n_threads for n_threads' description
jjerphan 83854fa
Use self.k directly
jjerphan 445c860
Remove unneeded csr_dist and csr_rdist interfaces
jjerphan 6a4d7fe
Add pairwise_dist_chunk_size keyword argument to config_context
jjerphan 0ba2e39
Merge branch 'main' into pairwise-distances-argkmin
jjerphan 5399cc6
fixup! Rename PairwiseDistancesReduction callbacks
jjerphan d40b333
TST Refactor test and adapt checks and tolerances
jjerphan 2f02350
TST Factorise fixtures for metric params
jjerphan 19dd7ca
Change metric to fast_sqeuclidean for pairwise_distances_argmin*
jjerphan 54b4b96
TST Remove spurious skip for tests
jjerphan 56e86ef
TST Remove useless guard for haversine
jjerphan 355cbe2
DOC Clarify docstrings and comments
jjerphan 56151ab
MAINT Drop unneeded Cython directive
jjerphan 96aaa0b
MAINT Better validate and use chunk_size
jjerphan 9d5f7f7
MAINT Raise UserWarning when uneeded metric_params are specified
jjerphan 5fa4cb1
Correctly fallback on standard metric
jjerphan 7c36f11
TST Remove uneeded tests and use adapted version parsing
jjerphan d2396a7
Rename variable and fix docstring for the simultaneous swap
jjerphan 02a0e92
fixup! MAINT Raise UserWarning when uneeded metric_params are specified
jjerphan e2e5282
fixup! DOC Clarify docstrings and comments
jjerphan 1b77d2f
fixup! MAINT Drop unneeded Cython directive
jjerphan 94facea
Merge branch 'main' into pairwise-distances-argkmin
jjerphan bba8976
Merge branch 'main' into pairwise-distances-argkmin
jjerphan f9037f0
[WIP] Rework PairwiseDistancesArgKmin.compute
jjerphan 6983c32
DOC Add notes for `PairwiseDistancesArgKmin.compute`
jjerphan 449545c
DOC Better word
jjerphan a6f9f9c
Merge branch 'main' into pairwise-distances-argkmin
jjerphan 0915a36
Remove 'fast_sqeuclidean' and 'fast_euclidean'
jjerphan 09873f1
Merge pull request #6 from jjerphan/remove-fast-alternatives
jjerphan 048b958
Revert uneeded changes
jjerphan 720b807
Merge branch 'pairwise-distances-argkmin' into pairwise-distances-arg…
jjerphan 7485f44
DOC Correct typos
jjerphan 0f7a4e7
Merge branch 'main' into pairwise-distances-argkmin
jjerphan 7b6b399
Choose the strategy at initialisation
jjerphan 5749c02
TST Adapt for the new `compute` interface
jjerphan 2300c5e
Distinguish between effective and available threads
jjerphan a6ab85f
DOC Add and correct comments
jjerphan 7435885
Update sklearn/metrics/_pairwise_distances_reduction.pyx
jjerphan 27329c1
fixup! DOC Add and correct comments
jjerphan 8754998
FIX Change strategy default value to None
jjerphan 9cda95f
MAINT Rename n_threads variables and document them.
jjerphan 5d77467
DOC Improve remark regarding support for discrete distance metrics
jjerphan af5a991
MAINT Remove useless private _compute method
jjerphan 8a0771a
Merge pull request #5 from jjerphan/pairwise-distances-argkmin-raii
jjerphan dd1dde3
Merge branch 'main' into pairwise-distances-argkmin
jjerphan decb029
Remove duplicated submodules registration
jjerphan 527555d
fixup! Remove duplicated submodules registration
jjerphan c265944
Merge branch 'main' into pairwise-distances-argkmin
jjerphan 25446ef
Merge branch 'main' into pairwise-distances-argkmin
ogrisel b11109c
Move changelog entry at the top
ogrisel 5d6c9c2
Merge branch 'main' into pairwise-distances-argkmin
ogrisel File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.