Thanks to visit codestin.com
Credit goes to github.com

Skip to content

ENH Pairwise Kernels Cython Routines #21218

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 235 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
235 commits
Select commit Hold shift + click to select a range
60fc4ad
Add private submodule for fast argkmin
jjerphan Jun 11, 2021
13442b6
Add minimal documentation
jjerphan Jun 14, 2021
7aa2c3c
Plug 'fast_sqeuclidean' strategy implementation and test for KNeighbo…
jjerphan Jun 14, 2021
e5b33a1
Add test for translation invariance
jjerphan Jun 15, 2021
59c8a57
Complete test parametrisation for translation invariance
jjerphan Jun 16, 2021
20570c2
Merge branch 'main' into pairwise_aggregation_cython
jjerphan Jun 21, 2021
0171827
Lighten test parametrisation
jjerphan Jun 21, 2021
36a52ef
Factorise NeighborsHeap code under a private Cython submodule
jjerphan Jun 21, 2021
9a65958
Use relative imports
jjerphan Jun 21, 2021
e1bb0a1
Use utils._openmp_helpers._openmp_effective_n_threads directly
jjerphan Jun 22, 2021
03e516f
Plug 'fast_sqeuclidean' strategy implementation and test for pairwise…
jjerphan Jun 16, 2021
cb85791
Adapt cython submodule for heaps
jjerphan Jun 23, 2021
5abda94
Reintroduce deleted test_neighbors_heap
jjerphan Jun 23, 2021
bc8925e
Lint
jjerphan Jun 23, 2021
e2bb562
Minify utils._heap definition file
jjerphan Jun 23, 2021
9e9065d
Merge branch 'main' into pairwise_aggregation_cython
jjerphan Jun 24, 2021
ac76852
Post-merge black code formatting
jjerphan Jun 24, 2021
cac7313
Spread datasets for the tests of the fast_sqeuclidean strategy
jjerphan Jun 24, 2021
8a06c3f
Rectify test
jjerphan Jun 30, 2021
41bd644
[WIP] Adapting to use class hierarchy
jjerphan Jun 30, 2021
568ed2a
[WIP] Adapting to use class hierarchy
jjerphan Jun 30, 2021
2bf34aa
[WIP] Adapting to use class hierarchy
jjerphan Jun 30, 2021
c1415d6
[WIP] Adapting to use class hierarchy
jjerphan Jun 30, 2021
80aaf0b
[WIP] Adapting to use class hierarchy
jjerphan Jun 30, 2021
49e247d
[WIP] Adapting to use class hierarchy
jjerphan Jun 30, 2021
25c9a2c
fixup! [WIP] Adapting to use class hierarchy
jjerphan Jun 30, 2021
eb8b931
[WIP] Adapting to use class hierarchy
jjerphan Jun 30, 2021
e0d1c99
Move neighbors.DistanceMetric to metrics
jjerphan Jun 25, 2021
ac5ddc1
Rename private submodule to _parallel_reductions
jjerphan Jun 30, 2021
4e7b3cb
Introduce DistanceMetric and ArgKmin factory method
jjerphan Jun 30, 2021
d386fe1
Support DistanceMetric's and branch on ArgKmin when possible
jjerphan Jun 30, 2021
a61e81f
Document GEMM call and change wording to "approximated distance"
jjerphan Jul 1, 2021
e0d2881
Support memmapped and integral arrays
jjerphan Jul 1, 2021
b23f972
Pull _parallel_on_{X,Y} up on ParallelReduction
jjerphan Jul 1, 2021
67e02d4
Pull GEMM buffers down to FastSquaredEuclideanArgKmin
jjerphan Jul 1, 2021
a8331da
Skip tests for translation invariance
jjerphan Jul 1, 2021
0617dbd
Define methods on the base class
jjerphan Jul 1, 2021
38b97c3
Improve tests for NearestNeighbors the algorithm
jjerphan Jul 1, 2021
ef9c7f6
Propagate metric kwargs in KNeighborsMixin.kneighbors
jjerphan Jul 1, 2021
882e6e7
Add DistanceMetric data validation at initialisation
jjerphan Jul 1, 2021
15c110a
Remove warning checks for 'wminkowski' now that Scipy is not used
jjerphan Jul 1, 2021
7e3c4b7
Parametrise test_k_and_radius_neighbors_duplicates on algorithms
jjerphan Jul 1, 2021
2ec36c1
Remove uncalled snippet
jjerphan Jul 1, 2021
ad496f0
Do not branch on sparse arrays, yet
jjerphan Jul 1, 2021
3cdc476
Merge pull request #2 from jjerphan/pairwise_aggregation_cython-oop
jjerphan Jul 1, 2021
68af6a7
Document
jjerphan Jul 2, 2021
f740334
Remove unnecessary parallel_on_X thread-local datastructures
jjerphan Jul 2, 2021
0b9732b
Remove attributes for dtypes' size
jjerphan Jul 2, 2021
5c57860
Use all threads when sorting
jjerphan Jul 2, 2021
384b9a8
Rename ParallelReduction to PairwiseDistancesReduction
jjerphan Jul 2, 2021
ed03b88
Cast pointer to const value for gemm interface
jjerphan Jul 5, 2021
6c5e0b9
[WIP] Add RadiusNeighborhood
jjerphan Jul 1, 2021
e37b147
[WIP] Add RadiusNeighborhood
jjerphan Jul 1, 2021
29d145a
[WIP] Add RadiusNeighborhood
jjerphan Jul 1, 2021
dcca503
[WIP] Add RadiusNeighborhood
jjerphan Jul 1, 2021
eaedf53
Excluse some distances from valid ones
jjerphan Jul 5, 2021
70b13cd
Add temporary consistency test
jjerphan Jul 5, 2021
0f87e6f
Move results allocation from initialisation to compute
jjerphan Jul 5, 2021
633d3f4
Address review comments
jjerphan Jul 7, 2021
f5d2915
Exclude some DistanceMetrics for PairwiseDistancesReduction
jjerphan Jul 7, 2021
39c4788
Introduce utils functions for vector to ndarray coercion
jjerphan Jul 8, 2021
b58321f
Introduce parallel_on_Y for RadiusNeighborhood
jjerphan Jul 8, 2021
c24d184
Sort returned valid distances
jjerphan Jul 9, 2021
1ee3ae1
Update comments
jjerphan Jul 9, 2021
6ecf3f3
Change template for parallel_on_X
jjerphan Jul 9, 2021
4e0c465
Fix number of threads for parallel_on_Y
jjerphan Jul 9, 2021
30dcbba
Adapt test for 'brute' algorithm
jjerphan Jul 9, 2021
1a25476
Only allocate temporary buffer once
jjerphan Jul 9, 2021
93db1ee
Remove called to Argkmin.__dealloc__ in subclass'
jjerphan Jul 9, 2021
12d7dfd
[WIP] Introduce parallel_on_Y for RadiusNeighborhood
jjerphan Jul 9, 2021
937c4f3
Introduce parallel_on_Y for RadiusNeighborhood
jjerphan Jul 9, 2021
9780f28
Remove unnecessary intermediary sorts
jjerphan Jul 9, 2021
40e369d
Remove duplicate code to vector-to-ndarray coercion
jjerphan Jul 9, 2021
a3de08a
Use consistent names
jjerphan Jul 9, 2021
81b1b1b
Plug RadiusNeighborhood in RadiusNeighborsMixin.radius_neighbors
jjerphan Jul 9, 2021
b8fe6e1
Correctly free vectors using del
jjerphan Jul 12, 2021
54fb2c5
Use a sentinel for managing vectors' memory
jjerphan Jul 12, 2021
9df4047
Remove temporary consistency test
jjerphan Jul 12, 2021
7c713a1
Add comments
jjerphan Jul 12, 2021
7ea6daa
Merge pull request #3 from jjerphan/pairwise_aggregation_cython-radius
jjerphan Jul 13, 2021
8e54572
Merge branch 'main' into pairwise_aggregation_cython
jjerphan Jul 13, 2021
1d3336d
Revert to 'euclidean' when 'fast_sqeuclidean' can't be used
jjerphan Jul 13, 2021
d96b163
Use 'fast_sqeuclidean' for Birch internals
jjerphan Jul 13, 2021
f660dbf
Fix PairwiseDistancesReduction.is_usable
jjerphan Jul 13, 2021
bb24d95
Make array C-ordered for test
jjerphan Jul 13, 2021
6eea1aa
Annotate with cython.final when relevant
jjerphan Jul 13, 2021
15c4150
Black contains all the color that I like
jjerphan Jul 15, 2021
a9706d6
Use relative imports
jjerphan Jul 15, 2021
0935975
Use method for flatten
jjerphan Jul 15, 2021
e2b5398
Correct cross-referencing for metrics.DistanceMetric
jjerphan Jul 15, 2021
ce1ccdc
Precise that p is the parameter used by 'minkowski'
jjerphan Jul 15, 2021
a9fe71f
Prefer assert_allclose over assert_array_equal
jjerphan Jul 20, 2021
1593dab
Prefer csr_matrix.toarray over csr_matrix.A
jjerphan Jul 20, 2021
01c1294
Rework test for correct behavior regarding the radius
jjerphan Jul 20, 2021
4b6a041
Inline heap pushes
jjerphan Jul 20, 2021
c26b583
Mirror ValueError for incorrectly set sort_results and return_distances
jjerphan Jul 20, 2021
5bcea9f
Parametrise test_radius_neighbors_graph_sparse
jjerphan Jul 20, 2021
474804c
Lighten and correct test
jjerphan Jul 21, 2021
e84ff1b
Allow other dtypes than np.float64
jjerphan Jul 21, 2021
ac2ce70
Parametrise test_kneighbors_graph_sparse
jjerphan Jul 21, 2021
9f612a1
fixup! Remove uncalled snippet
jjerphan Jul 21, 2021
74240bd
Mark a test case as xfail for test_fast_sqeuclidean_correctness
jjerphan Jul 21, 2021
32b08af
fixup! Inline heap pushes
jjerphan Jul 21, 2021
8cfabc8
fixup! Adapt cython submodule for heaps
jjerphan Jul 21, 2021
dc1079f
Do not push if the element is identical to the largest
jjerphan Jul 21, 2021
f5308b0
Remove X and Y from _reduce_on_chunks signature
jjerphan Jul 12, 2021
19d461f
Introduce DistanceMetric.sparse_{rdist,dist}
jjerphan Jul 22, 2021
444c4bc
Introduce DistanceComputer
jjerphan Jul 21, 2021
cf24fd0
Adapt tests for behavior on duplicates
jjerphan Jul 21, 2021
e7aaa71
Free datastructure if present and do not raise otherwise
jjerphan Jul 22, 2021
718a4c2
Adapt doctest to the new behavior
jjerphan Jul 22, 2021
b70db13
Monkey-patch neighbors to alias metrics.DistanceMetric
jjerphan Jul 22, 2021
39e8189
Revert "Make array C-ordered for test"
jjerphan Jul 22, 2021
71f0130
Makes test parametrisation execution order fixed
jjerphan Jul 22, 2021
cd7e1a2
Introduce _openmp_thread_num in helpers
jjerphan Jul 22, 2021
29e5c7b
Reorder macros, cython and python imports
jjerphan Jul 22, 2021
dbbb8e9
Merge branch 'main' into pairwise_aggregation_cython
jjerphan Jul 22, 2021
550bb76
Improve comments for DistanceComputer
jjerphan Jul 27, 2021
0d5480c
Use 'proxy' instead for 'approx' as a wording
jjerphan Jul 27, 2021
305f007
Follow PEP 257
jjerphan Jul 27, 2021
ba89532
Fix RadiusNeighborhood.__dealloc__
jjerphan Jul 27, 2021
3b1e98c
Rename DistanceComputer for DatasetPairs
jjerphan Jul 27, 2021
7ae5091
Minimalistically document template methods
jjerphan Jul 27, 2021
2f9dc02
Reallocate datastructures for results at each new call
jjerphan Jul 27, 2021
4f3bd4c
Avoid thread over-subscription for BLAS
jjerphan Jul 27, 2021
9d6b83b
Introduce FastSquaredEuclideanRadiusNeighborhood
jjerphan Jul 28, 2021
a319358
Adapt internal checks
jjerphan Jul 28, 2021
3ebb200
Add test suite for PairwiseDistancesReduction
jjerphan Jul 28, 2021
69b0ad9
Merge branch 'main' into pairwise_aggregation_cython
jjerphan Jul 28, 2021
f63692a
Merge vectors at the really end using dynamic scheduling
jjerphan Jul 28, 2021
5f1d3a0
Pull the distance metric up and make it readonly
jjerphan Jul 28, 2021
bd81245
Use proxy distance for RadiusNeighborhood reduction
jjerphan Jul 28, 2021
2e48051
fixup! Correct cross-referencing for metrics.DistanceMetric
jjerphan Jul 29, 2021
53ba89d
Improve inputs checks
jjerphan Jul 29, 2021
1632e14
Rename submodule for pairwise distances reductions
jjerphan Jul 29, 2021
906b1e4
Move DatasetsPair closer to DistanceMetrics
jjerphan Jul 29, 2021
eb988ea
Add const qualifier on squared norms memoryviews
jjerphan Jul 29, 2021
9fc5e95
Improve style
jjerphan Jul 29, 2021
4f73848
Improve docstring notse for FastSquaredEuclidean alternatives
jjerphan Jul 29, 2021
ea3d791
Fix __cinit__
jjerphan Jul 29, 2021
445a22d
Skip for tests for 32 bits
jjerphan Jul 29, 2021
2647327
Adapt tests for better parametrisation variability
jjerphan Jul 29, 2021
420dbc8
Merge branch 'main' into pairwise_aggregation_cython
jjerphan Aug 4, 2021
4b2c165
Adaptation for DistanceMetrics
jjerphan Aug 4, 2021
3c71fd6
Improve style
jjerphan Aug 4, 2021
eaa892f
Parametrise tests by seed last
jjerphan Aug 4, 2021
7d5f603
Lighten tests parametrization
jjerphan Aug 4, 2021
91e3b27
Use squared norms of X vectors in FastSquaredEuclideanArgKmin
jjerphan Aug 4, 2021
c4add77
fixup! Adapt tests for better parametrisation variability
jjerphan Aug 4, 2021
caa7faa
Simplify tests
jjerphan Aug 4, 2021
c48c886
Improve test parametrisation on DistanceMetric
jjerphan Aug 4, 2021
4f06c3a
Improve heap routines' interfaces
jjerphan Aug 4, 2021
962f535
fixup! Improve heap routines' interfaces
jjerphan Aug 4, 2021
1b71282
Fix docstring
jjerphan Aug 4, 2021
dc8ddf4
Add missing dtype for indices
jjerphan Aug 4, 2021
5505f4f
Merge branch 'main' into pairwise_aggregation_cython
jjerphan Aug 4, 2021
bd1b0d9
Better deprecate neighbors.DistanceMetric
jjerphan Aug 4, 2021
c0dbc97
Add link to CPython docs regarding reference stealing
jjerphan Aug 5, 2021
2b0d3a6
Force the coretype to be armv8 on linux-arm64
jjerphan Aug 5, 2021
fb3866c
Revert "Force the coretype to be armv8 on linux-arm64"
jjerphan Aug 6, 2021
e55fd94
Use conda-forge to test arm64
jjerphan Aug 6, 2021
50d2669
Use Mambaforge instead
jjerphan Aug 6, 2021
84c4315
Install all dependencies in a row via mamba
jjerphan Aug 6, 2021
2411ffc
Mark tests as xfail when in unstable OpenBLAS configuration
jjerphan Aug 6, 2021
b7bbd06
Lighten tests' parametrizations
jjerphan Aug 6, 2021
042e228
Improve checks for unstable OpenBLAS configuration
jjerphan Aug 6, 2021
394d9dc
fixup! Use Mambaforge instead
jjerphan Aug 6, 2021
b2d80dc
Check against now made privated in_unstable_openblas_configuration
jjerphan Aug 6, 2021
ae097cf
fixup! Use conda-forge to test arm64
jjerphan Aug 6, 2021
1821aad
Remove SparseEfficiencyWarning
jjerphan Aug 11, 2021
ca942a5
Apply suggestions from reviews comments and discussions
jjerphan Aug 11, 2021
89f909c
Correct string alignement
jjerphan Aug 11, 2021
6887a37
Improve comment for 32 bits fallback
jjerphan Aug 11, 2021
0832dc4
Remove some checks and interactions with python
jjerphan Aug 12, 2021
4fba30a
CI Specify latest lib versions for linux-arm64
jjerphan Aug 31, 2021
2a8c026
DOC Fix glossary
jjerphan Aug 31, 2021
f4c5b64
Remove neighbors.DistanceMetric.__init__
jjerphan Aug 31, 2021
dfd9661
Format Parameters section
jjerphan Aug 31, 2021
afc8bf8
Improve some docstrings and comments
jjerphan Aug 31, 2021
de2dbf6
Use libc.float.DBL_MAX instead of constant defined via macro
jjerphan Aug 31, 2021
36f7b6e
Change default metric for 'fast_sqeuclidean'
jjerphan Aug 31, 2021
9e7c8a0
Merge branch 'main' into pairwise_aggregation_cython
jjerphan Sep 1, 2021
16bf24a
Adapt error message in test
jjerphan Sep 1, 2021
5353794
Add guard against negative zeros when computing exact distances
jjerphan Sep 1, 2021
d7b984d
Introduce 'fast_euclidean' and adapt KNeighborsMixins accordingly
jjerphan Sep 1, 2021
6427883
[WIP] Use 'fast_sqeuclidean' instead when possible in KNeighborsMixins
jjerphan Sep 1, 2021
c865dc6
fixup! Introduce 'fast_euclidean' and adapt KNeighborsMixins accordingly
jjerphan Sep 2, 2021
be6741b
fixup! [WIP] Use 'fast_sqeuclidean' instead when possible in KNeighbo…
jjerphan Sep 2, 2021
e8664df
Fix NearestNeighbors docstring for Numpydoc
jjerphan Sep 2, 2021
339ab30
Use metric="fast_sqeuclidean" for pairwise_distances_argmin internal …
jjerphan Sep 2, 2021
f9e337c
Add n_threads on PairwiseDistancesReduction
jjerphan Sep 3, 2021
d14af8e
Pass n_threads on PairwiseDistancesReduction calls
jjerphan Sep 3, 2021
e8f0468
Factorise tests and add another for n_threads agnosticism
jjerphan Sep 3, 2021
7e5775c
Add docstring for RadiusNeighborhood.compute and improve others
jjerphan Sep 3, 2021
2d267b4
Merge branch 'main' into pairwise_aggregation_cython
jjerphan Sep 3, 2021
e9803de
Fix conjugation
jjerphan Sep 6, 2021
fb05746
Use correct wording
jjerphan Sep 6, 2021
5f14488
Format code à la black
jjerphan Sep 6, 2021
b71811c
Format docstring for 'auto' strategy
jjerphan Sep 6, 2021
9c819c8
Reword 'reduced distance' for 'rank-preserving surrogate distance'
jjerphan Sep 6, 2021
1ff2433
Look-up for the strategy in scikit-learn's configuration if not speci…
jjerphan Sep 6, 2021
0170af3
Merge branch 'main' into pairwise_aggregation_cython
jjerphan Sep 6, 2021
8530267
Clarify wording in comment regarding n_threads
jjerphan Sep 8, 2021
0365c34
Apply some small reviews suggestions
jjerphan Sep 9, 2021
e754b67
Remove redundant statement
jjerphan Sep 9, 2021
fe2f8de
Do not validate X and Y for same number of dimensions
jjerphan Sep 14, 2021
4c6253e
Do not validate X and Y for same number of dimensions
jjerphan Sep 14, 2021
7c86a39
Remove checks for CSR matrices in DatasetsPair
jjerphan Sep 14, 2021
b3efe85
Rename sparse_{dist,rdist} to csr_{dist,rdist}
jjerphan Sep 14, 2021
a81c2f8
Merge branch 'main' into pairwise_aggregation_cython
jjerphan Sep 14, 2021
a6f0e4a
fixup! Do not validate X and Y for same number of dimensions
jjerphan Sep 14, 2021
7231535
Don't use f-strings for docstrings
jjerphan Sep 14, 2021
faed7cc
Remove some checks on arrays
jjerphan Sep 23, 2021
7ca3d5e
Merge branch 'main' into pairwise_aggregation_cython
jjerphan Sep 23, 2021
66b60b8
Compute squared euclidean norm for rows in parallel
jjerphan Sep 24, 2021
90cb9fd
Validate arrays for C-contiguity where needed
jjerphan Sep 24, 2021
1ae16d7
Add tests for Neighbors-mixins subclasses
jjerphan Sep 27, 2021
e9dfc95
Xfail test on another numerical edge-case
jjerphan Sep 27, 2021
2dcac3f
Use PyArray_SetBaseObject via NumPy Cython API
jjerphan Sep 27, 2021
c5524f3
Apply review suggestions
jjerphan Sep 28, 2021
eda2b26
Revert whats_new entry.
jjerphan Sep 14, 2021
952d41a
Test the fast euclidean overriding
jjerphan Sep 29, 2021
86c0d6f
Mention distances computations and their reduction in dedicated method
jjerphan Sep 29, 2021
bd5a1db
fixup! Apply review suggestions
jjerphan Sep 29, 2021
1e2181d
Tight self.neigh_{indices,distances}'s lifetime to their composite
jjerphan Sep 29, 2021
3052db3
Adapt docstrings
jjerphan Sep 29, 2021
a758fc1
Factor compute in base class
jjerphan Sep 29, 2021
3dbe038
Do not use frenchism
jjerphan Sep 29, 2021
e8deb0f
Test fast metric alternatives fallbacks
jjerphan Sep 30, 2021
5911f1c
fixup! Test fast metric alternatives fallbacks
jjerphan Sep 30, 2021
9b9fb7c
Change warning message
jjerphan Sep 30, 2021
042f306
[WIP] Introduce RBFKernel
jjerphan Oct 1, 2021
7890aab
Merge branch 'main' into pairwise-kernels
jjerphan Oct 5, 2021
ed55c26
[WIP] Introduce RBFKernel
jjerphan Oct 1, 2021
ea769ef
Change default value for gamma
jjerphan Oct 5, 2021
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -116,6 +116,8 @@ jobs:
environment:
- OMP_NUM_THREADS: 2
- OPENBLAS_NUM_THREADS: 2
- NUMPY_VERSION: 'latest'
- SCIPY_VERSION: 'latest'
- CYTHON_VERSION: 'latest'
- JOBLIB_VERSION: 'latest'
- THREADPOOLCTL_VERSION: 'latest'
Expand Down
48 changes: 30 additions & 18 deletions build_tools/circle/build_test_arm.sh
Original file line number Diff line number Diff line change
Expand Up @@ -21,39 +21,51 @@ source build_tools/shared.sh

sudo add-apt-repository --remove ppa:ubuntu-toolchain-r/test
sudo apt-get update
sudo apt-get install python3-virtualenv ccache
python3 -m virtualenv --system-site-packages --python=python3 testenv
source testenv/bin/activate
pip install --upgrade pip

# Setup conda environment
MINICONDA_URL="https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-Linux-aarch64.sh"

# Install Mambaforge
wget $MINICONDA_URL -O mambaforge.sh
MINICONDA_PATH=$HOME/miniconda
chmod +x mambaforge.sh && ./mambaforge.sh -b -p $MINICONDA_PATH
export PATH=$MINICONDA_PATH/bin:$PATH
mamba update --yes conda

# Create environment and install dependencies
mamba create -n testenv --yes python=3.7
source activate testenv

# Use the latest by default
mamba install --verbose -y ccache \
pip \
$(get_dep numpy $NUMPY_VERSION) \
$(get_dep scipy $SCIPY_VERSION) \
$(get_dep cython $CYTHON_VERSION) \
$(get_dep joblib $JOBLIB_VERSION) \
$(get_dep threadpoolctl $THREADPOOLCTL_VERSION) \
$(get_dep pytest $PYTEST_VERSION) \
$(get_dep pytest-xdist $PYTEST_XDIST_VERSION)
setup_ccache
python -m pip install $(get_dep cython $CYTHON_VERSION) \
$(get_dep joblib $JOBLIB_VERSION)
python -m pip install $(get_dep threadpoolctl $THREADPOOLCTL_VERSION) \
$(get_dep pytest $PYTEST_VERSION) \
$(get_dep pytest-xdist $PYTEST_XDIST_VERSION)

if [[ "$COVERAGE" == "true" ]]; then
python -m pip install codecov pytest-cov
fi

if [[ "$PYTEST_XDIST_VERSION" != "none" ]]; then
python -m pip install pytest-xdist
mamba install --verbose -y codecov pytest-cov
fi

if [[ "$TEST_DOCSTRINGS" == "true" ]]; then
# numpydoc requires sphinx
python -m pip install sphinx
python -m pip install numpydoc
mamba install --verbose -y sphinx
mamba install --verbose -y numpydoc
fi

python --version
conda list

# Set parallelism to 3 to overlap IO bound tasks with CPU bound tasks on CI
# workers with 2 cores when building the compiled extensions of scikit-learn.
export SKLEARN_BUILD_PARALLEL=3

python -m pip list
pip install --verbose --editable .
pip install --verbose --editable . --no-build-isolation
ccache -s
python -c "import sklearn; sklearn.show_versions()"
python -m threadpoolctl --import sklearn
Expand Down
7 changes: 3 additions & 4 deletions doc/glossary.rst
Original file line number Diff line number Diff line change
Expand Up @@ -644,9 +644,8 @@ General Concepts

Note that for most distance metrics, we rely on implementations from
:mod:`scipy.spatial.distance`, but may reimplement for efficiency in
our context. The :mod:`neighbors` module also duplicates some metric
implementations for integration with efficient binary tree search data
structures.
our context. The :class:`metrics.DistanceMetric` interface is used to implement
distance metrics for integration with efficient neighbors search.

pd
A shorthand for `Pandas <https://pandas.pydata.org>`_ due to the
Expand Down Expand Up @@ -1023,7 +1022,7 @@ such as:

Further examples:

* :class:`neighbors.DistanceMetric`
* :class:`metrics.DistanceMetric`
* :class:`gaussian_process.kernels.Kernel`
* ``tree.Criterion``

Expand Down
11 changes: 10 additions & 1 deletion doc/modules/classes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1058,6 +1058,16 @@ further details.

metrics.consensus_score

Distance metrics
----------------

.. currentmodule:: sklearn

.. autosummary::
:toctree: generated/
:template: class.rst

metrics.DistanceMetric

Pairwise metrics
----------------
Expand Down Expand Up @@ -1317,7 +1327,6 @@ Model validation
:template: class.rst

neighbors.BallTree
neighbors.DistanceMetric
neighbors.KDTree
neighbors.KernelDensity
neighbors.KNeighborsClassifier
Expand Down
2 changes: 1 addition & 1 deletion doc/modules/density.rst
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,7 @@ The form of these kernels is as follows:
:math:`K(x; h) \propto \cos(\frac{\pi x}{2h})` if :math:`x < h`

The kernel density estimator can be used with any of the valid distance
metrics (see :class:`~sklearn.neighbors.DistanceMetric` for a list of available metrics), though
metrics (see :class:`~sklearn.metrics.DistanceMetric` for a list of available metrics), though
the results are properly normalized only for the Euclidean metric. One
particularly useful metric is the
`Haversine distance <https://en.wikipedia.org/wiki/Haversine_formula>`_
Expand Down
4 changes: 3 additions & 1 deletion sklearn/cluster/_affinity_propagation.py
Original file line number Diff line number Diff line change
Expand Up @@ -523,7 +523,9 @@ def predict(self, X):

if self.cluster_centers_.shape[0] > 0:
with config_context(assume_finite=True):
return pairwise_distances_argmin(X, self.cluster_centers_)
return pairwise_distances_argmin(
X, self.cluster_centers_, metric="fast_euclidean"
)
else:
warnings.warn(
"This model does not have any cluster centers "
Expand Down
4 changes: 2 additions & 2 deletions sklearn/cluster/_agglomerative.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,8 @@

from ..base import BaseEstimator, ClusterMixin
from ..metrics.pairwise import paired_distances
from ..neighbors import DistanceMetric
from ..neighbors._dist_metrics import METRIC_MAPPING
from ..metrics import DistanceMetric
from ..metrics._dist_metrics import METRIC_MAPPING
from ..utils import check_array
from ..utils._fast_dict import IntFloatDict
from ..utils.fixes import _astype_copy_false
Expand Down
7 changes: 1 addition & 6 deletions sklearn/cluster/_birch.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@
from ..metrics import pairwise_distances_argmin
from ..metrics.pairwise import euclidean_distances
from ..base import TransformerMixin, ClusterMixin, BaseEstimator
from ..utils.extmath import row_norms
from ..utils import deprecated
from ..utils.validation import check_is_fitted
from ..exceptions import ConvergenceWarning
Expand Down Expand Up @@ -654,11 +653,10 @@ def predict(self, X):
"""
check_is_fitted(self)
X = self._validate_data(X, accept_sparse="csr", reset=False)
kwargs = {"Y_norm_squared": self._subcluster_norms}

with config_context(assume_finite=True):
argmin = pairwise_distances_argmin(
X, self.subcluster_centers_, metric_kwargs=kwargs
X, self.subcluster_centers_, metric="fast_euclidean"
)
return self.subcluster_labels_[argmin]

Expand Down Expand Up @@ -704,9 +702,6 @@ def _global_clustering(self, X=None):
"n_clusters should be an instance of ClusterMixin or an int"
)

# To use in predict to avoid recalculation.
self._subcluster_norms = row_norms(self.subcluster_centers_, squared=True)

if clusterer is None or not_enough_centroids:
self.subcluster_labels_ = np.arange(len(centroids))
if not_enough_centroids:
Expand Down
15 changes: 7 additions & 8 deletions sklearn/cluster/_hierarchical_fast.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ ctypedef np.int8_t INT8

np.import_array()

from ..neighbors._dist_metrics cimport DistanceMetric
from ..metrics._dist_metrics cimport DistanceMetric
from ..utils._fast_dict cimport IntFloatDict

# C++
Expand Down Expand Up @@ -236,8 +236,8 @@ def max_merge(IntFloatDict a, IntFloatDict b,
def average_merge(IntFloatDict a, IntFloatDict b,
np.ndarray[ITYPE_t, ndim=1] mask,
ITYPE_t n_a, ITYPE_t n_b):
"""Merge two IntFloatDicts with the average strategy: when the
same key is present in the two dicts, the weighted average of the two
"""Merge two IntFloatDicts with the average strategy: when the
same key is present in the two dicts, the weighted average of the two
values is used.

Parameters
Expand Down Expand Up @@ -290,13 +290,13 @@ def average_merge(IntFloatDict a, IntFloatDict b,


###############################################################################
# An edge object for fast comparisons
# An edge object for fast comparisons

cdef class WeightedEdge:
cdef public ITYPE_t a
cdef public ITYPE_t b
cdef public DTYPE_t weight

def __init__(self, DTYPE_t weight, ITYPE_t a, ITYPE_t b):
self.weight = weight
self.a = a
Expand Down Expand Up @@ -326,7 +326,7 @@ cdef class WeightedEdge:
return self.weight > other.weight
elif op == 5:
return self.weight >= other.weight

def __repr__(self):
return "%s(weight=%f, a=%i, b=%i)" % (self.__class__.__name__,
self.weight,
Expand Down Expand Up @@ -475,7 +475,7 @@ def mst_linkage_core(

dist_metric: DistanceMetric
A DistanceMetric object conforming to the API from
``sklearn.neighbors._dist_metrics.pxd`` that will be
``sklearn.metrics._dist_metrics.pxd`` that will be
used to compute distances.

Returns
Expand Down Expand Up @@ -534,4 +534,3 @@ def mst_linkage_core(
current_node = new_node

return np.array(result)

4 changes: 3 additions & 1 deletion sklearn/cluster/_mean_shift.py
Original file line number Diff line number Diff line change
Expand Up @@ -512,4 +512,6 @@ def predict(self, X):
check_is_fitted(self)
X = self._validate_data(X, reset=False)
with config_context(assume_finite=True):
return pairwise_distances_argmin(X, self.cluster_centers_)
return pairwise_distances_argmin(
X, self.cluster_centers_, metric="fast_euclidean"
)
5 changes: 3 additions & 2 deletions sklearn/cluster/tests/test_hierarchical.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
from scipy.sparse.csgraph import connected_components

from sklearn.metrics.cluster import adjusted_rand_score
from sklearn.neighbors.tests.test_dist_metrics import METRICS_DEFAULT_PARAMS
from sklearn.metrics.tests.test_dist_metrics import METRICS_DEFAULT_PARAMS
from sklearn.utils._testing import assert_almost_equal, create_memmap_backed_data
from sklearn.utils._testing import assert_array_almost_equal
from sklearn.utils._testing import ignore_warnings
Expand All @@ -31,14 +31,15 @@
_fix_connectivity,
)
from sklearn.feature_extraction.image import grid_to_graph
from sklearn.metrics import DistanceMetric
from sklearn.metrics.pairwise import (
PAIRED_DISTANCES,
cosine_distances,
manhattan_distances,
pairwise_distances,
)
from sklearn.metrics.cluster import normalized_mutual_info_score
from sklearn.neighbors import kneighbors_graph, DistanceMetric
from sklearn.neighbors import kneighbors_graph
from sklearn.cluster._hierarchical_fast import (
average_merge,
max_merge,
Expand Down
3 changes: 3 additions & 0 deletions sklearn/metrics/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,8 @@
from ._classification import brier_score_loss
from ._classification import multilabel_confusion_matrix

from ._dist_metrics import DistanceMetric

from . import cluster
from .cluster import adjusted_mutual_info_score
from .cluster import adjusted_rand_score
Expand Down Expand Up @@ -115,6 +117,7 @@
"davies_bouldin_score",
"DetCurveDisplay",
"det_curve",
"DistanceMetric",
"euclidean_distances",
"explained_variance_score",
"f1_score",
Expand Down
Original file line number Diff line number Diff line change
@@ -1,14 +1,13 @@
#!python
#cython: boundscheck=False
#cython: wraparound=False
#cython: cdivision=True
# cython: boundscheck=False
# cython: cdivision=True
# cython: initializedcheck=False
# cython: wraparound=False

cimport cython
cimport numpy as np
from libc.math cimport fabs, sqrt, exp, cos, pow
from libc.math cimport sqrt, exp

from ._typedefs cimport DTYPE_t, ITYPE_t, DITYPE_t
from ._typedefs import DTYPE, ITYPE
from ..utils._typedefs cimport DTYPE_t, ITYPE_t

######################################################################
# Inline distance functions
Expand Down Expand Up @@ -60,9 +59,25 @@ cdef class DistanceMetric:
cdef DTYPE_t dist(self, const DTYPE_t* x1, const DTYPE_t* x2,
ITYPE_t size) nogil except -1

cdef DTYPE_t rdist(self, DTYPE_t* x1, DTYPE_t* x2,
cdef DTYPE_t rdist(self, const DTYPE_t* x1, const DTYPE_t* x2,
ITYPE_t size) nogil except -1

cdef DTYPE_t csr_dist(
self,
const DTYPE_t[:] x1_data,
const ITYPE_t[:] x1_indices,
const DTYPE_t[:] x2_data,
const ITYPE_t[:] x2_indices,
) nogil except -1

cdef DTYPE_t csr_rdist(
self,
const DTYPE_t[:] x1_data,
const ITYPE_t[:] x1_indices,
const DTYPE_t[:] x2_data,
const ITYPE_t[:] x2_indices,
) nogil except -1

cdef int pdist(self, const DTYPE_t[:, ::1] X, DTYPE_t[:, ::1] D) except -1

cdef int cdist(self, const DTYPE_t[:, ::1] X, const DTYPE_t[:, ::1] Y,
Expand All @@ -71,3 +86,24 @@ cdef class DistanceMetric:
cdef DTYPE_t _rdist_to_dist(self, DTYPE_t rdist) nogil except -1

cdef DTYPE_t _dist_to_rdist(self, DTYPE_t dist) nogil except -1


######################################################################
# DatasetsPair base class
cdef class DatasetsPair:
cdef DistanceMetric distance_metric

cdef ITYPE_t n_X(self) nogil

cdef ITYPE_t n_Y(self) nogil

cdef DTYPE_t dist(self, ITYPE_t i, ITYPE_t j) nogil

cdef DTYPE_t ranking_preserving_dist(self, ITYPE_t i, ITYPE_t j) nogil


cdef class DenseDenseDatasetsPair(DatasetsPair):
cdef:
const DTYPE_t[:, ::1] X
const DTYPE_t[:, ::1] Y
ITYPE_t d
Loading