Thanks to visit codestin.com
Credit goes to github.com

Skip to content

FEAT Support precomputed distance matrix for PairwiseDistancesReductions #29483

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 84 commits into
base: main
Choose a base branch
from

Conversation

kyrajeep
Copy link

@kyrajeep kyrajeep commented Jul 15, 2024

Towards #25888

  1. What does this implement/fix? Explain your changes.
    (1) To take in precomputed distance directly, uses 'compute' in {ArgKmin, RadiusNeighbors}{ClassMode} {32,64}.
    (2) When this is received as a parameter X, initializes an instance of class PrecomputedDistance.
    (3) This precomputed data is formatted by datasets_pair.py
    (4) The data is passed onto the backend by using the get_for() factory method.
    (5) Changes the API according to @Micky774 's suggestion, which is to include 'precomputed' as one of the metrics instead of as an additional parameter. This simplifies its use by taking a precomputed matrix as X and avoids confusion for the users who do not have a precomputed matrix.
    For example, a user with a precomputed matrix specifies metric = 'precomputed' and X = precomputed matrix. Previously, there was an extra 'precomputed' parameter that took a Boolean value.

  2. Any other comments?
    Thank you to @jjerphan for the reviews!

Co-authors: @jjerphan @adam2392

Copy link

github-actions bot commented Jul 15, 2024

❌ Linting issues

This PR is introducing linting issues. Here's a summary of the issues. Note that you can avoid having linting issues by enabling pre-commit hooks. Instructions to enable them can be found here.

You can see the details of the linting issues under the lint job here


ruff check

ruff detected issues. Please run ruff check --fix --output-format=full locally, fix the remaining issues, and push the changes. Here you can see the detected issues. Note that the installed ruff version is ruff=0.11.7.


sklearn/metrics/_pairwise_distances_reduction/_dispatcher.py:119:12: E712 Avoid equality comparisons to `False`; use `if not is_usable:` for false checks
    |
118 |         # is_usable = (X is not None and Y is not None) ^ bool(precomputed)
119 |         if is_usable == False:
    |            ^^^^^^^^^^^^^^^^^^ E712
120 |             return is_usable
121 |         # FIXME: the current Cython implementation is too slow for a large number of
    |
    = help: Replace with `not is_usable`

sklearn/metrics/tests/test_pairwise_distances_reduction.py:408:89: E501 Line too long (96 > 88)
    |
406 | @pytest.mark.parametrize("cls", [ArgKmin, RadiusNeighbors])
407 | def test_precompute_invalid_metric(cls):
408 |     """Test that ValueError is raised when metric is not 'precomputed' for precomputed input."""
    |                                                                                         ^^^^^^^^ E501
409 |     X = np.random.rand(10, 10)  # Precomputed matrix
410 |     with pytest.raises(
    |

sklearn/metrics/tests/test_pairwise_distances_reduction.py:471:5: F811 Redefinition of unused `assert_precomputed` from line 106
    |
471 | def assert_precomputed(precomputed, n_samples_X, n_samples_Y):
    |     ^^^^^^^^^^^^^^^^^^ F811
472 |     """
473 |     Validates a precomputed matrix for compatibility.
    |
    = help: Remove definition: `assert_precomputed`

sklearn/metrics/tests/test_pairwise_distances_reduction.py:499:89: E501 Line too long (112 > 88)
    |
497 |     if precomputed.shape != expected_shape:
498 |         raise AssertionError(
499 |             f"Incorrect dimensions for precomputed matrix. Expected: {expected_shape}, Got: {precomputed.shape}"
    |                                                                                         ^^^^^^^^^^^^^^^^^^^^^^^^ E501
500 |         )
    |

Found 4 errors.
No fixes available (1 hidden fix can be enabled with the `--unsafe-fixes` option).

ruff format

ruff detected issues. Please run ruff format locally and push the changes. Here you can see the detected issues. Note that the installed ruff version is ruff=0.11.7.


--- sklearn/metrics/tests/test_pairwise_distances_reduction.py
+++ sklearn/metrics/tests/test_pairwise_distances_reduction.py
@@ -378,11 +378,6 @@
 }
 
 
-
-
-
-
-
 @pytest.mark.parametrize("cls", [ArgKmin, RadiusNeighbors])
 def test_precompute_all_inputs_none(cls):
     """Test that ValueError is raised when all inputs are None."""

1 file would be reformatted, 922 files already formatted

mypy

mypy detected issues. Please fix them locally and push the changes. Here you can see the detected issues. Note that the installed mypy version is mypy=1.15.0.


sklearn/metrics/tests/test_pairwise_distances_reduction.py:471: error: Name "assert_precomputed" already defined on line 106  [no-redef]
Found 1 error in 1 file (checked 563 source files)

Generated for commit: c9f270d. Link to the linter CI: here

@Micky774
Copy link
Contributor

Also, in the abstract methods dist and surrogate_dist of PrecomputedDistanceMatrix, the indices [i,j] are required. Does this mean that other indices in the same dataset need computation and are not precomputed?

No. Keep in mind that surrogate_dist and dist methods are for scalar distances, i.e. they take a pair of indices and produce the distance between exactly those two data vectors. Hence, for the precomputed distances, it's simply indexing into the precomputed distances array.

@kyrajeep
Copy link
Author

Thank you for the clarification

@kyrajeep
Copy link
Author

In class PrecomputedDistanceMatrix, I don't see the option to input the precomputed distance matrix. Should I make that option or is it stored somewhere else?

@jjerphan
Copy link
Member

jjerphan commented Jul 15, 2024

Hi! I don't have much time and energy to have a look at it in details but basically:

  • a reference to the distance matrix needs to be passed to the compute methods of the dispatchers (which subclass BaseDistancesReductionDispatcher)
  • this reference needs to be passed then to the constructors of the Cython implementations (which subclass BaseDistancesReduction{{name_suffix}}) similarly to X and Y and be stored in an instance of a potentially new DatasetsPair subclass.
  • the logic for computing chunks size and other pieces of information in BaseDistancesReduction{{name_suffix}}.__init__ also needs adaptation.
  • we could have another second class hierarchy for handling precomputed distance matrices, but let's start with this simpler adaptation for now.

Let me know if this is unclear. 🙂

@jjerphan jjerphan changed the title Add the precomputed option to bypass the computation and call the precomputed distance instead. FEAT Support precomputed distance matrix for PairwiseDistancesReductions Jul 23, 2024
@kyrajeep
Copy link
Author

kyrajeep commented Aug 3, 2024

Thank you for your comments. They make sense, are helpful, and I believe I am going in the right direction. So far I have made progress on all the points made above by @jjerphan except for creating an instance to store the precomputed matrix. I think even though there are things (referencing the precomputed matrix and syntax) that should be definitely changed a review will be helpful and probably worthwhile at this point :) Higher level review will be helpful enough as I am familiar with the class hierarchy and codebase.

I have a couple questions that are not syntax related. We have a class for precomputed that subclasses DatasetsPair. In that class we have surrogate_dist and dist functions. My current understanding is that they are both used to compute distance depending on metric, chunking strategy, etc.. I want to create a method that will take in a precomputed matrix, if provided, and another method to retrieve this matrix to be passed into the compute method. Does that sound about right?

As for creating an instance to store the precomputed matrix itself... which file should I include it in?

Feedback will be appreciated! Thank you!

Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A quick review with more detailed explanations than my previous ones.

I pushed a commit (see 3a027c7) since your local venv_sklearn was staged and committed as part of e2ef246. I would recommend updating your global .gitignore with venv* to avoid this in the future, or to use a conda environment.

Signed-off-by: Julien Jerphanion <[email protected]>
Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, run the linters and remove the unused fixture in the tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, revert changes to this file.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need tests checking that the results of PairwiseDistancesReduction obtained respectively on (X, Y) or on their precomputed distance matrix are identical.

@Micky774
Copy link
Contributor

Catching up on this PR. I wanted to mention that I don't know if I like the current API with precomputed as a separate argument, since usually we include it as an option for metric. I think keeping with tradition leads to a more consistent experience since e.g. you don't have a situation where both precomputed and metric are defined (which would be entirely redundant) and instead succinctly have metric='precomputed'.

To enable this, we could simply enforce an API where metric='precomputed' expects X to be the precomputed distances, and raises an error if Y is provided (in the rare case that folks accidentally use precomputed when they didn't mean to).

What do you all think?

@jjerphan
Copy link
Member

Thank you for shiming in, @Micky774.

We can indeed revise the API to get them in line with the public ones.

@kyrajeep
Copy link
Author

Hi @Micky774,

I'm sorry to take a while to respond. That's a great point. I agree that including 'precomputed' in one of the options for metrics is clearer. How about making the change in a separate PR so that this PR does not get too lengthy? Thanks!

@Micky774
Copy link
Contributor

Hi @Micky774,

I'm sorry to take a while to respond. That's a great point. I agree that including 'precomputed' in one of the options for metrics is clearer. How about making the change in a separate PR so that this PR does not get too lengthy? Thanks!

Considering that it is a fairly key part of the API being affected by this PR, I think it is better to resolve it here. In my opinion it costs less to fix it now, than it would to start up a new PR for that change. And no worries on the delay -- I appreciate you continuing your work here!

@kyrajeep
Copy link
Author

Ok, thank you :)

@kyrajeep
Copy link
Author

Hi @jjerphan, @Micky774

While working on implementing the change in the API to enable 'precomputed' as one of the metrics, I realized that X = precomputed argument may lead to issues due to the dimensions being different from X. This is from the current DatasetsPair docstring:

X : {ndarray, sparse matrix} of shape (n_samples_X, n_features)
Input data.
If provided as a ndarray, it must be C-contiguous.
If provided as a sparse matrix, it must be in CSR format.
Y : {ndarray, sparse matrix} of shape (n_samples_Y, n_features)
Input data.
If provided as a ndarray, it must be C-contiguous.
If provided as a sparse matrix, it must be in CSR format.
precomputed : ndarray of shape (n_samples_X, n_samples_Y)
Precomputed distance if provided.

What do you all think?

@Micky774
Copy link
Contributor

Hi @jjerphan, @Micky774

While working on implementing the change in the API to enable 'precomputed' as one of the metrics, I realized that X = precomputed argument may lead to issues due to the dimensions being different from X. This is from the current DatasetsPair docstring:

X : {ndarray, sparse matrix} of shape (n_samples_X, n_features) Input data. If provided as a ndarray, it must be C-contiguous. If provided as a sparse matrix, it must be in CSR format. Y : {ndarray, sparse matrix} of shape (n_samples_Y, n_features) Input data. If provided as a ndarray, it must be C-contiguous. If provided as a sparse matrix, it must be in CSR format. precomputed : ndarray of shape (n_samples_X, n_samples_Y) Precomputed distance if provided.

What do you all think?

I'm not sure I see the problem here. If you're worried about the fact that an NxN pre-computed matrix may be a result of NxD data for arbitrary D, then I do not believe it is a problem since the responsibility lies on the user to ensure that they are entering the correct precomputed matrix for their data -- that is not something we could (or should) verify imo.

If that's not what your concern is, then I may need a bit more of an explanation :)

@kyrajeep
Copy link
Author

kyrajeep commented Feb 26, 2025

Hi @jjerphan, @Micky774
While working on implementing the change in the API to enable 'precomputed' as one of the metrics, I realized that X = precomputed argument may lead to issues due to the dimensions being different from X. This is from the current DatasetsPair docstring:
X : {ndarray, sparse matrix} of shape (n_samples_X, n_features) Input data. If provided as a ndarray, it must be C-contiguous. If provided as a sparse matrix, it must be in CSR format. Y : {ndarray, sparse matrix} of shape (n_samples_Y, n_features) Input data. If provided as a ndarray, it must be C-contiguous. If provided as a sparse matrix, it must be in CSR format. precomputed : ndarray of shape (n_samples_X, n_samples_Y) Precomputed distance if provided.
What do you all think?

I'm not sure I see the problem here. If you're worried about the fact that an NxN pre-computed matrix may be a result of NxD data for arbitrary D, then I do not believe it is a problem since the responsibility lies on the user to ensure that they are entering the correct precomputed matrix for their data -- that is not something we could (or should) verify imo.

If that's not what your concern is, then I may need a bit more of an explanation :)

Thank you for your input, @Micky774 ! That's exactly what I meant, but shouldn't we verify the shape of the precomputed matrix to throw an error if the dimension is not correct? What do you think, @jjerphan ? :)

@jjerphan
Copy link
Member

I would look at and implement the same logic as the one of estimator taking a data matrix or a pre-computed distance matrix for X.

@kyrajeep
Copy link
Author

Sounds good. Thank you.

@kyrajeep
Copy link
Author

Hi @jjerphan, @Micky774, @adam2392

Tests for the PR have been implemented and the newly requested API change to include 'precomputed' as a metric is almost done. In building the dev environment, however, _dist_metrics.pyx.tp needs some changes for the API change. I need help with this file from someone more experienced in Cython, this contribution being my first one in scikit-learn. Originally, I got a review that this file should not be changed, but with the API change of including 'precomputed' as a metric instead of a parameter, it seems that this file also needs changing.

I learned a lot from working on this first PR and most of the work is done, but I will need some help to get it merged because of not having previous experience with Cython before this PR. I really hope I can collaborate with someone, one of the co-authors or somebody else who wants to be included as such. I could not run the pytest and linters yet, because of the above mentioned issue which I pasted down below, but I will as soon as they are resolved. Thank you!

Sincerely,
@kyrajeep

pip install --no-build-isolation -e .
Obtaining file:///Users/kyra/Documents/scikit-learn
Checking if build backend supports build_editable ... done
Preparing editable metadata (pyproject.toml) ... error
error: subprocess-exited-with-error

× Preparing editable metadata (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [124 lines of output]
+ meson setup --reconfigure /Users/kyra/Documents/scikit-learn /Users/kyra/Documents/scikit-learn/build/cp313 -Dbuildtype=release -Db_ndebug=if-release -Db_vscrt=md --native-file=/Users/kyra/Documents/scikit-learn/build/cp313/meson-python-native-file.ini
Cleaning... 0 files.
The Meson build system
Version: 1.8.0
Source dir: /Users/kyra/Documents/scikit-learn
Build dir: /Users/kyra/Documents/scikit-learn/build/cp313
Build type: native build
Project name: scikit-learn
Project version: 1.8.dev0
C compiler for the host machine: /usr/bin/clang (clang 17.0.0 "Apple clang version 17.0.0 (clang-1700.0.13.3)")
C linker for the host machine: /usr/bin/clang ld64 1167.4.1
C++ compiler for the host machine: /usr/bin/clang++ (clang 17.0.0 "Apple clang version 17.0.0 (clang-1700.0.13.3)")
C++ linker for the host machine: /usr/bin/clang++ ld64 1167.4.1
Cython compiler for the host machine: cython (cython 3.1.0)
Host machine cpu family: aarch64
Host machine cpu: aarch64
Compiler for C supports arguments -Wno-unused-but-set-variable: YES (cached)
Compiler for C supports arguments -Wno-unused-function: YES (cached)
Compiler for C supports arguments -Wno-conversion: YES (cached)
Compiler for C supports arguments -Wno-misleading-indentation: YES (cached)
Library m found: NO
Program sklearn/_build_utils/tempita.py found: YES (/Users/kyra/miniforge3/envs/sklearn-env/bin/python3.13 /Users/kyra/Documents/scikit-learn/sklearn/_build_utils/tempita.py)
Program python found: YES (/Users/kyra/miniforge3/envs/sklearn-env/bin/python3.13)
Dependency OpenMP found: YES 5.1 (cached)
Program cython found: YES (/Users/kyra/miniforge3/envs/sklearn-env/bin/cython)
Build targets in project: 111

  scikit-learn 1.8.dev0
  
    User defined options
      Native files: /Users/kyra/Documents/scikit-learn/build/cp313/meson-python-native-file.ini
      b_ndebug    : if-release
      b_vscrt     : md
      buildtype   : release
  
  Found ninja-1.12.1 at /Users/kyra/miniforge3/envs/sklearn-env/bin/ninja
  + /Users/kyra/miniforge3/envs/sklearn-env/bin/ninja
  [1/140] Compiling C object sklearn/__check_build/_check_build.cpython-313-darwin.so.p/meson-generated__check_build.c.o
  FAILED: sklearn/__check_build/_check_build.cpython-313-darwin.so.p/meson-generated__check_build.c.o
  /usr/bin/clang -Isklearn/__check_build/_check_build.cpython-313-darwin.so.p -Isklearn/__check_build -I../../sklearn/__check_build -I/Users/kyra/miniforge3/envs/sklearn-env/include/python3.13 -I/usr/local/opt/libomp/include -fvisibility=hidden -fdiagnostics-color=always -DNDEBUG -Wall -Winvalid-pch -std=c11 -O3 -Wno-unused-but-set-variable -Wno-unused-function -Wno-conversion -Wno-misleading-indentation -ftree-vectorize -fPIC -fstack-protector-strong -O2 -pipe -isystem /Users/kyra/miniforge3/envs/sklearn-dev/include -D_FORTIFY_SOURCE=2 -isystem /Users/kyra/miniforge3/envs/sklearn-dev/include -Xpreprocessor -fopenmp -MD -MQ sklearn/__check_build/_check_build.cpython-313-darwin.so.p/meson-generated__check_build.c.o -MF sklearn/__check_build/_check_build.cpython-313-darwin.so.p/meson-generated__check_build.c.o.d -o sklearn/__check_build/_check_build.cpython-313-darwin.so.p/meson-generated__check_build.c.o -c sklearn/__check_build/_check_build.cpython-313-darwin.so.p/_check_build.c
  sklearn/__check_build/_check_build.cpython-313-darwin.so.p/_check_build.c:1111:10: fatal error: 'omp.h' file not found
   1111 | #include <omp.h>
        |          ^~~~~~~
  1 error generated.
  [2/140] Compiling C++ object sklearn/utils/murmurhash.cpython-313-darwin.so.p/src_MurmurHash3.cpp.o
  [3/140] Compiling C object sklearn/utils/murmurhash.cpython-313-darwin.so.p/meson-generated_murmurhash.c.o
  FAILED: sklearn/utils/murmurhash.cpython-313-darwin.so.p/meson-generated_murmurhash.c.o
  /usr/bin/clang -Isklearn/utils/murmurhash.cpython-313-darwin.so.p -Isklearn/utils -I../../sklearn/utils -Isklearn -I/Users/kyra/miniforge3/envs/sklearn-env/include/python3.13 -I/usr/local/opt/libomp/include -fvisibility=hidden -fdiagnostics-color=always -DNDEBUG -Wall -Winvalid-pch -std=c11 -O3 -Wno-unused-but-set-variable -Wno-unused-function -Wno-conversion -Wno-misleading-indentation -ftree-vectorize -fPIC -fstack-protector-strong -O2 -pipe -isystem /Users/kyra/miniforge3/envs/sklearn-dev/include -D_FORTIFY_SOURCE=2 -isystem /Users/kyra/miniforge3/envs/sklearn-dev/include -Xpreprocessor -fopenmp -MD -MQ sklearn/utils/murmurhash.cpython-313-darwin.so.p/meson-generated_murmurhash.c.o -MF sklearn/utils/murmurhash.cpython-313-darwin.so.p/meson-generated_murmurhash.c.o.d -o sklearn/utils/murmurhash.cpython-313-darwin.so.p/meson-generated_murmurhash.c.o -c sklearn/utils/murmurhash.cpython-313-darwin.so.p/murmurhash.c
  sklearn/utils/murmurhash.cpython-313-darwin.so.p/murmurhash.c:1115:10: fatal error: 'omp.h' file not found
   1115 | #include <omp.h>
        |          ^~~~~~~
  1 error generated.
  [4/140] Compiling C object sklearn/utils/arrayfuncs.cpython-313-darwin.so.p/meson-generated_arrayfuncs.c.o
  FAILED: sklearn/utils/arrayfuncs.cpython-313-darwin.so.p/meson-generated_arrayfuncs.c.o
  /usr/bin/clang -Isklearn/utils/arrayfuncs.cpython-313-darwin.so.p -Isklearn/utils -I../../sklearn/utils -Isklearn -I/Users/kyra/miniforge3/envs/sklearn-env/include/python3.13 -I/usr/local/opt/libomp/include -fvisibility=hidden -fdiagnostics-color=always -DNDEBUG -Wall -Winvalid-pch -std=c11 -O3 -Wno-unused-but-set-variable -Wno-unused-function -Wno-conversion -Wno-misleading-indentation -ftree-vectorize -fPIC -fstack-protector-strong -O2 -pipe -isystem /Users/kyra/miniforge3/envs/sklearn-dev/include -D_FORTIFY_SOURCE=2 -isystem /Users/kyra/miniforge3/envs/sklearn-dev/include -Xpreprocessor -fopenmp -MD -MQ sklearn/utils/arrayfuncs.cpython-313-darwin.so.p/meson-generated_arrayfuncs.c.o -MF sklearn/utils/arrayfuncs.cpython-313-darwin.so.p/meson-generated_arrayfuncs.c.o.d -o sklearn/utils/arrayfuncs.cpython-313-darwin.so.p/meson-generated_arrayfuncs.c.o -c sklearn/utils/arrayfuncs.cpython-313-darwin.so.p/arrayfuncs.c
  sklearn/utils/arrayfuncs.cpython-313-darwin.so.p/arrayfuncs.c:1116:10: fatal error: 'omp.h' file not found
   1116 | #include <omp.h>
        |          ^~~~~~~
  1 error generated.
  [5/140] Compiling C object sklearn/_isotonic.cpython-313-darwin.so.p/meson-generated__isotonic.c.o
  FAILED: sklearn/_isotonic.cpython-313-darwin.so.p/meson-generated__isotonic.c.o
  /usr/bin/clang -Isklearn/_isotonic.cpython-313-darwin.so.p -Isklearn -I../../sklearn -I/Users/kyra/miniforge3/envs/sklearn-env/include/python3.13 -I/usr/local/opt/libomp/include -fvisibility=hidden -fdiagnostics-color=always -DNDEBUG -Wall -Winvalid-pch -std=c11 -O3 -Wno-unused-but-set-variable -Wno-unused-function -Wno-conversion -Wno-misleading-indentation -ftree-vectorize -fPIC -fstack-protector-strong -O2 -pipe -isystem /Users/kyra/miniforge3/envs/sklearn-dev/include -D_FORTIFY_SOURCE=2 -isystem /Users/kyra/miniforge3/envs/sklearn-dev/include -Xpreprocessor -fopenmp -MD -MQ sklearn/_isotonic.cpython-313-darwin.so.p/meson-generated__isotonic.c.o -MF sklearn/_isotonic.cpython-313-darwin.so.p/meson-generated__isotonic.c.o.d -o sklearn/_isotonic.cpython-313-darwin.so.p/meson-generated__isotonic.c.o -c sklearn/_isotonic.cpython-313-darwin.so.p/_isotonic.c
  sklearn/_isotonic.cpython-313-darwin.so.p/_isotonic.c:1114:10: fatal error: 'omp.h' file not found
   1114 | #include <omp.h>
        |          ^~~~~~~
  1 error generated.
  [6/140] Compiling C object sklearn/utils/_cython_blas.cpython-313-darwin.so.p/meson-generated__cython_blas.c.o
  FAILED: sklearn/utils/_cython_blas.cpython-313-darwin.so.p/meson-generated__cython_blas.c.o
  /usr/bin/clang -Isklearn/utils/_cython_blas.cpython-313-darwin.so.p -Isklearn/utils -I../../sklearn/utils -Isklearn -I/Users/kyra/miniforge3/envs/sklearn-env/include/python3.13 -I/usr/local/opt/libomp/include -fvisibility=hidden -fdiagnostics-color=always -DNDEBUG -Wall -Winvalid-pch -std=c11 -O3 -Wno-unused-but-set-variable -Wno-unused-function -Wno-conversion -Wno-misleading-indentation -ftree-vectorize -fPIC -fstack-protector-strong -O2 -pipe -isystem /Users/kyra/miniforge3/envs/sklearn-dev/include -D_FORTIFY_SOURCE=2 -isystem /Users/kyra/miniforge3/envs/sklearn-dev/include -Xpreprocessor -fopenmp -MD -MQ sklearn/utils/_cython_blas.cpython-313-darwin.so.p/meson-generated__cython_blas.c.o -MF sklearn/utils/_cython_blas.cpython-313-darwin.so.p/meson-generated__cython_blas.c.o.d -o sklearn/utils/_cython_blas.cpython-313-darwin.so.p/meson-generated__cython_blas.c.o -c sklearn/utils/_cython_blas.cpython-313-darwin.so.p/_cython_blas.c
  sklearn/utils/_cython_blas.cpython-313-darwin.so.p/_cython_blas.c:1114:10: fatal error: 'omp.h' file not found
   1114 | #include <omp.h>
        |          ^~~~~~~
  1 error generated.
  [7/140] Compiling C object sklearn/utils/sparsefuncs_fast.cpython-313-darwin.so.p/meson-generated_sparsefuncs_fast.c.o
  FAILED: sklearn/utils/sparsefuncs_fast.cpython-313-darwin.so.p/meson-generated_sparsefuncs_fast.c.o
  /usr/bin/clang -Isklearn/utils/sparsefuncs_fast.cpython-313-darwin.so.p -Isklearn/utils -I../../sklearn/utils -Isklearn -I/Users/kyra/miniforge3/envs/sklearn-env/include/python3.13 -I/usr/local/opt/libomp/include -fvisibility=hidden -fdiagnostics-color=always -DNDEBUG -Wall -Winvalid-pch -std=c11 -O3 -Wno-unused-but-set-variable -Wno-unused-function -Wno-conversion -Wno-misleading-indentation -ftree-vectorize -fPIC -fstack-protector-strong -O2 -pipe -isystem /Users/kyra/miniforge3/envs/sklearn-dev/include -D_FORTIFY_SOURCE=2 -isystem /Users/kyra/miniforge3/envs/sklearn-dev/include -Xpreprocessor -fopenmp -MD -MQ sklearn/utils/sparsefuncs_fast.cpython-313-darwin.so.p/meson-generated_sparsefuncs_fast.c.o -MF sklearn/utils/sparsefuncs_fast.cpython-313-darwin.so.p/meson-generated_sparsefuncs_fast.c.o.d -o sklearn/utils/sparsefuncs_fast.cpython-313-darwin.so.p/meson-generated_sparsefuncs_fast.c.o -c sklearn/utils/sparsefuncs_fast.cpython-313-darwin.so.p/sparsefuncs_fast.c
  sklearn/utils/sparsefuncs_fast.cpython-313-darwin.so.p/sparsefuncs_fast.c:1116:10: fatal error: 'omp.h' file not found
   1116 | #include <omp.h>
        |          ^~~~~~~
  1 error generated.
  [8/140] Compiling C object sklearn/utils/_openmp_helpers.cpython-313-darwin.so.p/meson-generated__openmp_helpers.c.o
  [9/140] Compiling C++ object sklearn/utils/_fast_dict.cpython-313-darwin.so.p/meson-generated__fast_dict.cpp.o
  FAILED: sklearn/utils/_fast_dict.cpython-313-darwin.so.p/meson-generated__fast_dict.cpp.o
  /usr/bin/clang++ -Isklearn/utils/_fast_dict.cpython-313-darwin.so.p -Isklearn/utils -I../../sklearn/utils -Isklearn -I/Users/kyra/miniforge3/envs/sklearn-env/include/python3.13 -I/usr/local/opt/libomp/include -fvisibility=hidden -fvisibility-inlines-hidden -fdiagnostics-color=always -DNDEBUG -Wall -Winvalid-pch -std=c++14 -O3 -ftree-vectorize -fPIC -fstack-protector-strong -O2 -pipe -stdlib=libc++ -fvisibility-inlines-hidden -fmessage-length=0 -isystem /Users/kyra/miniforge3/envs/sklearn-dev/include -D_FORTIFY_SOURCE=2 -isystem /Users/kyra/miniforge3/envs/sklearn-dev/include -Xpreprocessor -fopenmp -MD -MQ sklearn/utils/_fast_dict.cpython-313-darwin.so.p/meson-generated__fast_dict.cpp.o -MF sklearn/utils/_fast_dict.cpython-313-darwin.so.p/meson-generated__fast_dict.cpp.o.d -o sklearn/utils/_fast_dict.cpython-313-darwin.so.p/meson-generated__fast_dict.cpp.o -c sklearn/utils/_fast_dict.cpython-313-darwin.so.p/_fast_dict.cpp
  sklearn/utils/_fast_dict.cpython-313-darwin.so.p/_fast_dict.cpp:1144:10: fatal error: 'omp.h' file not found
   1144 | #include <omp.h>
        |          ^~~~~~~
  1 error generated.
  [10/140] Generating 'sklearn/metrics/_dist_metrics.cpython-313-darwin.so.p/_dist_metrics.c'
  FAILED: sklearn/metrics/_dist_metrics.cpython-313-darwin.so.p/_dist_metrics.c
  /Users/kyra/miniforge3/envs/sklearn-env/bin/cython '-X language_level=3' '-X boundscheck=False' '-X wraparound=False' '-X initializedcheck=False' '-X nonecheck=False' '-X cdivision=True' '-X profile=False' --include-dir /Users/kyra/Documents/scikit-learn/build/cp313 sklearn/metrics/_dist_metrics.pyx --output-file sklearn/metrics/_dist_metrics.cpython-313-darwin.so.p/_dist_metrics.c
  
  Error compiling Cython file:
  ------------------------------------------------------------
  ...
  
  ######################################################################
  # metric mappings
  #  These map from metric id strings to class names
  METRIC_MAPPING64 = {
      'precomputed': PrecomputedDistanceMatrix64,
                     ^
  ------------------------------------------------------------
  
  sklearn/metrics/_dist_metrics.pyx:222:19: undeclared name not builtin: PrecomputedDistanceMatrix64
  
  Error compiling Cython file:
  ------------------------------------------------------------
  ...
  
  ######################################################################
  # metric mappings
  #  These map from metric id strings to class names
  METRIC_MAPPING32 = {
      'precomputed': PrecomputedDistanceMatrix32,
                     ^
  ------------------------------------------------------------
  
  sklearn/metrics/_dist_metrics.pyx:2807:19: undeclared name not builtin: PrecomputedDistanceMatrix32
  warning: sklearn/metrics/_dist_metrics.pyx:860:44: Assigning to 'float64_t *' from 'const float64_t *' discards const qualifier
  warning: sklearn/metrics/_dist_metrics.pyx:923:40: Assigning to 'float64_t *' from 'const float64_t *' discards const qualifier
  warning: sklearn/metrics/_dist_metrics.pyx:3445:44: Assigning to 'float32_t *' from 'const float32_t *' discards const qualifier
  warning: sklearn/metrics/_dist_metrics.pyx:3508:40: Assigning to 'float32_t *' from 'const float32_t *' discards const qualifier
  ninja: build stopped: subcommand failed.
  [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

@jjerphan
Copy link
Member

My two cents briefly is that you likely need to install and use c-compiler and cxx-compiler from conda-forge because Apple Clang does not support OpenMP.

@adam2392
Copy link
Member

If using Mac, I find this page helpful: https://scikit-learn.org/stable/developers/advanced_installation.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants