Thanks to visit codestin.com
Credit goes to github.com

Skip to content

RFC should the scikit-learn metrics return a Python scalar or a NumPy scalar? #27339

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
glemaitre opened this issue Sep 11, 2023 · 12 comments · Fixed by #30575
Closed

RFC should the scikit-learn metrics return a Python scalar or a NumPy scalar? #27339

glemaitre opened this issue Sep 11, 2023 · 12 comments · Fixed by #30575

Comments

@glemaitre
Copy link
Member

While working on the representation imposed by NEP51, I found out that we recently made the accuracy_score to return a Python scalar while, up-to-now, other metric are returning NumPy scalar.

This change was made due to the array API work:

def _weighted_sum(sample_score, sample_weight, normalize=False, xp=None):
# XXX: this function accepts Array API input but returns a Python scalar
# float. The call to float() is convenient because it removes the need to
# move back results from device to host memory (e.g. calling `.cpu()` on a
# torch tensor). However, this might interact in unexpected ways (break?)
# with lazy Array API implementations. See:
# https://github.com/data-apis/array-api/issues/642

I assume that we are getting to an intersection where we should make the output of our metrics consistent but also foresee potential requirements: as the comment indicate, calling float() will be a sync point but it might not be the best strategy for lazy computation.

This RFC is a placeholder to discuss what strategy we should be implementing.

@github-actions github-actions bot added the Needs Triage Issue requires triage label Sep 11, 2023
@glemaitre glemaitre added RFC and removed Needs Triage Issue requires triage labels Sep 11, 2023
@ogrisel
Copy link
Member

ogrisel commented Sep 11, 2023

As hinted by @glemaitre, the accepted NEP51 proposes to change the representation __repr__ of numpy scalar arrays:

https://numpy.org/neps/nep-0051-scalar-representation.html

It is being implemented for numpy 2.0 scheduled for next year and is causing many of our doctests to fail.

To address NEP51, we could go either way.

[solution-a] Do nothing to our scoring metrics (continue returning scalar numpy arrays)

  • accept that this is a more informative __repr__ and still decide to use numpy scalar arrays in the output of our metric functions in scikit-learn. The users would get more verbose output when using scikit-learn in interactive model (e.g. in a jupyter notebook sesssion) but at least it would be more explicit about the precision of floating point computation for instance,
  • we could either update our docstest so that they would pass on numpy 2.0.0.dev0+ and stop running the doctests on or numpy < 2.0 CIs or alternatively keep the doctests unchanged for now but stop running the doctests on the [scipy-dev] build and reexplore this decision once numpy 2.0 is out (or at least an official beta).
  • to add array API support, we might need to change our test to call float() manually on the result of accuracy_score and the like explicitly when needed (e.g. to trigger a lazy computation, move the results to CPU and be able to compare to another Python scalar value).

[solution-b] Change our scoring metrics to always return a Python scalar

So we could decide to make our scalar metric function always call float() (and maybe int() in some cases) internally.

This means that the precision level of the computation would be lost.

This also means that eager/blocking execution semantics are forced when calling a metric function with lazy Array API compatible inputs. It would also always move back the resulting scalar value to the CPU without the user having to do a library specific operation.

But this will make our doctests suite pass both for numpy 1.x and numpy 2.x+ unchanged.

This is always less verbose for scikit-learn users calling accuracy_score and similar in their notebooks.

[solution-c] Add a new flag return_as_python_scalar=True to all metrics

This way we do the conversion to Python scalar by default but we give control to the user, if they need to access the dtype of the result or keep the computation as lazy as possible when using the Array API.

More context about the possibility of lazy computation via Array API in scikit-learn

Note that the fit method of estimators that use an iterative solver with a tol-based stopping criterion will by construction need to trigger eager computation internally by design anyway, so we cannot really hope to express 100% lazy machine learning pipelines. More precisely:

est = IterativeEstimator(tol=1e-4).fit(X_lazy_array_train, y_lazy_array_train)
test_score = est.score(X_lazy_array_test, y_lazy_array_test)

even if test_score is kept as a lazy scalar array until calling float(test_score), a significant computation would have already been triggered (implicit eager evaluation) when calling fit due to the tol-based checks at the end of each iteration of the internal solver.

Also note:

  • I opened an issue to discuss lazy Array API in scikit-learn in Array API support with lazy evaluation #26724
  • Right now, dask does not really implement the Array API but it could quite quickly via array-api-compat: Dask support data-apis/array-api-compat#17
  • JAX's own Array API support might be enough for some scikit-learn estimators (e.g. PCA) but I haven't tried and we have not included JAX in our array-api compliance test framework yet.
  • PyTorch uses eager computation semantics by default, unless it is wrapped in a function decorated by torch.compile. By default the compiler attempts to discover blocks of code that can be executed natively (with lazy evaluation semantics) and automatically issue synchronized blocking calls to the Python interpreter whenever its needed.

@ogrisel ogrisel changed the title RFC does the scikit-learn metric should return a Python scalar or a NumPy scalar RFC should the scikit-learn metrics return a Python scalar or a NumPy scalar Sep 11, 2023
@ogrisel ogrisel changed the title RFC should the scikit-learn metrics return a Python scalar or a NumPy scalar RFC should the scikit-learn metrics return a Python scalar or a NumPy scalar? Sep 11, 2023
@betatim
Copy link
Member

betatim commented Sep 12, 2023

@ogrisel it seems some of your comment is misformatted or missing words? Could you take a look?

@ogrisel
Copy link
Member

ogrisel commented Sep 13, 2023

@betatim I edited my comment.

@betatim
Copy link
Member

betatim commented Sep 18, 2023

Thanks a lot!

I think my least favourite option is (c). It feels like we are delegating the complexity to our users, who probably have even less clue about "the right thing to do" than we have.

I like (b) and think that it isn't a big deal that something like Estimator.score can't be lazy. Trying to make it lazy seems to add a lot of complexity (you need a LazyScalar and explain to users) and I can't think of a use-case where making the computation lazy would allow you to make the computation more efficient. Compared to for example reading data from a file, where you can make the reading more efficient by knowing that certain columns are never used.

I also wonder if we need to specify that the type of the return value of Estimator.score is a Python scalar and not rely on duck typing (so it can be a Python scalar or a Numpy scalar).

@adrinjalali
Copy link
Member

I also like (b) for the same reasons as @betatim mentions

@betatim
Copy link
Member

betatim commented Oct 13, 2023

There seems to be some amount of consensus and no new comments for a while. Should we try to wrap this up?

@glemaitre what do you think of the options Oliver listed? Do you also like option (b)?

@glemaitre
Copy link
Member Author

Thanks @betatim to keep track of this issue. Option (b) looks the right trade-off right now.
So we can therefore settle on converting output to Python scalar.

@ckosten
Copy link

ckosten commented Nov 7, 2023

/take

@adrinjalali
Copy link
Member

@ckosten this is not a good issue to begin with. You can look for good first issues and help wanted tags to find one.

@ckosten
Copy link

ckosten commented Dec 15, 2023

It was assigned as a beginner problem in a scikit learn workshop recently...

@ckosten this is not a good issue to begin with. You can look for good first issues and help wanted tags to find one.

@ogrisel
Copy link
Member

ogrisel commented Jun 7, 2024

Shall we close this issue and open matching meta-issue to track the remaining work to do?

It might be redundant with the array API meta issue at #26024 which is already well under way and since most scalar-returning metric function will likely such a treatment to have their tests pass on PyTorch with CUDA.

Maybe we can leave it open for now and close it once all the metric function referenced in #26024 have been addressed.

@adrinjalali
Copy link
Member

will likely such a treatment to have their tests pass on PyTorch with CUDA.

To be clear, you mean for them to return a scalar? Then yeah I'm happy to have this closed.

@github-project-automation github-project-automation bot moved this from Discussion to Done in Array API Feb 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

5 participants