ENH Deprecates _pairwise attribute and adds pairwise to estimator tags #18143

thomasjpfan · 2020-08-12T00:45:16Z

Reference Issues/PRs

Related to #17806

What does this implement/fix? Explain your changes.

Currently, this PR deprecates the _pairwise attribute and places it into estimator tags.

amueller · 2020-08-12T19:50:53Z

Thank you, I think this is an important cleanup.
Even though this property is private, I think we need to make this backward compatible. Otherwise downstream packages will have silent bugs. We prefaced the _pairwise property with an underscore, but it's actually essential for third party packages to provide accurate cross-validation results.

jnothman · 2020-08-12T20:55:07Z

I agree this should be deprecated.

…r_tags

thomasjpfan · 2020-08-14T18:48:36Z

Updated PR to deprecate the _pairwise attribute.

…r_tags

glemaitre

LGTM

glemaitre · 2020-08-18T12:34:21Z

sklearn/base.py

@@ -37,6 +37,7 @@
    'binary_only': False,
    'requires_fit': True,
    'requires_y': False,
+    'pairwise': False


Suggested change

'pairwise': False

'pairwise': False,

To have on diff less for the next tag :)

glemaitre · 2020-08-18T12:36:06Z

sklearn/kernel_ridge.py

    @property
    def _pairwise(self):
-        return self.kernel == "precomputed"
+        return self.kernel == 'precomputed'


Double quotes were fine :)

amueller · 2020-08-18T19:17:22Z

doc/developers/develop.rst

+    similar methods consists of pairwise measures over samples rather than a
+    feature representation for each sample.  It is usually `True` where an
+    estimator has a `metric` or `affinity` or `kernel` parameter with value
+    'precomputed'. Its primary purpose is that when a :term:`meta-estimator`


when using cross-validation or when a meta-estimator?

I don't think GridSearchCV is the first meta-estimator people think of and I think cross_val_score etc are an important use-case.

amueller · 2020-08-18T19:21:40Z

I don't think the deprecation is correct. What we want for a third party is that if they implement _pairwise but not the pairwise tag, then we should behave as we did before but warn. Right now, we behave as we did before but don't warn, right? So that doesn't really help.
Also, we currently ignore the pairwise tag?
I'm a bit confused, or maybe I misread what the PR is doing?

jnothman · 2020-08-18T22:07:31Z

It doesn't hurt to deprecate the attribute too, but I agree that deprecating the behaviour is the main thing

…r_tags

thomasjpfan · 2020-08-19T00:04:39Z

Also, we currently ignore the pairwise tag?

I updated the estimator_checks to actually use the tag now. This means some of the estimator checks will fail if the tag is not set correctly.

What we want for a third party is that if they implement _pairwise but not the pairwise tag, then we should behave as we did before but warn. Right now, we behave as we did before but don't warn, right?

If a third party estimator inherits from BaseEstimator, they will have a pairwise=False tag by default. We could warn in _safe_split but that means we would be warning on our own estimators as well who have the tags properly defined. I do not think this is a good user experience. I would prefer to remove the _pairwise attribute from our own estimators so we can have _safe_split do this:

if hasattr(estimator, '_pairwise'):
    pairwise = estimator._pairwise:
else:
    pairwise = estimator._get_tags().get("pairwise", False)

which would be backward compatible with third party estimators. The downside of this approach is that third party estimators that uses _pairwise would not work anymore. What do you think @amueller ?

glemaitre · 2020-08-19T07:02:19Z

I updated the estimator_checks to actually use the tag now. This means some of the estimator checks will fail if the tag is not set correctly.

Isn't it something that we want to avoid as part of the deprecation? I mean that the code of third-party will fail if they use check_estimator.

glemaitre · 2020-08-19T07:09:31Z

Uhm actually, we have the default flag so the issue would only be if one does not inherit from BaseEstimator.
Thinking about, shall we detect the case where the tag is not equal to _pairwise if present and warn in this case. It would be an estimator for which the tag has not been set.

thomasjpfan · 2020-08-19T13:48:23Z

Thinking about, shall we detect the case where the tag is not equal to _pairwise if present and warn in this case. It would be an estimator for which the tag has not been set.

Should we make a warning every time a call to _safe_split is made?

I think the issue is that _safe_split has to continue using _pairwise for backward compatibility and the estimators need to deprecated _pairwise for backward compatibility. This means that scikit-learn would warn of deprecation errors when using the library by itself.

amueller · 2020-08-21T19:31:51Z

I had discussed this with @thomasjpfan two days ago, not sure if that was before or after he commented.

My suggestion is to check in _safe_split on both the tag and the _pairwise attribute and

If both _pairwise and the tag are present and consistent, use that value, don't warn (the right thing will keep working indefinitely).
If only _pairwise is present and pairwise is not False, do a deprecation warning and use the value of _pairwise.
If both are present and they are inconsistent, use the value of _pairwise but do a deprecation warning. This ensures consistent behavior of estimators that inherit the default value of the tag but implement _pairwise correctly.

The last one is maybe a bit surprising but I think should be fine? We need to catch any deprecation warnings raised by accessing _pairwise but then throw one if appropriate.

This requires that all our estimators implement the tag, and backwards-compatibility for the estimators (in case someone else implemented cross-validation, maybe a bit far-fetched but whatever) dictates that we still have the _pairwise attribute.

jnothman · 2020-08-22T23:08:51Z

Sounds good.

amueller

looks good apart from nitpick for test.

amueller · 2020-08-26T19:25:44Z

doc/developers/develop.rst

@@ -226,6 +226,11 @@ the dataset, e.g. when ``X`` is a precomputed kernel matrix. Specifically,
 the :term:`_pairwise` property is used by ``utils.metaestimators._safe_split``
 to slice rows and columns.

+.. deprecated:: 0.24
+
+    The _pairwise attribute is deprecated in 0.24. From 0.26 and onward,


Suggested change

The _pairwise attribute is deprecated in 0.24. From 0.26 and onward,

The _pairwise attribute is deprecated in 0.24. From 0.26 onward,

amueller · 2020-08-26T19:26:04Z

doc/glossary.rst

+                .. deprecated:: 0.24
+
+                    The _pairwise attribute is deprecated in 0.24. From 0.26
+                    and onward, the `pairwise` estimator tag should be used


Suggested change

and onward, the `pairwise` estimator tag should be used

onward, the `pairwise` estimator tag should be used

amueller · 2020-08-26T19:28:18Z

sklearn/manifold/_mds.py

    @property
    def _pairwise(self):
-        return self.kernel == "precomputed"
+        return self.metric == "precomputed"


was this a bug?!

amueller · 2020-08-26T19:32:37Z

sklearn/utils/metaestimators.py

@@ -189,7 +195,7 @@ def _safe_split(estimator, X, y, indices, train_indices=None):
        Indexed targets.

    """
-    if getattr(estimator, "_pairwise", False):
+    if _is_pairwise(estimator):


This here is the main usage right? I think it might be nice to have a direct test of this? Or of cross_validate or anything like that? Right now you're only testing the helper (extensively) but you're not testing anything that actually uses the helper, right?

It might be a bit overkill to test all of these places but maybe one?

This was done in cfeaa09

amueller · 2020-09-23T19:24:05Z

do you wanna fix conflicts?

jnothman

otherwise lgtm

jnothman · 2020-09-24T22:02:00Z

sklearn/model_selection/tests/test_validation.py

+
+# TODO: Remove in 0.26 when the _pairwise attribute is removed
+def test_validation_pairwise():
+    # Correctly warns with pairwise tags


Update this comment

…r_tags

thomasjpfan · 2020-10-05T17:00:18Z

Synced this PR up with master.

scikit-learn#18143)

See scikit-learn/scikit-learn#18143

ENH Adds pairwise tag to estimator tags

f1b26fb

DOC Fix reference

de87e6e

thomasjpfan added 5 commits August 12, 2020 18:52

Merge remote-tracking branch 'upstream/master' into pairwise_estimato…

1b2a12c

…r_tags

Merge remote-tracking branch 'upstream/master' into pairwise_estimato…

ce602c1

…r_tags

MRG

3ce1b71

WIP Adds another test

f8e74ed

CLN Deprecates _pairwise

a2132c0

thomasjpfan changed the title ~~ENH Adds pairwise tag to estimator tags~~ ENH Deprecates _pairwise attribute and adds pairwise to estimator tags Aug 14, 2020

Merge remote-tracking branch 'upstream/master' into pairwise_estimato…

2064cc9

…r_tags

glemaitre approved these changes Aug 18, 2020

View reviewed changes

amueller reviewed Aug 18, 2020

View reviewed changes

thomasjpfan added 3 commits August 18, 2020 19:17

CLN Uses estimator tags

31768db

Merge remote-tracking branch 'upstream/master' into pairwise_estimato…

5228b23

…r_tags

BUG Fixes

93733f8

CLN Uses pairwise estimator tags

2a3bafb

thomasjpfan added 4 commits August 25, 2020 16:37

MNT Carefully deprecates pairwise

7b43694

DOC Improve grammer

9c932eb

STY Minor styling

9f3f8c8

BUG Fix for estimators with

f88eb56

amueller reviewed Aug 26, 2020

View reviewed changes

BUG Fix

9db5c2a

thomasjpfan mentioned this pull request Aug 27, 2020

FIX Correctly sets MDS pairwise attribute #18278

Merged

CLN Adds tests for cross_validate and pairwise estimators

cfeaa09

thomasjpfan added 2 commits September 24, 2020 12:41

MNT Only assign if the issue is open

dd578f6

Merge branch 'unassigned_help_wanted' into pairwise_estimator_tags

def1ef1

jnothman reviewed Sep 24, 2020

View reviewed changes

jnothman approved these changes Sep 24, 2020

View reviewed changes

jnothman mentioned this pull request Sep 24, 2020

MNT Deprecates _estimator_type and replaces by a estimator tag #17806

Closed

thomasjpfan added 4 commits September 29, 2020 14:22

Merge remote-tracking branch 'upstream/master' into pairwise_estimato…

caa2ee2

…r_tags

DOC Update comment in test

ba7f129

Merge remote-tracking branch 'upstream/master' into pairwise_estimato…

80552c8

…r_tags

TST Fixes test

723cda1

jnothman merged commit 02fa8f1 into scikit-learn:master Oct 7, 2020

amrcode pushed a commit to amrcode/scikit-learn that referenced this pull request Oct 19, 2020

ENH Deprecates _pairwise attribute and adds pairwise to estimator tags (

62bc35e

scikit-learn#18143)

jayzed82 pushed a commit to jayzed82/scikit-learn that referenced this pull request Oct 22, 2020

ENH Deprecates _pairwise attribute and adds pairwise to estimator tags (

71b52d2

scikit-learn#18143)

thomasjpfan mentioned this pull request Nov 9, 2020

Do we have a compelling reason to enforce tags? #18798

Open

jrbourbeau mentioned this pull request Nov 11, 2020

Update pairwise check for scikit-learn 0.24 dask/dask-ml#755

Merged

sebp added a commit to sebp/scikit-survival that referenced this pull request Mar 7, 2021

Remove deprecated _pairwise property

58cf21d

See scikit-learn/scikit-learn#18143

This was referenced Mar 12, 2022

[MRG] Fix DBSCAN is missing _pairwise property #11453

Closed

DBSCAN is missing _pairwise property #11432

Closed

	The _pairwise attribute is deprecated in 0.24. From 0.26 and onward,
	The _pairwise attribute is deprecated in 0.24. From 0.26 onward,

	and onward, the `pairwise` estimator tag should be used
	onward, the `pairwise` estimator tag should be used

Uh oh!

ENH Deprecates _pairwise attribute and adds pairwise to estimator tags #18143

ENH Deprecates _pairwise attribute and adds pairwise to estimator tags #18143

Uh oh!

Conversation

thomasjpfan commented Aug 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Uh oh!

amueller commented Aug 12, 2020

Uh oh!

jnothman commented Aug 12, 2020

Uh oh!

thomasjpfan commented Aug 14, 2020

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amueller commented Aug 18, 2020

Uh oh!

jnothman commented Aug 18, 2020

Uh oh!

thomasjpfan commented Aug 19, 2020

Uh oh!

glemaitre commented Aug 19, 2020

Uh oh!

glemaitre commented Aug 19, 2020

Uh oh!

thomasjpfan commented Aug 19, 2020

Uh oh!

amueller commented Aug 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman commented Aug 22, 2020 via email

Uh oh!

amueller left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amueller commented Sep 23, 2020

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

thomasjpfan commented Oct 5, 2020

Uh oh!

Uh oh!

thomasjpfan commented Aug 12, 2020 •

edited

Loading

amueller commented Aug 21, 2020 •

edited

Loading