Thanks to visit codestin.com
Credit goes to github.com

Skip to content

ENH Deprecates _pairwise attribute and adds pairwise to estimator tags #18143

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
Oct 7, 2020

Conversation

thomasjpfan
Copy link
Member

@thomasjpfan thomasjpfan commented Aug 12, 2020

Reference Issues/PRs

Related to #17806

What does this implement/fix? Explain your changes.

Currently, this PR deprecates the _pairwise attribute and places it into estimator tags.

@amueller
Copy link
Member

Thank you, I think this is an important cleanup.
Even though this property is private, I think we need to make this backward compatible. Otherwise downstream packages will have silent bugs. We prefaced the _pairwise property with an underscore, but it's actually essential for third party packages to provide accurate cross-validation results.

@jnothman
Copy link
Member

I agree this should be deprecated.

@thomasjpfan thomasjpfan changed the title ENH Adds pairwise tag to estimator tags ENH Deprecates _pairwise attribute and adds pairwise to estimator tags Aug 14, 2020
@thomasjpfan
Copy link
Member Author

Updated PR to deprecate the _pairwise attribute.

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

sklearn/base.py Outdated
@@ -37,6 +37,7 @@
'binary_only': False,
'requires_fit': True,
'requires_y': False,
'pairwise': False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'pairwise': False
'pairwise': False,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To have on diff less for the next tag :)

@property
def _pairwise(self):
return self.kernel == "precomputed"
return self.kernel == 'precomputed'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double quotes were fine :)

similar methods consists of pairwise measures over samples rather than a
feature representation for each sample. It is usually `True` where an
estimator has a `metric` or `affinity` or `kernel` parameter with value
'precomputed'. Its primary purpose is that when a :term:`meta-estimator`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when using cross-validation or when a meta-estimator?

I don't think GridSearchCV is the first meta-estimator people think of and I think cross_val_score etc are an important use-case.

@amueller
Copy link
Member

I don't think the deprecation is correct. What we want for a third party is that if they implement _pairwise but not the pairwise tag, then we should behave as we did before but warn. Right now, we behave as we did before but don't warn, right? So that doesn't really help.
Also, we currently ignore the pairwise tag?
I'm a bit confused, or maybe I misread what the PR is doing?

@jnothman
Copy link
Member

It doesn't hurt to deprecate the attribute too, but I agree that deprecating the behaviour is the main thing

@thomasjpfan
Copy link
Member Author

Also, we currently ignore the pairwise tag?

I updated the estimator_checks to actually use the tag now. This means some of the estimator checks will fail if the tag is not set correctly.

What we want for a third party is that if they implement _pairwise but not the pairwise tag, then we should behave as we did before but warn. Right now, we behave as we did before but don't warn, right?

If a third party estimator inherits from BaseEstimator, they will have a pairwise=False tag by default. We could warn in _safe_split but that means we would be warning on our own estimators as well who have the tags properly defined. I do not think this is a good user experience. I would prefer to remove the _pairwise attribute from our own estimators so we can have _safe_split do this:

if hasattr(estimator, '_pairwise'):
    pairwise = estimator._pairwise:
else:
    pairwise = estimator._get_tags().get("pairwise", False)

which would be backward compatible with third party estimators. The downside of this approach is that third party estimators that uses _pairwise would not work anymore. What do you think @amueller ?

@glemaitre
Copy link
Member

I updated the estimator_checks to actually use the tag now. This means some of the estimator checks will fail if the tag is not set correctly.

Isn't it something that we want to avoid as part of the deprecation? I mean that the code of third-party will fail if they use check_estimator.

@glemaitre
Copy link
Member

Uhm actually, we have the default flag so the issue would only be if one does not inherit from BaseEstimator.
Thinking about, shall we detect the case where the tag is not equal to _pairwise if present and warn in this case. It would be an estimator for which the tag has not been set.

@thomasjpfan
Copy link
Member Author

Thinking about, shall we detect the case where the tag is not equal to _pairwise if present and warn in this case. It would be an estimator for which the tag has not been set.

Should we make a warning every time a call to _safe_split is made?

I think the issue is that _safe_split has to continue using _pairwise for backward compatibility and the estimators need to deprecated _pairwise for backward compatibility. This means that scikit-learn would warn of deprecation errors when using the library by itself.

@amueller
Copy link
Member

amueller commented Aug 21, 2020

I had discussed this with @thomasjpfan two days ago, not sure if that was before or after he commented.

My suggestion is to check in _safe_split on both the tag and the _pairwise attribute and

  • If both _pairwise and the tag are present and consistent, use that value, don't warn (the right thing will keep working indefinitely).
  • If only _pairwise is present and pairwise is not False, do a deprecation warning and use the value of _pairwise.
  • If both are present and they are inconsistent, use the value of _pairwise but do a deprecation warning. This ensures consistent behavior of estimators that inherit the default value of the tag but implement _pairwise correctly.

The last one is maybe a bit surprising but I think should be fine? We need to catch any deprecation warnings raised by accessing _pairwise but then throw one if appropriate.

This requires that all our estimators implement the tag, and backwards-compatibility for the estimators (in case someone else implemented cross-validation, maybe a bit far-fetched but whatever) dictates that we still have the _pairwise attribute.

@jnothman
Copy link
Member

jnothman commented Aug 22, 2020 via email

Copy link
Member

@amueller amueller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good apart from nitpick for test.

@@ -226,6 +226,11 @@ the dataset, e.g. when ``X`` is a precomputed kernel matrix. Specifically,
the :term:`_pairwise` property is used by ``utils.metaestimators._safe_split``
to slice rows and columns.

.. deprecated:: 0.24

The _pairwise attribute is deprecated in 0.24. From 0.26 and onward,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The _pairwise attribute is deprecated in 0.24. From 0.26 and onward,
The _pairwise attribute is deprecated in 0.24. From 0.26 onward,

doc/glossary.rst Outdated
.. deprecated:: 0.24

The _pairwise attribute is deprecated in 0.24. From 0.26
and onward, the `pairwise` estimator tag should be used
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
and onward, the `pairwise` estimator tag should be used
onward, the `pairwise` estimator tag should be used

@property
def _pairwise(self):
return self.kernel == "precomputed"
return self.metric == "precomputed"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was this a bug?!

@@ -189,7 +195,7 @@ def _safe_split(estimator, X, y, indices, train_indices=None):
Indexed targets.

"""
if getattr(estimator, "_pairwise", False):
if _is_pairwise(estimator):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This here is the main usage right? I think it might be nice to have a direct test of this? Or of cross_validate or anything like that? Right now you're only testing the helper (extensively) but you're not testing anything that actually uses the helper, right?

It might be a bit overkill to test all of these places but maybe one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was done in cfeaa09

@amueller
Copy link
Member

do you wanna fix conflicts?

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

otherwise lgtm


# TODO: Remove in 0.26 when the _pairwise attribute is removed
def test_validation_pairwise():
# Correctly warns with pairwise tags
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update this comment

@thomasjpfan
Copy link
Member Author

Synced this PR up with master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants