Thanks to visit codestin.com
Credit goes to github.com

Skip to content

MNT Expose allow_nan tag in bagging #25506

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Feb 5, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions doc/whats_new/v1.3.rst
Original file line number Diff line number Diff line change
Expand Up @@ -136,6 +136,10 @@ Changelog
scikit-learn 1.3: retraining with scikit-learn 1.3 is required.
:pr:`25186` by :user:`Felipe Breve Siola <fsiola>`.

- |Enhancement| :class:`ensemble.BaggingClassifier` and
:class:`ensemble.BaggingRegressor` expose the `allow_nan` tag from the
underlying estimator. :pr:`25506` by `Thomas Fan`_.

:mod:`sklearn.exception`
........................
- |Feature| Added :class:`exception.InconsistentVersionWarning` which is raised
Expand Down
16 changes: 16 additions & 0 deletions sklearn/ensemble/_bagging.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
from ..utils.random import sample_without_replacement
from ..utils._param_validation import Interval, HasMethods, StrOptions
from ..utils.validation import has_fit_parameter, check_is_fitted, _check_sample_weight
from ..utils._tags import _safe_tags
from ..utils.parallel import delayed, Parallel


Expand Down Expand Up @@ -981,6 +982,14 @@ def decision_function(self, X):

return decisions

def _more_tags(self):
if self.estimator is None:
estimator = DecisionTreeClassifier()
else:
estimator = self.estimator
Comment on lines +986 to +989
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if we should have something similar to the available_if where we delegate first to estimator_ and then estimator if the estimator is not fitted.

I am thinking of the following case:

  • fit a bagging estimator supporting nan
  • set estimator to an estimator that does not support nan

At this stage, your model can still handle nan if predict is called.
I am not sure what is the expected behaviour of the tag there.

Copy link
Member Author

@thomasjpfan thomasjpfan Jan 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming estimator_ does support nan and estimator does not support nan, then bagging.fit would not support nan. In this sense, Bagging* should get the allow_nan tag from estimator. Getting the tag from estimator_ would give the incorrect tag.


return {"allow_nan": _safe_tags(estimator, "allow_nan")}


class BaggingRegressor(RegressorMixin, BaseBagging):
"""A Bagging regressor.
Expand Down Expand Up @@ -1261,3 +1270,10 @@ def _set_oob_score(self, X, y):

self.oob_prediction_ = predictions
self.oob_score_ = r2_score(y, predictions)

def _more_tags(self):
if self.estimator is None:
estimator = DecisionTreeRegressor()
else:
estimator = self.estimator
return {"allow_nan": _safe_tags(estimator, "allow_nan")}
16 changes: 16 additions & 0 deletions sklearn/ensemble/tests/test_bagging.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@
from sklearn.pipeline import make_pipeline
from sklearn.feature_selection import SelectKBest
from sklearn.model_selection import train_test_split
from sklearn.ensemble import HistGradientBoostingClassifier
from sklearn.ensemble import HistGradientBoostingRegressor
from sklearn.datasets import load_diabetes, load_iris, make_hastie_10_2
from sklearn.utils import check_random_state
from sklearn.preprocessing import FunctionTransformer, scale
Expand Down Expand Up @@ -980,3 +982,17 @@ def test_deprecated_base_estimator_has_decision_function():
with pytest.warns(FutureWarning, match=warn_msg):
y_decision = clf.fit(X, y).decision_function(X)
assert y_decision.shape == (150, 3)


@pytest.mark.parametrize(
"bagging, expected_allow_nan",
[
(BaggingClassifier(HistGradientBoostingClassifier(max_iter=1)), True),
(BaggingRegressor(HistGradientBoostingRegressor(max_iter=1)), True),
(BaggingClassifier(LogisticRegression()), False),
(BaggingRegressor(SVR()), False),
],
)
def test_bagging_allow_nan_tag(bagging, expected_allow_nan):
"""Check that bagging inherits allow_nan tag."""
assert bagging._get_tags()["allow_nan"] == expected_allow_nan