Thanks to visit codestin.com
Credit goes to github.com

Skip to content

ENH Adds n_features_in_ to ensemble module #19326

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Feb 22, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
14778ba
ENH Adds n_features_in_ checking in bagging
lorentzenchr Feb 1, 2021
186e7a2
ENH Adds n_features_in_ checking in weighted boosting
lorentzenchr Feb 1, 2021
cb12f64
ENH Adds n_features_in_ checking in HGBT
lorentzenchr Feb 3, 2021
5366110
ENH Adds n_features_in_ checking in GradientBoosting
lorentzenchr Feb 3, 2021
f11e457
ENH Adds n_features_in_ checking in forests
lorentzenchr Feb 3, 2021
65edf9f
ENH Adds n_features_in_ checking in IsolationForest
lorentzenchr Feb 3, 2021
4f40b0f
DEP n_features_ in forests
lorentzenchr Feb 4, 2021
679443c
DEP n_features_ in bagging
lorentzenchr Feb 4, 2021
a04fbad
DEP n_features_ in IsolationForest
lorentzenchr Feb 4, 2021
9568948
DEP n_features_ in GradientBoosting
lorentzenchr Feb 4, 2021
d89cb0b
TST add test for deprecated attribute n_features_
lorentzenchr Feb 4, 2021
4e1bac9
TST remove ensemble from N_FEATURES_IN_AFTER_FIT_MODULES_TO_IGNORE
lorentzenchr Feb 4, 2021
905d85c
TST add ExtraTreesClassifier etc to deprecation test of n_features_
lorentzenchr Feb 4, 2021
803461d
CLN remove ensemble from N_FEATURES_IN_AFTER_FIT_MODULES_TO_IGNORE
lorentzenchr Feb 4, 2021
f30d144
CLN mark as code in docstrings
lorentzenchr Feb 5, 2021
43be0d8
Merge branch 'main' into n_features_ensemble
lorentzenchr Feb 5, 2021
44e21c1
Add comment from code review
lorentzenchr Feb 6, 2021
bc43b6a
Merge remote-tracking branch 'origin/main' into pr/lorentzenchr/19326
glemaitre Feb 11, 2021
b1eb01c
CLN version 1.0. to 1.0
lorentzenchr Feb 11, 2021
bed7034
Merge remote-tracking branch 'origin/main' into pr/lorentzenchr/19326
glemaitre Feb 11, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 31 additions & 31 deletions sklearn/ensemble/_bagging.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
from ..base import ClassifierMixin, RegressorMixin
from ..metrics import r2_score, accuracy_score
from ..tree import DecisionTreeClassifier, DecisionTreeRegressor
from ..utils import check_random_state, check_array, column_or_1d
from ..utils import check_random_state, column_or_1d, deprecated
from ..utils import indices_to_mask
from ..utils.metaestimators import if_delegate_has_method
from ..utils.multiclass import check_classification_targets
Expand Down Expand Up @@ -287,7 +287,7 @@ def _fit(self, X, y, max_samples=None, max_depth=None, sample_weight=None):
sample_weight = _check_sample_weight(sample_weight, X, dtype=None)

# Remap output
n_samples, self.n_features_ = X.shape
n_samples = X.shape[0]
self._n_samples = n_samples
y = self._validate_y(y)

Expand All @@ -313,11 +313,11 @@ def _fit(self, X, y, max_samples=None, max_depth=None, sample_weight=None):
if isinstance(self.max_features, numbers.Integral):
max_features = self.max_features
elif isinstance(self.max_features, float):
max_features = self.max_features * self.n_features_
max_features = self.max_features * self.n_features_in_
else:
raise ValueError("max_features must be int or float")

if not (0 < max_features <= self.n_features_):
if not (0 < max_features <= self.n_features_in_):
raise ValueError("max_features must be in (0, n_features]")

max_features = max(1, int(max_features))
Expand Down Expand Up @@ -408,7 +408,7 @@ def _get_estimators_indices(self):
# to those in `_parallel_build_estimators()`
feature_indices, sample_indices = _generate_bagging_indices(
seed, self.bootstrap_features, self.bootstrap,
self.n_features_, self._n_samples, self._max_features,
self.n_features_in_, self._n_samples, self._max_features,
self._max_samples)

yield feature_indices, sample_indices
Expand All @@ -429,6 +429,16 @@ def estimators_samples_(self):
return [sample_indices
for _, sample_indices in self._get_estimators_indices()]

# TODO: Remove in 1.2
# mypy error: Decorated property not supported
@deprecated( # type: ignore
"Attribute n_features_ was deprecated in version 1.0 and will be "
"removed in 1.2. Use 'n_features_in_' instead."
)
@property
def n_features_(self):
return self.n_features_in_


class BaggingClassifier(ClassifierMixin, BaseBagging):
"""A Bagging classifier.
Expand Down Expand Up @@ -523,6 +533,10 @@ class BaggingClassifier(ClassifierMixin, BaseBagging):
n_features_ : int
The number of features when :meth:`fit` is performed.

.. deprecated:: 1.0
Attribute `n_features_` was deprecated in version 1.0 and will be
removed in 1.2. Use `n_features_in_` instead.

estimators_ : list of estimators
The collection of fitted base estimators.

Expand Down Expand Up @@ -702,17 +716,11 @@ def predict_proba(self, X):
"""
check_is_fitted(self)
# Check data
X = check_array(
X = self._validate_data(
X, accept_sparse=['csr', 'csc'], dtype=None,
force_all_finite=False
force_all_finite=False, reset=False
)

if self.n_features_ != X.shape[1]:
raise ValueError("Number of features of the model must "
"match the input. Model n_features is {0} and "
"input n_features is {1}."
"".format(self.n_features_, X.shape[1]))

# Parallel loop
n_jobs, n_estimators, starts = _partition_estimators(self.n_estimators,
self.n_jobs)
Expand Down Expand Up @@ -753,17 +761,11 @@ def predict_log_proba(self, X):
check_is_fitted(self)
if hasattr(self.base_estimator_, "predict_log_proba"):
# Check data
X = check_array(
X = self._validate_data(
X, accept_sparse=['csr', 'csc'], dtype=None,
force_all_finite=False
force_all_finite=False, reset=False
)

if self.n_features_ != X.shape[1]:
raise ValueError("Number of features of the model must "
"match the input. Model n_features is {0} "
"and input n_features is {1} "
"".format(self.n_features_, X.shape[1]))

# Parallel loop
n_jobs, n_estimators, starts = _partition_estimators(
self.n_estimators, self.n_jobs)
Expand Down Expand Up @@ -811,17 +813,11 @@ def decision_function(self, X):
check_is_fitted(self)

# Check data
X = check_array(
X = self._validate_data(
X, accept_sparse=['csr', 'csc'], dtype=None,
force_all_finite=False
force_all_finite=False, reset=False
)

if self.n_features_ != X.shape[1]:
raise ValueError("Number of features of the model must "
"match the input. Model n_features is {0} and "
"input n_features is {1} "
"".format(self.n_features_, X.shape[1]))

# Parallel loop
n_jobs, n_estimators, starts = _partition_estimators(self.n_estimators,
self.n_jobs)
Expand Down Expand Up @@ -929,6 +925,10 @@ class BaggingRegressor(RegressorMixin, BaseBagging):
n_features_ : int
The number of features when :meth:`fit` is performed.

.. deprecated:: 1.0
Attribute `n_features_` was deprecated in version 1.0 and will be
removed in 1.2. Use `n_features_in_` instead.

estimators_ : list of estimators
The collection of fitted sub-estimators.

Expand Down Expand Up @@ -1024,9 +1024,9 @@ def predict(self, X):
"""
check_is_fitted(self)
# Check data
X = check_array(
X = self._validate_data(
X, accept_sparse=['csr', 'csc'], dtype=None,
force_all_finite=False
force_all_finite=False, reset=False
)

# Parallel loop
Expand Down
42 changes: 35 additions & 7 deletions sklearn/ensemble/_forest.py
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ class calls the ``fit`` method of each sub-estimator on random samples
from ..tree import (DecisionTreeClassifier, DecisionTreeRegressor,
ExtraTreeClassifier, ExtraTreeRegressor)
from ..tree._tree import DTYPE, DOUBLE
from ..utils import check_random_state, check_array, compute_sample_weight
from ..utils import check_random_state, compute_sample_weight, deprecated
from ..exceptions import DataConversionWarning
from ._base import BaseEnsemble, _partition_estimators
from ..utils.fixes import delayed
Expand Down Expand Up @@ -312,9 +312,6 @@ def fit(self, X, y, sample_weight=None):
# ensemble sorts the indices.
X.sort_indices()

# Remap output
self.n_features_ = X.shape[1]

y = np.atleast_1d(y)
if y.ndim == 2 and y.shape[1] == 1:
warn("A column-vector y was passed when a 1d array was"
Expand Down Expand Up @@ -446,7 +443,8 @@ def _compute_oob_predictions(self, X, y):
(n_samples, 1, n_outputs)
The OOB predictions.
"""
X = check_array(X, dtype=DTYPE, accept_sparse='csr')
X = self._validate_data(X, dtype=DTYPE, accept_sparse='csr',
reset=False)

n_samples = y.shape[0]
n_outputs = self.n_outputs_
Expand Down Expand Up @@ -530,12 +528,22 @@ def feature_importances_(self):
for tree in self.estimators_ if tree.tree_.node_count > 1)

if not all_importances:
return np.zeros(self.n_features_, dtype=np.float64)
return np.zeros(self.n_features_in_, dtype=np.float64)

all_importances = np.mean(all_importances,
axis=0, dtype=np.float64)
return all_importances / np.sum(all_importances)

# TODO: Remove in 1.2
# mypy error: Decorated property not supported
@deprecated( # type: ignore
"Attribute n_features_ was deprecated in version 1.0 and will be "
"removed in 1.2. Use 'n_features_in_' instead."
)
@property
def n_features_(self):
return self.n_features_in_


def _accumulate_prediction(predict, X, out, lock):
"""
Expand Down Expand Up @@ -1163,6 +1171,10 @@ class labels (multi-output problem).
n_features_ : int
The number of features when ``fit`` is performed.

.. deprecated:: 1.0
Attribute `n_features_` was deprecated in version 1.0 and will be
removed in 1.2. Use `n_features_in_` instead.

n_outputs_ : int
The number of outputs when ``fit`` is performed.

Expand Down Expand Up @@ -1463,6 +1475,10 @@ class RandomForestRegressor(ForestRegressor):
n_features_ : int
The number of features when ``fit`` is performed.

.. deprecated:: 1.0
Attribute `n_features_` was deprecated in version 1.0 and will be
removed in 1.2. Use `n_features_in_` instead.

n_outputs_ : int
The number of outputs when ``fit`` is performed.

Expand Down Expand Up @@ -1783,6 +1799,10 @@ class labels (multi-output problem).
n_features_ : int
The number of features when ``fit`` is performed.

.. deprecated:: 1.0
Attribute `n_features_` was deprecated in version 1.0 and will be
removed in 1.2. Use `n_features_in_` instead.

n_outputs_ : int
The number of outputs when ``fit`` is performed.

Expand Down Expand Up @@ -2068,6 +2088,10 @@ class ExtraTreesRegressor(ForestRegressor):
n_features_ : int
The number of features.

.. deprecated:: 1.0
Attribute `n_features_` was deprecated in version 1.0 and will be
removed in 1.2. Use `n_features_in_` instead.

n_outputs_ : int
The number of outputs.

Expand Down Expand Up @@ -2292,6 +2316,10 @@ class RandomTreesEmbedding(BaseForest):
n_features_ : int
The number of features when ``fit`` is performed.

.. deprecated:: 1.0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1.0. -> 1.0

Attribute `n_features_` was deprecated in version 1.0 and will be
removed in 1.2. Use `n_features_in_` instead.

n_outputs_ : int
The number of outputs when ``fit`` is performed.

Expand Down Expand Up @@ -2421,7 +2449,7 @@ def fit_transform(self, X, y=None, sample_weight=None):
X_transformed : sparse matrix of shape (n_samples, n_out)
Transformed dataset.
"""
X = check_array(X, accept_sparse=['csc'])
X = self._validate_data(X, accept_sparse=['csc'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhm what is the reason that the common test where not failing for this transformer since we did not introduce _validate_data before

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_check_n_features_in_after_fitting is applied to all estimators except those from modules listed in N_FEATURES_IN_AFTER_FIT_MODULES_TO_IGNORE. Every module where we add n_features_in_ has to be removed from that list. This is done in this PR form ensemble.

Or do you think of another test?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about check_n_features_in(name, estimator_orig) but I can check on the side.
Sorry we merged a new PR that added some conflicts in test_bagging again.

if issparse(X):
# Pre-sort indices to avoid that each individual tree of the
# ensemble sorts the indices.
Expand Down
Loading