-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
TST check n_features_in_ in pipeline module #20192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -265,7 +265,6 @@ def test_search_cv(estimator, check, request): | |||
'model_selection', | |||
'multiclass', | |||
'multioutput', | |||
'pipeline', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remark: even if removed from the list, the estimators of this module are not tested anyway (skipped in the instance constructions)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is true that we only have a negative test for n_features_in_
via sklearn.tests.test_metaestimator.test_meta_estimators_delegate_data_validation
and _generate_meta_estimator_instances_with_pipeline
that is updated in this pipeline.
It would be great to have a new positive common test for the presence of n_features_in_
on tabular dataset in sklearn.tests.test_metaestimator
.
n_features_in_ still needs to be documented for these classes. It's not detected because it's a property (see #20190) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm !
Co-authored-by: Jérémie du Boisberranger <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM to me the way it is as it's already a net improvement but it's true that we probably do need a new n_features_in_
for metaestimators on tabular data.
@@ -265,7 +265,6 @@ def test_search_cv(estimator, check, request): | |||
'model_selection', | |||
'multiclass', | |||
'multioutput', | |||
'pipeline', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is true that we only have a negative test for n_features_in_
via sklearn.tests.test_metaestimator.test_meta_estimators_delegate_data_validation
and _generate_meta_estimator_instances_with_pipeline
that is updated in this pipeline.
It would be great to have a new positive common test for the presence of n_features_in_
on tabular dataset in sklearn.tests.test_metaestimator
.
Co-authored-by: Jérémie du Boisberranger <[email protected]>
Merged to move forward. |
* TST enable test docstring params for feature extraction module (scikit-learn#20188) * DOC fix a reference in sklearn.ensemble.GradientBoostingRegressor (scikit-learn#20198) * FIX mcc zero divsion (scikit-learn#19977) * TST Add TransformedTargetRegressor to test_meta_estimators_delegate_data_validation (scikit-learn#20175) Co-authored-by: Guillaume Lemaitre <[email protected]> * TST enable n_feature_in_ test for feature_extraction module * FIX Uses points instead of pixels in plot_tree (scikit-learn#20023) * MNT n_features_in through the multiclass module (scikit-learn#20193) * CI Removes python 3.6 builds from wheel building (scikit-learn#20184) * FIX Fix typo in error message in `fetch_openml` (scikit-learn#20201) * FIX Fix error when using Calibrated with Voting (scikit-learn#20087) * FIX Fix RandomForestRegressor doesn't accept max_samples=1.0 (scikit-learn#20159) Co-authored-by: Olivier Grisel <[email protected]> Co-authored-by: Thomas J. Fan <[email protected]> * ENH Adds Poisson criterion in RandomForestRegressor (scikit-learn#19836) Co-authored-by: Christian Lorentzen <[email protected]> Co-authored-by: Alihan Zihna <[email protected]> Co-authored-by: Alihan Zihna <[email protected]> Co-authored-by: Chiara Marmo <[email protected]> Co-authored-by: Olivier Grisel <[email protected]> Co-authored-by: naozin555 <[email protected]> Co-authored-by: Venkatachalam N <[email protected]> Co-authored-by: Thomas J. Fan <[email protected]> * TST Replace assert_warns from decomposition/tests (scikit-learn#20214) * TST check n_features_in_ in pipeline module (scikit-learn#20192) Co-authored-by: Olivier Grisel <[email protected]> Co-authored-by: Jérémie du Boisberranger <[email protected]> Co-authored-by: Olivier Grisel <[email protected]> * Allow `n_knots=None` if knots are explicitly specified in `SplineTransformer` (scikit-learn#20191) Co-authored-by: Olivier Grisel <[email protected]> * FIX make check_complex_data deterministic (scikit-learn#20221) * TST test_fit_docstring_attributes include properties (scikit-learn#20190) * FIX Uses the color max for colormap in ConfusionMatrixDisplay (scikit-learn#19784) * STY Changing .format method to f-string formatting (scikit-learn#20215) * CI Adds permissions for label action Co-authored-by: Jérémie du Boisberranger <[email protected]> Co-authored-by: tsuga <[email protected]> Co-authored-by: Conner Shen <[email protected]> Co-authored-by: Guillaume Lemaitre <[email protected]> Co-authored-by: mlondschien <[email protected]> Co-authored-by: Clément Fauchereau <[email protected]> Co-authored-by: murata-yu <[email protected]> Co-authored-by: Olivier Grisel <[email protected]> Co-authored-by: Brian Sun <[email protected]> Co-authored-by: Christian Lorentzen <[email protected]> Co-authored-by: Alihan Zihna <[email protected]> Co-authored-by: Alihan Zihna <[email protected]> Co-authored-by: Chiara Marmo <[email protected]> Co-authored-by: Olivier Grisel <[email protected]> Co-authored-by: naozin555 <[email protected]> Co-authored-by: Venkatachalam N <[email protected]> Co-authored-by: Nanshan Li <[email protected]> Co-authored-by: solosilence <[email protected]>
Towards #19333