TST check n_features_in_ in pipeline module #20192

glemaitre · 2021-06-02T16:09:08Z

Towards #19333

jeremiedbb · 2021-06-02T16:43:28Z

sklearn/tests/test_common.py

    'model_selection',
    'multiclass',
    'multioutput',
-    'pipeline',


remark: even if removed from the list, the estimators of this module are not tested anyway (skipped in the instance constructions)

It is true that we only have a negative test for n_features_in_ via sklearn.tests.test_metaestimator.test_meta_estimators_delegate_data_validation and _generate_meta_estimator_instances_with_pipeline that is updated in this pipeline.

It would be great to have a new positive common test for the presence of n_features_in_ on tabular dataset in sklearn.tests.test_metaestimator.

jeremiedbb · 2021-06-02T20:59:45Z

n_features_in_ still needs to be documented for these classes. It's not detected because it's a property (see #20190)

jeremiedbb

lgtm !

sklearn/pipeline.py

Co-authored-by: Jérémie du Boisberranger <[email protected]>

ogrisel

LGTM to me the way it is as it's already a net improvement but it's true that we probably do need a new n_features_in_ for metaestimators on tabular data.

ogrisel · 2021-06-03T12:48:53Z

sklearn/tests/test_common.py

    'model_selection',
    'multiclass',
    'multioutput',
-    'pipeline',


It is true that we only have a negative test for n_features_in_ via sklearn.tests.test_metaestimator.test_meta_estimators_delegate_data_validation and _generate_meta_estimator_instances_with_pipeline that is updated in this pipeline.

It would be great to have a new positive common test for the presence of n_features_in_ on tabular dataset in sklearn.tests.test_metaestimator.

Co-authored-by: Jérémie du Boisberranger <[email protected]>

ogrisel · 2021-06-07T10:29:54Z

Merged to move forward.

* TST enable test docstring params for feature extraction module (scikit-learn#20188) * DOC fix a reference in sklearn.ensemble.GradientBoostingRegressor (scikit-learn#20198) * FIX mcc zero divsion (scikit-learn#19977) * TST Add TransformedTargetRegressor to test_meta_estimators_delegate_data_validation (scikit-learn#20175) Co-authored-by: Guillaume Lemaitre <[email protected]> * TST enable n_feature_in_ test for feature_extraction module * FIX Uses points instead of pixels in plot_tree (scikit-learn#20023) * MNT n_features_in through the multiclass module (scikit-learn#20193) * CI Removes python 3.6 builds from wheel building (scikit-learn#20184) * FIX Fix typo in error message in `fetch_openml` (scikit-learn#20201) * FIX Fix error when using Calibrated with Voting (scikit-learn#20087) * FIX Fix RandomForestRegressor doesn't accept max_samples=1.0 (scikit-learn#20159) Co-authored-by: Olivier Grisel <[email protected]> Co-authored-by: Thomas J. Fan <[email protected]> * ENH Adds Poisson criterion in RandomForestRegressor (scikit-learn#19836) Co-authored-by: Christian Lorentzen <[email protected]> Co-authored-by: Alihan Zihna <[email protected]> Co-authored-by: Alihan Zihna <[email protected]> Co-authored-by: Chiara Marmo <[email protected]> Co-authored-by: Olivier Grisel <[email protected]> Co-authored-by: naozin555 <[email protected]> Co-authored-by: Venkatachalam N <[email protected]> Co-authored-by: Thomas J. Fan <[email protected]> * TST Replace assert_warns from decomposition/tests (scikit-learn#20214) * TST check n_features_in_ in pipeline module (scikit-learn#20192) Co-authored-by: Olivier Grisel <[email protected]> Co-authored-by: Jérémie du Boisberranger <[email protected]> Co-authored-by: Olivier Grisel <[email protected]> * Allow `n_knots=None` if knots are explicitly specified in `SplineTransformer` (scikit-learn#20191) Co-authored-by: Olivier Grisel <[email protected]> * FIX make check_complex_data deterministic (scikit-learn#20221) * TST test_fit_docstring_attributes include properties (scikit-learn#20190) * FIX Uses the color max for colormap in ConfusionMatrixDisplay (scikit-learn#19784) * STY Changing .format method to f-string formatting (scikit-learn#20215) * CI Adds permissions for label action Co-authored-by: Jérémie du Boisberranger <[email protected]> Co-authored-by: tsuga <[email protected]> Co-authored-by: Conner Shen <[email protected]> Co-authored-by: Guillaume Lemaitre <[email protected]> Co-authored-by: mlondschien <[email protected]> Co-authored-by: Clément Fauchereau <[email protected]> Co-authored-by: murata-yu <[email protected]> Co-authored-by: Olivier Grisel <[email protected]> Co-authored-by: Brian Sun <[email protected]> Co-authored-by: Christian Lorentzen <[email protected]> Co-authored-by: Alihan Zihna <[email protected]> Co-authored-by: Alihan Zihna <[email protected]> Co-authored-by: Chiara Marmo <[email protected]> Co-authored-by: Olivier Grisel <[email protected]> Co-authored-by: naozin555 <[email protected]> Co-authored-by: Venkatachalam N <[email protected]> Co-authored-by: Nanshan Li <[email protected]> Co-authored-by: solosilence <[email protected]>

glemaitre added 2 commits June 2, 2021 18:08

TST check n_features_in_ in pipeline module

a6f05c7

TST add FeatureUnion in metaestimator check

db4e5dd

glemaitre added the No Changelog Needed label Jun 2, 2021

glemaitre mentioned this pull request Jun 2, 2021

Track SLEP10: Add n_features_in_ to all modules #19333

Closed

47 tasks

jeremiedbb reviewed Jun 2, 2021

View reviewed changes

glemaitre added 2 commits June 3, 2021 13:10

Add documentation

abdd88d

typo

7b4aa7b

github-actions bot added the module:pipeline label Jun 3, 2021

typo

3d2f3c0

jeremiedbb approved these changes Jun 3, 2021

View reviewed changes

sklearn/pipeline.py Outdated Show resolved Hide resolved

sklearn/pipeline.py Outdated Show resolved Hide resolved

Apply suggestions from code review

db8c2dc

Co-authored-by: Jérémie du Boisberranger <[email protected]>

ogrisel approved these changes Jun 3, 2021

View reviewed changes

ogrisel and others added 2 commits June 3, 2021 14:57

Merge branch 'main' into pipeline_n_features_in_

e0c7d68

Update sklearn/pipeline.py

08a2c86

Co-authored-by: Jérémie du Boisberranger <[email protected]>

ogrisel merged commit 800aee6 into scikit-learn:main Jun 7, 2021

This was referenced Jun 7, 2021

Random failure in test_estimators[OutputCodeClassifier(estimator=LogisticRegression(C=1))-check_complex_data] #20218

Closed

FIX make sure OutputCodeClassifier rejects complex labels #20219

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

TST check n_features_in_ in pipeline module #20192

TST check n_features_in_ in pipeline module #20192

Uh oh!

glemaitre commented Jun 2, 2021

Uh oh!

jeremiedbb Jun 2, 2021

Uh oh!

ogrisel Jun 3, 2021

Uh oh!

jeremiedbb commented Jun 2, 2021

Uh oh!

jeremiedbb left a comment

Uh oh!

Uh oh!

Uh oh!

ogrisel left a comment

Uh oh!

ogrisel Jun 3, 2021

Uh oh!

ogrisel commented Jun 7, 2021

Uh oh!

Uh oh!

Uh oh!

TST check n_features_in_ in pipeline module #20192

TST check n_features_in_ in pipeline module #20192

Uh oh!

Conversation

glemaitre commented Jun 2, 2021

Uh oh!

jeremiedbb Jun 2, 2021

Choose a reason for hiding this comment

Uh oh!

ogrisel Jun 3, 2021

Choose a reason for hiding this comment

Uh oh!

jeremiedbb commented Jun 2, 2021

Uh oh!

jeremiedbb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

ogrisel Jun 3, 2021

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Jun 7, 2021

Uh oh!

Uh oh!