Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@thomasjpfan
Copy link
Member

@thomasjpfan thomasjpfan commented Nov 28, 2021

Reference Issues/PRs

Fixes #21743

What does this implement/fix? Explain your changes.

This PR adjusts RFE to allow nans by default and only changes allow_nan if the underlying estimator has tags.

Any other comments?

I agree with #21743 (comment) for meta-estimators in general. I think meta-estimators should default to allow_nan=True, unless proven otherwise.

@bmreiniger
Copy link
Contributor

Can we bring this back to life? It's somewhat niche, but came up again organically with a SelectFromModel with a pipeline estimator (where we wanted to apply some processing for the selection, but not for the resulting output).

@adrinjalali
Copy link
Member

@thomasjpfan wanna give this an update? happy to review.

@github-actions
Copy link

github-actions bot commented Oct 13, 2023

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: cb96116. Link to the linter CI: here

@thomasjpfan thomasjpfan marked this pull request as draft October 13, 2023 15:05
@thomasjpfan thomasjpfan changed the title FIX Corrects tags for pipeline and RFE FIX Adjust tags in RFE to allow nans by default Oct 13, 2023
@thomasjpfan
Copy link
Member Author

This one is a little tricker. Consider the following two pipelines:

pipe1 = make_pipeline(
    SimpleImputer(),
    StandardScaler(),
    LogisticRegression(),
)
pipe2 = make_pipeline(
    StandardScaler(),
    LogisticRegression(),
)

In both cases, the pipeline's first step accepts nans, but an imputer imputes the output and a scalar passes through the nans. This means pipe1 can accept nans, but pipe2 does not accept nans. It is not possible to know if a whole pipeline can accept nans, because the tags does not describe if the estimator output's nans.

@thomasjpfan thomasjpfan marked this pull request as ready for review October 13, 2023 16:38
Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I think it's okay to delay the errors / warnings to the step which actually has to deal with it anyway.

@adrinjalali adrinjalali added Quick Review For PRs that are quick to review Waiting for Second Reviewer First reviewer is done, need a second one! labels Oct 17, 2023
Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also test for RFECV but otherwise, LGTM.

@ogrisel ogrisel enabled auto-merge (squash) October 20, 2023 15:10
@ogrisel ogrisel merged commit 3ff6c82 into scikit-learn:main Oct 20, 2023
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Oct 31, 2023
REDVM pushed a commit to REDVM/scikit-learn that referenced this pull request Nov 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:feature_selection module:pipeline module:utils Quick Review For PRs that are quick to review Waiting for Second Reviewer First reviewer is done, need a second one!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Missing values break feature selection with pipeline estimator

4 participants