Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Is there any reason for SelectFromModel.transform to use force_all_finite=True in check_array? #10985

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Noctiphobia opened this issue Apr 16, 2018 · 5 comments · Fixed by #11635
Labels
Easy Well-defined and straightforward way to resolve Sprint

Comments

@Noctiphobia
Copy link

Description

SelectFromModel's transform raises ValueError if any value is infinite or NaN - however the values aren't actually used anywhere, so it seems to me that this check (check_array using default True value for parameter force_all_finite) could be lifted. as some models are capable of working with such values (e.g. tree based models should handle infinities properly). This could also apply to some other feature selection methods.

@rth
Copy link
Member

rth commented Apr 16, 2018

This sounds reasonable to me for SelectFromModel.

However SelectFromModel.transform is inherited from SelectorMixin.transform which is used in other feature selectors. So relaxing this check here, means adding additional checks to the feature selectors that require it, for instance, RFE, as far as I understand. Which means that RFE.predict would validate X twice. The alternative is to copy-paste the transform code from the mixin to SelectFromModel which is also not ideal.

I'm not sure if it's worth it; let's wait for a second opinion on this..

@jnothman
Copy link
Member

jnothman commented Apr 16, 2018 via email

@jnothman jnothman added Easy Well-defined and straightforward way to resolve help wanted labels Apr 16, 2018
@nsorros
Copy link

nsorros commented Apr 17, 2018

I am happy to work on this issue if there is no one else working on it.

@alexpantyukhin
Copy link

I have made an example how it could be fixed. Could you please review it?

@adpeters
Copy link
Contributor

adpeters commented Jul 19, 2018

I've just created a new PR that works to address this and #10821 and would love some feedback on it. It seems like the feature selection classes should generally leave as much validation as possible to their estimators, since they need to be handling all of that already and right now they create unnecessary/artificial constraints. However allowing NaN/Inf in SelectorMixin.transform creates testing failures for the models that inherit from it because they are subject to the check_estimators_nan_inf check. So I am wondering what the best way to deal with this is; should we add these to the list in ALLOW_NAN, or maybe there is a better way to express that feature selectors in general do not need this check?

I was also wondering if people have ideas for what tests to write for this. I wrote some that simply test that NaN/Inf are allowed in the input and no errors are raised, which covers both the RFE/RFECV.fit methods and SelectorMixin.transform method. Any other suggestions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Easy Well-defined and straightforward way to resolve Sprint
Projects
None yet
8 participants