Is there any reason for SelectFromModel.transform to use force_all_finite=True in check_array? #10985

Noctiphobia · 2018-04-16T14:38:25Z

Description

SelectFromModel's transform raises ValueError if any value is infinite or NaN - however the values aren't actually used anywhere, so it seems to me that this check (check_array using default True value for parameter force_all_finite) could be lifted. as some models are capable of working with such values (e.g. tree based models should handle infinities properly). This could also apply to some other feature selection methods.

rth · 2018-04-16T17:53:57Z

This sounds reasonable to me for SelectFromModel.

However SelectFromModel.transform is inherited from SelectorMixin.transform which is used in other feature selectors. So relaxing this check here, means adding additional checks to the feature selectors that require it, for instance, RFE, as far as I understand. Which means that RFE.predict would validate X twice. The alternative is to copy-paste the transform code from the mixin to SelectFromModel which is also not ideal.

I'm not sure if it's worth it; let's wait for a second opinion on this..

jnothman · 2018-04-16T23:49:52Z

I'd be happy if this constraint were removed, even in RFE, where the downstream model will check for finiteness too, and perhaps univariate (although then finiteness checks should be implemented in our available score functions)

nsorros · 2018-04-17T19:53:55Z

I am happy to work on this issue if there is no one else working on it.

alexpantyukhin · 2018-04-25T05:57:41Z

I have made an example how it could be fixed. Could you please review it?

adpeters · 2018-07-19T17:02:10Z

I've just created a new PR that works to address this and #10821 and would love some feedback on it. It seems like the feature selection classes should generally leave as much validation as possible to their estimators, since they need to be handling all of that already and right now they create unnecessary/artificial constraints. However allowing NaN/Inf in SelectorMixin.transform creates testing failures for the models that inherit from it because they are subject to the check_estimators_nan_inf check. So I am wondering what the best way to deal with this is; should we add these to the list in ALLOW_NAN, or maybe there is a better way to express that feature selectors in general do not need this check?

I was also wondering if people have ideas for what tests to write for this. I wrote some that simply test that NaN/Inf are allowed in the input and no errors are raised, which covers both the RFE/RFECV.fit methods and SelectorMixin.transform method. Any other suggestions?

jnothman added Easy Well-defined and straightforward way to resolve help wanted labels Apr 16, 2018

glemaitre added the Sprint label Apr 20, 2018

alexpantyukhin mentioned this issue Apr 24, 2018

set force_all_finite=False for SelectFromModel.transform #11026

Closed

amueller removed the help wanted label Jun 1, 2018

adpeters mentioned this issue Jul 19, 2018

[MRG] Allow nan/inf in feature selection #11635

Merged

jnothman closed this as completed in #11635 Nov 5, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Is there any reason for SelectFromModel.transform to use force_all_finite=True in check_array? #10985

Is there any reason for SelectFromModel.transform to use force_all_finite=True in check_array? #10985

Noctiphobia commented Apr 16, 2018

rth commented Apr 16, 2018 •

edited

Loading

Uh oh!

jnothman commented Apr 16, 2018 via email

Uh oh!

nsorros commented Apr 17, 2018

Uh oh!

alexpantyukhin commented Apr 25, 2018

Uh oh!

adpeters commented Jul 19, 2018 •

edited

Loading

Uh oh!

Uh oh!

Is there any reason for SelectFromModel.transform to use force_all_finite=True in check_array? #10985

Is there any reason for SelectFromModel.transform to use force_all_finite=True in check_array? #10985

Comments

Noctiphobia commented Apr 16, 2018

Description

rth commented Apr 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman commented Apr 16, 2018 via email

Uh oh!

nsorros commented Apr 17, 2018

Uh oh!

alexpantyukhin commented Apr 25, 2018

Uh oh!

adpeters commented Jul 19, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rth commented Apr 16, 2018 •

edited

Loading

adpeters commented Jul 19, 2018 •

edited

Loading