-
-
Notifications
You must be signed in to change notification settings - Fork 26.3k
FEAT allow RFE(CV) be used with pemutation_importance
#32251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
pemutation_importance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. This is very useful. Please indeed update the example and add a changelog entry.
EDIT: here are the instructions for the changelog entry: https://github.com/scikit-learn/scikit-learn/blob/main/doc/whats_new/upcoming_changes/README.md
This allows methods like :func:`permutation_importance` to extract the relevant features | ||
from its test set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This allows methods like :func:`permutation_importance` to extract the relevant features | |
from its test set. | |
This allows methods like :func:`permutation_importance` and similar tools to | |
iteratively extract the previously selected features from a test set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I realized it was not very clear so I planned to change it to:
" This allows methods that need a test set, like :func:permutation_importance
, to know which
features of to use in their predictions."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM besides the following:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found a small problem in the example + small suggestions for further improvement in the docstrings.
Also @ArturoAmorQ might be interested in reviewing this PR (the updated example in particular).
sklearn/feature_selection/_rfe.py
Outdated
If `callable`, overrides the default feature importance getter. | ||
The callable is passed with the fitted estimator and it should | ||
return importance for each feature. | ||
return importance for each feature. When it accepts it, the callable is passed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return importance for each feature. When it accepts it, the callable is passed | |
return importance for each feature. When it accepts it, the callable is passed |
sklearn/feature_selection/_rfe.py
Outdated
return importance for each feature. | ||
return importance for each feature. When it accepts it, the callable is passed | ||
`feature_indices` which stores the index of the features in the full dataset | ||
that have not been eliminated yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that have not been eliminated yet. | |
that have not yet been eliminated in previous iterations. |
sklearn/feature_selection/_rfe.py
Outdated
return importance for each feature. | ||
return importance for each feature. When it accepts it, the callable is passed | ||
`feature_indices` which stores the index of the features in the full dataset | ||
that have not been eliminated yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that have not been eliminated yet. | |
that have not yet been eliminated in previous iterations. |
shown at the end of | ||
:ref:`sphx_glr_auto_examples_feature_selection_plot_rfe_with_cross_validation.py`. | ||
.. versionadded:: 0.24 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should add a .. versionchanged:: 1.8
and mention that support for passing feature_indices
to the callable when part of its signature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And similarly for the docstring of the other class.
n_classes=8, | ||
n_clusters_per_class=1, | ||
class_sep=0.8, | ||
random_state=0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to make sure that we do not sample from the training set:
random_state=0, | |
random_state=1, # Use a different seed to sample different points. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The centroids
positions of make_classification
depend on the random_state
generator:
centroids = _generate_hypercube(n_clusters, n_informative, generator).astype( |
i.e. changing to random_state=1
will not sample the same distribution. Let's rather make a train_test_split
right after the first make_classification
(with n_samples=1_000
).
On a side note, X_test
/y_test
feel awkward for computing something (the permutation importances) that is later used during fit
. Should we call them X_val
/y_val
instead (similar to the notation of early stopping in HGBT)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @GaetandeCast, this is certainly a very nice addition! A few comments regarding documentation only.
n_classes=8, | ||
n_clusters_per_class=1, | ||
class_sep=0.8, | ||
random_state=0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The centroids
positions of make_classification
depend on the random_state
generator:
centroids = _generate_hypercube(n_clusters, n_informative, generator).astype( |
i.e. changing to random_state=1
will not sample the same distribution. Let's rather make a train_test_split
right after the first make_classification
(with n_samples=1_000
).
On a side note, X_test
/y_test
feel awkward for computing something (the permutation importances) that is later used during fit
. Should we call them X_val
/y_val
instead (similar to the notation of early stopping in HGBT)?
# Under the hood, `RFECV` uses importance scores derived from the coefficients of the | ||
# linear model we used, to choose which feature to eliminate. We show here how to use | ||
# `permutation_importance` as an alternative way to measure the importance of features. | ||
# For that, we use a callable in the `importance_getter` parameter of RFECV. | ||
# This callable accepts a fitted model and an array containing the indices of the | ||
# features that have not been eliminated yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's rephrase to introduce the importance_getter
earlier in the sentence. By doing so it's easier to justify the "under the hood" statement. We can use something similar to:
The
importance_getter
parameter inRFE
andRFECV
uses by default thecoef_
(e.g. in linear models) or thefeature_importances_
attributes of an estimator to derive the feature importance. We show here how to use a callable instead to compute thepermutation_importance
. This callable accepts a fitted model and an array containing the indices of the features that remain after elimination.
sklearn/feature_selection/_rfe.py
Outdated
return importance for each feature. When it accepts it, the callable is passed | ||
`feature_indices` which stores the index of the features in the full dataset | ||
that have not yet been eliminated in previous iterations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"When it accepts it" is a bit vague. How about being more explicit? Something in the line of
[...] it should an importance value for each feature. If the callable accepts an additional argument
feature_indices
, it should contain the indices of the features in the full dataset that that remain after elimination in previous iterations.
@@ -0,0 +1,9 @@ | |||
- :class:`feature_selection.RFE` and :class:`feature_selection.RFECV` | |||
now support the use of :func:`permutation_importance` as an :attr:`importance_getter`. | |||
When a callable, and when it can accept it, the :attr:`importance_getter` is passed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment about "it can accept it", who is who in the pronouns "it"?
Reference Issues/PRs
Fixes #32201
What does this implement/fix? Explain your changes.
To be used in RFE and RFECV,
permutation_importance
needs to be aware of which features were already eliminated by the procedure to reduce its test dataset.This PR adds a
feature_indices
parameter tosklearn.feature_selection._base._get_feature_importances
that is given to theimportance_getter
so that it is aware of which features to compute the importance of.Any other comments?
The new feature is added to the test suite and illustrated in the
RFECV
doc example.@glemaitre @ogrisel