FEAT allow RFE(CV) be used with `pemutation_importance` #32251

GaetandeCast · 2025-09-22T13:46:00Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

To be used in RFE and RFECV, permutation_importance needs to be aware of which features were already eliminated by the procedure to reduce its test dataset.
This PR adds a feature_indices parameter to sklearn.feature_selection._base._get_feature_importances that is given to the importance_getter so that it is aware of which features to compute the importance of.

Any other comments?

The new feature is added to the test suite and illustrated in the RFECV doc example.

@glemaitre @ogrisel

github-actions · 2025-09-22T13:46:56Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: a6ed721. Link to the linter CI: here}

ogrisel

Thanks for the PR. This is very useful. Please indeed update the example and add a changelog entry.

EDIT: here are the instructions for the changelog entry: https://github.com/scikit-learn/scikit-learn/blob/main/doc/whats_new/upcoming_changes/README.md

sklearn/feature_selection/_base.py

sklearn/feature_selection/tests/test_rfe.py

ogrisel · 2025-09-25T08:47:47Z

doc/whats_new/upcoming_changes/sklearn.feature_selection/32251.feature.rst

+  This allows methods like :func:`permutation_importance` to extract the relevant features
+  from its test set.


Suggested change

This allows methods like :func:`permutation_importance` to extract the relevant features

from its test set.

This allows methods like :func:`permutation_importance` and similar tools to

iteratively extract the previously selected features from a test set.

yeah I realized it was not very clear so I planned to change it to:
" This allows methods that need a test set, like :func:permutation_importance, to know which
features of to use in their predictions."

ogrisel

LGTM besides the following:

sklearn/feature_selection/_rfe.py

ogrisel

I found a small problem in the example + small suggestions for further improvement in the docstrings.

Also @ArturoAmorQ might be interested in reviewing this PR (the updated example in particular).

ogrisel · 2025-09-26T12:28:34Z

sklearn/feature_selection/_rfe.py

        If `callable`, overrides the default feature importance getter.
        The callable is passed with the fitted estimator and it should
-        return importance for each feature.
+        return importance for each feature. When  it accepts it, the callable is passed


Suggested change

return importance for each feature. When it accepts it, the callable is passed

return importance for each feature. When it accepts it, the callable is passed

ogrisel · 2025-09-26T12:29:09Z

sklearn/feature_selection/_rfe.py

-        return importance for each feature.
+        return importance for each feature. When  it accepts it, the callable is passed
+        `feature_indices` which stores the index of the features in the full dataset
+        that have not been eliminated yet.


Suggested change

that have not been eliminated yet.

that have not yet been eliminated in previous iterations.

ogrisel · 2025-09-26T12:29:38Z

sklearn/feature_selection/_rfe.py

-        return importance for each feature.
+        return importance for each feature. When it accepts it, the callable is passed
+        `feature_indices` which stores the index of the features in the full dataset
+        that have not been eliminated yet.


Suggested change

that have not been eliminated yet.

that have not yet been eliminated in previous iterations.

ogrisel · 2025-09-26T12:30:40Z

sklearn/feature_selection/_rfe.py

+        shown at the end of
+        :ref:`sphx_glr_auto_examples_feature_selection_plot_rfe_with_cross_validation.py`.

        .. versionadded:: 0.24


I think we should add a .. versionchanged:: 1.8 and mention that support for passing feature_indices to the callable when part of its signature.

And similarly for the docstring of the other class.

ogrisel · 2025-09-26T12:32:39Z

examples/feature_selection/plot_rfe_with_cross_validation.py

+    n_classes=8,
+    n_clusters_per_class=1,
+    class_sep=0.8,
+    random_state=0,


We need to make sure that we do not sample from the training set:

Suggested change

random_state=0,

random_state=1, # Use a different seed to sample different points.

The centroids positions of make_classification depend on the random_state generator:

scikit-learn/sklearn/datasets/_samples_generator.py

Line 277 in 3498b6f

centroids = _generate_hypercube(n_clusters, n_informative, generator).astype(

i.e. changing to random_state=1 will not sample the same distribution. Let's rather make a train_test_split right after the first make_classification (with n_samples=1_000).

On a side note, X_test/y_test feel awkward for computing something (the permutation importances) that is later used during fit. Should we call them X_val/y_val instead (similar to the notation of early stopping in HGBT)?

ArturoAmorQ

Thanks for the PR @GaetandeCast, this is certainly a very nice addition! A few comments regarding documentation only.

ArturoAmorQ · 2025-09-26T13:38:22Z

examples/feature_selection/plot_rfe_with_cross_validation.py

+    n_classes=8,
+    n_clusters_per_class=1,
+    class_sep=0.8,
+    random_state=0,


The centroids positions of make_classification depend on the random_state generator:

scikit-learn/sklearn/datasets/_samples_generator.py

Line 277 in 3498b6f

centroids = _generate_hypercube(n_clusters, n_informative, generator).astype(

i.e. changing to random_state=1 will not sample the same distribution. Let's rather make a train_test_split right after the first make_classification (with n_samples=1_000).

On a side note, X_test/y_test feel awkward for computing something (the permutation importances) that is later used during fit. Should we call them X_val/y_val instead (similar to the notation of early stopping in HGBT)?

ArturoAmorQ · 2025-09-26T13:57:56Z

examples/feature_selection/plot_rfe_with_cross_validation.py

+# Under the hood, `RFECV` uses importance scores derived from the coefficients of the
+# linear model we used, to choose which feature to eliminate. We show here how to use
+# `permutation_importance` as an alternative way to measure the importance of features.
+# For that, we use a callable in the `importance_getter` parameter of RFECV.
+# This callable accepts a fitted model and an array containing the indices of the
+# features that have not been eliminated yet.


Let's rephrase to introduce the importance_getter earlier in the sentence. By doing so it's easier to justify the "under the hood" statement. We can use something similar to:

The importance_getter parameter in RFE and RFECV uses by default the coef_ (e.g. in linear models) or the feature_importances_ attributes of an estimator to derive the feature importance. We show here how to use a callable instead to compute the permutation_importance. This callable accepts a fitted model and an array containing the indices of the features that remain after elimination.

ArturoAmorQ · 2025-09-26T14:12:32Z

sklearn/feature_selection/_rfe.py

+        return importance for each feature. When it accepts it, the callable is passed
+        `feature_indices` which stores the index of the features in the full dataset
+        that have not yet been eliminated in previous iterations.


"When it accepts it" is a bit vague. How about being more explicit? Something in the line of

[...] it should an importance value for each feature. If the callable accepts an additional argument feature_indices, it should contain the indices of the features in the full dataset that that remain after elimination in previous iterations.

ArturoAmorQ · 2025-09-26T14:13:24Z

doc/whats_new/upcoming_changes/sklearn.feature_selection/32251.feature.rst

@@ -0,0 +1,9 @@
+- :class:`feature_selection.RFE` and :class:`feature_selection.RFECV`
+  now support the use of :func:`permutation_importance` as an :attr:`importance_getter`.
+  When a callable, and when it can accept it, the :attr:`importance_getter` is passed


Same comment about "it can accept it", who is who in the pronouns "it"?

base logic and test

eb67d47

github-actions bot added the module:feature_selection label Sep 22, 2025

GaetandeCast changed the title ~~base logic and test~~ FEAT allow RFE(CV) be used with pemutation_importance Sep 22, 2025

dont inspect signature for attrgetter to avoid python3.10 bug

55db7fb

ogrisel reviewed Sep 24, 2025

View reviewed changes

sklearn/feature_selection/_base.py Outdated Show resolved Hide resolved

sklearn/feature_selection/tests/test_rfe.py Show resolved Hide resolved

GaetandeCast added 2 commits September 25, 2025 10:25

add changelog

e9a58bc

add test from code review

5777a4a

ogrisel reviewed Sep 25, 2025

View reviewed changes

GaetandeCast added 2 commits September 25, 2025 11:58

doc example and docstrings

f2bf623

typo and reformulation

701e6a6

GaetandeCast marked this pull request as ready for review September 25, 2025 14:13

ogrisel approved these changes Sep 25, 2025

View reviewed changes

sklearn/feature_selection/_rfe.py Outdated Show resolved Hide resolved

sklearn/feature_selection/_rfe.py Outdated Show resolved Hide resolved

sklearn/feature_selection/_rfe.py Outdated Show resolved Hide resolved

GaetandeCast added 2 commits September 25, 2025 18:33

add missing backtick and colon

a7897d3

typo

db66df6

ogrisel added the Waiting for Reviewer label Sep 26, 2025

ogrisel reviewed Sep 26, 2025

View reviewed changes

improve docstring and fix example

3854954

ArturoAmorQ reviewed Sep 26, 2025

View reviewed changes

improve documentation

a6ed721

		This allows methods like :func:`permutation_importance` to extract the relevant features
		from its test set.

	return importance for each feature. When it accepts it, the callable is passed
	return importance for each feature. When it accepts it, the callable is passed

	that have not been eliminated yet.
	that have not yet been eliminated in previous iterations.

	random_state=0,
	random_state=1, # Use a different seed to sample different points.

Uh oh!

FEAT allow RFE(CV) be used with pemutation_importance #32251

Are you sure you want to change the base?

FEAT allow RFE(CV) be used with pemutation_importance #32251

Conversation

GaetandeCast commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

github-actions bot commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

ogrisel left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ArturoAmorQ left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

FEAT allow RFE(CV) be used with `pemutation_importance` #32251

FEAT allow RFE(CV) be used with `pemutation_importance` #32251

GaetandeCast commented Sep 22, 2025 •

edited

Loading

github-actions bot commented Sep 22, 2025 •

edited

Loading

ogrisel left a comment •

edited

Loading