MNT Clean-up deprecations for 1.8: Imputer drops empty feature when keep_empty_features=False even if strategy='constant' #32266

FrancoisPgm · 2025-09-24T14:52:57Z

Reference Issues/PRs

Deprecation clean up for #29950.

What does this implement/fix? Explain your changes.

Remove the behaviour that made Imputers not drop empty features when strategy='constant', even when keep_empty_features is set to False. Now keep_empty_features=False makes the Imputer drop empty features in all cases.

Any other comments?

I followed the indications left in the TODO(1.8) comments, but in SimpleImputer._dense_fit it says to put np.nan in the statistic in the empty features dimensions so they can get dropped later, however the statistic is a numpy array with a dtype corresponding to X, and np.nan is a float, so it can't be inserted in int arrays. So right now I have a test failing for integer arrays. I'm not sure about the best strategy to get around this issue.

…='constant' and keep_empty_features=False

github-actions · 2025-09-24T14:53:59Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: ec95191. Link to the linter CI: here}

adrinjalali · 2025-09-25T10:16:37Z

CI failing here @FrancoisPgm

FrancoisPgm · 2025-09-25T11:31:08Z

CI failing here @FrancoisPgm

Yes the failure is related to what I mentioned in the comment, np.nan is a float and can't be added to a int array, so the solution to deprecate the old behavior described by the instructions in the TODO comments does not work in all cases. An alternative way to mark the empty feature dimensions needs to be used. I figured I'd open the PR to have a place to discuss it.

FrancoisPgm · 2025-09-25T11:32:42Z

sklearn/impute/_base.py

-            # TODO(1.8): Remove FutureWarning and add `np.nan` as a statistic
-            # for empty features to drop them later.
-            if not self.keep_empty_features and ma.getmask(masked_X).all(axis=0).any():
-                warnings.warn(
-                    "Currently, when `keep_empty_feature=False` and "
-                    '`strategy="constant"`, empty features are not dropped. '
-                    "This behaviour will change in version 1.8. Set "
-                    "`keep_empty_feature=True` to preserve this behaviour.",
-                    FutureWarning,
-                )
-
            # for constant strategy, self.statistcs_ is used to store
            # fill_value in each column
-            return np.full(X.shape[1], fill_value, dtype=X.dtype)
+            statistics = np.full(X.shape[1], fill_value, dtype=X.dtype)
+
+            if not self.keep_empty_features:
+                for i in range(masked_X.shape[1]):
+                    if ma.getmask(masked_X[:, i]).all():
+                        statistics[i] = np.nan
+
+            return statistics


Here is the issue where np.nan is added to statistics.

adrinjalali · 2025-09-25T12:01:31Z

sklearn/impute/_base.py

+            if not self.keep_empty_features:
+                for i in range(missing_mask.shape[1]):
+                    if all(missing_mask[:, i].data):
+                        statistics[i] = np.nan


The reason this works, but the dense case doesn't, is that here statistics is created as a float array, whereas there the array is created with the same dtype as the input, and then putting np.nan (which is a float) into the int array fails.

I think we should have probably a dtype object for statistics, and then convert the type before putting it into X when modifying it instead.

…ibute declared during fit to convert the fill values during transform

deprecate the behaviour of not droping an empty feature when strategy…

e6a0180

…='constant' and keep_empty_features=False

github-actions bot added the module:impute label Sep 24, 2025

FrancoisPgm commented Sep 25, 2025

View reviewed changes

adrinjalali reviewed Sep 25, 2025

View reviewed changes

use np.object_ dtype for statistics attribute, add a _fill_dtype attr…

ec95191

…ibute declared during fit to convert the fill values during transform

adrinjalali added the No Changelog Needed label Sep 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

MNT Clean-up deprecations for 1.8: Imputer drops empty feature when keep_empty_features=False even if strategy='constant' #32266

MNT Clean-up deprecations for 1.8: Imputer drops empty feature when keep_empty_features=False even if strategy='constant' #32266

Uh oh!

FrancoisPgm commented Sep 24, 2025

Uh oh!

github-actions bot commented Sep 24, 2025 •

edited

Loading

Uh oh!

adrinjalali commented Sep 25, 2025

Uh oh!

FrancoisPgm commented Sep 25, 2025

Uh oh!

FrancoisPgm Sep 25, 2025

Uh oh!

adrinjalali Sep 25, 2025

Uh oh!

Uh oh!

Uh oh!

MNT Clean-up deprecations for 1.8: Imputer drops empty feature when keep_empty_features=False even if strategy='constant' #32266

Are you sure you want to change the base?

MNT Clean-up deprecations for 1.8: Imputer drops empty feature when keep_empty_features=False even if strategy='constant' #32266

Uh oh!

Conversation

FrancoisPgm commented Sep 24, 2025

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

github-actions bot commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

adrinjalali commented Sep 25, 2025

Uh oh!

FrancoisPgm commented Sep 25, 2025

Uh oh!

FrancoisPgm Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

adrinjalali Sep 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Sep 24, 2025 •

edited

Loading