Thanks to visit codestin.com
Credit goes to github.com

Skip to content

IterativeImputer behaviour on missing nan's in fit data #13773

@Pacman1984

Description

@Pacman1984

Why is this behaviour forced:

Features with missing values during transform which did not have any missing values during fit will be imputed with the initial imputation method only.

https://scikit-learn.org/dev/modules/generated/sklearn.impute.IterativeImputer.html#sklearn.impute.IterativeImputer

This means by default it will return the mean of that feature. I would prefer just fit one iteration of the chosen estimator and use that fitted estimator to impute missing values.

Actual behaviour:
Example - The second feature missing np.nan --> mean imputation

import numpy as np
from sklearn.impute import IterativeImputer
imp = IterativeImputer(max_iter=10, verbose=0)
imp.fit([[1, 2], [3, 6], [4, 8], [10, 20], [np.nan, 22], [7, 14]])

X_test = [[np.nan, 4], [6, np.nan], [np.nan, 6], [4, np.nan], [33, np.nan]]
print(np.round(imp.transform(X_test)))
Return:
[[ 2.  4.]
 [ 6. 12.]
 [ 3.  6.]
 [ 4. 12.]
 [33. 12.]]

Example adjusted - Second feature has np.nan values --> iterative imputation with estimator

import numpy as np
from sklearn.impute import IterativeImputer
imp = IterativeImputer(max_iter=10, verbose=0)
imp.fit([[1, 2], [3, 6], [4, 8], [10, 20], [np.nan, 22], [7, np.nan]])

X_test = [[np.nan, 4], [6, np.nan], [np.nan, 6], [4, np.nan], [33, np.nan]]
print(np.round(imp.transform(X_test)))
Return:
[[ 2.  4.]
 [ 6. 12.]
 [ 3.  6.]
 [ 4. 8.]
 [33. 66.]]

Maybe sklearn/impute.py line 679 to 683 should be optional with a parameter like force-iterimpute.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions