-
-
Notifications
You must be signed in to change notification settings - Fork 26.5k
Closed
Labels
EasyWell-defined and straightforward way to resolveWell-defined and straightforward way to resolveEnhancementhelp wanted
Description
Why is this behaviour forced:
Features with missing values during transform which did not have any missing values during fit will be imputed with the initial imputation method only.
This means by default it will return the mean of that feature. I would prefer just fit one iteration of the chosen estimator and use that fitted estimator to impute missing values.
Actual behaviour:
Example - The second feature missing np.nan --> mean imputation
import numpy as np
from sklearn.impute import IterativeImputer
imp = IterativeImputer(max_iter=10, verbose=0)
imp.fit([[1, 2], [3, 6], [4, 8], [10, 20], [np.nan, 22], [7, 14]])
X_test = [[np.nan, 4], [6, np.nan], [np.nan, 6], [4, np.nan], [33, np.nan]]
print(np.round(imp.transform(X_test)))Return:
[[ 2. 4.]
[ 6. 12.]
[ 3. 6.]
[ 4. 12.]
[33. 12.]]
Example adjusted - Second feature has np.nan values --> iterative imputation with estimator
import numpy as np
from sklearn.impute import IterativeImputer
imp = IterativeImputer(max_iter=10, verbose=0)
imp.fit([[1, 2], [3, 6], [4, 8], [10, 20], [np.nan, 22], [7, np.nan]])
X_test = [[np.nan, 4], [6, np.nan], [np.nan, 6], [4, np.nan], [33, np.nan]]
print(np.round(imp.transform(X_test)))Return:
[[ 2. 4.]
[ 6. 12.]
[ 3. 6.]
[ 4. 8.]
[33. 66.]]
Maybe sklearn/impute.py line 679 to 683 should be optional with a parameter like force-iterimpute.
justus-hildebrand
Metadata
Metadata
Assignees
Labels
EasyWell-defined and straightforward way to resolveWell-defined and straightforward way to resolveEnhancementhelp wanted