-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[MRG] Remove SimpleImputer axis parameter (#10636) #10652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Remove SimpleImputer axis parameter (#10636) #10652
Conversation
…into Remove-SimpleImputer-axis-parameter-scikit-learn#10636
This pull request introduces 2 alerts - view on lgtm.com new alerts:
Comment posted by lgtm.com |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank
doc/whats_new/v0.20.rst
Outdated
@@ -376,6 +376,13 @@ Imputer | |||
- Deprecate :class:`preprocessing.Imputer` and move the corresponding module to | |||
:class:`impute.SimpleImputer`. :issue:`9726` by :user:`Kumar Ashutosh | |||
<thechargedneutron>`. | |||
- The `sklearn.preprocessing.Imputer` has been renamed to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should already be an entry here about the renaming
sklearn/impute.py
Outdated
@@ -98,28 +92,23 @@ class SimpleImputer(BaseEstimator, TransformerMixin): | |||
a new copy will always be made, even if `copy=False`: | |||
|
|||
- If X is not an array of floating values; | |||
- If X is sparse and `missing_values=0`; | |||
- If `axis=0` and X is encoded as a CSR matrix; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still relevant. We just assume axis=0 always
sklearn/impute.py
Outdated
|
||
Notes | ||
----- | ||
- When ``axis=0``, columns which only contained missing values at `fit` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still applies
sklearn/impute.py
Outdated
- When ``axis=1``, an exception is raised if there are rows for which it is | ||
not possible to fill in the missing values (e.g., because they only | ||
contain missing values). | ||
- The sklearn.preprocessing.Imputer has been renamed to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is helpful. People reading here have already found SimpleImputer. It might be helpful to just say that to do imputation across rows, they can use...
sklearn/preprocessing/imputation.py
Outdated
@@ -63,7 +63,8 @@ def _most_frequent(array, extra_value, n_repeat): | |||
|
|||
@deprecated("Imputer was deprecated in version 0.20 and will be " | |||
"removed in 0.22. Import impute.SimpleImputer from " | |||
"sklearn instead.") | |||
"sklearn instead." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing a space at the end
sklearn/impute.py
Outdated
|
||
# Count the elements != 0 | ||
mask_non_zeros = sparse.csc_matrix( | ||
(mask_valids.astype(np.float64), | ||
X.indices, | ||
X.indptr), copy=False) | ||
s = mask_non_zeros.sum(axis=0) | ||
s = mask_non_zeros.sum() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here
sklearn/impute.py
Outdated
n_non_missing = np.add(n_non_missing, s) | ||
|
||
else: | ||
sums = X.sum(axis=axis) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
=0
sklearn/impute.py
Outdated
@@ -255,7 +218,7 @@ def _dense_fit(self, X, strategy, missing_values, axis): | |||
|
|||
# Mean | |||
if strategy == "mean": | |||
mean_masked = np.ma.mean(masked_X, axis=axis) | |||
mean_masked = np.ma.mean(masked_X) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
=0
sklearn/impute.py
Outdated
@@ -270,7 +233,7 @@ def _dense_fit(self, X, strategy, missing_values, axis): | |||
# recent versions of numpy, which we want to mimic | |||
masked_X.mask = np.logical_or(masked_X.mask, | |||
np.isnan(X)) | |||
median_masked = np.ma.median(masked_X, axis=axis) | |||
median_masked = np.ma.median(masked_X) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here too
@@ -363,13 +320,10 @@ def transform(self, X): | |||
X = X.toarray() | |||
|
|||
mask = _get_mask(X, self.missing_values) | |||
n_missing = np.sum(mask, axis=self.axis) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here
@jnothman Tks! |
This pull request introduces 2 alerts - view on lgtm.com new alerts:
Comment posted by lgtm.com |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise this looks good!
sklearn/impute.py
Outdated
@@ -142,8 +135,7 @@ def fit(self, X, y=None): | |||
raise ValueError("Can only use these strategies: {0} " | |||
" got strategy={1}".format(allowed_strategies, | |||
self.strategy)) | |||
|
|||
if self.axis not in [0, 1]: | |||
if self.axis: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be removed. The attribute no longer exists and the error is no longer relevant
=) tks! |
This pull request introduces 2 alerts - view on lgtm.com new alerts:
Comment posted by lgtm.com |
Sorry for my poor reviewing. Tests are not passing. You've forgotten to remove axis parameter from _sparse_fit and _dense_fit |
sklearn/impute.py
Outdated
self.statistics_ = self._sparse_fit(X, | ||
self.strategy, | ||
self.missing_values, | ||
self.axis) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
!!!!!
Please search the file for any remaining mentions of axis
You have another flake8 error, and it looks like tests are still failing. Please check the travis logs. |
Tests still failing. Let us know if you need help |
I'm sorry, I'm still a newbie here! I believe it will be all right now. Thank you for your help. |
@jnothman I need your help! How can I test my modification locally in travis and appveyor? Because I use the test file and all test passed! Can you help me? For example, travis.ci is triggering an error in the Base.py file, which I have not even worked with. |
The difference between travis and appveyor ithe the OS. The former is linux while the second is windows. Then if you want to replicate a failure, you need to create a python environment with the same version of the different required packages. |
|
||
# Count the zeros | ||
if missing_values == 0: | ||
n_zeros_axis = np.zeros(X.shape[not axis], dtype=int) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not axis
should mean 1
if I am not wrong. By the way not axis
is really not intuitive use here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK! @glemaitre
@gilbertoolimpio You can run the tests locally with |
doc/whats_new/v0.20.rst
Outdated
- Future (and default) behavior is equivalent to ``axis=0`` | ||
(impute along columns). Row-wise imputation can be performed with | ||
FunctionTransformer (e.g., | ||
``FunctionTransformer(lambda X: Imputer().fit_transform(X.T).T)``). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please mention the deprecation of axis
parameter here and in the deprecation message of preprocessing.Imputer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK! @qinhanmin2014
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tests are failing. Are you able to understand the log output and what to do?
@@ -376,6 +376,7 @@ Imputer | |||
- Deprecate :class:`preprocessing.Imputer` and move the corresponding module to | |||
:class:`impute.SimpleImputer`. :issue:`9726` by :user:`Kumar Ashutosh | |||
<thechargedneutron>`. | |||
- The ``axis`` parameter was deprecated in this version. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deprecated in -> dropped from (or removed from)
Please remove the -
at the beginning, and in the next line also. We want readers to see all these points as one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also credit yourself for this work, and reference thus pull request
@gilbertoolimpio Are you able to make those changes? |
@gilbertoolimpio Are you still working on it or do you have any difficulties? Otherwise we might let someone else to take it. |
@qinhanmin2014 I think that somebody should take over to get the imputer fixed asap |
@gilbertoolimpio Thanks for your great work so far :)
@glemaitre Agree, marking as help wanted. Not sure whether there's conflict with MICE. Maybe we should get MICE in first and then take care of this? |
Either way. But MICE is ready to go on my side and the one of @jnothman as
well I think.
I might have some time in this week to take over this issue in fact.
…On 16 March 2018 at 14:43, Hanmin Qin ***@***.***> wrote:
@gilbertoolimpio <https://github.com/gilbertoolimpio> Thanks for your
great work so far :)
@qinhanmin2014 <https://github.com/qinhanmin2014> I think that somebody
should take over to get the imputer fixed asap
@glemaitre <https://github.com/glemaitre> Agree, marking as help wanted.
Not sure whether there's conflict with MICE. Maybe we should get MICE in
first and then take care of this?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#10652 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AHG9P9f2hYQXrMolxTXPRBF9xAdWr4qGks5te8FqgaJpZM4SJaR4>
.
--
Guillaume Lemaitre
INRIA Saclay - Parietal team
Center for Data Science Paris-Saclay
https://glemaitre.github.io/
|
@glemaitre Thanks. Please remove the labels if you've decided to take it yourself :) |
Reference Issues/PRs
Fixes #10636
What does this implement/fix? Explain your changes.
In issue #10636 was requested to remove the
axis
parameter entries, both in the code and in the document. Also included information about the removal of theaxis
parameter in the documentation and in what is news.Any other comments?
No