Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[MRG] Remove SimpleImputer axis parameter (#10636) #10652

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

giba0
Copy link

@giba0 giba0 commented Feb 17, 2018

Reference Issues/PRs

Fixes #10636

What does this implement/fix? Explain your changes.

In issue #10636 was requested to remove the axis parameter entries, both in the code and in the document. Also included information about the removal of the axis parameter in the documentation and in what is news.

Any other comments?

No

@sklearn-lgtm
Copy link

This pull request introduces 2 alerts - view on lgtm.com

new alerts:

  • 2 for Wrong number of arguments in a call

Comment posted by lgtm.com

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank

@@ -376,6 +376,13 @@ Imputer
- Deprecate :class:`preprocessing.Imputer` and move the corresponding module to
:class:`impute.SimpleImputer`. :issue:`9726` by :user:`Kumar Ashutosh
<thechargedneutron>`.
- The `sklearn.preprocessing.Imputer` has been renamed to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should already be an entry here about the renaming

@@ -98,28 +92,23 @@ class SimpleImputer(BaseEstimator, TransformerMixin):
a new copy will always be made, even if `copy=False`:

- If X is not an array of floating values;
- If X is sparse and `missing_values=0`;
- If `axis=0` and X is encoded as a CSR matrix;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still relevant. We just assume axis=0 always


Notes
-----
- When ``axis=0``, columns which only contained missing values at `fit`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still applies

- When ``axis=1``, an exception is raised if there are rows for which it is
not possible to fill in the missing values (e.g., because they only
contain missing values).
- The sklearn.preprocessing.Imputer has been renamed to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is helpful. People reading here have already found SimpleImputer. It might be helpful to just say that to do imputation across rows, they can use...

@@ -63,7 +63,8 @@ def _most_frequent(array, extra_value, n_repeat):

@deprecated("Imputer was deprecated in version 0.20 and will be "
"removed in 0.22. Import impute.SimpleImputer from "
"sklearn instead.")
"sklearn instead."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing a space at the end


# Count the elements != 0
mask_non_zeros = sparse.csc_matrix(
(mask_valids.astype(np.float64),
X.indices,
X.indptr), copy=False)
s = mask_non_zeros.sum(axis=0)
s = mask_non_zeros.sum()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And here

n_non_missing = np.add(n_non_missing, s)

else:
sums = X.sum(axis=axis)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

=0

@@ -255,7 +218,7 @@ def _dense_fit(self, X, strategy, missing_values, axis):

# Mean
if strategy == "mean":
mean_masked = np.ma.mean(masked_X, axis=axis)
mean_masked = np.ma.mean(masked_X)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

=0

@@ -270,7 +233,7 @@ def _dense_fit(self, X, strategy, missing_values, axis):
# recent versions of numpy, which we want to mimic
masked_X.mask = np.logical_or(masked_X.mask,
np.isnan(X))
median_masked = np.ma.median(masked_X, axis=axis)
median_masked = np.ma.median(masked_X)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too

@@ -363,13 +320,10 @@ def transform(self, X):
X = X.toarray()

mask = _get_mask(X, self.missing_values)
n_missing = np.sum(mask, axis=self.axis)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here

@giba0
Copy link
Author

giba0 commented Feb 18, 2018

@jnothman Tks!

@sklearn-lgtm
Copy link

This pull request introduces 2 alerts - view on lgtm.com

new alerts:

  • 2 for Wrong number of arguments in a call

Comment posted by lgtm.com

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise this looks good!

@@ -142,8 +135,7 @@ def fit(self, X, y=None):
raise ValueError("Can only use these strategies: {0} "
" got strategy={1}".format(allowed_strategies,
self.strategy))

if self.axis not in [0, 1]:
if self.axis:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be removed. The attribute no longer exists and the error is no longer relevant

@giba0
Copy link
Author

giba0 commented Feb 19, 2018

=) tks!

@jnothman jnothman changed the title Fix Remove SimpleImputer axis parameter (#10636) [MRG+1] Remove SimpleImputer axis parameter (#10636) Feb 19, 2018
@sklearn-lgtm
Copy link

This pull request introduces 2 alerts - view on lgtm.com

new alerts:

  • 2 for Wrong number of arguments in a call

Comment posted by lgtm.com

@jnothman
Copy link
Member

Sorry for my poor reviewing. Tests are not passing. You've forgotten to remove axis parameter from _sparse_fit and _dense_fit

self.statistics_ = self._sparse_fit(X,
self.strategy,
self.missing_values,
self.axis)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!!!!!

Please search the file for any remaining mentions of axis

@jnothman
Copy link
Member

You have another flake8 error, and it looks like tests are still failing. Please check the travis logs.

@jnothman
Copy link
Member

Tests still failing. Let us know if you need help

@giba0
Copy link
Author

giba0 commented Feb 24, 2018

I'm sorry, I'm still a newbie here! I believe it will be all right now. Thank you for your help.

@giba0
Copy link
Author

giba0 commented Feb 24, 2018

@jnothman I need your help! How can I test my modification locally in travis and appveyor? Because I use the test file and all test passed! Can you help me? For example, travis.ci is triggering an error in the Base.py file, which I have not even worked with.

@glemaitre
Copy link
Member

The difference between travis and appveyor ithe the OS. The former is linux while the second is windows. Then if you want to replicate a failure, you need to create a python environment with the same version of the different required packages.


# Count the zeros
if missing_values == 0:
n_zeros_axis = np.zeros(X.shape[not axis], dtype=int)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not axis should mean 1 if I am not wrong. By the way not axis is really not intuitive use here.

Copy link
Author

@giba0 giba0 Feb 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qinhanmin2014
Copy link
Member

@gilbertoolimpio You can run the tests locally with pytest sklearn (Also see http://scikit-learn.org/dev/developers/advanced_installation.html#testing). Generally you get the same result across different environments. So if you get no error with pytest sklearn, you'll pass Travis and AppVeyor (except for some special cases).
For current PR, please double check your diff to make sure that you didn't change anything when axis=0 (e.g., See above review).

- Future (and default) behavior is equivalent to ``axis=0``
(impute along columns). Row-wise imputation can be performed with
FunctionTransformer (e.g.,
``FunctionTransformer(lambda X: Imputer().fit_transform(X.T).T)``).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please mention the deprecation of axis parameter here and in the deprecation message of preprocessing.Imputer

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests are failing. Are you able to understand the log output and what to do?

@@ -376,6 +376,7 @@ Imputer
- Deprecate :class:`preprocessing.Imputer` and move the corresponding module to
:class:`impute.SimpleImputer`. :issue:`9726` by :user:`Kumar Ashutosh
<thechargedneutron>`.
- The ``axis`` parameter was deprecated in this version.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deprecated in -> dropped from (or removed from)

Please remove the - at the beginning, and in the next line also. We want readers to see all these points as one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also credit yourself for this work, and reference thus pull request

@glemaitre
Copy link
Member

@gilbertoolimpio Are you able to make those changes?

@qinhanmin2014
Copy link
Member

@gilbertoolimpio Are you still working on it or do you have any difficulties? Otherwise we might let someone else to take it.

@glemaitre
Copy link
Member

@qinhanmin2014 I think that somebody should take over to get the imputer fixed asap

@qinhanmin2014
Copy link
Member

@gilbertoolimpio Thanks for your great work so far :)

@qinhanmin2014 I think that somebody should take over to get the imputer fixed asap

@glemaitre Agree, marking as help wanted. Not sure whether there's conflict with MICE. Maybe we should get MICE in first and then take care of this?

@qinhanmin2014 qinhanmin2014 added Easy Well-defined and straightforward way to resolve Stalled help wanted labels Mar 16, 2018
@glemaitre
Copy link
Member

glemaitre commented Mar 16, 2018 via email

@qinhanmin2014
Copy link
Member

@glemaitre Thanks. Please remove the labels if you've decided to take it yourself :)
I saw your ping in MICE so if it's not merged after a couple of days, I'll try to have a look at the algorithm and give my review there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Blocker Easy Well-defined and straightforward way to resolve help wanted Stalled
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove SimpleImputer's axis parameter
5 participants