Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[MRG+1] Add 'axis' argument to sparsefuncs.mean_variance_axis #3622

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

untom
Copy link
Contributor

@untom untom commented Sep 2, 2014

This PR adds an 'axis' argument to sparsefuncs.mean_variance_axis, making it easier to calculate the columnwise mean/variance of sparse matrices.

While switching the codebase over to the new functionality, I noticed that VarianceThreshold needlessly converted CSC matrices to CSR. This PR also includes a commit to fix this, as the change fits naturally with the other changes.

(If the immediate need for the new functionality is not obvious: This PR is the first in a series that tries to split up #2514 into a few smaller, easy-to-digest PRs. I plan to commit other PRs that will make use of this one)

@coveralls
Copy link

Coverage Status

Coverage increased (+0.01%) when pulling 8036a2e on untom:sparse_mean_variance_axis into c09a4ca on scikit-learn:master.

@untom untom force-pushed the sparse_mean_variance_axis branch from 8036a2e to f09a94f Compare September 2, 2014 10:45
@coveralls
Copy link

Coverage Status

Coverage increased (+0.01%) when pulling f09a94f on untom:sparse_mean_variance_axis into c09a4ca on scikit-learn:master.

if axis < 0:
axis += 2
if (axis != 0) and (axis != 1):
raise ValueError("Invalid axis, use 0 for rows, or 1 for columns")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you tell to the user what was the given axis?

This is useful for introspecting code and debugging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@coveralls
Copy link

Coverage Status

Coverage increased (+0.01%) when pulling e1b1200 on untom:sparse_mean_variance_axis into c09a4ca on scikit-learn:master.

@@ -61,6 +61,9 @@ def mean_variance_axis0(X):
X: CSR or CSC sparse matrix, shape (n_samples, n_features)
Input data.

axis: int (either 0 or 1)
Axis along which the axis should be computed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently, you also accept -1 and -2.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, out of consistency with other methods in sklearn (and scipy in general) that handle the axis argument this way as well (e.g. count_nonzero in the same file), but those function don't document that usage, either. I assumed this is an sklearn convention.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. see also https://github.com/scipy/scipy/blob/master/scipy/sparse/compressed.py which uses the same convention thoughout, but never documents it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whereas the numpy.matrix.std has a docstring that says:

Refer to `numpy.std` for full documentation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

grepping through the numpy and scipy codebases, it seems like the most common way is to describe this as "axis : int" without specifying which values are allowed (which makes sense for numpy given that an ndarray can have any number of axis), while the scipy.sparse module explicitly lists 0 and 1 as valid arguments (never -1 and -2, although the functions in questions do accept those values as well). Personally I think the way I documented it makes sense, as it's consistent with scipy.sparse

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

count_nonzero is a backport from NumPy. We don't generally accept funny axes, since data is assumed to be 2-d almost everywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what do you suggest would be the right thing to do? Remove -2/-1 as accepted values?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'd get rid of those. They're unlikely to be more useful than confusing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps more to the point, unlike scipy.sparse, utils here are not public.

On 3 September 2014 04:21, Lars Buitinck [email protected] wrote:

In sklearn/utils/sparsefuncs.py:

@@ -61,6 +61,9 @@ def mean_variance_axis0(X):
X: CSR or CSC sparse matrix, shape (n_samples, n_features)
Input data.

  • axis: int (either 0 or 1)
  •    Axis along which the axis should be computed.
    

Yes, I'd get rid of those. They're unlikely to be more useful than
confusing.


Reply to this email directly or view it on GitHub
https://github.com/scikit-learn/scikit-learn/pull/3622/files#r17004979.

@untom untom force-pushed the sparse_mean_variance_axis branch from 99c8f48 to 50e20a0 Compare September 3, 2014 08:51
@untom
Copy link
Contributor Author

untom commented Sep 3, 2014

I've removed the support for axis = -1 and axis = -2, and rebased the commit so it doesn't clutter up the log.

@untom
Copy link
Contributor Author

untom commented Sep 3, 2014

Travis CI error was due to out-of-error on Python 3.4, I don't think this is related to my patch

@larsmans
Copy link
Member

larsmans commented Sep 3, 2014

+1 for merge. Also, thank you for taking the time to split up #2514. I would still like to see (most of) that merged.

@larsmans larsmans changed the title ENH Add 'axis' argument to sparsefuncs.mean_variance_axis [MRG+1] Add 'axis' argument to sparsefuncs.mean_variance_axis Sep 3, 2014
@@ -71,10 +74,20 @@ def mean_variance_axis0(X):
Feature-wise variances

"""
if axis != 0 and axis != 1:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style / nitpick: this could be written as if axis not in (0, 1):

But ok, it's not really important.

@arjoly
Copy link
Member

arjoly commented Sep 3, 2014

looks good to merge

@untom
Copy link
Contributor Author

untom commented Sep 3, 2014

I'll fix arjoly's last comment, that should also kick off Travis CI again, just to be sure

@coveralls
Copy link

Coverage Status

Coverage increased (+0.01%) when pulling 3cec589 on untom:sparse_mean_variance_axis into 8dc8995 on scikit-learn:master.

@larsmans
Copy link
Member

larsmans commented Sep 3, 2014

Merged as 52adb5c and 842d80a with a tiny PEP8 fix (a missing blank line). Thanks!

@larsmans larsmans closed this Sep 3, 2014
@untom untom deleted the sparse_mean_variance_axis branch September 3, 2014 12:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants