[MRG+1] Add 'axis' argument to sparsefuncs.mean_variance_axis #3622

untom · 2014-09-02T10:25:27Z

This PR adds an 'axis' argument to sparsefuncs.mean_variance_axis, making it easier to calculate the columnwise mean/variance of sparse matrices.

While switching the codebase over to the new functionality, I noticed that VarianceThreshold needlessly converted CSC matrices to CSR. This PR also includes a commit to fix this, as the change fits naturally with the other changes.

(If the immediate need for the new functionality is not obvious: This PR is the first in a series that tries to split up #2514 into a few smaller, easy-to-digest PRs. I plan to commit other PRs that will make use of this one)

coveralls · 2014-09-02T10:39:05Z

Coverage increased (+0.01%) when pulling 8036a2e on untom:sparse_mean_variance_axis into c09a4ca on scikit-learn:master.

coveralls · 2014-09-02T10:58:15Z

Coverage increased (+0.01%) when pulling f09a94f on untom:sparse_mean_variance_axis into c09a4ca on scikit-learn:master.

arjoly · 2014-09-02T11:10:09Z

sklearn/utils/sparsefuncs.py

+    if axis < 0:
+        axis += 2
+    if (axis != 0) and (axis != 1):
+        raise ValueError("Invalid axis, use 0 for rows, or 1 for columns")


Can you tell to the user what was the given axis?

This is useful for introspecting code and debugging.

coveralls · 2014-09-02T11:43:04Z

Coverage increased (+0.01%) when pulling e1b1200 on untom:sparse_mean_variance_axis into c09a4ca on scikit-learn:master.

arjoly · 2014-09-02T11:54:42Z

sklearn/utils/sparsefuncs.py

@@ -61,6 +61,9 @@ def mean_variance_axis0(X):
    X: CSR or CSC sparse matrix, shape (n_samples, n_features)
        Input data.

+    axis: int (either 0 or 1)
+        Axis along which the axis should be computed.


Apparently, you also accept -1 and -2.

True, out of consistency with other methods in sklearn (and scipy in general) that handle the axis argument this way as well (e.g. count_nonzero in the same file), but those function don't document that usage, either. I assumed this is an sklearn convention.

e.g. see also https://github.com/scipy/scipy/blob/master/scipy/sparse/compressed.py which uses the same convention thoughout, but never documents it.

Whereas the numpy.matrix.std has a docstring that says:

Refer to `numpy.std` for full documentation.

grepping through the numpy and scipy codebases, it seems like the most common way is to describe this as "axis : int" without specifying which values are allowed (which makes sense for numpy given that an ndarray can have any number of axis), while the scipy.sparse module explicitly lists 0 and 1 as valid arguments (never -1 and -2, although the functions in questions do accept those values as well). Personally I think the way I documented it makes sense, as it's consistent with scipy.sparse

count_nonzero is a backport from NumPy. We don't generally accept funny axes, since data is assumed to be 2-d almost everywhere.

So what do you suggest would be the right thing to do? Remove -2/-1 as accepted values?

Yes, I'd get rid of those. They're unlikely to be more useful than confusing.

Perhaps more to the point, unlike scipy.sparse, utils here are not public.

On 3 September 2014 04:21, Lars Buitinck [email protected] wrote:

In sklearn/utils/sparsefuncs.py:

@@ -61,6 +61,9 @@ def mean_variance_axis0(X):
X: CSR or CSC sparse matrix, shape (n_samples, n_features)
Input data.

axis: int (either 0 or 1)

Axis along which the axis should be computed.

Yes, I'd get rid of those. They're unlikely to be more useful than
confusing.

—
Reply to this email directly or view it on GitHub
https://github.com/scikit-learn/scikit-learn/pull/3622/files#r17004979.

untom · 2014-09-03T08:52:42Z

I've removed the support for axis = -1 and axis = -2, and rebased the commit so it doesn't clutter up the log.

untom · 2014-09-03T09:32:54Z

Travis CI error was due to out-of-error on Python 3.4, I don't think this is related to my patch

larsmans · 2014-09-03T09:37:35Z

+1 for merge. Also, thank you for taking the time to split up #2514. I would still like to see (most of) that merged.

arjoly · 2014-09-03T09:55:49Z

sklearn/utils/sparsefuncs.py

@@ -71,10 +74,20 @@ def mean_variance_axis0(X):
        Feature-wise variances

    """
+    if axis != 0 and axis != 1:


Style / nitpick: this could be written as if axis not in (0, 1):

But ok, it's not really important.

arjoly · 2014-09-03T09:57:24Z

looks good to merge

untom · 2014-09-03T10:21:44Z

I'll fix arjoly's last comment, that should also kick off Travis CI again, just to be sure

coveralls · 2014-09-03T10:34:14Z

Coverage increased (+0.01%) when pulling 3cec589 on untom:sparse_mean_variance_axis into 8dc8995 on scikit-learn:master.

larsmans · 2014-09-03T11:48:34Z

Merged as 52adb5c and 842d80a with a tiny PEP8 fix (a missing blank line). Thanks!

untom force-pushed the sparse_mean_variance_axis branch from 8036a2e to f09a94f Compare September 2, 2014 10:45

arjoly reviewed Sep 2, 2014
View reviewed changes

untom mentioned this pull request Sep 2, 2014

[MRG+1-1] Refactoring and expanding sklearn.preprocessing scaling #2514

Closed

arjoly reviewed Sep 2, 2014
View reviewed changes

Thomas Unterthiner added 2 commits September 3, 2014 10:51

ENH Add 'axis' argument to sparsefuncs.mean_variance_axis

eb1de0e

ENH improved CSC matrix handling in VarianceThreshold

50e20a0

untom force-pushed the sparse_mean_variance_axis branch from 99c8f48 to 50e20a0 Compare September 3, 2014 08:51

larsmans changed the title ~~ENH Add 'axis' argument to sparsefuncs.mean_variance_axis~~ [MRG+1] Add 'axis' argument to sparsefuncs.mean_variance_axis Sep 3, 2014

arjoly reviewed Sep 3, 2014
View reviewed changes

COSMIT better check for valid axis argument

3cec589

larsmans closed this Sep 3, 2014

untom deleted the sparse_mean_variance_axis branch September 3, 2014 12:00

Uh oh!

[MRG+1] Add 'axis' argument to sparsefuncs.mean_variance_axis #3622

[MRG+1] Add 'axis' argument to sparsefuncs.mean_variance_axis #3622

Uh oh!

Conversation

untom commented Sep 2, 2014

Uh oh!

coveralls commented Sep 2, 2014

Uh oh!

coveralls commented Sep 2, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coveralls commented Sep 2, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

untom commented Sep 3, 2014

Uh oh!

untom commented Sep 3, 2014

Uh oh!

larsmans commented Sep 3, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arjoly commented Sep 3, 2014

Uh oh!

untom commented Sep 3, 2014

Uh oh!

coveralls commented Sep 3, 2014

Uh oh!

larsmans commented Sep 3, 2014

Uh oh!

Uh oh!