[MRG+1] ENH: Swap rows in sparsefuncs #3104

MechCoder · 2014-04-24T11:51:01Z

Numpy version of #3087

jnothman · 2014-04-24T12:18:53Z

? you might as well define swap_column as well, seeing as CSR and CSC are transposes.

jnothman · 2014-04-24T12:20:02Z

sklearn/utils/sparsefuncs.py

+    # If non zero rows are equal in mth and nth row, then swapping becomes
+    # easy.
+    if nz_m == nz_n:
+        mask = X.indices[m_ptr1: m_ptr2].copy()


Please remove the space after : on this and similar lines.

larsmans · 2014-04-24T15:20:17Z

you might as well define swap_column as well, seeing as CSR and CSC are transposes.

I'm not for implementing unused utility functions.

MechCoder · 2014-04-24T15:27:12Z

@larsmans It is not exactly unused, there are times in which one has to do column wise swapping in lars_path. Please look at the latest commit.

a] Replaced numpy slicing with concatanate b] Added swap_sparse_column

larsmans · 2014-04-24T15:46:44Z

Did you forget to push? The GH diff only shows it being used in the tests...

arjoly · 2014-04-24T17:00:40Z

As @jnothman, I am curious to know if it is faster than with the scipy implementation.

MechCoder · 2014-04-24T18:03:31Z

@larsmans I've pushed it. Please look at L142. Swapping columns of a CSC matrix, is the same as swapping the rows of a CSR matrix represented with the same indices, indptr and data right?

@arjoly Could you tell what implementation you are talking about? I'm not clear.

arjoly · 2014-04-24T19:35:11Z

Swapping using dot product to perform the swap and your implementation. The comment of @jnothman might enlighten you

How much faster is this than X[permutation_of_range] in the dev version of scipy? That currently involves a matrix multiplication, and performs two copies. Still, I'm curious.

MechCoder · 2014-04-25T13:47:44Z

@jnothman Any more comments or can we get this in?

jnothman · 2014-04-26T09:11:08Z

sklearn/utils/sparsefuncs.py

+        m, n = n, m
+
+    indptr = X.indptr
+    m_ptr1 = indptr[m]


I don't particularly like the m_ptr1/2 nomenclature. Perhaps m_start, m_stop?

jnothman · 2014-04-26T09:16:00Z

As per arjoly, can we get a benchmark comparison with:

def swap_rows(X, m, n):
    idx = np.arange(X.shape[0])
    idx[m], idx[n] = idx[n], idx[m]
    return X[idx, :]

MechCoder · 2014-04-26T18:18:36Z

@jnothman @arjoly Considerably faster.
In [8]: sp = sparse.rand(1000, 1000, 0.1)

In [9]: csr = sp.tocsr()

In [10]: %timeit swap_rows(csr, 500, 1)
100 loops, best of 3: 4.6 ms per loop

In [11]: %timeit swap_row(csr, 500, 1)
10000 loops, best of 3: 176 us per loop

 In [13]: %timeit swap_rows(csc, 500, 1)
 100 loops, best of 3: 4.88 ms per loop

 In [14]: %timeit swap_row(csc, 500, 1)
 1000 loops, best of 3: 988 us per loop

coveralls · 2014-04-26T18:22:13Z

Coverage remained the same when pulling 31d438e on MechCoder:swap_sparse into 7c140a1 on scikit-learn:master.

jnothman · 2014-04-26T22:10:03Z

Yes, I think that's fair enough.

+1 from me.

On 27 April 2014 04:22, Coveralls [email protected] wrote:

[image: Coverage Status] https://coveralls.io/builds/715616

Coverage remained the same when pulling 31d438e
31d438e
on MechCoder:swap_sparse into 7c140a1
7c140a1
on scikit-learn:master.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/3104#issuecomment-41476454
.

MechCoder · 2014-04-29T17:00:36Z

Ping @arjoly @larsmans @agramfort ?

GaelVaroquaux · 2014-04-29T17:04:03Z

sklearn/utils/sparsefuncs.py

+    n : int, index of second sample
+    """
+    if m < 0:
+        m += X.shape[0]


Just in case someone passes in a mutable (for instance a numpy int instead of a Python int), I would do 'm = m + X.shape[0]' here to avoid side effects.

Just out of curiousity (and for learning), could you please tell me how would the behaviour of the code that I've written, change for a numpy int ?

I don't get that:

>>> x = y = np.int(0) >>> y += 1 >>> x, y (0, 1)

Gael, where you thinking of 0-d arrays?

>>> x = y = array(0) >>> x += 1 >>> x, y (array(1), array(1))

I'm not sure if guarding for this is worthwhile. It's very likely that a later refactoring round undoes this; callers should take care not to produce these values.

Yes I was.

Maybe I am paranoid but each time I see an in place modification I check that it does not apply to input arguments. Thus I think that l might see the potential problem also in a later code review.

I think that it is good practice not to let such a code. If m is an intégré there is no computational benefit to doing the +=.

-------- Original message --------
From: Lars Buitinck [email protected]
Date:29/04/2014 23:37 (GMT+01:00)
To: scikit-learn/scikit-learn [email protected]
Cc: Gael Varoquaux [email protected]
Subject: Re: [scikit-learn] [MRG+1] ENH: Swap rows in sparsefuncs (#3104)

In sklearn/utils/sparsefuncs.py:

@@ -56,3 +57,109 @@ def inplace_column_scale(X, scale):
else:
raise TypeError(
"Unsupported type; expected a CSR or CSC sparse matrix.")
+
+
+def swap_row_csc(X, m, n):

"""

Swaps two rows of a CSC matrix in-place.

Parameters

X : scipy.sparse.csc_matrix, shape=(n_samples, n_features)

m : int, index of first sample

n : int, index of second sample

"""

if m < 0:

m += X.shape[0]
I don't get that:

x = y = np.int(0)
y += 1
x, y
(0, 1)
Gael, where you thinking of 0-d arrays?

x = y = array(0)
x += 1
x, y
(array(1), array(1))
I'm not sure if guarding for this is worthwhile. It's very likely that a later refactoring round undoes this; callers should take care not to produce these values.

—
Reply to this email directly or view it on GitHub.

It's a very Pythonic shorthand, though. I think it would be cleaner to typecheck and raise a TypeError for a 0-d array:

>>> isinstance(np.array(1), (np.integer, Integral)) False

@GaelVaroquaux Is it just 0-D numpy arrays, or are there other cases also? Then it might be better to do m = m + X.shape[0]

It's a very Pythonic shorthand, though. I think it would be cleaner to
typecheck and raise a TypeError for a 0-d array:

isinstance(np.array(1), (np.integer, Integral))
False

If you whish.

1. Minor changes to docs 2. Replaced swap with inplace_swap

MechCoder · 2014-04-30T18:13:05Z

@GaelVaroquaux @larsmans I can haz merge?

GaelVaroquaux · 2014-04-30T18:58:55Z

Yes you can :). Merging

[MRG+1] ENH: Swap rows in sparsefuncs

ogrisel · 2014-05-09T15:22:21Z

sklearn/utils/sparsefuncs.py

+        return inplace_swap_row_csr(X, m, n)
+    else:
+        raise TypeError(
+            "Unsupported type; expected a CSR or CSC sparse matrix.")


Coming late to the party but it would be nice to report the actual type of sparse matrix with X.getformat() in the error message.

MechCoder · 2014-05-10T06:01:59Z

@ogrisel I've sent a PR at #3140

I would also appreciate it, if you would be able to look at my continuation of your work at #3102

ENH: Swap rows in sparsefuncs

5119d68

jnothman reviewed Apr 24, 2014
View reviewed changes

Made the following changes

924db9b

a] Replaced numpy slicing with concatanate b] Added swap_sparse_column

jnothman reviewed Apr 26, 2014
View reviewed changes

COSMIT: Replaced ptr1/2 with start/stop

31d438e

jnothman changed the title ~~ENH: Swap rows in sparsefuncs~~ [MRG+1] ENH: Swap rows in sparsefuncs Apr 26, 2014

GaelVaroquaux reviewed Apr 29, 2014
View reviewed changes

Made the following changes

fc634f9

1. Minor changes to docs 2. Replaced swap with inplace_swap

GaelVaroquaux added a commit that referenced this pull request Apr 30, 2014

Merge pull request #3104 from MechCoder/swap_sparse

b842d4d

[MRG+1] ENH: Swap rows in sparsefuncs

GaelVaroquaux merged commit b842d4d into scikit-learn:master Apr 30, 2014

MechCoder deleted the swap_sparse branch April 30, 2014 18:59

ogrisel reviewed May 9, 2014
View reviewed changes

Uh oh!

[MRG+1] ENH: Swap rows in sparsefuncs #3104

[MRG+1] ENH: Swap rows in sparsefuncs #3104

Uh oh!

Conversation

MechCoder commented Apr 24, 2014

Uh oh!

jnothman commented Apr 24, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

larsmans commented Apr 24, 2014

Uh oh!

MechCoder commented Apr 24, 2014

Uh oh!

larsmans commented Apr 24, 2014

Uh oh!

arjoly commented Apr 24, 2014

Uh oh!

MechCoder commented Apr 24, 2014

Uh oh!

arjoly commented Apr 24, 2014

Uh oh!

MechCoder commented Apr 25, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnothman commented Apr 26, 2014

Uh oh!

MechCoder commented Apr 26, 2014

Uh oh!

coveralls commented Apr 26, 2014

Uh oh!

jnothman commented Apr 26, 2014

Uh oh!

MechCoder commented Apr 29, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MechCoder commented Apr 30, 2014

Uh oh!

GaelVaroquaux commented Apr 30, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MechCoder commented May 10, 2014

Uh oh!

Uh oh!