Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[MRG+1] ENH: Swap rows in sparsefuncs #3104

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 30, 2014

Conversation

MechCoder
Copy link
Member

Numpy version of #3087

cc @jaidevd @jnothman

@jnothman
Copy link
Member

? you might as well define swap_column as well, seeing as CSR and CSC are transposes.

# If non zero rows are equal in mth and nth row, then swapping becomes
# easy.
if nz_m == nz_n:
mask = X.indices[m_ptr1: m_ptr2].copy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the space after : on this and similar lines.

@larsmans
Copy link
Member

you might as well define swap_column as well, seeing as CSR and CSC are transposes.

I'm not for implementing unused utility functions.

@MechCoder
Copy link
Member Author

@larsmans It is not exactly unused, there are times in which one has to do column wise swapping in lars_path. Please look at the latest commit.

a] Replaced numpy slicing with concatanate
b] Added swap_sparse_column
@larsmans
Copy link
Member

Did you forget to push? The GH diff only shows it being used in the tests...

@arjoly
Copy link
Member

arjoly commented Apr 24, 2014

As @jnothman, I am curious to know if it is faster than with the scipy implementation.

@MechCoder
Copy link
Member Author

@larsmans I've pushed it. Please look at L142. Swapping columns of a CSC matrix, is the same as swapping the rows of a CSR matrix represented with the same indices, indptr and data right?

@arjoly Could you tell what implementation you are talking about? I'm not clear.

@arjoly
Copy link
Member

arjoly commented Apr 24, 2014

Swapping using dot product to perform the swap and your implementation. The comment of @jnothman might enlighten you

How much faster is this than X[permutation_of_range] in the dev version of scipy? That currently involves a matrix multiplication, and performs two copies. Still, I'm curious.

@MechCoder
Copy link
Member Author

@jnothman Any more comments or can we get this in?

m, n = n, m

indptr = X.indptr
m_ptr1 = indptr[m]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't particularly like the m_ptr1/2 nomenclature. Perhaps m_start, m_stop?

@jnothman
Copy link
Member

As per arjoly, can we get a benchmark comparison with:

def swap_rows(X, m, n):
    idx = np.arange(X.shape[0])
    idx[m], idx[n] = idx[n], idx[m]
    return X[idx, :]

@MechCoder
Copy link
Member Author

@jnothman @arjoly Considerably faster.
In [8]: sp = sparse.rand(1000, 1000, 0.1)

In [9]: csr = sp.tocsr()

In [10]: %timeit swap_rows(csr, 500, 1)
100 loops, best of 3: 4.6 ms per loop

In [11]: %timeit swap_row(csr, 500, 1)
10000 loops, best of 3: 176 us per loop

 In [13]: %timeit swap_rows(csc, 500, 1)
 100 loops, best of 3: 4.88 ms per loop

 In [14]: %timeit swap_row(csc, 500, 1)
 1000 loops, best of 3: 988 us per loop

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling 31d438e on MechCoder:swap_sparse into 7c140a1 on scikit-learn:master.

@jnothman
Copy link
Member

Yes, I think that's fair enough.

+1 from me.

On 27 April 2014 04:22, Coveralls [email protected] wrote:

[image: Coverage Status] https://coveralls.io/builds/715616

Coverage remained the same when pulling 31d438e
31d438e
on MechCoder:swap_sparse
into 7c140a1
7c140a1
on scikit-learn:master
.


Reply to this email directly or view it on GitHubhttps://github.com//pull/3104#issuecomment-41476454
.

@jnothman jnothman changed the title ENH: Swap rows in sparsefuncs [MRG+1] ENH: Swap rows in sparsefuncs Apr 26, 2014
@MechCoder
Copy link
Member Author

Ping @arjoly @larsmans @agramfort ?

n : int, index of second sample
"""
if m < 0:
m += X.shape[0]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just in case someone passes in a mutable (for instance a numpy int instead of a Python int), I would do 'm = m + X.shape[0]' here to avoid side effects.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just out of curiousity (and for learning), could you please tell me how would the behaviour of the code that I've written, change for a numpy int ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get that:

>>> x = y = np.int(0)
>>> y += 1
>>> x, y
(0, 1)

Gael, where you thinking of 0-d arrays?

>>> x = y = array(0)
>>> x += 1
>>> x, y
(array(1), array(1))

I'm not sure if guarding for this is worthwhile. It's very likely that a later refactoring round undoes this; callers should take care not to produce these values.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I was. 

Maybe I am paranoid but each time I see an in place modification I check that it does not apply to input arguments. Thus I think that l might see the potential problem also in a later code review. 

I think that it is good practice not to let such a code.  If m is an intégré there is no computational benefit to doing the +=.

-------- Original message --------
From: Lars Buitinck [email protected]
Date:29/04/2014 23:37 (GMT+01:00)
To: scikit-learn/scikit-learn [email protected]
Cc: Gael Varoquaux [email protected]
Subject: Re: [scikit-learn] [MRG+1] ENH: Swap rows in sparsefuncs (#3104)
In sklearn/utils/sparsefuncs.py:

@@ -56,3 +57,109 @@ def inplace_column_scale(X, scale):
else:
raise TypeError(
"Unsupported type; expected a CSR or CSC sparse matrix.")
+
+
+def swap_row_csc(X, m, n):

  • """
  • Swaps two rows of a CSC matrix in-place.
  • Parameters

  • X : scipy.sparse.csc_matrix, shape=(n_samples, n_features)
  • m : int, index of first sample
  • n : int, index of second sample
  • """
  • if m < 0:
  •    m += X.shape[0]
    
    I don't get that:

x = y = np.int(0)
y += 1
x, y
(0, 1)
Gael, where you thinking of 0-d arrays?

x = y = array(0)
x += 1
x, y
(array(1), array(1))
I'm not sure if guarding for this is worthwhile. It's very likely that a later refactoring round undoes this; callers should take care not to produce these values.


Reply to this email directly or view it on GitHub.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a very Pythonic shorthand, though. I think it would be cleaner to typecheck and raise a TypeError for a 0-d array:

>>> isinstance(np.array(1), (np.integer, Integral))
False

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@GaelVaroquaux Is it just 0-D numpy arrays, or are there other cases also? Then it might be better to do m = m + X.shape[0]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a very Pythonic shorthand, though. I think it would be cleaner to
typecheck and raise a TypeError for a 0-d array:

isinstance(np.array(1), (np.integer, Integral))
False

If you whish.

1. Minor changes to docs
2. Replaced swap with inplace_swap
@MechCoder
Copy link
Member Author

@GaelVaroquaux @larsmans I can haz merge?

@GaelVaroquaux
Copy link
Member

Yes you can :). Merging

GaelVaroquaux added a commit that referenced this pull request Apr 30, 2014
[MRG+1] ENH: Swap rows in sparsefuncs
@GaelVaroquaux GaelVaroquaux merged commit b842d4d into scikit-learn:master Apr 30, 2014
@MechCoder MechCoder deleted the swap_sparse branch April 30, 2014 18:59
return inplace_swap_row_csr(X, m, n)
else:
raise TypeError(
"Unsupported type; expected a CSR or CSC sparse matrix.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Coming late to the party but it would be nice to report the actual type of sparse matrix with X.getformat() in the error message.

@MechCoder
Copy link
Member Author

@ogrisel I've sent a PR at #3140

I would also appreciate it, if you would be able to look at my continuation of your work at #3102

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants