-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+1] ENH: Swap rows in sparsefuncs #3104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
? you might as well define |
# If non zero rows are equal in mth and nth row, then swapping becomes | ||
# easy. | ||
if nz_m == nz_n: | ||
mask = X.indices[m_ptr1: m_ptr2].copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the space after :
on this and similar lines.
I'm not for implementing unused utility functions. |
@larsmans It is not exactly unused, there are times in which one has to do column wise swapping in lars_path. Please look at the latest commit. |
a] Replaced numpy slicing with concatanate b] Added swap_sparse_column
Did you forget to push? The GH diff only shows it being used in the tests... |
As @jnothman, I am curious to know if it is faster than with the scipy implementation. |
Swapping using dot product to perform the swap and your implementation. The comment of @jnothman might enlighten you
|
@jnothman Any more comments or can we get this in? |
m, n = n, m | ||
|
||
indptr = X.indptr | ||
m_ptr1 = indptr[m] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't particularly like the m_ptr1/2
nomenclature. Perhaps m_start
, m_stop
?
As per arjoly, can we get a benchmark comparison with: def swap_rows(X, m, n):
idx = np.arange(X.shape[0])
idx[m], idx[n] = idx[n], idx[m]
return X[idx, :] |
@jnothman @arjoly Considerably faster.
|
Yes, I think that's fair enough. +1 from me. On 27 April 2014 04:22, Coveralls [email protected] wrote:
|
Ping @arjoly @larsmans @agramfort ? |
n : int, index of second sample | ||
""" | ||
if m < 0: | ||
m += X.shape[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just in case someone passes in a mutable (for instance a numpy int instead of a Python int), I would do 'm = m + X.shape[0]' here to avoid side effects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just out of curiousity (and for learning), could you please tell me how would the behaviour of the code that I've written, change for a numpy int ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get that:
>>> x = y = np.int(0)
>>> y += 1
>>> x, y
(0, 1)
Gael, where you thinking of 0-d arrays?
>>> x = y = array(0)
>>> x += 1
>>> x, y
(array(1), array(1))
I'm not sure if guarding for this is worthwhile. It's very likely that a later refactoring round undoes this; callers should take care not to produce these values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I was.
Maybe I am paranoid but each time I see an in place modification I check that it does not apply to input arguments. Thus I think that l might see the potential problem also in a later code review.
I think that it is good practice not to let such a code. If m is an intégré there is no computational benefit to doing the +=.
@@ -56,3 +57,109 @@ def inplace_column_scale(X, scale):
else:
raise TypeError(
"Unsupported type; expected a CSR or CSC sparse matrix.")
+
+
+def swap_row_csc(X, m, n):
- """
- Swaps two rows of a CSC matrix in-place.
- Parameters
- X : scipy.sparse.csc_matrix, shape=(n_samples, n_features)
- m : int, index of first sample
- n : int, index of second sample
- """
- if m < 0:
I don't get that:m += X.shape[0]
x = y = np.int(0)
y += 1
x, y
(0, 1)
Gael, where you thinking of 0-d arrays?x = y = array(0)
x += 1
x, y
(array(1), array(1))
I'm not sure if guarding for this is worthwhile. It's very likely that a later refactoring round undoes this; callers should take care not to produce these values.
—
Reply to this email directly or view it on GitHub.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a very Pythonic shorthand, though. I think it would be cleaner to typecheck and raise a TypeError
for a 0-d array:
>>> isinstance(np.array(1), (np.integer, Integral))
False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GaelVaroquaux Is it just 0-D numpy arrays, or are there other cases also? Then it might be better to do m = m + X.shape[0]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a very Pythonic shorthand, though. I think it would be cleaner to
typecheck and raise a TypeError for a 0-d array:isinstance(np.array(1), (np.integer, Integral))
False
If you whish.
1. Minor changes to docs 2. Replaced swap with inplace_swap
@GaelVaroquaux @larsmans I can haz merge? |
Yes you can :). Merging |
[MRG+1] ENH: Swap rows in sparsefuncs
return inplace_swap_row_csr(X, m, n) | ||
else: | ||
raise TypeError( | ||
"Unsupported type; expected a CSR or CSC sparse matrix.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coming late to the party but it would be nice to report the actual type of sparse matrix with X.getformat()
in the error message.
Numpy version of #3087
cc @jaidevd @jnothman