-
-
Notifications
You must be signed in to change notification settings - Fork 26.6k
[WIP] Faster manhattan distance for sparse and dense matrices #14986
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
jnothman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly cosmetics, thanks!
sklearn/metrics/pairwise_fast.pyx
Outdated
| if i < X_indptr[px+1]: ix = X_indices[i] | ||
| if j < Y_indptr[py+1]: iy = Y_indices[j] | ||
|
|
||
| if ix==iy: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spaces around binary operators please
sklearn/metrics/pairwise_fast.pyx
Outdated
| j = Y_indptr[py] | ||
| d = 0.0 | ||
| while i < X_indptr[px+1] and j < Y_indptr[py+1]: | ||
| if i < X_indptr[px+1]: ix = X_indices[i] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't put two statements on a line
See pep8
| """ | ||
| cdef double[::1] row = np.empty(n_features) | ||
| cdef np.npy_intp ix, iy, j | ||
| """Pairwise L1 distances for CSR matrices. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The algorithm may be simpler for CSC...?
sklearn/metrics/pairwise_fast.pyx
Outdated
| Usage: | ||
| >>> D = np.zeros(X.shape[0], Y.shape[0]) | ||
| >>> cython_manhattan(X.data, X.indices, X.indptr, | ||
| ... Y.data, Y.indices, Y.indptr, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're assuming that X.has_sorted_indices
sklearn/metrics/pairwise_fast.pyx
Outdated
| s = 0 | ||
| for k in range(x.shape[1]): | ||
| s = s + fabs(x[i,k]-y[j,k]) | ||
| out[i,j]=s |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file should end with a newline character
sklearn/metrics/pairwise_fast.pyx
Outdated
| j = j+1 | ||
|
|
||
| if i== X_indptr[px+1]: | ||
| while j < Y_indptr[py+1]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can use for here
sklearn/metrics/pairwise.py
Outdated
| D = X[:, np.newaxis, :] - Y[np.newaxis, :, :] | ||
| D = np.abs(D, D) | ||
| return D.reshape((-1, X.shape[1])) | ||
| D = np.empty(shape=(X.shape[0],Y.shape[0])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spaces after commas please
| cdef np.npy_intp px, py, i, j, ix, iy | ||
| cdef double d = 0.0 | ||
|
|
||
| cdef int m = D.shape[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need these variables.
Removed useless variable Removed stray part of docstring
sort_indices() in-place on sparse matrices
We haven't made a decision yet :/
Tests related to this are in |
jeremiedbb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments
sklearn/metrics/pairwise_fast.pyx
Outdated
| #cython: boundscheck=False | ||
| #cython: cdivision=True | ||
| #cython: wraparound=False | ||
| # distutils: extra_compile_args=-fopenmp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need that. We take care of the OpenMP flags in the setup
sklearn/metrics/pairwise_fast.pyx
Outdated
| row[X_indices[j]] = X_data[j] | ||
| for j in range(Y_indptr[iy], Y_indptr[iy + 1]): | ||
| row[Y_indices[j]] -= Y_data[j] | ||
| for px in prange(m): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can avoid 1 indentation by grouping nogil and prange:
for px in prange(m, nogil=True):
...
sklearn/metrics/pairwise_fast.pyx
Outdated
| iy = Y_indices[j] | ||
|
|
||
| if ix == iy: | ||
| d = d + fabs(X_data[i] - Y_data[j]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inplace operation is preferable
d += fabs(X_data[i] - Y_data[j])
Have you tried using just abs ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for inplace operations everywhere i += 1 etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When using prange, the inplace operator signals a reduction (at the level of the for statement where the prange is used), which is not what I want here.
Maybe I should add a comment in the code to make that clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right we want a thread local variable here
sklearn/metrics/pairwise_fast.pyx
Outdated
| cdef double s = 0.0 | ||
| cdef np.npy_intp i, j, k | ||
| with nogil: | ||
| for i in prange(x.shape[0]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you benchmark this function with the environment variable OMP_NUM_THREADS=1 (before any import) against scipy.spatial.distance.cdist(metric='cityblock') ?
|
It would be nice if you could also provide a small benchmark vs master (in both single and multi threaded modes) |
sklearn/metrics/pairwise_fast.pyx
Outdated
| row[X_indices[j]] = X_data[j] | ||
| for j in range(Y_indptr[iy], Y_indptr[iy + 1]): | ||
| row[Y_indices[j]] -= Y_data[j] | ||
| for px in prange(m): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a few comments describing in English what this implementation does, as was done previously.
sklearn/metrics/pairwise_fast.pyx
Outdated
| iy = Y_indices[j] | ||
|
|
||
| if ix == iy: | ||
| d = d + fabs(X_data[i] - Y_data[j]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for inplace operations everywhere i += 1 etc.
|
General comment: if you had not noticed already, I am not quite up to speed with this development process. I read around and try to do the right thing, but please do not hesitate to set me on the right path. |
|
@jeremiedbb, below is a preliminary benchmark/regression test. |
I have no problem with that.
If you made changes that impact the lines a conversation is about, this conversation will be marked as outdated. You don't need to mark it as resolved. And it's better to let the person who started the conversation to confirm that it has been addressed correctly :) |
|
Regarding the benchmarks |
|
About the dense case. It's a bit subtle to get the best performance. The reduction is not vectorized by gcc. The way to vectorize it is to manually unroll the loop. Here's how to do it: def _dense_manhattan(floating[:, :] X, floating[:, :] Y, floating[:, :] out):
cdef:
floating s = 0.0
int n_samples_x = X.shape[0]
int n_samples_y = Y.shape[0]
int n_features = X.shape[1]
np.npy_intp i, j
for i in prange(n_samples_x, nogil=True):
for j in range(n_samples_y):
out[i, j] = _manhattan_1d(&X[i, 0], &Y[j, 0], n_features)
cdef floating _manhattan_1d(floating *x, floating *y, int n_features) nogil:
cdef:
int i
int n = n_features // 4
int rem = n_features % 4
floating result = 0
for i in range(n):
result += (fabs(x[0] - y[0])
+fabs(x[1] - y[1])
+fabs(x[2] - y[2])
+fabs(x[3] - y[3]))
x += 4; y += 4
for i in range(rem):
result += fabs(x[i] - y[i])
return resultWith this you can equal (even slightly outperform) scipy. And scalability is good. However |
|
@jeremiedbb, thanks for looking into this and for pointing out the vectorization opportunities, of which I was not aware. Regarding the motivation for this PR, I was actually trying to improve the performance of the calculation of the laplacian kernel (in metrics/pairwise.py)), which uses the manhattan_distances() function: |
|
Ok makes sense. In that case, what would be even faster is to re-implement the whole kernel computation. because this is sub-optimal. It involves 2 extra loop over the distance matrix. Re-using the above code, it would be more efficient to do something like this: from libc.math cimport exp
def _dense_laplacian_kernel(floating[:, :] X, floating[:, :] Y, floating[:, :] out, int gamma):
cdef:
int n_samples_x = X.shape[0]
int n_samples_y = Y.shape[0]
int n_features = X.shape[1]
np.npy_intp i, j
floating tmp
for i in prange(n_samples_x, nogil=True):
for j in range(n_samples_y):
tmp = _dense_manhattan_1d(&X[i, 0], &Y[j, 0], n_features)
out[i, j] = exp(-gamma * tmp) |
|
Anyway, I think these considerations can be moved to another PR. It will be easier to review if you stick to speed up sparse manhattan for this PR to actually fix the original issue. |
|
OK, I will reinstate the existing implementation of Regarding the sparse case, there is one detail that I am not sure about. |
|
I created PR #15049 to cover just the sparse matrix case. |
Reference Issues/PRs
Fixes enhancement request #14304 "manhattan_distances for sparse matrices is slow"
What does this implement/fix? Explain your changes.
This PR affects primarily metrics/pairwise_fast.pyx.
The new version provides a faster Cython implementation of _sparse_manhattan(), but requires that the matrices have sorted indices.
I also improved the implementation of the dense matrix case, making it less memory demanding.
In both cases, the implementation is now multithreaded, using all cores available (it uses Cython's prange).
I ran existing tests via:
pytest -v metrics/tests/test_pairwise.py
There are 2 issues but they have nothing to do with what I touched.
Any other comments?
This is a first attempt, so I welcome comments and suggestions.
Questions