-
-
Notifications
You must be signed in to change notification settings - Fork 26.6k
[MRG] Use GEMM in _update_dict #11420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Use GEMM in _update_dict #11420
Conversation
23228fe to
e48f0f1
Compare
This fuses the multiplication and addition together into the same computation. Also includes the sign change as well. Not to mention that SciPy linalg routines return Fortran ordered arrays (even if they were C-ordered originally) unlike NumPy's `dot`. So this avoids a copy as well.
e48f0f1 to
78c0570
Compare
jnothman
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth showing a benchmark?
| ger, = linalg.get_blas_funcs(('ger',), (dictionary, code)) | ||
| # Residuals, computed with BLAS for speed and efficiency | ||
| # R <- -1.0 * U * V^T + 1.0 * Y | ||
| R = gemm(-1.0, dictionary, code, 1.0, Y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This produces a Fortran array I presume?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would be worth making it explicit that we expect a fortran array in the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestingly, the SciPy linalg functions correctly handle C ordered arrays as input.
In [1]: import numpy as np
In [2]: import scipy.linalg as linalg
In [3]: np.random.seed(0)
In [4]: a = np.random.random((2, 3))
In [5]: b = np.random.random((3, 4))
In [6]: c = np.random.random((2, 4))
In [7]: gemm, = linalg.get_blas_funcs(('gemm',), (a, b, c))
In [8]: gemm(1.0, a, b, 1.0, c)
Out[8]:
array([[1.62736178, 1.79020759, 1.92593582, 2.17344607],
[1.08121316, 1.54678646, 0.89707181, 1.77876957]])
In [9]: gemm(1.0, a, b, 1.0, c).flags
Out[9]:
C_CONTIGUOUS : False
F_CONTIGUOUS : True
OWNDATA : True
WRITEABLE : True
ALIGNED : True
WRITEBACKIFCOPY : False
UPDATEIFCOPY : FalseThat said, we appear to already be forcing dictionary and code to Fortran ordered arrays anywhere _update_dict is called.
|
This is the main take away. In [1]: import numpy as np
In [2]: a = np.random.random((5, 6)).copy(order="F")
In [3]: b = np.random.random((6, 7)).copy(order="F")
In [4]: np.isfortran(np.dot(a, b))
Out[4]: FalseMeaning calling |
|
Ah, I get it now. LGTM. But I'm no blas expert, so let's get a second opinion. |
|
Any ideas who might be a good person to ask to look at this? |
|
Would you be able to review, @ogrisel? |
|
Friendly nudge 😉 |
ogrisel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. Out of curiosity could you please run a quick benchmark to evaluate the performance impact of that fix on a typical (smallish) problem of yours?
| ger, = linalg.get_blas_funcs(('ger',), (dictionary, code)) | ||
| # Residuals, computed with BLAS for speed and efficiency | ||
| # R <- -1.0 * U * V^T + 1.0 * Y | ||
| R = gemm(-1.0, dictionary, code, 1.0, Y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would be worth making it explicit that we expect a fortran array in the comment.
|
Benchmarking shows the time change is pretty close to negligible, which is expected since the same operations are done in either case. Though memory usage should be reduced as we are dealing only in Fortran arrays now instead of converting as before thus avoiding a copy. |
|
@ogrisel I added the comment you suggested. Merging. |
|
Thanks @agramfort. Sorry @ogrisel, thought you were concerned about the requirements of input arrays. |
Reference Issues/PRs
What does this implement/fix? Explain your changes.
Avoids a copy in
_update_dictand fuses two operations together using the BLAS GEMM operation.Any other comments?