-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
Replace np.einsum with np.tensordot in _upsampled_dft #3710
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace np.einsum with np.tensordot in _upsampled_dft #3710
Conversation
_upsampled_dft uses np.einsum to compute the upsampled DFT of an image. This is not easily generalizable and in numpy < 1.12 is (extremely) slow because no optimize flag is available. np.einsum can be replaced with sequential calls to np.tensordot which is generalizable to higher dimensions.
@taylor-scott very nice! I'm not super concerned with supporting NumPy 1.11, which is getting a bit long in the tooth, but the code is significantly cleaner now so I think this is a worthwhile change either way! Regarding benchmarks, I think it's a good idea to add a couple (one for 2D and one for 3D), since performance is part of the motivation for this PR. I recommend you look at https://github.com/scikit-image/scikit-image/blob/8135c2f3505d77134162846abf1b06c0b18d5a56/benchmarks/benchmark_morphology.py for how to parameterise tests, and then add a |
By the way, if this work sounds too onerous, we can do it the old-fashioned way: run a benchmark script on your machine and share with us the script and the results. =) |
Honestly, I wouldn't worry about numpy 1.11 much. It is almost 3 years old now, so not so important. I should write these out somewhere:
or
Then to test your benchmark
To compare your results to master
you might have to swap the order the branch names. |
Hello @taylor-scott! Thanks for updating the PR.
Comment last updated on February 08, 2019 at 17:53 Hours UTC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So much more readable; thank you!
@@ -27,7 +27,7 @@ def _upsampled_dft(data, upsampled_region_size, | |||
|
|||
Parameters | |||
---------- | |||
data : 2D or 3D ndarray | |||
data : nD ndarray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just ndarray
?
Thanks for the help on benchmarking @jni and @hmaarrfk. I've pushed a benchmark file to this PR. It works with I've run this this script several times and I've gotten pretty consistent results (Python 3.7, numpy 1.16.1). (Sorry it's a little long -- the easiest way to compare einsum and tensordot was to include the original and modified _upsampled_dft functions in the file.) Results are here. Highlights:
Thoughts on using tensordot in the 2D case? It's slower, but the code cleaner with no special cases. |
On Mon, 04 Feb 2019 21:24:22 -0800, Taylor D. Scott wrote:
Thoughts on using tensordot in the 2D case? It's slower, but the code
cleaner with no special cases.
I think 50 µsec is nothing to be concerned about at all, especially
given that the code is clear and expressive.
|
I think I'm nearly done with this... A couple more things I noticed:
I think a good way to handle this is to add an optional flag The
|
On Tue, 05 Feb 2019 18:04:00 +0000, Taylor D. Scott wrote:
1. `_upsampled_dft` calculates the DFT but we actually want the inverse DFT. That's way the function call is `_upsampled_df(image_product.conj(), ...).conj()`.
I think a good way to handle this is to add an optional flag
`inverse=False` to `_uspampled_dft`. If true, use positive `im2pi`
when calculating the kernel (to give the inverse DFT). If false, use
`-im2pi` as it is now.
Could you expand a bit more on this item?
2. [L256](https://github.com/scikit-image/scikit-image/blob/a3d2557209fb5913609749f06e70b367cef00617/skimage/feature/register_translation.py#L256):
This might be returning the right thing by coincidence but max() sorts
lexicographically for complex numbers. Looking at the matlab code,
it's supposed to be the maximum cross_correlation sorted by absolute
value. I'll fix this one tonight.
Thanks!
|
The MATLAB code is here:
It takes the absolute value of the cross-correlation, finds the argmax, then return the cross-correlation at that point. The code in register_translation, So:
gives See the discussion here: numpy/numpy#8151 |
@taylor-scott Sorry if my response was confusing; I was asking about item nr 1. |
@stefanv I'm sorry, I completely misread your question. To find the cross-correlation, we take the inverse DFT of the image product. In pixel-level registration, this is just This almost gives the cross-correlation, but we actually need the inverse DFT of the larger image. So to calculate the cross-correlation, For large images, the time to call To avoid this call to This is easy to accomplish while generating the kernel: kernel = ((np.arange(upsampled_region_size[0]) - ax_offset)[:, None]
* np.fft.fftfreq(n_items, upsample_factor))
if inverse:
# We can also add the normalizing factor here if desired
kernel = np.exp(im2pi * kernel)
else:
kernel = np.exp(-im2pi * kernel) |
Also, after thinking about it a bit more, this may be an issue better handled in a separate pull request after some more discussion. I think |
@taylor-scott Thank you for that very clear explanation. Yikes, such subtle tricks are very easy to miss, and I agree with you that making it explicit would be an improvement. If you are willing to make the change in another PR, then great—let's get this one merged so long. Feel free to add your name to CONTRIBUTORS.txt if you want. @scikit-image/core Ready for final review. |
7458567
to
1f8530a
Compare
Ok, great. I've removed the inverse flag from this PR and I'll open a separate PR to address that. Thanks @stefanv ! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is beautiful right now. Thank you!
Thanks, @taylor-scott! |
Description
In feature/register_translation.py, _upsampled_dft uses np.einsum to compute the upsampled DFT of an image. This is not easily generalizable and in numpy 1.11 is (extremely) slow because no optimize flag is available.
np.einsum can be replaced with sequential calls to np.tensordot which (should) generalize to higher dimensions.
E.g.,
It should also be faster in numpy 1.11 and, from testing on my latptop, not slower than np.einsum in numpy 1.16. It seems to be slightly faster for some inputs.
Is there a standard way to benchmark functions? I see there's a benchmarks folder but I'm not sure how to use it.
Todo
and numpy 1.11Checklist
Gallery example in./doc/examples
(new features only)./benchmarks
, if your changes aren't covered by anexisting benchmark
For reviewers
later.
__init__.py
.doc/release/release_dev.rst
.@meeseeksdev backport to v0.14.x