Replace np.einsum with np.tensordot in _upsampled_dft #3710

taylor-scott · 2019-02-03T21:03:02Z

Description

In feature/register_translation.py, _upsampled_dft uses np.einsum to compute the upsampled DFT of an image. This is not easily generalizable and in numpy 1.11 is (extremely) slow because no optimize flag is available.

np.einsum can be replaced with sequential calls to np.tensordot which (should) generalize to higher dimensions.

E.g.,

import numpy as np
image = np.random.uniform(10, size=(10, 15, 20))
kernels = [np.random.uniform(10, size=(2*i, i)) for i in image.shape]
einsum_result = np.einsum('ijk,li,mj,nk->lmn', image, *kernels, optimize=True)
tensordot_result = np.copy(image)
for kernel in kernels[::-1]:
    tensordot_result = np.tensordot(kernel, tensordot_result, axes=(1, -1))

print(np.abs(tensordot_result - einsum_result).max()) # 0.0

It should also be faster in numpy 1.11 and, from testing on my latptop, not slower than np.einsum in numpy 1.16. It seems to be slightly faster for some inputs.

Is there a standard way to benchmark functions? I see there's a benchmarks folder but I'm not sure how to use it.

Todo

Benchmark in numpy 1.16 (current) ~~and numpy 1.11~~
Extend to nD images
Write tests for nD images
Clean up kernel statement

Checklist

Docstrings for all functions
~~Gallery example in ./doc/examples (new features only)~~
Benchmark in ./benchmarks, if your changes aren't covered by an
existing benchmark
Unit tests
Clean style in the spirit of PEP8

For reviewers

Check that the PR title is short, concise, and will make sense 1 year
later.
Check that new functions are imported in corresponding __init__.py.
Check that new features, API changes, and deprecations are mentioned in
doc/release/release_dev.rst.
Consider backporting the PR with @meeseeksdev backport to v0.14.x

_upsampled_dft uses np.einsum to compute the upsampled DFT of an image. This is not easily generalizable and in numpy < 1.12 is (extremely) slow because no optimize flag is available. np.einsum can be replaced with sequential calls to np.tensordot which is generalizable to higher dimensions.

jni · 2019-02-03T23:03:33Z

@taylor-scott very nice! I'm not super concerned with supporting NumPy 1.11, which is getting a bit long in the tooth, but the code is significantly cleaner now so I think this is a worthwhile change either way!

Regarding benchmarks, I think it's a good idea to add a couple (one for 2D and one for 3D), since performance is part of the motivation for this PR. I recommend you look at https://github.com/scikit-image/scikit-image/blob/8135c2f3505d77134162846abf1b06c0b18d5a56/benchmarks/benchmark_morphology.py for how to parameterise tests, and then add a class RegisterTranslation to benchmark_feature.py. As long as the class has a setup method and a time_register_translation method, it'll get picked up by airspeed velocity for benchmarks.

jni · 2019-02-03T23:04:41Z

By the way, if this work sounds too onerous, we can do it the old-fashioned way: run a benchmark script on your machine and share with us the script and the results. =)

…lation

hmaarrfk · 2019-02-04T00:58:48Z

Honestly, I wouldn't worry about numpy 1.11 much. It is almost 3 years old now, so not so important.

I should write these out somewhere:

pip install asv

or

conda install asv

Then to test your benchmark

asv run -E existing -b RankSuite

To compare your results to master

asv continuous -b RankSuite -E conda:3.7 master simplify_benchmarks

you might have to swap the order the branch names.

pep8speaks · 2019-02-05T03:52:30Z

Hello @taylor-scott! Thanks for updating the PR.

In the file benchmarks/benchmark_feature.py, following are the PEP8 issues :

Line 57:80: E501 line too long (88 > 79 characters)

Comment last updated on February 08, 2019 at 17:53 Hours UTC

stefanv

So much more readable; thank you!

stefanv · 2019-02-05T04:35:27Z

skimage/feature/register_translation.py

@@ -27,7 +27,7 @@ def _upsampled_dft(data, upsampled_region_size,

    Parameters
    ----------
-    data : 2D or 3D ndarray
+    data : nD ndarray


Just ndarray?

taylor-scott · 2019-02-05T05:24:21Z

Thanks for the help on benchmarking @jni and @hmaarrfk. I've pushed a benchmark file to this PR. It works with asv run -E existing but asv chokes while setting up new environments on my computer so I can't compare it to master.

I've run this this script several times and I've gotten pretty consistent results (Python 3.7, numpy 1.16.1). (Sorry it's a little long -- the easiest way to compare einsum and tensordot was to include the original and modified _upsampled_dft functions in the file.)

Results are here.

Highlights:

2D: np.tensordot is ~50 usec slower than np.dot
3D: np.tensordot is faster than np.einsum, particularly in the case of a larger image/upshift

Thoughts on using tensordot in the 2D case? It's slower, but the code cleaner with no special cases.

stefanv · 2019-02-05T05:31:54Z

On Mon, 04 Feb 2019 21:24:22 -0800, Taylor D. Scott wrote: Thoughts on using tensordot in the 2D case? It's slower, but the code cleaner with no special cases.

I think 50 µsec is nothing to be concerned about at all, especially given that the code is clear and expressive.

taylor-scott · 2019-02-05T18:03:59Z

I think I'm nearly done with this...

A couple more things I noticed:

_upsampled_dft calculates the DFT but we actually want the inverse DFT. That's way the function call is _upsampled_df(image_product.conj(), ...).conj().

I think a good way to handle this is to add an optional flag inverse=False to _uspampled_dft. If true, use positive im2pi when calculating the kernel (to give the inverse DFT). If false, use -im2pi as it is now.

The conj calls can get expensive for large images (~10 msec for a 1040x1392) so I think it's good to get rid of them if possible.

L256: This might be returning the right thing by coincidence but max() sorts lexicographically for complex numbers. Looking at the matlab code, it's supposed to be the maximum cross_correlation sorted by absolute value. I'll fix this one tonight.

stefanv · 2019-02-05T19:49:07Z

On Tue, 05 Feb 2019 18:04:00 +0000, Taylor D. Scott wrote: 1. `_upsampled_dft` calculates the DFT but we actually want the inverse DFT. That's way the function call is `_upsampled_df(image_product.conj(), ...).conj()`. I think a good way to handle this is to add an optional flag `inverse=False` to `_uspampled_dft`. If true, use positive `im2pi` when calculating the kernel (to give the inverse DFT). If false, use `-im2pi` as it is now.

Could you expand a bit more on this item?

2. [L256](https://github.com/scikit-image/scikit-image/blob/a3d2557209fb5913609749f06e70b367cef00617/skimage/feature/register_translation.py#L256): This might be returning the right thing by coincidence but max() sorts lexicographically for complex numbers. Looking at the matlab code, it's supposed to be the maximum cross_correlation sorted by absolute value. I'll fix this one tonight.

Thanks!

taylor-scott · 2019-02-05T20:11:54Z

The MATLAB code is here:

CCabs = abs(CC);
[rloc, cloc] = find(CCabs == max(CCabs(:)),1,'first');
CCmax = CC(rloc,cloc);

It takes the absolute value of the cross-correlation, finds the argmax, then return the cross-correlation at that point.

The code in register_translation, CCmax = cross_correlation.max() returns the maximum value based on how numpy sorts complex numbers, which is not by absolute value. Instead, it sorts by real part first, then by complex part.

So:

np.array([1+1j, 0+10j]).max()

gives 1 + 1j even though abs(0+10j) > abs(1 + 1j)

See the discussion here: numpy/numpy#8151

…e clear

stefanv · 2019-02-07T06:39:00Z

@taylor-scott Sorry if my response was confusing; I was asking about item nr 1.

taylor-scott · 2019-02-07T23:55:11Z

@stefanv I'm sorry, I completely misread your question.

To find the cross-correlation, we take the inverse DFT of the image product. In pixel-level registration, this is just cross_correlation = ifftn(image_product) but for subpixel registration the function calls _upsampled_dft which embeds the image product into a large image and calculates the DFT of the larger image.

This almost gives the cross-correlation, but we actually need the inverse DFT of the larger image. So to calculate the cross-correlation, register_translation uses the identity fft(image.conj).conj/N = ifft(image) where N is the image size.

For large images, the time to call conj is not trivial. For example, calling image.conj() on one of my images (1040 x 1392) takes ~11 msec and registering two images takes ~230 msec total.

To avoid this call to conj, I proposed adding an optional flag inverse=False to _upsampled_dft. When false, return the DFT as the function does now. When true, return the inverse DFT which is what we want in this case.

This is easy to accomplish while generating the kernel:

    kernel = ((np.arange(upsampled_region_size[0]) - ax_offset)[:, None]
              * np.fft.fftfreq(n_items, upsample_factor))
    if inverse:
        # We can also add the normalizing factor here if desired
        kernel = np.exp(im2pi * kernel)
    else:
        kernel = np.exp(-im2pi * kernel)

taylor-scott · 2019-02-07T23:57:04Z

Also, after thinking about it a bit more, this may be an issue better handled in a separate pull request after some more discussion. I think _upsampled_dft could be generally useful, but IMO this aspect and how the image is embedded in the upsampled image are non-intuitive.

stefanv · 2019-02-08T17:31:51Z

@taylor-scott Thank you for that very clear explanation. Yikes, such subtle tricks are very easy to miss, and I agree with you that making it explicit would be an improvement. If you are willing to make the change in another PR, then great—let's get this one merged so long. Feel free to add your name to CONTRIBUTORS.txt if you want.

@scikit-image/core Ready for final review.

taylor-scott · 2019-02-08T18:00:24Z

Ok, great. I've removed the inverse flag from this PR and I'll open a separate PR to address that. Thanks @stefanv !

jni

This is beautiful right now. Thank you!

stefanv · 2019-02-09T00:56:30Z

Thanks, @taylor-scott!

taylor-scott added 4 commits February 3, 2019 18:00

Add test for 4D image registration, pixel-level

c8935cd

Add test for 4D image registration, subpixel registration

8fee026

Delete no 4d image test

f1fabcb

Add support for subpixel registration for nD images in register_trans…

4b1c4bd

…lation

taylor-scott added 2 commits February 3, 2019 23:00

Fix flake8 problems

7003ccf

2D or 3D -> nD

af51e0e

soupault added 💪 Work in progress 📈 type: Performance labels Feb 4, 2019

soupault added this to the 0.15 milestone Feb 4, 2019

Add benchmarks for register_translation

2cb8ac6

stefanv approved these changes Feb 5, 2019

View reviewed changes

taylor-scott added 4 commits February 4, 2019 23:42

Use tensordot in 2D case

30227e8

ndarray -> array in docstrings

8e1aea4

Remove version check in test_3d_input

e6c6dac

Clean up kernel statement

c8585b3

kernel.shape -> (ups_size, image_dim) to make tensordot operation mor…

aa2b990

…e clear

taylor-scott changed the title ~~[WIP] Replace np.einsum with np.tensordot in _upsampled_dft~~ Replace np.einsum with np.tensordot in _upsampled_dft Feb 7, 2019

stefanv added action: mrg+1 and removed 💪 Work in progress labels Feb 8, 2019

taylor-scott added 3 commits February 8, 2019 11:50

Calculate CCmax as the max of cross_correlation by absolute value

4c6bf23

PEP8 formatting

5c5a501

Bugfix: instead of in kernel generation

1f8530a

taylor-scott force-pushed the tensordot-in-register-translation branch from 7458567 to 1f8530a Compare February 8, 2019 17:53

Add name to CONTRIBUTORS.txt

382ed23

jni approved these changes Feb 9, 2019

View reviewed changes

stefanv merged commit 4846bd0 into scikit-image:master Feb 9, 2019

taylor-scott deleted the tensordot-in-register-translation branch February 9, 2019 04:53

taylor-scott mentioned this pull request Feb 10, 2019

Expected behavior of _upsampled_dft? #3735

Open

Uh oh!

Replace np.einsum with np.tensordot in _upsampled_dft #3710

Replace np.einsum with np.tensordot in _upsampled_dft #3710

Uh oh!

Conversation

taylor-scott commented Feb 3, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Todo

Checklist

For reviewers

Uh oh!

jni commented Feb 3, 2019

Uh oh!

jni commented Feb 3, 2019

Uh oh!

hmaarrfk commented Feb 4, 2019

Uh oh!

pep8speaks commented Feb 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated on February 08, 2019 at 17:53 Hours UTC

Uh oh!

stefanv left a comment

Choose a reason for hiding this comment

Uh oh!

stefanv Feb 5, 2019

Choose a reason for hiding this comment

Uh oh!

taylor-scott commented Feb 5, 2019

Uh oh!

stefanv commented Feb 5, 2019 via email

Uh oh!

taylor-scott commented Feb 5, 2019

Uh oh!

stefanv commented Feb 5, 2019 via email

Uh oh!

taylor-scott commented Feb 5, 2019

Uh oh!

stefanv commented Feb 7, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

taylor-scott commented Feb 7, 2019

Uh oh!

taylor-scott commented Feb 7, 2019

Uh oh!

stefanv commented Feb 8, 2019

Uh oh!

taylor-scott commented Feb 8, 2019

Uh oh!

jni left a comment

Choose a reason for hiding this comment

Uh oh!

stefanv commented Feb 9, 2019

Uh oh!

Uh oh!

taylor-scott commented Feb 3, 2019 •

edited

Loading

pep8speaks commented Feb 5, 2019 •

edited

Loading

stefanv commented Feb 7, 2019 •

edited

Loading