Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Replace np.einsum with np.tensordot in _upsampled_dft #3710

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

taylor-scott
Copy link
Contributor

@taylor-scott taylor-scott commented Feb 3, 2019

Description

In feature/register_translation.py, _upsampled_dft uses np.einsum to compute the upsampled DFT of an image. This is not easily generalizable and in numpy 1.11 is (extremely) slow because no optimize flag is available.

np.einsum can be replaced with sequential calls to np.tensordot which (should) generalize to higher dimensions.

E.g.,

import numpy as np
image = np.random.uniform(10, size=(10, 15, 20))
kernels = [np.random.uniform(10, size=(2*i, i)) for i in image.shape]
einsum_result = np.einsum('ijk,li,mj,nk->lmn', image, *kernels, optimize=True)
tensordot_result = np.copy(image)
for kernel in kernels[::-1]:
    tensordot_result = np.tensordot(kernel, tensordot_result, axes=(1, -1))

print(np.abs(tensordot_result - einsum_result).max()) # 0.0

It should also be faster in numpy 1.11 and, from testing on my latptop, not slower than np.einsum in numpy 1.16. It seems to be slightly faster for some inputs.

Is there a standard way to benchmark functions? I see there's a benchmarks folder but I'm not sure how to use it.

Todo

  • Benchmark in numpy 1.16 (current) and numpy 1.11
  • Extend to nD images
  • Write tests for nD images
  • Clean up kernel statement

Checklist

For reviewers

  • Check that the PR title is short, concise, and will make sense 1 year
    later.
  • Check that new functions are imported in corresponding __init__.py.
  • Check that new features, API changes, and deprecations are mentioned in
    doc/release/release_dev.rst.
  • Consider backporting the PR with @meeseeksdev backport to v0.14.x

_upsampled_dft uses np.einsum to compute the upsampled DFT
of an image. This is not easily generalizable and in numpy < 1.12 is
(extremely) slow because no optimize flag is available.

np.einsum can be replaced with sequential calls to np.tensordot which is
generalizable to higher dimensions.
@jni
Copy link
Member

jni commented Feb 3, 2019

@taylor-scott very nice! I'm not super concerned with supporting NumPy 1.11, which is getting a bit long in the tooth, but the code is significantly cleaner now so I think this is a worthwhile change either way!

Regarding benchmarks, I think it's a good idea to add a couple (one for 2D and one for 3D), since performance is part of the motivation for this PR. I recommend you look at https://github.com/scikit-image/scikit-image/blob/8135c2f3505d77134162846abf1b06c0b18d5a56/benchmarks/benchmark_morphology.py for how to parameterise tests, and then add a class RegisterTranslation to benchmark_feature.py. As long as the class has a setup method and a time_register_translation method, it'll get picked up by airspeed velocity for benchmarks.

@jni
Copy link
Member

jni commented Feb 3, 2019

By the way, if this work sounds too onerous, we can do it the old-fashioned way: run a benchmark script on your machine and share with us the script and the results. =)

@hmaarrfk
Copy link
Member

hmaarrfk commented Feb 4, 2019

Honestly, I wouldn't worry about numpy 1.11 much. It is almost 3 years old now, so not so important.

I should write these out somewhere:

pip install asv

or

conda install asv

Then to test your benchmark

asv run -E existing -b RankSuite

To compare your results to master

asv continuous -b RankSuite -E conda:3.7 master simplify_benchmarks 

you might have to swap the order the branch names.

@pep8speaks
Copy link

pep8speaks commented Feb 5, 2019

Hello @taylor-scott! Thanks for updating the PR.

Line 57:80: E501 line too long (88 > 79 characters)

Comment last updated on February 08, 2019 at 17:53 Hours UTC

Copy link
Member

@stefanv stefanv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So much more readable; thank you!

@@ -27,7 +27,7 @@ def _upsampled_dft(data, upsampled_region_size,

Parameters
----------
data : 2D or 3D ndarray
data : nD ndarray
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just ndarray?

@taylor-scott
Copy link
Contributor Author

Thanks for the help on benchmarking @jni and @hmaarrfk. I've pushed a benchmark file to this PR. It works with asv run -E existing but asv chokes while setting up new environments on my computer so I can't compare it to master.

I've run this this script several times and I've gotten pretty consistent results (Python 3.7, numpy 1.16.1). (Sorry it's a little long -- the easiest way to compare einsum and tensordot was to include the original and modified _upsampled_dft functions in the file.)

Results are here.

Highlights:

  • 2D: np.tensordot is ~50 usec slower than np.dot
  • 3D: np.tensordot is faster than np.einsum, particularly in the case of a larger image/upshift

Thoughts on using tensordot in the 2D case? It's slower, but the code cleaner with no special cases.

@stefanv
Copy link
Member

stefanv commented Feb 5, 2019 via email

@taylor-scott
Copy link
Contributor Author

I think I'm nearly done with this...

A couple more things I noticed:

  1. _upsampled_dft calculates the DFT but we actually want the inverse DFT. That's way the function call is _upsampled_df(image_product.conj(), ...).conj().

I think a good way to handle this is to add an optional flag inverse=False to _uspampled_dft. If true, use positive im2pi when calculating the kernel (to give the inverse DFT). If false, use -im2pi as it is now.

The conj calls can get expensive for large images (~10 msec for a 1040x1392) so I think it's good to get rid of them if possible.

  1. L256: This might be returning the right thing by coincidence but max() sorts lexicographically for complex numbers. Looking at the matlab code, it's supposed to be the maximum cross_correlation sorted by absolute value. I'll fix this one tonight.

@stefanv
Copy link
Member

stefanv commented Feb 5, 2019 via email

@taylor-scott
Copy link
Contributor Author

The MATLAB code is here:

CCabs = abs(CC);
[rloc, cloc] = find(CCabs == max(CCabs(:)),1,'first');
CCmax = CC(rloc,cloc);

It takes the absolute value of the cross-correlation, finds the argmax, then return the cross-correlation at that point.

The code in register_translation, CCmax = cross_correlation.max() returns the maximum value based on how numpy sorts complex numbers, which is not by absolute value. Instead, it sorts by real part first, then by complex part.

So:

np.array([1+1j, 0+10j]).max()

gives 1 + 1j even though abs(0+10j) > abs(1 + 1j)

See the discussion here: numpy/numpy#8151

@taylor-scott taylor-scott changed the title [WIP] Replace np.einsum with np.tensordot in _upsampled_dft Replace np.einsum with np.tensordot in _upsampled_dft Feb 7, 2019
@stefanv
Copy link
Member

stefanv commented Feb 7, 2019

@taylor-scott Sorry if my response was confusing; I was asking about item nr 1.

@taylor-scott
Copy link
Contributor Author

@stefanv I'm sorry, I completely misread your question.

To find the cross-correlation, we take the inverse DFT of the image product. In pixel-level registration, this is just cross_correlation = ifftn(image_product) but for subpixel registration the function calls _upsampled_dft which embeds the image product into a large image and calculates the DFT of the larger image.

This almost gives the cross-correlation, but we actually need the inverse DFT of the larger image. So to calculate the cross-correlation, register_translation uses the identity fft(image.conj).conj/N = ifft(image) where N is the image size.

For large images, the time to call conj is not trivial. For example, calling image.conj() on one of my images (1040 x 1392) takes ~11 msec and registering two images takes ~230 msec total.

To avoid this call to conj, I proposed adding an optional flag inverse=False to _upsampled_dft. When false, return the DFT as the function does now. When true, return the inverse DFT which is what we want in this case.

This is easy to accomplish while generating the kernel:

    kernel = ((np.arange(upsampled_region_size[0]) - ax_offset)[:, None]
              * np.fft.fftfreq(n_items, upsample_factor))
    if inverse:
        # We can also add the normalizing factor here if desired
        kernel = np.exp(im2pi * kernel)
    else:
        kernel = np.exp(-im2pi * kernel)

@taylor-scott
Copy link
Contributor Author

Also, after thinking about it a bit more, this may be an issue better handled in a separate pull request after some more discussion. I think _upsampled_dft could be generally useful, but IMO this aspect and how the image is embedded in the upsampled image are non-intuitive.

@stefanv
Copy link
Member

stefanv commented Feb 8, 2019

@taylor-scott Thank you for that very clear explanation. Yikes, such subtle tricks are very easy to miss, and I agree with you that making it explicit would be an improvement. If you are willing to make the change in another PR, then great—let's get this one merged so long. Feel free to add your name to CONTRIBUTORS.txt if you want.

@scikit-image/core Ready for final review.

@taylor-scott taylor-scott force-pushed the tensordot-in-register-translation branch from 7458567 to 1f8530a Compare February 8, 2019 17:53
@taylor-scott
Copy link
Contributor Author

Ok, great. I've removed the inverse flag from this PR and I'll open a separate PR to address that. Thanks @stefanv !

Copy link
Member

@jni jni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is beautiful right now. Thank you!

@stefanv stefanv merged commit 4846bd0 into scikit-image:master Feb 9, 2019
@stefanv
Copy link
Member

stefanv commented Feb 9, 2019

Thanks, @taylor-scott!

@taylor-scott taylor-scott deleted the tensordot-in-register-translation branch February 9, 2019 04:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants