Simplify logic for determining image comparison directories #5858

maxalbert · 2016-01-15T14:27:55Z

I took another look at the logic in _image_directories (because it still didn't work for my use case) and realised that pretty much nothing in there is used with the current test directory structure in matplotlib, but it imposes fairly arbitrary constraints which make re-using the @image_comparison decorator in third-party projects very hard and fragile.

The way the image comparison tests are laid out in matplotlib is that all the baseline images are contained in lib/matplotlib/tests/baseline_images/, and this folder contains a bunch of sub-directories corresponding to the individual test files in lib/matplotlib/tests/. Most of the logic in _image_directories tries to be clever about guessing the correct path to the baseline image in case the tests live in nested directory hierarchies, but this is not used at all.

So I decided to remove all that logic, which makes the helper function much simpler and more explicit. The way it works now is as follows. If there is a test file called /path/to/test_dir/test_script.py then the baseline directory for the tests in this file will assumed to be /path/to/test_dir/baseline_images/test_script/ and the result directory will be /path/to/test_dir/result_images/test_script/. This is exactly the way it currently works in matplotlib so no other changes were needed.

I also added a couple of safety checks to make sure that the number of figures created in each test equals the number of baseline images provided. This actually exposed a minor bug in one of the tests which would draw twice into the same figure (rather than two different figures). This is now also fixed.

…arison.

…gures.

…which failed.

… it only creates one figure and plots the data twice into it.

maxalbert · 2016-01-15T14:30:12Z

On a mostly unrelated note, the @image_comparison decorator does not work when applied to a class method rather than a standalone function (the decorated test method is then simply ignored by nosetests). This is slightly unfortunate because it means you can't easily group image comparison tests in classes that derive from unittest.TestCase. If anyone has any ideas why this might happen I'd be glad to hear them and can maybe include a fix in this PR. Thanks!

tacaswell · 2016-01-15T15:32:24Z

I do not understand why that test image changed.

It looks like the image comparison decorator is promoting the test to be test function to be a test class which probably explains why it does not look like a method hanging off of the parent test class for nose to find.

maxalbert · 2016-01-15T15:47:44Z

Good point about about the test function being promoted to a class. I'll have a think whether this is easily fixable (but if so will submit it in a separate PR).

The test image only changed ever so slightly, and the reason for this is because the test plotted the exact same data twice into the same figure. This resulted in slightly thicker lines surrounding the points than if the points had only been plotted once (not sure if this is a bug, but it's the way that matplotlib currently behaves). Below is an example of a "failed diff" image (scatter_svg-failed-diff.png) that illustrates this.

maxalbert · 2016-01-15T18:29:05Z

Btw, here is a standalone code snippet to reproduce the issue which resulted in the change of the test image:

import matplotlib.pyplot as plt

xvals = [1, 2, 3, 4]
yvals = [1, 3, 2, 2.5]

fig, ax = plt.subplots()
ax.set_xlim(0, 5)
ax.set_ylim(0, 4)

ax.plot(xvals, yvals, 'o', color='blue')
fig.savefig('plot_1.png')

ax.plot(xvals, yvals, 'o', color='blue')
fig.savefig('plot_2.png')

If you have perceptualdiff installed you can then run the following command to see the difference:

$ perceptualdiff -output diff.png plot_1.png plot_2.png

tacaswell · 2016-01-17T05:42:42Z

That makes perfect sense. Looks like another case where we were passing before due high tolerances and then re-generated slightly wrong images.

tacaswell · 2016-01-17T05:47:19Z

The pdf probably changed too, but we just rasterize it at too low of a DPI to tell in the comparison tests, can you also include that one as well?

Also, it amuses me that you can see the paths removed from the svg.

maxalbert · 2016-01-18T13:01:53Z

I've added the PDF as well as you suggested. There is a Travis failure but it seems unrelated. Might be worth re-starting the Travis build to be sure?

tacaswell · 2016-01-18T15:03:52Z

Looks like ipython blew up exiting from the doc build. Restarted, will merge when it passes.

jenshnielsen · 2016-01-18T15:23:45Z

The IPython readonly thing is a known issue in Ipython and should be fixed in the next release ipython/ipython#8850

We are starting to see linkchecker complain about the link to _static/CHANGELOG recently but I am not really sure why

mdboom · 2016-01-18T18:03:49Z

lib/matplotlib/testing/decorators.py

@@ -200,7 +205,16 @@ def remove_text(figure):
    def test(self):
        baseline_dir, result_dir = _image_directories(self._func)

-        for fignum, baseline in zip(plt.get_fignums(), self._baseline_images):
+        for fignum, baseline in zip_longest(plt.get_fignums(), self._baseline_images):


Rather than using izip_longest, couldn't you just check that the lengths of plt.get_fignums() and self._baseline_images are the same before the loop? That we we don't do a bunch of work only to fail later. (And if either of those is an iterator, they are certainly very short iterators so the cost of doing list() on them should be negligible).

mdboom · 2016-01-18T18:13:30Z

The test image only changed ever so slightly, and the reason for this is because the test plotted the exact same data twice into the same figure. This resulted in slightly thicker lines surrounding the points than if the points had only been plotted once (not sure if this is a bug, but it's the way that matplotlib currently behaves). Below is an example of a "failed diff" image (scatter_svg-failed-diff.png) that illustrates this.

I understand the image changes here -- but I'm struggling to find why this PR made those changes. I know we are reusing the same image file name there, which has been the source of some race conditions / nondeterminacy in the past...

Also, I agree that nothing in matplotlib is using nested subdirectories of tests, but are we sure third party tools that use the matplotlib testing infrastructure don't? git blame tells me that @pelson submitted the big chunk of code removed by this PR, so I suspect it's required by cartopy...

mdboom · 2016-01-18T19:00:19Z

Got it -- I understand why the test needed to be updated now (because there was a mismatch between the number of figures and the number of filenames listed). That aspect of this now makes sense to me.

jenshnielsen · 2016-01-24T12:01:46Z

Restarted the Docs test on Travis which failed due to the IPython Traitlets issue fixed earlier in the week by IPython 4.0.3

maxalbert · 2016-01-25T23:58:07Z

@mdboom Many thanks for the comments, and apologies for the long silence. Currently being swamped with things so it will take a bit more time before I can tackle them. I agree with the inline comments you made and will re-work the PR to address them.

Regarding the use of @image_comparison in third-party code, that's a valid point, but to be honest the existing code made a lot of assumptions about the layout of the test suite and did quite a bit of "implicit magic" which made it hard to use in different contexts, and potentially also made understanding its behaviour non-obvious to users (I had to look at the actual implementation a few times to understand the behaviour because quite a few of the "magic" aspects about where it would look for baseline images weren't explained in the docs). I wonder whether a simple, explicit API (e.g. an additional argument to specify the root directory of the test suite / the directory where to look for baseline images?) would be clearer. I'll have a think about it when I get a chance, but if anyone has suggestions I'd be happy to hear them.

I'd also be interested in @pelson's comments, or anyone else's who needs the more flexible behaviour. (I checked the cartopy code and it doesn't use @image_comparison at all, but of course that doesn't mean that nobody else is relying on the functionality.)

pelson · 2016-01-26T13:06:54Z

Hi @maxalbert.

I've not really had much of a chance to look at the changes proposed here, but the original changes to the image comparison capability were indeed done to aid cartopy. Unfortunately, either the release was too slow, or it didn't quite fit the purpose (I can't remember which) which meant that cartopy has it own image testing capability based off of the image_comparison decorator (https://github.com/SciTools/cartopy/blob/master/lib/cartopy/tests/mpl/__init__.py#L32). This test class should be more widely applicable beyond cartopy, and could happily be pulled out into its own package if you're interested...

Cheers,

maxalbert · 2016-01-28T14:32:18Z

Thanks for the explanation @pelson! Do I understand correctly that cartopy re-implements the functionality rather than builds on top of matplotlib's @image_comparison decorator? I.e., cartopy would be unaffected by the changes proposed here? In this case I wonder whether it would be ok merge this PR (after I have included the changes suggested by @mdboom). Or do people anticipate problems for other third-party software?

It would be great to separate out cartopy's image comparison functionality into a separate package, but I personally won't have time to look into doing this in the next couple of months. I'll certainly keep it in mind for the longer term, though.

jankatins · 2016-02-06T13:57:36Z

ggplot also has it's own testing infrastructure based on mpl, but also only uses the default paths andthe only thing it imports from mpl.testing is from matplotlib.testing.compare import compare_images (but a few other things are copied).

It also has a assert_same_figure_images(fig, name) https://github.com/yhat/ggplot/blob/master/ggplot/tests/__init__.py#L80

which is used like this:

from . import get_assert_same_ggplot, cleanup
assert_same_ggplot = get_assert_same_ggplot(__file__)
gg = ggplot(aes(x="x", y="y"), data=df) + geom_line()
# internally the next line does a `fig = gg.draw(); assert_same_figure_images(fig, name)
assert_same_ggplot(gg, 'scale_without_log')

It needs the setup per test file, but on the other hand the failing asserts point to the line number...

dstansby · 2018-04-13T16:10:24Z

Looks like this needs a big 'ol rebase.

anntzer · 2018-04-14T23:02:13Z

This can now be made as simple as

diff --git a/lib/matplotlib/testing/decorators.py b/lib/matplotlib/testing/decorators.py
index faba2e247..18aa5b243 100644
--- a/lib/matplotlib/testing/decorators.py
+++ b/lib/matplotlib/testing/decorators.py
@@ -435,38 +435,15 @@ def _image_directories(func):
     """
     Compute the baseline and result image directories for testing *func*.
     Create the result directory if it doesn't exist.
-    """
-    module_name = func.__module__
-    if module_name == '__main__':
-        # FIXME: this won't work for nested packages in matplotlib.tests
-        warnings.warn(
-            'Test module run as script. Guessing baseline image locations.')
-        module_path = Path(sys.argv[0]).resolve()
-        subdir = module_path.stem
-    else:
-        module_path = Path(sys.modules[func.__module__].__file__)
-        mods = module_name.split('.')
-        if len(mods) >= 3:
-            mods.pop(0)
-            # mods[0] will be the name of the package being tested (in
-            # most cases "matplotlib") However if this is a
-            # namespace package pip installed and run via the nose
-            # multiprocess plugin or as a specific test this may be
-            # missing. See https://github.com/matplotlib/matplotlib/issues/3314
-        if mods.pop(0) != 'tests':
-            warnings.warn(
-                "Module {!r} does not live in a parent module named 'tests'. "
-                "This is probably ok, but we may not be able to guess the "
-                "correct subdirectory containing the baseline images. If "
-                "things go wrong please make sure that there is a parent "
-                "directory named 'tests' and that it contains a __init__.py "
-                "file (can be empty).".format(module_name))
-        subdir = os.path.join(*mods)
 
-    baseline_dir = module_path.parent / 'baseline_images' / subdir
-    result_dir = Path().resolve() / 'result_images' / subdir
+    For test module ``foo.bar.test_baz``, the baseline images are at
+    ``foo/bar/baseline_images/test_baz`` and the result images at
+    ``$(pwd)/result_images/test_baz``.
+    """
+    module_path = Path(sys.modules[func.__module__].__file__)
+    baseline_dir = module_path.parent / "baseline_images" / module_path.stem
+    result_dir = Path().resolve() / "result_images" / module_path.stem
     result_dir.mkdir(parents=True, exist_ok=True)
-
     return str(baseline_dir), str(result_dir)

(now that we're using pytest, we can't directly run the test modules from outside of pytest anyways).

jklymak · 2018-08-16T21:49:49Z

I'm closing as likely obsolete. Feel free to re-open if more work will go into this!

maxalbert added 5 commits January 14, 2016 16:16

Simplify logic which finds baseline image directories for @image_comp…

86f7bcd

…arison.

Raise error if test specifies more baseline images than it creates fi…

54412ba

…gures.

Make error message more informative by printing the name of the test …

b05110e

…which failed.

Raise error if test does not specify enough baseline images.

841dbc6

Fix test_scatter_plot() which failed with the stricter checks because…

628427a

… it only creates one figure and plots the data twice into it.

mdboom added the status: needs review label Jan 15, 2016

tacaswell added this to the proposed next point release (2.1) milestone Jan 15, 2016

Add corrected PDF for scatter plot test, too.

f883da9

mdboom reviewed Jan 18, 2016
View reviewed changes

tacaswell modified the milestones: 2.1 (next point release), 2.2 (next next feature release) Sep 24, 2017

tacaswell removed this from the needs sorting milestone Apr 14, 2018

tacaswell added this to the v3.0 milestone Apr 14, 2018

anntzer added the topic: testing label Apr 14, 2018

anntzer mentioned this pull request Apr 18, 2018

py3fication of some tests. #11073

Merged

6 tasks

dstansby added the status: needs rebase label Apr 19, 2018

jklymak modified the milestones: v3.0, needs sorting Jul 9, 2018

jklymak closed this Aug 16, 2018

anntzer removed status: needs rebase status: needs review labels Sep 14, 2018

story645 removed this from the future releases milestone Oct 6, 2022

Uh oh!

Simplify logic for determining image comparison directories #5858

Simplify logic for determining image comparison directories #5858

Uh oh!

Conversation

maxalbert commented Jan 15, 2016

Uh oh!

maxalbert commented Jan 15, 2016

Uh oh!

tacaswell commented Jan 15, 2016

Uh oh!

maxalbert commented Jan 15, 2016

Uh oh!

maxalbert commented Jan 15, 2016

Uh oh!

tacaswell commented Jan 17, 2016

Uh oh!

tacaswell commented Jan 17, 2016

Uh oh!

maxalbert commented Jan 18, 2016

Uh oh!

tacaswell commented Jan 18, 2016

Uh oh!

jenshnielsen commented Jan 18, 2016

Uh oh!

mdboom Jan 18, 2016

Choose a reason for hiding this comment

Uh oh!

mdboom commented Jan 18, 2016

Uh oh!

mdboom commented Jan 18, 2016

Uh oh!

jenshnielsen commented Jan 24, 2016

Uh oh!

maxalbert commented Jan 25, 2016

Uh oh!

pelson commented Jan 26, 2016

Uh oh!

maxalbert commented Jan 28, 2016

Uh oh!

jankatins commented Feb 6, 2016

Uh oh!

dstansby commented Apr 13, 2018

Uh oh!

anntzer commented Apr 14, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jklymak commented Aug 16, 2018

Uh oh!

Uh oh!

anntzer commented Apr 14, 2018 •

edited

Loading