[SVG] Introduce sequential ID-generation scheme for clip-paths. #27833

jayaddison · 2024-02-28T23:25:16Z

PR summary

This pull request is intended to improve the reproducibility of SVG output from matplotlib, by removing variability from the ID generation scheme for the identifiers of <clipPath> XML elements (and references to them).

In particular, use of the Python built-in id(...) function, that retrieves an integer identifier for an object in memory at runtime -- not necessarily a memory address, but often so -- is removed and replaced by a monotonically increasing counter value.

Closes #27831.

PR checklist

"closes #0000" is in the body of the PR description to link the related issue
new and changed code is tested
- Minimal test coverage has been added.
- Adding some previously-recommended test coverage would be an improvement.
Plotting related features are demonstrated in an example
New Features and API Changes are noted with a ~~directive and~~ release note
Documentation complies with general and docstring guidelines

Edit: use a more-direct hyperlink to the test coverage recommendation.

github-actions

Thank you for opening your first PR into Matplotlib!

If you have not heard from us in a week or so, please leave a new comment below and that should bring it to our attention. Most of our reviewers are volunteers and sometimes things fall through the cracks.

You can also join us on gitter for real-time discussion.

For details on testing, writing docs, and our review process, please see the developer guide

We strive to be a welcoming and open project. Please follow our Code of Conduct.

jayaddison · 2024-03-11T20:51:40Z

Comment-pinging to check for possible review on this PR - thank you!

tacaswell · 2024-03-12T16:57:22Z

lib/matplotlib/backends/backend_svg.py

@@ -590,7 +596,7 @@ def _get_clip_attrs(self, gc):
        clippath, clippath_trans = gc.get_clip_path()
        if clippath is not None:
            clippath_trans = self._make_flip_transform(clippath_trans)
-            dictkey = (id(clippath), str(clippath_trans))
+            dictkey = (self._get_next_clip_id(), str(clippath_trans))


I think this breaks the re-use of a clip path if we have multiple artists that use the same clip path.

We need to keep a second dictionary that maps id(clippath) -> incrementing int so that on line 608 when we make the oid we can use that instead.

It wont let me comment on line 608, but I think the logic there should be something like

if clip is None: stable_id = self._get_next_clip_id() oid = self._make_id('p', stable_id)

so I don't think we actually need a second dictionary to keep the mapping? That seems good to not pick up extra state.

Ok, yep. It seems like some extra test coverage would be useful in that case, to confirm whether keeping an id-mapping is required (and to test any fixes if so).

I'm not sure the test coverage I've added makes semantic/usage sense, but it does now cover sharing a clip-path (technically a Patch) object across multiple artists.

I've added some test coverage on ID-uniqueness in the generated SVG at the same time after finding some code/issue history mentioning that it's important to maintain distinct identifiers.

(however, as noted in #27831 - I missed a note that recommended a particular style of test coverage - it's not yet added here)

I think this breaks the re-use of a clip path if we have multiple artists that use the same clip path.

We need to keep a second dictionary that maps id(clippath) -> incrementing int so that on line 608 when we make the oid we can use that instead.

This pull request has been updated to implement this in a way that (I hope!) follows the intended behaviour, after re-reading the issue thread and details like the above. There's no second-level dictionary required, but we do make use of one dictionary to store the clip-to-incrementing-id mapping.

I don't think the changes are ready quite yet though; the 2x2 grid test case is yet to be added.

so I don't think we actually need a second dictionary to keep the mapping? That seems good to not pick up extra state.

The implication of this is that further refactoring in this part of the code could be helpful -- and I agree that there seem to be opportunities to simplify the logic here. However: I think it's important to confirm that the test coverage is sufficient first.

tacaswell · 2024-03-12T17:02:12Z

sorry for the delay.

jayaddison · 2024-03-12T17:30:05Z

No problem, thank you for the comments!

lib/matplotlib/backends/backend_ps.py

lib/matplotlib/backends/backend_svg.py

tacaswell · 2024-04-29T15:54:30Z

Can you rebase [and force push to your branch] this on main (rather than merging)?

lib/matplotlib/tests/test_determinism.py

jayaddison · 2024-05-05T11:03:28Z

I realized that I hadn't visually inspected the results of rendering the 2x2 star grid example in the test cases -- and related to that, worried that perhaps the re-use of clip path IDs could cause problems in that case. However, the results do appear correct to me:

And included below is the 'before' image -- where I've rendered the same diagram but without the changes to backend_svg.py from this branch:

(note: I screenshotted these using manually-selected rectangular screen regions, so the screenshots are not bit-for-bit identical; and indeed neither are the SVG files they were displayed from, as expected -- but re-rendering the files with the backend_svg.py changes in place does produce bit-for-bit identical SVG files)

lib/matplotlib/backends/backend_svg.py

jayaddison · 2024-05-05T19:43:01Z

Perhaps this is an unlikely scenario, or perhaps it is a non-issue, but before merge I would like to confirm that the following situation is handled reasonably:

Two distinct separate SVG diagrams are constructed from matplotlib code using the same hashsalt and SOURCE_DATE_EPOCH value -- as could realistically occur when (re)building the contents of a report from source.
Both of the SVG diagrams make use of distinct clipping paths.
Both of the SVG diagrams appear in a single HTML page as output.

My specific concern here is that the id values of the output SVG diagrams could collide. Unfortunately my understanding is that when SVG diagrams are embedded within HTML documents, they do not have separate namespace scoping.

I'd forgotten about that particular problem until recently, but it may be relevant here.

tacaswell · 2024-05-06T03:25:35Z

Collisions between elements in different svgs in the same html document is a real problem (we have had bug reports about this in the past). The hash key includes details about the clip path transform so we may avoid collision (or if we do have collisions we may still be OK!), but that should be tested. I think the ultimate fix may be to include something deterministic for the figure layer (like mixing fig.get_label() or fig.get_gid() into the hash).

jayaddison · 2024-05-06T17:32:32Z

I have to admit, I'd forgotten (I am a goldfish) that the string-representation of the clippath is included in the hash key -- that mostly reassures me that the problem should be infrequent, if it occurs at all. I'll move the PR back into ready-for-review state because most of that concern is resolved, but even so I'll try to build some more confidence about this by running some more checks locally.

jayaddison · 2024-05-06T21:30:06Z

The following testing methology for ID collisions isn't exhaustive, but is what I'm starting with:

I've used some editor scripting to configure matplotlib.rcParams['svg.hashsalt'] to a fixed value at the start of each of the .py files in the examples dir of this repository.
Subsequently I've added a call to plt.savefig(f"{__file__}.svg") at the end of each of those files.
I've temporarily removed the embedding_webagg_sgskip.py and ginput_manual_clabel_sgskip.py files from the directory tree because they appear to be interactive/blocking processes.
Now I'm running all of those files with a fixed SOURCE_DATE_EPOCH for the batch, and inspecting the results as they appear.

So far every clipPath id="p<....>" value is unique within each individual SVG file, however some duplicates do appear; I believe these refer to common/shared path shapes that are re-used across different diagrams.

In particular, one clipPath with id="p209e94b0da" has appeared in more than 100 different output diagrams so far, so I'd like to determine what it represents. 35-or-so other non-unique clipPaths exist, generally with single-digit re-use across files.

jayaddison · 2024-05-06T21:34:06Z

(as a sidenote: stylesheet-related name collisions are in fact the cause of the problem I'd remembered in the past, not ID collisions -- it would be nice to confirm that IDs either are, or are not, namespace-internal to embedded SVG elements; there is almost certainly a smart way to do that with some simple local testcases. even so, I've started some of this testing, so I'm going to try to learn a few facts from it)

jayaddison · 2024-05-06T21:39:50Z

In particular, one clipPath with id="p209e94b0da" has appeared in more than 100 different output diagrams so far, so I'd like to determine what it represents. 35-or-so other non-unique clipPaths exist, generally with single-digit re-use across files.

The 100+ repetition case is from a rectangular clip-path defined as <rect x="57.6" y="41.472" width="357.12" height="266.112"/> that appears in many of the gallery examples when rendered to SVG.

jayaddison · 2024-05-06T21:45:11Z

Based on the results so far, I'm fairly confident about the state of the updated ID-generation scheme. I also re-checked the code and confirmed that it is using an acceptably-strong hash function during _make_id, SHA256. Currently we don't retain the entire hash digest for use in the path ID, but that could be extended if required, trading-off against output filesizes.

jayaddison · 2024-06-13T12:50:37Z

Please let me know if there's anything else I can do to make progress on this pull request; thank you.

tacaswell · 2024-06-13T13:12:21Z

lib/matplotlib/tests/test_determinism.py



-def _save_figure(objects='mhi', fmt="pdf", usetex=False):
+class PathClippedImagePatch(PathPatch):


Why do we need a custom class here?

That code is borrowed from the demo_text_path.py gallery example - it's a shortcut I took when checking the examples for some suitable test coverage code.

It may be possible to refactor this class out and achieve the same test coverage - I'll look into cleaning that up.

(this is going to take me a while to get around to, but I'll confirm results when possible)

oh, I did not realize this was pulled from an example, if that is the case can you move it to in the function where it is used and add a note "lifted from example"?

Class definition relocated, and an explanatory comment added alongside.

I haven't yet attempted to simplify/refactor the code itself to remove the class entirely - it may be possible; figuring out whether the bbox redraw is required (and how to catch that event without implementing a custom pathpatch, if so) seems to be the main question there.

tacaswell · 2024-06-13T13:18:40Z

Sorry this fell off my radar.

The implementation looks good, but I do not understand the custom class in the tests. As you note, caching anything about the size/location of things in the output space is fraught and needs to be re-generated on each draw (or have careful cache invalidation). Is that class exercising something that no existing Artist class does?

The 100+ repetition case is from a rectangular clip-path defined as that appears in many of the gallery examples when rendered to SVG.

That makes sense as I suspect that as the galley figures are all the same size, many contain only one Axes, and do not have any auto-layout then location of the bounding box of the axes in output is going to be the same for all of them and we clip almost everything to the bounding box of the axes.

jayaddison · 2024-06-13T13:28:33Z

As you note, caching anything about the size/location of things in the output space is fraught and needs to be re-generated on each draw

A small clarification, so that you don't over-estimate my understanding of the code: if that note mention refers to the -JJ comment, then that's from the existing gallery sample code. Even so, it does make sense to me that clipping would need to be re-evaluated after changes to intersecting objects in the diagram/scene.

jayaddison · 2024-06-13T14:09:04Z

One more response re: a PathClippedImagePatch question after a re-read:

Is that class exercising something that no existing Artist class does?

As best as I can remember, my thinking with that case was that adding coverage for text-based paths might be worthwhile since they're relatively geometrically complicated (so another way to attempt to catch problems). Aside from that, though, I don't think it tests anything fundamentally different.

tacaswell · 2024-06-13T20:12:47Z

@jayaddison Ah, your use name started with 'J' so I thought 'JJ' was you, that is probably actually @leejjoon .

tacaswell · 2024-06-14T16:54:43Z

This is looking good to me!

@jayaddison are you willing to squash this to one or two commits?

tacaswell

Should be squashed either by OP or by merge.

jayaddison · 2024-06-14T17:57:07Z

Thank you very much @tacaswell!

Re: squashing commits: yep, I'm happy to squash this down to a single commit. I'll review the dev docs and some mainline commit messages before doing that. Even so a double-check afterwards from you and/or the person merging could be helpful.

jayaddison · 2024-06-14T22:53:38Z

lib/matplotlib/tests/test_determinism.py

@@ -67,12 +145,13 @@ def _save_figure(objects='mhi', fmt="pdf", usetex=False):
        ("m", "pdf", False),
        ("h", "pdf", False),
        ("i", "pdf", False),
-        ("mhi", "pdf", False),
-        ("mhi", "ps", False),
+        ("p", "svg", False),  # (clipping) paths are only relevant for SVG output


A late self-nitpick here: this comment seems poorly phrased and could potentially be misleading to future readers.

If I understand correctly, Matplotlib clipping is a feature that is output-format-agnostic (that is, it should work for all output formats).

The fix in this pull request only affects SVG elements named clipPath -- but they're a different concept.

I do think it could make sense to perform an isolated SVG-format-only path test (p), alongside the complete-functionality test (mhip) -- but I think the comment attempting to explain should either be improved, or omitted entirely.

At the moment I'm leaning towards removing it entirely, and perhaps relocating the line so that future readers don't consider it a possible typo/accidental difference from the preceding pdf test parameters.

(maybe an overly verbose explanation for a small detail, but I want to try to explain my thinking)

(done, and no further changes planned on this branch)

jayaddison · 2024-06-24T18:33:00Z

Re-pinging to keep this thread active; please let me know if there's anything further I should adjust here.

This change enables more diagrams to emit deterministic (repeatable) SVG format output -- provided that the prerequisite ``hashsalt`` rcParams option has been configured, and also that the clip paths themselves are added to the diagram(s) in deterministic order. Previously, the Python built-in ``id(...)`` function was used to provide a convenient but runtime-varying (and therefore non-deterministic) mechanism to uniquely identify each clip path instance; instead here we introduce an in-memory dictionary to store and lookup sequential integer IDs that are assigned to each clip path.

jayaddison · 2024-07-05T18:02:53Z

It's been a few weeks since the previous rebase, so I'm going to perform another rebase of these changes against the latest main branch to re-confirm test results.

tacaswell · 2024-07-05T19:54:23Z

@jayaddison Thank you for following up#

jayaddison · 2024-07-05T21:50:07Z

No problem - I'll admit that I'm eager for this to be merged, but I also get that release prep and branch co-ordination requires patience :)

greglucas · 2024-07-06T13:54:28Z

Congratulations on your first merged PR to matplotlib @jayaddison 🎉 We hope to see more contributions from you in the future.

jayaddison · 2024-07-06T15:33:23Z

Thank you very much @greglucas @tacaswell! I'll make sure to be around to watch for any potentially-related bugs in the bugtracker when v3.10.0 is released.

I do have one other potential issue/bugreport that I'm still researching; when I can figure out more about that I'll open a bugreport and perhaps a fix alongside if it's within my ability.

jayaddison · 2024-12-19T19:12:26Z

This thread is slightly stale now, but even so, some brief updates:

Thank you very much @greglucas @tacaswell! I'll make sure to be around to watch for any potentially-related bugs in the bugtracker when v3.10.0 is released.

Given the 3.10 release recently, I've been checking for any SVG / clipPath related bugreports in the issue tracker (and PRs, just in case). So far, so good (no reports).

I do have one other potential issue/bugreport that I'm still researching; when I can figure out more about that I'll open a bugreport and perhaps a fix alongside if it's within my ability.

I haven't been able to track that down - it was a nondeterminism issue and seemed very similar to #28574 (tick axes changing within a multi-chart grid) -- so optimistically it may be solved, but I'll revisit this if I encounter it again.

github-actions bot added the backend: svg label Feb 28, 2024

github-actions bot reviewed Feb 28, 2024

View reviewed changes

jayaddison changed the title ~~[SVG] Implement monotonically-increasing counter for clipPath identifiers~~ [SVG] Use monotonically-increasing counter for non-rectangular clip-path identifiers Feb 29, 2024

jayaddison marked this pull request as ready for review February 29, 2024 14:13

tacaswell added this to the v3.10.0 milestone Feb 29, 2024

tacaswell reviewed Mar 12, 2024

View reviewed changes

jayaddison mentioned this pull request Mar 12, 2024

[Bug]: Nondeterminism in SVG clipPath element id attributes #27831

Closed

jayaddison marked this pull request as draft April 24, 2024 18:54

github-actions bot added the backend: ps label Apr 24, 2024

jayaddison commented Apr 24, 2024

View reviewed changes

lib/matplotlib/backends/backend_ps.py Outdated Show resolved Hide resolved

This comment was marked as outdated.

Sign in to view

jayaddison commented Apr 24, 2024

View reviewed changes

lib/matplotlib/backends/backend_svg.py Show resolved Hide resolved

jayaddison commented Apr 24, 2024

View reviewed changes

lib/matplotlib/backends/backend_svg.py Show resolved Hide resolved

jayaddison changed the title ~~[SVG] Use monotonically-increasing counter for non-rectangular clip-path identifiers~~ [SVG] Introduce repeatable ID-generation scheme for clip-path identifiers. Apr 25, 2024

jayaddison changed the title ~~[SVG] Introduce repeatable ID-generation scheme for clip-path identifiers.~~ [SVG] Introduce repeatable ID-generation scheme for clip-paths. Apr 25, 2024

This comment was marked as outdated.

Sign in to view

github-actions bot removed the backend: ps label Apr 25, 2024

This comment was marked as outdated.

Sign in to view

jayaddison changed the title ~~[SVG] Introduce repeatable ID-generation scheme for clip-paths.~~ [SVG] Introduce sequential ID-generation scheme for clip-paths. Apr 28, 2024

tacaswell reviewed Apr 29, 2024

View reviewed changes

lib/matplotlib/tests/test_determinism.py Outdated Show resolved Hide resolved

jayaddison marked this pull request as ready for review April 29, 2024 22:12

jayaddison commented May 5, 2024

View reviewed changes

lib/matplotlib/backends/backend_svg.py Show resolved Hide resolved

jayaddison marked this pull request as draft May 5, 2024 19:43

jayaddison marked this pull request as ready for review May 6, 2024 17:32

jayaddison mentioned this pull request Jun 10, 2024

[Doc]: code-of-conduct URL in git repository may be outdated #27834

Closed

tacaswell reviewed Jun 13, 2024

View reviewed changes

tacaswell approved these changes Jun 14, 2024

View reviewed changes

jayaddison commented Jun 14, 2024

View reviewed changes

greglucas approved these changes Jul 6, 2024

View reviewed changes

greglucas merged commit 2d1db48 into matplotlib:main Jul 6, 2024
40 of 41 checks passed

jayaddison deleted the issue-27831/deterministic-svg-clippath-identifiers branch July 6, 2024 15:33

jayaddison mentioned this pull request Jan 30, 2025

matplotlib 3.3.0 debian amd64: tests errors #11324

Closed



		def _save_figure(objects='mhi', fmt="pdf", usetex=False):
		class PathClippedImagePatch(PathPatch):

Uh oh!

[SVG] Introduce sequential ID-generation scheme for clip-paths. #27833

[SVG] Introduce sequential ID-generation scheme for clip-paths. #27833

Uh oh!

Conversation

jayaddison commented Feb 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR summary

PR checklist

Uh oh!

github-actions bot left a comment

Choose a reason for hiding this comment

Uh oh!

jayaddison commented Mar 11, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tacaswell Mar 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tacaswell commented Mar 12, 2024

Uh oh!

jayaddison commented Mar 12, 2024

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

Uh oh!

This comment was marked as outdated.

This comment was marked as outdated.

tacaswell commented Apr 29, 2024

Uh oh!

Uh oh!

jayaddison commented May 5, 2024

Uh oh!

Uh oh!

jayaddison commented May 5, 2024

Uh oh!

tacaswell commented May 6, 2024

Uh oh!

jayaddison commented May 6, 2024

Uh oh!

jayaddison commented May 6, 2024

Uh oh!

jayaddison commented May 6, 2024

Uh oh!

jayaddison commented May 6, 2024

Uh oh!

jayaddison commented May 6, 2024

Uh oh!

jayaddison commented Jun 13, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tacaswell commented Jun 13, 2024

Uh oh!

jayaddison commented Jun 13, 2024

Uh oh!

jayaddison commented Jun 13, 2024

jayaddison commented Feb 28, 2024 •

edited

Loading

tacaswell Mar 12, 2024 •

edited

Loading