-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
[SVG] Introduce sequential ID-generation scheme for clip-paths. #27833
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SVG] Introduce sequential ID-generation scheme for clip-paths. #27833
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for opening your first PR into Matplotlib!
If you have not heard from us in a week or so, please leave a new comment below and that should bring it to our attention. Most of our reviewers are volunteers and sometimes things fall through the cracks.
You can also join us on gitter for real-time discussion.
For details on testing, writing docs, and our review process, please see the developer guide
We strive to be a welcoming and open project. Please follow our Code of Conduct.
Comment-pinging to check for possible review on this PR - thank you! |
@@ -590,7 +596,7 @@ def _get_clip_attrs(self, gc): | |||
clippath, clippath_trans = gc.get_clip_path() | |||
if clippath is not None: | |||
clippath_trans = self._make_flip_transform(clippath_trans) | |||
dictkey = (id(clippath), str(clippath_trans)) | |||
dictkey = (self._get_next_clip_id(), str(clippath_trans)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this breaks the re-use of a clip path if we have multiple artists that use the same clip path.
We need to keep a second dictionary that maps id(clippath) -> incrementing int
so that on line 608 when we make the oid we can use that instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wont let me comment on line 608, but I think the logic there should be something like
if clip is None:
stable_id = self._get_next_clip_id()
oid = self._make_id('p', stable_id)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so I don't think we actually need a second dictionary to keep the mapping? That seems good to not pick up extra state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, yep. It seems like some extra test coverage would be useful in that case, to confirm whether keeping an id-mapping is required (and to test any fixes if so).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure the test coverage I've added makes semantic/usage sense, but it does now cover sharing a clip-path (technically a Patch
) object across multiple artists.
I've added some test coverage on ID-uniqueness in the generated SVG at the same time after finding some code/issue history mentioning that it's important to maintain distinct identifiers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(however, as noted in #27831 - I missed a note that recommended a particular style of test coverage - it's not yet added here)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this breaks the re-use of a clip path if we have multiple artists that use the same clip path.
We need to keep a second dictionary that maps
id(clippath) -> incrementing int
so that on line 608 when we make the oid we can use that instead.
This pull request has been updated to implement this in a way that (I hope!) follows the intended behaviour, after re-reading the issue thread and details like the above. There's no second-level dictionary required, but we do make use of one dictionary to store the clip-to-incrementing-id mapping.
I don't think the changes are ready quite yet though; the 2x2 grid test case is yet to be added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so I don't think we actually need a second dictionary to keep the mapping? That seems good to not pick up extra state.
The implication of this is that further refactoring in this part of the code could be helpful -- and I agree that there seem to be opportunities to simplify the logic here. However: I think it's important to confirm that the test coverage is sufficient first.
sorry for the delay. |
No problem, thank you for the comments! |
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
Can you rebase [and force push to your branch] this on main (rather than merging)? |
I realized that I hadn't visually inspected the results of rendering the 2x2 star grid example in the test cases -- and related to that, worried that perhaps the re-use of clip path IDs could cause problems in that case. However, the results do appear correct to me: And included below is the 'before' image -- where I've rendered the same diagram but without the changes to (note: I screenshotted these using manually-selected rectangular screen regions, so the screenshots are not bit-for-bit identical; and indeed neither are the SVG files they were displayed from, as expected -- but re-rendering the files with the |
Perhaps this is an unlikely scenario, or perhaps it is a non-issue, but before merge I would like to confirm that the following situation is handled reasonably:
My specific concern here is that the I'd forgotten about that particular problem until recently, but it may be relevant here. |
Collisions between elements in different svgs in the same html document is a real problem (we have had bug reports about this in the past). The hash key includes details about the clip path transform so we may avoid collision (or if we do have collisions we may still be OK!), but that should be tested. I think the ultimate fix may be to include something deterministic for the figure layer (like mixing |
I have to admit, I'd forgotten (I am a goldfish) that the string-representation of the clippath is included in the hash key -- that mostly reassures me that the problem should be infrequent, if it occurs at all. I'll move the PR back into ready-for-review state because most of that concern is resolved, but even so I'll try to build some more confidence about this by running some more checks locally. |
The following testing methology for ID collisions isn't exhaustive, but is what I'm starting with:
So far every In particular, one |
(as a sidenote: stylesheet-related name collisions are in fact the cause of the problem I'd remembered in the past, not ID collisions -- it would be nice to confirm that IDs either are, or are not, namespace-internal to embedded SVG elements; there is almost certainly a smart way to do that with some simple local testcases. even so, I've started some of this testing, so I'm going to try to learn a few facts from it) |
The 100+ repetition case is from a rectangular clip-path defined as |
Based on the results so far, I'm fairly confident about the state of the updated ID-generation scheme. I also re-checked the code and confirmed that it is using an acceptably-strong hash function during |
Please let me know if there's anything else I can do to make progress on this pull request; thank you. |
|
||
|
||
def _save_figure(objects='mhi', fmt="pdf", usetex=False): | ||
class PathClippedImagePatch(PathPatch): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need a custom class here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That code is borrowed from the demo_text_path.py
gallery example - it's a shortcut I took when checking the examples for some suitable test coverage code.
It may be possible to refactor this class out and achieve the same test coverage - I'll look into cleaning that up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(this is going to take me a while to get around to, but I'll confirm results when possible)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, I did not realize this was pulled from an example, if that is the case can you move it to in the function where it is used and add a note "lifted from example"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Class definition relocated, and an explanatory comment added alongside.
I haven't yet attempted to simplify/refactor the code itself to remove the class entirely - it may be possible; figuring out whether the bbox redraw is required (and how to catch that event without implementing a custom pathpatch, if so) seems to be the main question there.
Sorry this fell off my radar. The implementation looks good, but I do not understand the custom class in the tests. As you note, caching anything about the size/location of things in the output space is fraught and needs to be re-generated on each draw (or have careful cache invalidation). Is that class exercising something that no existing Artist class does?
That makes sense as I suspect that as the galley figures are all the same size, many contain only one Axes, and do not have any auto-layout then location of the bounding box of the axes in output is going to be the same for all of them and we clip almost everything to the bounding box of the axes. |
A small clarification, so that you don't over-estimate my understanding of the code: if that note mention refers to the |
One more response re: a
As best as I can remember, my thinking with that case was that adding coverage for text-based paths might be worthwhile since they're relatively geometrically complicated (so another way to attempt to catch problems). Aside from that, though, I don't think it tests anything fundamentally different. |
@jayaddison Ah, your use name started with 'J' so I thought 'JJ' was you, that is probably actually @leejjoon . |
This is looking good to me! @jayaddison are you willing to squash this to one or two commits? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be squashed either by OP or by merge.
Thank you very much @tacaswell! Re: squashing commits: yep, I'm happy to squash this down to a single commit. I'll review the dev docs and some mainline commit messages before doing that. Even so a double-check afterwards from you and/or the person merging could be helpful. |
@@ -67,12 +145,13 @@ def _save_figure(objects='mhi', fmt="pdf", usetex=False): | |||
("m", "pdf", False), | |||
("h", "pdf", False), | |||
("i", "pdf", False), | |||
("mhi", "pdf", False), | |||
("mhi", "ps", False), | |||
("p", "svg", False), # (clipping) paths are only relevant for SVG output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A late self-nitpick here: this comment seems poorly phrased and could potentially be misleading to future readers.
If I understand correctly, Matplotlib clipping is a feature that is output-format-agnostic (that is, it should work for all output formats).
The fix in this pull request only affects SVG elements named clipPath
-- but they're a different concept.
I do think it could make sense to perform an isolated SVG-format-only path test (p
), alongside the complete-functionality test (mhip
) -- but I think the comment attempting to explain should either be improved, or omitted entirely.
At the moment I'm leaning towards removing it entirely, and perhaps relocating the line so that future readers don't consider it a possible typo/accidental difference from the preceding pdf
test parameters.
(maybe an overly verbose explanation for a small detail, but I want to try to explain my thinking)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(done, and no further changes planned on this branch)
Re-pinging to keep this thread active; please let me know if there's anything further I should adjust here. |
This change enables more diagrams to emit deterministic (repeatable) SVG format output -- provided that the prerequisite ``hashsalt`` rcParams option has been configured, and also that the clip paths themselves are added to the diagram(s) in deterministic order. Previously, the Python built-in ``id(...)`` function was used to provide a convenient but runtime-varying (and therefore non-deterministic) mechanism to uniquely identify each clip path instance; instead here we introduce an in-memory dictionary to store and lookup sequential integer IDs that are assigned to each clip path.
It's been a few weeks since the previous rebase, so I'm going to perform another rebase of these changes against the latest |
@jayaddison Thank you for following up# |
No problem - I'll admit that I'm eager for this to be merged, but I also get that release prep and branch co-ordination requires patience :) |
Congratulations on your first merged PR to matplotlib @jayaddison 🎉 We hope to see more contributions from you in the future. |
Thank you very much @greglucas @tacaswell! I'll make sure to be around to watch for any potentially-related bugs in the bugtracker when v3.10.0 is released. I do have one other potential issue/bugreport that I'm still researching; when I can figure out more about that I'll open a bugreport and perhaps a fix alongside if it's within my ability. |
This thread is slightly stale now, but even so, some brief updates:
Given the 3.10 release recently, I've been checking for any SVG / clipPath related bugreports in the issue tracker (and PRs, just in case). So far, so good (no reports).
I haven't been able to track that down - it was a nondeterminism issue and seemed very similar to #28574 (tick axes changing within a multi-chart grid) -- so optimistically it may be solved, but I'll revisit this if I encounter it again. |
PR summary
This pull request is intended to improve the reproducibility of SVG output from
matplotlib
, by removing variability from the ID generation scheme for the identifiers of<clipPath>
XML elements (and references to them).In particular, use of the Python built-in
id(...)
function, that retrieves an integer identifier for an object in memory at runtime -- not necessarily a memory address, but often so -- is removed and replaced by a monotonically increasing counter value.Closes #27831.
PR checklist
Plotting related features are demonstrated in an exampleand API Changesare noted with adirective andrelease noteEdit: use a more-direct hyperlink to the test coverage recommendation.