Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Break Artist._remove_method reference cycle #28861

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Oct 23, 2024

Conversation

justinjhendrick
Copy link
Contributor

@justinjhendrick justinjhendrick commented Sep 21, 2024

PR summary

I think I've found a reference cycle that can consume quite a bit of memory. The size of the reachable bytes from the reference cycle seems to grow linearly with the number of points I've plotted and Figure.clear() does not remove them. gc.collect() would remove this memory, but in performance critical applications, waiting for the GC is no fun.

The reference cycle looks like this:

Axes -> _children -> Artist (Line2D for example)
            ^           |
             \          | field: "_remove_method"
              \         v
             _children.remove               

(credit to refcycle for helping me find it)

This PR breaks the cycle by emptying out the Axes._children list instead of replacing it with an empty list. This removes the reference from the list to the Artist, allowing the reference count to reach 0 on the Artist.

I've found this method to be an effective workaround (without modifying matplotlib)

def real_clear(fig):
    artists = fig.get_children()
    for ax in fig.axes:
        artists.extend(ax.get_children())
    for artist in artists:
        try:
            artist.remove()
        except NotImplementedError:
            pass
    fig.clear()

But I would like to contribute this memory usage improvement back into matplotlib. Though I'm not sure the best way. This PR is the smallest and simplest change, but I'm not certain if it covers enough similar reference cycles or just the one that showed up in my profiling. I also don't know if this is a clean way to solve the problem because I'm not very familiar with matplotlib internals. I considered making Artist._remove_method a weakref but that seems like a larger change that I'm not ready for (yet?). How do you think we should approach this reference cycle?

Note that this PR branches from a47e26b (not latest main) because of the issues I encountered with the next commit (See #28866 and 597554d#commitcomment-147024800).

Here is the benchmark script that I've been using to evaluate the effectiveness of these changes:

import gc

import matplotlib
matplotlib.use("Agg")

from matplotlib.figure import Figure
from numpy.random import rand
import psutil

def remove_artists(fig):
    artists = fig.get_children()
    for ax in fig.axes:
        artists.extend(ax.get_children())
    for artist in artists:
        try:
            artist.remove()
        except NotImplementedError:
            pass

def plot(num_points, rm_artists):
    fig = Figure()
    ax = fig.add_subplot()
    ax.plot(rand(num_points))
    if rm_artists:
        remove_artists(fig)
    fig.clear()

def delta_mb_after_plot(num_points, rm_artists):
    proc = psutil.Process()
    before_mb = proc.memory_info().rss / 1e6
    plot(num_points, rm_artists)
    after_mb = proc.memory_info().rss / 1e6
    gc.collect()
    # after_gc_mb = proc.memory_info().rss / 1e6
    return after_mb - before_mb

def main():
    gc.disable()

    baseline = delta_mb_after_plot(1_000_000, False)
    with_rm_artists = delta_mb_after_plot(1_000_000, True)
    print(f"1 million points: {round(baseline - with_rm_artists, 3)}MB")

    baseline = delta_mb_after_plot(10_000_000, False)
    with_rm_artists = delta_mb_after_plot(10_000_000, True)
    print(f"10 million points: {round(baseline - with_rm_artists, 3)}MB")

if __name__ == "__main__":
    main()

Running the benchmark before this change prints (showing that the workaround breaks the cycle)

1 million points: 48.263MB
10 million points: 479.572MB

And after (showing that the PR behaves similarly to the workaround)

1 million points: 0.242MB
10 million points: -0.606MB

PR checklist

  • [N/A I don't have an issue to link to] "closes #0000" is in the body of the PR description to link the related issue
  • [Test added] new and changed code is tested
  • [N/A No new features] Plotting related features are demonstrated in an example
  • [N/A No new features or API changes] New Features and API Changes are noted with a directive and release note
  • [N/A No documentation changes needed] Documentation complies with general and docstring guidelines

Copy link

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for opening your first PR into Matplotlib!

If you have not heard from us in a week or so, please leave a new comment below and that should bring it to our attention. Most of our reviewers are volunteers and sometimes things fall through the cracks.

You can also join us on gitter for real-time discussion.

For details on testing, writing docs, and our review process, please see the developer guide

We strive to be a welcoming and open project. Please follow our Code of Conduct.

@tacaswell
Copy link
Member

Thank you for this. The number of loops we have is known problem that we are always open to improve! The reason for this bit of code is exactly this (to break the cycle between the Artist and its parents).

I think we are doing it with the swap like this to reduce the amount of time are are in an invalid state (we say that we are not thread safe, but that does not mean we should intentionally drive to invalid states). The invalid state I'm worried about here is the case where there are artists in the children that have their parents cleared which will cause draws to fail. I also have a concern that there maybe other references to the list object floating around.

I would propose a slight variation on this change where we keep the swap, but add old_children.clear() which I think still breaks the cycle that needs to be broken with a smaller change.

@justinjhendrick
Copy link
Contributor Author

Should we also set chld._remove_method = None inside the loop before we clear the list? I'm asking because someone could still have a reference to the child and try to call remove on it. Which will raise ValueError after the Axes have been cleared, which is new behavior.

@tacaswell
Copy link
Member

Should we also set chld._remove_method = None inside the loop before we clear the list?

Yes please.

@justinjhendrick
Copy link
Contributor Author

Looks like the new test is failing on python v3.13.

Relevant logs:

Fatal Python error: Segmentation fault

Current thread 0x00007fefd9bdab80 (most recent call first):
  File "/home/runner/work/matplotlib/matplotlib/lib/matplotlib/tests/test_axes.py", line 9201 in is_in_reference_cycle
  File "/home/runner/work/matplotlib/matplotlib/lib/matplotlib/tests/test_axes.py", line 9226 in test_axes_clear_reference_cycle

I wonder if the GC is running at the same time as is_in_reference_cycle? I'll try gc.disable() during the graph traversal and see if that changes anything

@tacaswell
Copy link
Member

I think you have found a CPython bug as you should not be able to segfault Python from Python.

@tacaswell
Copy link
Member

and it looks like the scatter got dropped from the test at some point.

@justinjhendrick
Copy link
Contributor Author

and it looks like the scatter got dropped from the test at some point.

I still see ax.scatter on line 9216. I wonder if Github is having trouble syncing?

@justinjhendrick
Copy link
Contributor Author

justinjhendrick commented Sep 25, 2024

I think you have found a CPython bug as you should not be able to segfault Python from Python.

I'll work on a minimal repro to submit to CPython then! What should we do with this test and this pull request while that's happening?

@justinjhendrick
Copy link
Contributor Author

I decided to skip the test on non-final versions of 3.13.0 because python/cpython#124538 was marked as a release blocker. So, we should be good to re-enable this test on 3.13.0-final and later.

Co-authored-by: Elliott Sales de Andrade <[email protected]>
@justinjhendrick
Copy link
Contributor Author

There are two failures on the most recent test run:

  • appveyor + micromamba seems unrelated to my change. Is that failing on main too?
  • code coverage check is complaining that I added an unreached line in the test. The unreached line is the return True in is_in_reference_cycle that would cause the test to fail. So it's a good thing that line is not reached! This seems like a benign thing to ignore, to me.

@justinjhendrick
Copy link
Contributor Author

What are we waiting for to merge this? Is there anything I need to do? The only failing test (appVeyor) seems unrelated to my changes and I can see it failing on other PRs. Has it been fixed on main yet? Should I pull latest main again?

Co-authored-by: Elliott Sales de Andrade <[email protected]>
@tacaswell tacaswell merged commit 82acbd8 into matplotlib:main Oct 23, 2024
35 of 40 checks passed
@QuLogic QuLogic added this to the v3.10.0 milestone Oct 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

3 participants