Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[Bug]: matplotlib default backend crashes when used with embedded python interpreter #23419

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
rohany opened this issue Jul 12, 2022 · 18 comments

Comments

@rohany
Copy link

rohany commented Jul 12, 2022

Bug summary

I'm using an embedded python interpreter with a green threading library underneath it (https://github.com/nv-legate/legate.core), and I get an error when trying to create a plot from matplotlib using the non-default interpreter.

Code for reproduction

import matplotlib.pyplot as plt
fig = plt.plot()

Actual outcome

2022-07-12 13:41:19.040 legion_python[64903:5195159] *** Terminating app due to uncaught exception 'NSInternalInconsistencyException', reason: 'NSWindow drag regions should only be invalidated on the Main Thread!'
*** First throw call stack:
(
	0   CoreFoundation                      0x00007fff323dc4d7 __exceptionPreprocess + 250
	1   libobjc.A.dylib                     0x00007fff6ae395bf objc_exception_throw + 48
	2   CoreFoundation                      0x00007fff32404cbc -[NSException raise] + 9
	3   AppKit                              0x00007fff2f601c1c -[NSWindow(NSWindow_Theme) _postWindowNeedsToResetDragMarginsUnlessPostingDisabled] + 310
	4   AppKit                              0x00007fff2f5e9682 -[NSWindow _initContent:styleMask:backing:defer:contentView:] + 1416
	5   AppKit                              0x00007fff2f5e90f3 -[NSWindow initWithContentRect:styleMask:backing:defer:] + 42
	6   _macosx.cpython-39-darwin.so        0x000000015a2555ac -[Window initWithContentRect:styleMask:backing:defer:withManager:] + 76
	7   _macosx.cpython-39-darwin.so        0x000000015a258a49 FigureManager_init + 281
	8   libpython3.9.dylib                  0x000000010ab3a91c wrap_init + 12
	9   libpython3.9.dylib                  0x000000010aabe35f wrapperdescr_call + 911
	10  libpython3.9.dylib                  0x000000010abcde27 _PyEval_EvalFrameDefault + 52903
	11  libpython3.9.dylib                  0x000000010aab1094 _PyFunction_Vectorcall + 420
	12  libpython3.9.dylib                  0x000000010ab3a27a slot_tp_init + 346
	13  libpython3.9.dylib                  0x000000010ab3fa50 type_call + 272
	14  libpython3.9.dylib                  0x000000010abcde27 _PyEval_EvalFrameDefault + 52903
	15  libpython3.9.dylib                  0x000000010aab1094 _PyFunction_Vectorcall + 420
	16  libpython3.9.dylib                  0x000000010aab641e method_vectorcall + 158
	17  libpython3.9.dylib                  0x000000010abcaa8c _PyEval_EvalFrameDefault + 39692
	18  libpython3.9.dylib                  0x000000010aab13c2 _PyFunction_Vectorcall + 1234
	19  libpython3.9.dylib                  0x000000010aab641e method_vectorcall + 158
	20  libpython3.9.dylib                  0x000000010abcf319 _PyEval_EvalFrameDefault + 58265
	21  libpython3.9.dylib                  0x000000010aab13c2 _PyFunction_Vectorcall + 1234
	22  libpython3.9.dylib                  0x000000010abcf319 _PyEval_EvalFrameDefault + 58265
	23  libpython3.9.dylib                  0x000000010aab13c2 _PyFunction_Vectorcall + 1234
	24  libpython3.9.dylib                  0x000000010abc7e68 _PyEval_EvalFrameDefault + 28392
	25  libpython3.9.dylib                  0x000000010aab1094 _PyFunction_Vectorcall + 420
	26  libpython3.9.dylib                  0x000000010abc7e68 _PyEval_EvalFrameDefault + 28392
	27  libpython3.9.dylib                  0x000000010aab13c2 _PyFunction_Vectorcall + 1234
	28  libpython3.9.dylib                  0x000000010abc7e68 _PyEval_EvalFrameDefault + 28392
	29  libpython3.9.dylib                  0x000000010aab13c2 _PyFunction_Vectorcall + 1234
	30  libpython3.9.dylib                  0x000000010abcaa8c _PyEval_EvalFrameDefault + 39692
	31  libpython3.9.dylib                  0x000000010abbf337 _PyEval_EvalCode + 663
	32  libpython3.9.dylib                  0x000000010abba5a9 builtin_exec + 329
	33  libpython3.9.dylib                  0x000000010ab0f067 cfunction_vectorcall_FASTCALL + 103
	34  libpython3.9.dylib                  0x000000010abc7e68 _PyEval_EvalFrameDefault + 28392
	35  libpython3.9.dylib                  0x000000010aab1094 _PyFunction_Vectorcall + 420
	36  libpython3.9.dylib                  0x000000010abc961b _PyEval_EvalFrameDefault + 34459
	37  libpython3.9.dylib                  0x000000010aab13c2 _PyFunction_Vectorcall + 1234
	38  libpython3.9.dylib                  0x000000010abc961b _PyEval_EvalFrameDefault + 34459
	39  libpython3.9.dylib                  0x000000010aab1094 _PyFunction_Vectorcall + 420
	40  libpython3.9.dylib                  0x000000010abc961b _PyEval_EvalFrameDefault + 34459
	41  libpython3.9.dylib                  0x000000010aab13c2 _PyFunction_Vectorcall + 1234
	42  libpython3.9.dylib                  0x000000010aab641e method_vectorcall + 158
	43  libpython3.9.dylib                  0x000000010abc7dc2 _PyEval_EvalFrameDefault + 28226
	44  libpython3.9.dylib                  0x000000010aab1094 _PyFunction_Vectorcall + 420
	45  libpython3.9.dylib                  0x000000010abc7e68 _PyEval_EvalFrameDefault + 28392
	46  libpython3.9.dylib                  0x000000010aab1094 _PyFunction_Vectorcall + 420
	47  librealm.dylib                      0x0000000109694f89 _ZN5Realm20LocalPythonProcessor12execute_taskEjRKNS_12ByteArrayRefE + 1673
	48  librealm.dylib                      0x0000000109447457 _ZN5Realm4Task20execute_on_processorENS_9ProcessorE + 471
	49  librealm.dylib                      0x000000010944e8a6 _ZN5Realm25KernelThreadTaskScheduler12execute_taskEPNS_4TaskE + 22
	50  librealm.dylib                      0x0000000109690a20 _ZN5Realm25PythonThreadTaskScheduler12execute_taskEPNS_4TaskE + 48
	51  librealm.dylib                      0x000000010944cbfb _ZN5Realm21ThreadedTaskScheduler14scheduler_loopEv + 2651
	52  librealm.dylib                      0x0000000109690da7 _ZN5Realm25PythonThreadTaskScheduler21python_scheduler_loopEv + 775
	53  librealm.dylib                      0x0000000109437d1c _ZN5Realm12KernelThread13pthread_entryEPv + 620
	54  libsystem_pthread.dylib             0x00007fff6c1e6109 _pthread_start + 148
	55  libsystem_pthread.dylib             0x00007fff6c1e1b8b thread_start + 15
)

Expected outcome

I expect this to not error out.

Additional information

No response

Operating system

OS/X

Matplotlib Version

3.5.1

Matplotlib Backend

MacOSX

Python version

3.9.31

@jklymak
Copy link
Member

jklymak commented Jul 12, 2022

Does the macosx backend work in normal interpreter for you? Do other backends (qt5agg) work for you? How did you install matplotlib?

@rohany
Copy link
Author

rohany commented Jul 12, 2022

matplotlib works in the normal interpreter! Within the embedded interpreter the agg backend works, I can try some others. I installed matplotlib through conda.

@tacaswell
Copy link
Member

I would not expect any UI toolkit to run under such conditions.

@rohany If you know a way a reliably way to check that we are being run under a non-standard interpreter we will consider merging a patch with it (it can not have any additional dependencies and can not put a significant time burden in the plain CPython interpreter). However, in the short term I suggest either setting the backend to something non-interactive or simple not using plt at all (you can create matplotlib.figure.Figure objects and boot-strap from there if you know that you will never need a UI).

@rohany
Copy link
Author

rohany commented Jul 13, 2022

I'll get back to you on whether there's a way to check about the non-standard interpreter, but it would be great if there was a more user friendly error message in situations like this to suggest trying a non-interactive backend.

@tacaswell
Copy link
Member

2022-07-12 13:41:19.040 legion_python[64903:5195159] *** Terminating app due to uncaught exception 'NSInternalInconsistencyException', reason: 'NSWindow drag regions should only be invalidated on the Main Thread!'

The error is that you can not create GUI windows on background threads. I guess we would also catch this in the objective-c and turn this into a normal Python error. I think we would be happy to review a patch that did that.

That said, we would only catch that after it was too late to do anything about it. The best place to put a cut would would be some code in backends/backend_osx.py that would raise ImportError if you can detect.

Have you reported this to legate-python? If this is un-avoidable on their side, then they should document that non-interactive backends must be used.

@lightsighter
Copy link

lightsighter commented Jul 13, 2022

If you know a way a reliably way to check that we are being run under a non-standard interpreter we will consider merging a patch with it

To be clear this is not "a non-standard interpreter". The interpreter is literally an unmodified CPython interpreter and is being embedded in a way that fully abides by all the rules and requirements specified in the CPython documentation. If you can point out a way that we are not abiding by the CPython embedding rules we'll be happy to fix it.
https://docs.python.org/3/extending/embedding.html

Rather than using the kernel thread ID, you should be using PyThreadState objects to check whether you are on the same logical "thread" from the Python interpreter's perspective. Comparing PyThreadState objects will work correctly regardless of the threading model or embedding mode of the Python interpreter.
https://docs.python.org/3/c-api/init.html#c.PyThreadState_Get

@lightsighter
Copy link

lightsighter commented Jul 13, 2022

The error is that you can not create GUI windows on background threads.

If you check, I think you'll find that we're trying to make a window on the same logical PyThreadState as the one that "starts" the interpreter but on a different kernel thread.

@tacaswell
Copy link
Member

To be clear this is not "a non-standard interpreter".

I read "embedded" in the OP as "embedded in hardware" where I thought all bets are off. I apologize for my misunderstanding.

Rather than using the kernel thread ID,

If you check, I think you'll find that we're trying to make a window on the same logical PyThreadState as the one that "starts" the interpreter but on a different kernel thread.

However, I do not think Matplotlib is doing this check or raising the error. If I am reading the tracback right

matplotlib/src/_macosx.m

Lines 1405 to 1408 in 4f5cacf

self = [super initWithContentRect: rect
styleMask: mask
backing: bufferingType
defer: deferCreation];
is the last place that we are in the Matplotlib cobe before the failure. I think that this exception is coming out of the underlying OSX gui toolkit which is naive to Python threads and from the trackback (and inferring form context from other GUI toolkits) it appears to demand that all GUI work be done on the process "Main" thread from the point of view of the kernel.

Have you gotten any of the GUI toolkits to work from inside of legate? From my understanding of GUI main loops and my very rough understanding of greenlet threads (and a reasonable understanding of asyncio), I would guess that the need to have a blocking event loop for the UI to be responsive is going to play very badly with any kind of cooperative multitasking.

@anntzer
Copy link
Contributor

anntzer commented Jul 13, 2022

May also be worth checking why the thread check at

def _warn_if_gui_out_of_main_thread():
if (_get_required_interactive_framework(_get_backend_mod())
and threading.current_thread() is not threading.main_thread()):
_api.warn_external(
"Starting a Matplotlib GUI outside of the main thread will likely "
"fail.")
# This function's signature is rewritten upon backend-load by switch_backend.
def new_figure_manager(*args, **kwargs):
"""Create a new figure manager instance."""
_warn_if_gui_out_of_main_thread()
return _get_backend_mod().new_figure_manager(*args, **kwargs)
does not trigger? Perhaps it's related to the green threads and could be modified to take them into account?

@lightsighter
Copy link

I think that this exception is coming out of the underlying OSX gui toolkit which is naive to Python threads and from the trackback (and inferring form context from other GUI toolkits) it appears to demand that all GUI work be done on the process "Main" thread from the point of view of the kernel.

@rohany is going to try on Linux too to see what the error looks like there.

Have you gotten any of the GUI toolkits to work from inside of legate? From my understanding of GUI main loops and my very rough understanding of greenlet threads (and a reasonable understanding of asyncio), I would guess that the need to have a blocking event loop for the UI to be responsive is going to play very badly with any kind of cooperative multitasking.

The underlying runtime that manages the "task" executions guarantees forward progress of an application even if a thread running a task goes off into an event loop. It's how we support execution even in the presence of interactive interpreter sessions. We can handle a GUI doing the same thing.

May also be worth checking why the thread check at does not trigger? Perhaps it's related to the green threads and could be modified to take them into account?

See my comment above about checking the PyThreadState objects. The threading module internally checks the equivalence of threads using their PyThreadState objects which is the pythonic way of doing it, which in this particular case are the same, so even though they are different kernel threads, they are the same logical thread to the Python interpreter, which is why we're making it through that check and then dying later on when the actual kernel threads are compared.

@lightsighter
Copy link

On Linux, we get the following error:

QObject::~QObject: Timers cannot be stopped from another thread

A little bit of introspection in gdb shows me that the PyThreadState object is the same as the "main" thread, but it is a different kernel thread where this error occurs. The thing that I keep coming back to is that CPython clearly decoupled itself from the threading of the application using the PyThreadState objects, so applications using Python can adopt whatever threading model they desire. While matplotlib is a Python library, it seems to be making stronger assumptions about the underlying threading model than CPython guarantees to Python programs.

@tacaswell
Copy link
Member

Again, this error is not coming out of Matplotlib, from that snippet I think that is the destructor of a c++ object from Qt which, as with OSX above, is naive to the Python threading model. In both cases the errors are happening in layers of the code that we can not control, other than not calling them.

Can you test than any GUI work (e.g. using PyQt) directly without Matplotlib involved does work as you expect? I do not expect that to work reliably.

I think there are three reasonable solutions to this:

  1. someone from legate puts in a PR that can detect when we are being run under legate and raise an ImportError at the top of each of the GUI backends (assuming it is simple, cheap, and dependency free)
  2. when legate launches the embedded interpreter you can set sys.modules['matplotlib.backends.backend_qt'] = None etc for the problematic modules (which will cause Matplotlib to fallback to Agg if it can not import any of the interactive backends).
  3. legate documents "you can not use mpl with interactive backends under legate" in the legate docs

@lightsighter
Copy link

In both cases the errors are happening in layers of the code that we can not control, other than not calling them.

Most Python libraries don't have any restrictions on the underlying threading model. If you're depending on libraries that require stronger assumptions I feel like users should have to opt-in to that behavior rather than making it the default. You've clearly already got the hooks for switching the backends.

I think there are three reasonable solutions to this:

We can do one of those things, but that doesn't fix the underlying issue. Anybody that uses a green threading library under Python will have the same problems with matplotlib. I'm going to add a fourth option: if there is a requirement that all rendering work is done on the same thread (as it seems to be across these backends), why doesn't matplotlib make its own thread for doing that work and send rendering commands to it? That way you can guarantee that all rendering always happens on the same thread and you don't have to rely on the client to provide you a thread that abides by your restrictions.

@tacaswell
Copy link
Member

I feel like users should have to opt-in to that behavior rather than making it the default.

Obviously no default is perfect for everyone. By default we will try to fallback through the available backends until we find one that works. The order is set so that if the can use a GUI toolkit we will (because a major target of Matplotlib is interactive use from a shell), see https://matplotlib.org/stable/users/explain/backends.html for details. If the default does not work or the user wants a particular backend, we provide a number of ways that a user can specify exactly which backend they want to use (and will use it or fail).

We are able to gracefully handle cases like the toolkits not being installed or detecting that we are on a headless linux server as part of the fallback logic. Options 1 and 2 above are variations on this (detect a case where we know GUI's do not work and skip them). Option 3 above is a version of users who have a need can override the default (either via config file, ENV, or in code).

As for option 4, the issue is not the rendering it is creating / managing / interacting with the GUI toolkit objects, trying to move that to an additional thread would not work (because as is noted in this thread those libraries are picky about what threads they are running on). Further, we are a library and should not have an opinion about our users choices about threading, however it is on the users to both manage thread safety (Matplotlib is very not thread safe in general) and ensuring compatibility with the GUI toolkits.

@anntzer
Copy link
Contributor

anntzer commented Jul 17, 2022

If I understand the situation correctly, the problem comes from the fact that GUI toolkits want to run on the OS-level main thread, but we check Python-level threads, which don't map one-to-one to OS threads on Legate.
As a possible workaround, does threading.get_native_id() == threading.main_thread().native_id check the right thing on Legate?

@lightsighter
Copy link

As for option 4, the issue is not the rendering it is creating / managing / interacting with the GUI toolkit objects, trying to move that to an additional thread would not work (because as is noted in this thread those libraries are picky about what threads they are running on).

Right, that is precisely why matplotlib should make its own thread (probably using the threading module to manage all interactions with the backend if it has such restrictions). Pretty much every implementation that I've seen of OpenGL or DirectX makes its own thread for exactly the same reason: to be able to manage all interactions with the window managers because they have similar requirements as the backends that matplotlib is offloading to.

Further, we are a library and should not have an opinion about our users choices about threading

I agree and I'm not saying that you should. I'm saying you should explicitly create your own thread internally to interact with the backend (whatever it is). The thread would be one that you own and control the semantics of, not anything that the matplotlib client gives you. If you use the threading module in Python to create a new thread, that must be a new kernel thread. That way you can guarantee that all backend interactions come from the same thread regardless of what threads are invoking matplotlib functionality.

however it is on the users to both manage thread safety (Matplotlib is very not thread safe in general) and ensuring compatibility with the GUI toolkits.

That is orthogonal to my point above. It's fine to expect clients to manage the thread safety of calling into matplotlib, but that doesn't prevent you from having an architecture that offloads all interactions with the backend onto an internal thread that you control.

If I understand the situation correctly, the problem comes from the fact that GUI toolkits want to run on the OS-level main thread, but we check Python-level threads, which don't map one-to-one to OS threads on Legate.

That only matters to the question as to why matplotlib isn't giving a nice error message. If you fix that all that changes is we get a nicer error message, but one that still seems like matplotlib is blaming the user for choices that it made for its internal dependences.

As a possible workaround, does threading.get_native_id() == threading.main_thread().native_id check the right thing on Legate?

Yes, in this case, you'll see the actual kernel thread ID and your error message will fire.

@zoeouyang2543
Copy link

hello,i got this error when I use matplotlib :
Starting a Matplotlib GUI outside of the main thread will likely fail.
fig = mpyplot.figure(figid + numfig)
*** Terminating app due to uncaught exception 'NSInternalInconsistencyException', reason: 'NSWindow should only be instantiated on the main thread!'
Operating system

Mac OS m1

Matplotlib Version

3.9.4

Matplotlib Backend

MacOSX

Python version

3.9.21

please give me a hand, thanks @tacaswell @anntzer @jklymak @lightsighter @rohany

@timhoffm
Copy link
Member

@zoeouyang2543
Starting a Matplotlib GUI from outside the main thread is not possible. This is typically a limitation of the underlying GUI frameworks. See https://matplotlib.org/3.9.3/users/faq.html#work-with-threads

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants