Thanks to visit codestin.com
Credit goes to github.com

Skip to content

NOTRACE_DISPATCH breaks tracing #96636

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
brandtbucher opened this issue Sep 7, 2022 · 10 comments
Closed

NOTRACE_DISPATCH breaks tracing #96636

brandtbucher opened this issue Sep 7, 2022 · 10 comments
Labels
3.11 only security fixes 3.12 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage release-blocker type-bug An unexpected behavior, bug, or error type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@brandtbucher
Copy link
Member

Most of our quickened instructions start by asserting that tracing isn't enabled, and end by calling NOTRACE_DISPATCH(), which skips tracing checks. This is safe for many of our specializations (like those for BINARY_OP and COMPARE_OP, which operate on known safe types), but not all of them. The problem is that any time we decref an unknown object, arbitrary finalizers could run and enable tracing.

It seems that our current practice is predicated on the idea that it's okay for tracing to start "sometime in the future", since we make no guarantees about when finalizers run. There are two issues with this:

  • Trace functions may temporarily trace some frames, but not others.
  • As mentioned, the interpreter loop is littered with asserts that will fail if this happens.

Here's a sort-of-minimal reproducer of what this can look like on debug builds:

Python 3.11.0rc1+ (heads/3.11:bb0dab5c48, Sep  6 2022, 22:29:10) [GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> class C:
...     def __init__(self, x):
...         self.x = x
...     def __del__(self):
...         if self.x:
...             breakpoint()
... 
>>> def f(x):
...     C(x).x, C
... 
>>> for _ in range(8):
...     f(False)
... 
>>> f(True)
--Return--
> <stdin>(6)__del__()->None
(Pdb) next
python: Python/ceval.c:3076: _PyEval_EvalFrameDefault: Assertion `cframe.use_tracing == 0' failed.
Aborted

In this case LOAD_ATTR_INSTANCE_VALUE's NOTRACE_DISPATCH() invalidates LOAD_GLOBAL_MODULE's assert(cframe.use_tracing == 0).

One option is to just rip out the asserts, which solves the more serious issue of crashing debug builds. However, I think that the best solution may be to stop using NOTRACE_DISPATCH() altogether. It's really hard to get right, and only saves us a single memory load and bitwise |. I benchmarked a branch that uses DISPATCH() for all instructions, and the result was "1.00x slower" than main. @markshannon tried the same thing, and got "1.01x slower". So going this route isn't free, but also not a huge penalty for correctness and peace-of-mind.

Flagging as a release blocker, since 3.11 is affected.

@markshannon @sweeneyde @pablogsal

@brandtbucher brandtbucher added type-bug An unexpected behavior, bug, or error performance Performance or resource usage interpreter-core (Objects, Python, Grammar, and Parser dirs) 3.11 only security fixes type-crash A hard crash of the interpreter, possibly with a core dump 3.12 only security fixes labels Sep 7, 2022
@markshannon
Copy link
Member

markshannon commented Sep 7, 2022

We can keep NOTRACE_DISPATCH() for the LOAD_FAST and LOAD_CONSTANT superinstructions. That was the original use of NOTRACE_DISPATCH() and is correct.

I agree, the other uses are just too risky.

We can fix this more efficiently for 3.12 using PEP 669 (or a similar approach internally if the PEP is rejected).

@markshannon
Copy link
Member

markshannon commented Sep 7, 2022

Thinking about this a bit more, the LOAD_FAST may not be safe either.
(C(0).x, local1, local2) would translate into:

LOAD_ATTR
LOAD_FAST__LOAD_FAST

which has the same problem.

@brandtbucher
Copy link
Member Author

Hm, I don't see how, since LOAD_ATTR_WHATEVER; LOAD_FAST__LOAD_FAST will deopt to LOAD_ATTR; LOAD_FAST; LOAD_FAST under tracing. And LOAD_FAST__LOAD_FAST can't run arbitrary code.

Regardless, I think that the fact that we're having trouble proving the correctness of even simple instructions like these is a strong point in favor of just using DISPATCH() everywhere.

markshannon added a commit that referenced this issue Sep 8, 2022
markshannon added a commit to faster-cpython/cpython that referenced this issue Sep 8, 2022
@sweeneyde
Copy link
Member

sweeneyde commented Sep 8, 2022

A question about the removal of NOTRACE_DISPATCH_SAME_OPARG: couldn't that lead to duplicate traces for a single instruction? As in, the adaptive instruction attempts to specialize and fails, does DO_TRACING, then re-does the adaptive instruction, then does DO_TRACING again in the same spot?

@brandtbucher
Copy link
Member Author

I don't think so, since DO_TRACING deopts the next instruction. So the execution would go BINARY_OP_ADAPTIVE; DO_TRACING; BINARY_OP, for example.

@sweeneyde
Copy link
Member

Okay I see. I was thinking that something like BINARY_OP_ADAPTIVE; DO_TRACING; BINARY_OP; DO_TRACING could be problematic and duplicate a line trace, but if I'm understanding right:

  • The only way that BINARY_OP_ADAPTIVE; DO_TRACING; BINARY_OP; DO_TRACING could happen is if _Py_Specialize_BinaryOp(...) itself somehow called sys.settrace(), which looks to be impossible.
  • Tracing happens about the instruction that is about to happen, so BINARY_OP_ADAPTIVE; DO_TRACING; BINARY_OP; DO_TRACING would trace the BINARY_OP and trace whatever comes next.

So I believe it impossible for an adaptive instruction to be followed by DO_TRACING. (?)

@brandtbucher
Copy link
Member Author

So I believe it impossible for an adaptive instruction to be followed by DO_TRACING. (?)

In practice, yeah, it doesn't really happen, since the specialization code needs to be extremely careful not to change the state of the program it's currently trying to optimize.

But it's not impossible, and catching it is really tricky (as we've seen). For example, the LOAD_ATTR specialization can call PyType_Ready, which can do all sorts of nonsense (like enable tracing). So I think it's safest to just use DISPATCH().

@gvanrossum
Copy link
Member

There seem to be only three people who understand DO_TRACING. I can't find any docs for it, not even a single comment explaining it. Maybe we should have a write-up somewhere? I would volunteer but I don't think I understand enough of it.

So far I believe this: when tracing is turned on (in quickened code only?) the opcode variable (but not the opcode byte in the bytecode string) is OR-ed with 255, and that makes us hit the case DO_TRACING: in the switch (which is really case 255:). The actual logic is mostly obscured by macros. The OR-ing happens in DISPATCH() and in PREDICT() using opcode |= cframe.use_tracing OR_DTRACE_LINE. When tracing is turned on, cframe.use_tracing is set to 255, in _PyThreadState_UpdateTracingState(), and when it's turned off, that variable is set to 0 again. There are also numerous macros named TRACE_FUNCTION_*() that check for cframe.use_tracing.

@brandtbucher
Copy link
Member Author

So far I believe this: when tracing is turned on (in quickened code only?)

The path is the same for both quickened and unquickened code. DO_TRACING calls trace functions, then re-dispatches to the vanilla, un-quickened version of the actual opcode. So it doesn't matter if code has been quickened - when tracing, only the un-quickened version of each instruction will run. See below for a walkthrough of what happens.

the opcode variable (but not the opcode byte in the bytecode string) is OR-ed with 255, and that makes us hit the case DO_TRACING: in the switch (which is really case 255:). The actual logic is mostly obscured by macros. The OR-ing happens in DISPATCH() and in PREDICT() using opcode |= cframe.use_tracing OR_DTRACE_LINE. When tracing is turned on, cframe.use_tracing is set to 255, in _PyThreadState_UpdateTracingState(), and when it's turned off, that variable is set to 0 again.

Yep. This or-ing is done to avoid branching on the tracing flag, and just using the normal instruction dispatch mechanism instead with a branchless modification.

So, to give a concrete example, let's assume that I have the quickened instructions LOAD_ATTR_INSTANCE_VALUE; BINARY_OP_ADD_INT. While executing LOAD_ATTR_INSTANCE_VALUE, tracing is turned on. This is what happens during the next DISPATCH():

  • opcode and oparg are loaded, and opcode == BINARY_OP_ADD_INT.
  • opcode is or-ed with cframe.use_tracing. Now opcode == 255 == DO_TRACING.
  • We jump to the DO_TRACING case, and fire off some tracing events.
  • At the bottom of DO_TRACING, in TRACING_NEXTOPARG(), we load the next opcode and oparg, which is really the current opcode and oparg, since DO_TRACING doesn't get the INSTRUCTION_START header that advances the instruction pointer. At this point, opcode == BINARY_OP_ADD_INT again.
    • Minor note: this reloads the 8-bit, un-extended oparg. This is okay, since EXTENDED_ARG historically skips doing tracing checks for the next instruction, so we never need to care about extended opargs here.
  • TRACING_NEXTOPARG then deoptimizes the opcode (this is the opcode = _PyOpcode_Deopt[opcode]; move it performs), and opcode == BINARY_OP.
  • Then we do PRE_DISPATCH_GOTO() and DISPATCH_GOTO(). This is equivalent to normal instruction dispatch, except we skip the part where we | the result with cframe.use_tracing (which would put us in an infinite loop where we just keep tracing the current instruction until the trace function disables itself).
  • We begin executing BINARY_OP, the instruction pointer gets advanced like normal, and, if tracing is still active, the whole dance happens again for the next opcode on the next DISPATCH().

So you can really think of tracing as executing two opcode cases per instruction: first DO_TRACING, then whatever the real, un-quickened instruction is.

There are also numerous macros named TRACE_FUNCTION_*() that check for cframe.use_tracing.

Those are for more "predictable" tracing events like call, return, and exception. DO_TRACING is only used for the line and opcode events, since those can happen literally anywhere (or, in the case of opcode, everywhere).

@gvanrossum
Copy link
Member

Thanks! Let’s put that in a file and check it in.

Repository owner moved this from Todo to Done in Release and Deferred blockers 🚫 Sep 9, 2022
gvanrossum pushed a commit that referenced this issue Sep 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.11 only security fixes 3.12 only security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) performance Performance or resource usage release-blocker type-bug An unexpected behavior, bug, or error type-crash A hard crash of the interpreter, possibly with a core dump
Projects
Development

No branches or pull requests

4 participants