-
-
Notifications
You must be signed in to change notification settings - Fork 32.3k
gh-136459: Add perf trampoline support for macOS #136461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
70e8099
c496c67
dcd6928
9e1f940
f663627
a444cd3
7d84315
057388d
8b03dc1
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,34 +2,35 @@ | |
|
||
.. _perf_profiling: | ||
|
||
============================================== | ||
Python support for the Linux ``perf`` profiler | ||
============================================== | ||
======================================================== | ||
Python support for the ``perf map`` compatible profilers | ||
======================================================== | ||
|
||
:author: Pablo Galindo | ||
|
||
`The Linux perf profiler <https://perf.wiki.kernel.org>`_ | ||
is a very powerful tool that allows you to profile and obtain | ||
information about the performance of your application. | ||
``perf`` also has a very vibrant ecosystem of tools | ||
that aid with the analysis of the data that it produces. | ||
`The Linux perf profiler <https://perf.wiki.kernel.org>`_ and | ||
`samply <https://github.com/mstange/samply>`_ are powerful tools that allow you to | ||
profile and obtain information about the performance of your application. | ||
Both tools have vibrant ecosystems that aid with the analysis of the data they produce. | ||
|
||
The main problem with using the ``perf`` profiler with Python applications is that | ||
``perf`` only gets information about native symbols, that is, the names of | ||
The main problem with using these profilers with Python applications is that | ||
they only get information about native symbols, that is, the names of | ||
functions and procedures written in C. This means that the names and file names | ||
of Python functions in your code will not appear in the output of ``perf``. | ||
of Python functions in your code will not appear in the profiler output. | ||
|
||
Since Python 3.12, the interpreter can run in a special mode that allows Python | ||
functions to appear in the output of the ``perf`` profiler. When this mode is | ||
functions to appear in the output of compatible profilers. When this mode is | ||
enabled, the interpreter will interpose a small piece of code compiled on the | ||
fly before the execution of every Python function and it will teach ``perf`` the | ||
fly before the execution of every Python function and it will teach the profiler the | ||
relationship between this piece of code and the associated Python function using | ||
:doc:`perf map files <../c-api/perfmaps>`. | ||
|
||
.. note:: | ||
|
||
Support for the ``perf`` profiler is currently only available for Linux on | ||
select architectures. Check the output of the ``configure`` build step or | ||
Support for profiling is available on Linux and macOS on select architectures. | ||
``perf`` is available on Linux, while ``samply`` can be used on both Linux and macOS. | ||
``samply`` support on macOS is available starting from Python 3.14. | ||
Check the output of the ``configure`` build step or | ||
check the output of ``python -m sysconfig | grep HAVE_PERF_TRAMPOLINE`` | ||
to see if your system is supported. | ||
|
||
|
@@ -148,6 +149,26 @@ Instead, if we run the same experiment with ``perf`` support enabled we get: | |
|
||
|
||
|
||
Using ``samply`` profiler | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We are going to need a bit more here. For example, simply supports both perf modes so we need clarification on when tho use them and what are the recommendations. How to read the flamegraphs etc etc There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would it make sense to break these discussions out into a separate PR? It doesn't seem useful to delay landing trampoline support for this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Is there any rush? This will go into 3.15 anyway and that's going to be released October 2026. We still need to figure out the buildbot situation which will take some time... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am happy to separate this into a different PR, though There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, I was hoping that we could maybe enable it on 3.14. Considering that the code was there since 3.12, and it's mostly putting lots of ifdefs here and there (minus samply and documentation part). I suspect that updating the documentation will take longer. But I'm not familiar with the release process. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
No way unfortunately as we are 3 betas past beta freeze. It's up to the release manager to decide (CC @hugovk) but we have a strict policy for this I am afraid and no new features can be added past beta freeze. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @hugovk Checking just in case although I assume the answer is "no" but would you consider adding this to 3.14 given that this is a new platform and the code is gated by ifdefs? This will allow people on macOS to profile their code using a native profiler, which would be very useful for investigating performance in Python+compiled code. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Some context for this: this would allow people on macOS to profile free threaded Python using samply, so maybe there is a case to allow it in 3.14 but I am still unsure. Up to you @hugovk |
||
------------------------- | ||
|
||
``samply`` is a modern profiler that can be used as an alternative to ``perf``. | ||
It uses the same perf map files that Python generates, making it compatible | ||
with Python's profiling support. ``samply`` is particularly useful on macOS | ||
where ``perf`` is not available. | ||
|
||
To use ``samply`` with Python, first install it following the instructions at | ||
https://github.com/mstange/samply, then run:: | ||
|
||
$ samply record PYTHONPERFSUPPORT=1 python my_script.py | ||
|
||
This will open a web interface where you can analyze the profiling data | ||
interactively. The advantage of ``samply`` is that it provides a modern | ||
web-based interface for analyzing profiling data and works on both Linux | ||
and macOS. | ||
|
||
On macOS, ``samply`` support requires Python 3.14 or later. | ||
|
||
How to enable ``perf`` profiling support | ||
---------------------------------------- | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,244 @@ | ||
import unittest | ||
import subprocess | ||
import sys | ||
import sysconfig | ||
import os | ||
import pathlib | ||
from test import support | ||
from test.support.script_helper import ( | ||
make_script, | ||
) | ||
from test.support.os_helper import temp_dir | ||
|
||
|
||
if not support.has_subprocess_support: | ||
raise unittest.SkipTest("test module requires subprocess") | ||
|
||
if support.check_sanitizer(address=True, memory=True, ub=True, function=True): | ||
# gh-109580: Skip the test because it does crash randomly if Python is | ||
# built with ASAN. | ||
raise unittest.SkipTest("test crash randomly on ASAN/MSAN/UBSAN build") | ||
|
||
|
||
def supports_trampoline_profiling(): | ||
perf_trampoline = sysconfig.get_config_var("PY_HAVE_PERF_TRAMPOLINE") | ||
if not perf_trampoline: | ||
return False | ||
return int(perf_trampoline) == 1 | ||
|
||
|
||
if not supports_trampoline_profiling(): | ||
raise unittest.SkipTest("perf trampoline profiling not supported") | ||
|
||
|
||
def samply_command_works(): | ||
try: | ||
cmd = ["samply", "--help"] | ||
except (subprocess.SubprocessError, OSError): | ||
return False | ||
|
||
# Check that we can run a simple samply run | ||
with temp_dir() as script_dir: | ||
try: | ||
output_file = script_dir + "/profile.json.gz" | ||
cmd = ( | ||
"samply", | ||
"record", | ||
"--save-only", | ||
"--output", | ||
output_file, | ||
sys.executable, | ||
"-c", | ||
'print("hello")', | ||
) | ||
env = {**os.environ, "PYTHON_JIT": "0"} | ||
stdout = subprocess.check_output( | ||
cmd, cwd=script_dir, text=True, stderr=subprocess.STDOUT, env=env | ||
) | ||
except (subprocess.SubprocessError, OSError): | ||
return False | ||
|
||
if "hello" not in stdout: | ||
return False | ||
|
||
return True | ||
|
||
|
||
def run_samply(cwd, *args, **env_vars): | ||
env = os.environ.copy() | ||
if env_vars: | ||
env.update(env_vars) | ||
env["PYTHON_JIT"] = "0" | ||
output_file = cwd + "/profile.json.gz" | ||
base_cmd = ( | ||
"samply", | ||
"record", | ||
"--save-only", | ||
"-o", output_file, | ||
) | ||
proc = subprocess.run( | ||
base_cmd + args, | ||
stdout=subprocess.PIPE, | ||
stderr=subprocess.PIPE, | ||
env=env, | ||
) | ||
if proc.returncode: | ||
print(proc.stderr, file=sys.stderr) | ||
raise ValueError(f"Samply failed with return code {proc.returncode}") | ||
|
||
import gzip | ||
with gzip.open(output_file, mode="rt", encoding="utf-8") as f: | ||
return f.read() | ||
|
||
|
||
@unittest.skipUnless(samply_command_works(), "samply command doesn't work") | ||
class TestSamplyProfilerMixin: | ||
def run_samply(self, script_dir, perf_mode, script): | ||
raise NotImplementedError() | ||
|
||
def test_python_calls_appear_in_the_stack_if_perf_activated(self): | ||
with temp_dir() as script_dir: | ||
code = """if 1: | ||
def foo(n): | ||
x = 0 | ||
for i in range(n): | ||
x += i | ||
|
||
def bar(n): | ||
foo(n) | ||
|
||
def baz(n): | ||
bar(n) | ||
|
||
baz(10000000) | ||
""" | ||
script = make_script(script_dir, "perftest", code) | ||
output = self.run_samply(script_dir, script) | ||
|
||
self.assertIn(f"py::foo:{script}", output) | ||
self.assertIn(f"py::bar:{script}", output) | ||
self.assertIn(f"py::baz:{script}", output) | ||
|
||
def test_python_calls_do_not_appear_in_the_stack_if_perf_deactivated(self): | ||
with temp_dir() as script_dir: | ||
code = """if 1: | ||
def foo(n): | ||
x = 0 | ||
for i in range(n): | ||
x += i | ||
|
||
def bar(n): | ||
foo(n) | ||
|
||
def baz(n): | ||
bar(n) | ||
|
||
baz(10000000) | ||
""" | ||
script = make_script(script_dir, "perftest", code) | ||
output = self.run_samply( | ||
script_dir, script, activate_trampoline=False | ||
) | ||
|
||
self.assertNotIn(f"py::foo:{script}", output) | ||
self.assertNotIn(f"py::bar:{script}", output) | ||
self.assertNotIn(f"py::baz:{script}", output) | ||
|
||
|
||
@unittest.skipUnless(samply_command_works(), "samply command doesn't work") | ||
class TestSamplyProfiler(unittest.TestCase, TestSamplyProfilerMixin): | ||
def run_samply(self, script_dir, script, activate_trampoline=True): | ||
if activate_trampoline: | ||
return run_samply(script_dir, sys.executable, "-Xperf", script) | ||
return run_samply(script_dir, sys.executable, script) | ||
|
||
def setUp(self): | ||
super().setUp() | ||
self.perf_files = set(pathlib.Path("/tmp/").glob("perf-*.map")) | ||
|
||
def tearDown(self) -> None: | ||
super().tearDown() | ||
files_to_delete = ( | ||
set(pathlib.Path("/tmp/").glob("perf-*.map")) - self.perf_files | ||
) | ||
for file in files_to_delete: | ||
file.unlink() | ||
|
||
def test_pre_fork_compile(self): | ||
code = """if 1: | ||
import sys | ||
import os | ||
import sysconfig | ||
from _testinternalcapi import ( | ||
compile_perf_trampoline_entry, | ||
perf_trampoline_set_persist_after_fork, | ||
) | ||
|
||
def foo_fork(): | ||
pass | ||
|
||
def bar_fork(): | ||
foo_fork() | ||
|
||
def foo(): | ||
import time; time.sleep(1) | ||
|
||
def bar(): | ||
foo() | ||
|
||
def compile_trampolines_for_all_functions(): | ||
perf_trampoline_set_persist_after_fork(1) | ||
for _, obj in globals().items(): | ||
if callable(obj) and hasattr(obj, '__code__'): | ||
compile_perf_trampoline_entry(obj.__code__) | ||
|
||
if __name__ == "__main__": | ||
compile_trampolines_for_all_functions() | ||
pid = os.fork() | ||
if pid == 0: | ||
print(os.getpid()) | ||
bar_fork() | ||
else: | ||
bar() | ||
""" | ||
|
||
with temp_dir() as script_dir: | ||
script = make_script(script_dir, "perftest", code) | ||
env = {**os.environ, "PYTHON_JIT": "0"} | ||
with subprocess.Popen( | ||
[sys.executable, "-Xperf", script], | ||
universal_newlines=True, | ||
stderr=subprocess.PIPE, | ||
stdout=subprocess.PIPE, | ||
env=env, | ||
) as process: | ||
stdout, stderr = process.communicate() | ||
|
||
self.assertEqual(process.returncode, 0) | ||
self.assertNotIn("Error:", stderr) | ||
child_pid = int(stdout.strip()) | ||
perf_file = pathlib.Path(f"/tmp/perf-{process.pid}.map") | ||
perf_child_file = pathlib.Path(f"/tmp/perf-{child_pid}.map") | ||
self.assertTrue(perf_file.exists()) | ||
self.assertTrue(perf_child_file.exists()) | ||
|
||
perf_file_contents = perf_file.read_text() | ||
self.assertIn(f"py::foo:{script}", perf_file_contents) | ||
self.assertIn(f"py::bar:{script}", perf_file_contents) | ||
self.assertIn(f"py::foo_fork:{script}", perf_file_contents) | ||
self.assertIn(f"py::bar_fork:{script}", perf_file_contents) | ||
|
||
child_perf_file_contents = perf_child_file.read_text() | ||
self.assertIn(f"py::foo_fork:{script}", child_perf_file_contents) | ||
self.assertIn(f"py::bar_fork:{script}", child_perf_file_contents) | ||
|
||
# Pre-compiled perf-map entries of a forked process must be | ||
# identical in both the parent and child perf-map files. | ||
perf_file_lines = perf_file_contents.split("\n") | ||
for line in perf_file_lines: | ||
if f"py::foo_fork:{script}" in line or f"py::bar_fork:{script}" in line: | ||
self.assertIn(line, child_perf_file_contents) | ||
|
||
|
||
if __name__ == "__main__": | ||
unittest.main() |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
Add support for perf trampoline on macOS, to allow profilers wit JIT map | ||
support to read python calls. While profiling, ``PYTHONPERFSUPPORT=1`` can | ||
be appended to enable the trampoline. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a shameless plug to samply 😄 Disclaimer: I'm not the maintainer of the project, but the maintainer is my colleague. But it doesn't change the fact that it's an awesome profiler! But I can revert it if you prefer not to include :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am happy with the plug, but this docs are going to need much more than this then. If
samply
is the main way to use this on macOS then we will need to update https://docs.python.org/3/howto/perf_profiling.html with full instructions for samply :)