From d18f804c16fac06a4404be04b2191c580a904c60 Mon Sep 17 00:00:00 2001 From: Fidget-Spinner <28750310+Fidget-Spinner@users.noreply.github.com> Date: Tue, 8 Mar 2022 22:51:53 +0800 Subject: [PATCH 01/18] Faster-cpython whatsnew initial draft --- Doc/whatsnew/3.11.rst | 151 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 151 insertions(+) diff --git a/Doc/whatsnew/3.11.rst b/Doc/whatsnew/3.11.rst index 4a64e044c4a167..8e1b1b61977f7b 100644 --- a/Doc/whatsnew/3.11.rst +++ b/Doc/whatsnew/3.11.rst @@ -62,6 +62,7 @@ Summary -- Release highlights .. This section singles out the most important changes in Python 3.11. Brevity is key. +- Python 3.11 is 20% faster than Python 3.10. See `Faster CPython`_ for details. .. PEP-sized items next. @@ -410,6 +411,156 @@ Optimizations (Contributed by Inada Naoki in :issue:`46845`.) +Faster CPython +============== + +Python 3.11 is on average `1.20x faster `_ +than Python 3.10 when measured with the +`pyperformance `_ benchmark suite, on +Ubuntu Linux. + +The project focuses on two major areas in Python: faster startup and faster +runtime. Other optimizations not under this project are listed in `Optimizations`_. + +Faster Startup +-------------- + +Static objects +~~~~~~~~~~~~~~ + +Freezing imports +~~~~~~~~~~~~~~~~ + + +Faster Runtime +-------------- + +Cheaper, lazy Python frames +~~~~~~~~~~~~~~~~~~~~~~~~~~~ +Python frames are created whenever Python calls a Python function. This frame +holds execution information. The following are new frame optimizations: + +- Streamlined the frame creation process. +- Avoided memory allocation by generously re-using frame space on the C stack. +- Streamlined the internal frame object to only contain essential information. + Frames previously held extra debugging and memory management information. + Old-style frames are now created only when required by debuggers. For most + user code, no frames are created at all. + +Nearly all Python functions calls have sped up significantly. +This resulted in a 3-7% improvement in pyperformance. +(Contributed by Mark Shannon in :issue:`44590`.) + +.. _inline-calls: + +Inlined Python function calls +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +During a Python function call, Python will call a C function to interpret that +function's code, similar to how a user would call ``eval`` to run arbitrary +code. + +In 3.11, when Python detects code calling another Python function, +it sets up a new frame, and "jumps" to the new code inside the new frame. This +avoids calling the C interpreting function altogether. + +Python function calls now consume almost no C stack space. This speeds up Python +to Python function calls. In simple recursive functions like fibonacci or +factorial, a 1.7x speedup was observed. This also means recursive functions +can recurse significantly deeper, assuming the recursion limit and memory limit +is not exceeded. This resulted in a 1-3% improvement in pyperformance. +(Contributed by Pablo Galindo and Mark Shannon in :issue:`45256`.) + +PEP 659: Specializing Adaptive Interpreter +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +:pep:`659` is one of the key parts of the faster CPython project. The general +idea is that while Python is a dynamic language, most code have regions where +objects and types rarely change. This concept is known as *type stability*. + +At runtime, Python will try to look for common patterns and type stability +in the executing code. Python will then replace the current operation with a +more specialized one. This specialized operation use fast paths available only +to those use cases/types and will generally outperform their generic +counterparts. This also brings in another concept called *inline caching*, where +Python caches the results of expensive operations directly in the bytecode. + +Since this exta information requires more memory, Python will only specialize +when it sees code that is "hot" (executed multiple times). It can also +de-specialize when code is too dynamic. This is attempted every so often, and +specialization attempts are not too expensive. + +(PEP written by Mark Shannon, with ideas inspired by Stefan Brunthaler. +See :pep:`659` for more information.) + +.. + If I missed out anyone, please add them. + ++---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ +| Operation | Form | Specialization | Operation speedup | Contributor(s) | +| | | | (up to) | | ++===============+====================+=======================================================+===================+===================+ +| Binary | ``o+o; o*o; o-o;`` | Binary add, multiply and subtract for common types | 10% | Mark Shannon, | +| operations | | such as ``int``, ``float``, and ``str`` take custom | | Dong-hee Na, | +| | | fast paths for their underlying types. | | Brandt Bucher, | +| | | | | Dennis Sweeney | ++---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ +| Subscript | ``o[i]`` | Subscripting container types such as ``list``, | 10-30% | Irit Katriel, | +| | | ``tuple`` and ``dict``directly index the underlying | | Mark Shannon | +| | | data structures. Subscripting custom ``__getitem__`` | | | +| | | is also inlined similar to :ref:`inline-calls`. | | | ++---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ +| Store | ``o[i] = z`` | Similar to subscripting specialization above. | ? | | +| subscript | | | | | ++---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ +| Calls | ``f(arg)`` | Calls to common builtin (C) functions such as ``len`` | 20% | Mark Shannon, | +| | | and ``isinstance`` directly call their underlying C | | Ken Jin | +| | ``T(arg)`` | version. This avoids going through the internal | 170% | | +| | | calling convention. | | | +| | | | ? | | +| | | Calls to certain Python functions are inlined similar | | | +| | | to :ref:`inline-calls`. | | | +| | | | | | +| | | ``__init__`` is also inlined when creating | | | +| | | Python classes. | | | ++---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ +| Load | ``print; len`` | The object's index in the globals/builtins namespace | 0% [1]_ | Mark Shannon | +| global | | is cached. Loading globals and builtins requires | | | +| variable | | no namespace lookups. | | | ++---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ +| Load | ``o.attr`` | Similar to loading global variables. The attribute's | ? | Mark Shannon | +| attribute | | index inside the class/object's namespace is cached. | | | +| | | In most cases, attribute loading will require zero | | | +| | | namespace lookups. | | | ++---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ +| Load | ``o.meth()`` | The actual address of the method is cached. Method | 10-20% [2]_ | Ken Jin | +| methods for | | loading now has no namespace lookups -- even for | | | +| call | | classes with long inheritance chains. | | | ++---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ +| Store | ``o.attr = z`` | Similar to load attribute optimization. | ? | Mark Shannon | +| attribute | | | | | ++---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ +| Unpack | ``*seq`` | Specialized for common containers such as ``list`` | ? | ? | +| Sequence | | and ``tuple``. Avoids internal calling convention. | | | ++---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ + +.. [1] Note that a similar optimization already existed since Python 3.8. + +.. [2] Classes with longer inheritance chains will see greater speedups. + This optimization effectively makes method lookup constant time + regardless of inheritance. + +About +----- + +Faster CPython explores optimizations for :term:`CPython`. The main team is +funded by Microsoft to work on this full-time. The team also collaborates +extensively with volunteer contributors in the community. The following list of +people involved is non-exhaustive: + +* Faster CPython team: Guido van Rossum, Mark Shannon, Eric Snow, Brandt Bucher +* External collaborators: Irit Katriel, Pablo Galindo +* Additional contributors: Dennis Sweeney, Dong-hee Na, Ken Jin, Jelle Zijlstra, Kumar Aditya +* There are too many people contributing ideas to list! + CPython bytecode changes ======================== From 77701feba07ed673948bdfe8c64d8c5a44444540 Mon Sep 17 00:00:00 2001 From: Fidget-Spinner <28750310+Fidget-Spinner@users.noreply.github.com> Date: Fri, 25 Mar 2022 17:34:26 +0800 Subject: [PATCH 02/18] Add section on static code objects/freezing imports Co-Authored-By: Kumar Aditya <59607654+kumaraditya303@users.noreply.github.com> --- Doc/tutorial/modules.rst | 2 ++ Doc/whatsnew/3.11.rst | 26 ++++++++++++++++++++++---- 2 files changed, 24 insertions(+), 4 deletions(-) diff --git a/Doc/tutorial/modules.rst b/Doc/tutorial/modules.rst index f1d4957e37eb11..342a66be728aea 100644 --- a/Doc/tutorial/modules.rst +++ b/Doc/tutorial/modules.rst @@ -209,6 +209,8 @@ directory. This is an error unless the replacement is intended. See section .. % Do we need stuff on zip files etc. ? DUBOIS +.. _tut-pycache: + "Compiled" Python files ----------------------- diff --git a/Doc/whatsnew/3.11.rst b/Doc/whatsnew/3.11.rst index 8e1b1b61977f7b..72c16a054371fd 100644 --- a/Doc/whatsnew/3.11.rst +++ b/Doc/whatsnew/3.11.rst @@ -425,11 +425,29 @@ runtime. Other optimizations not under this project are listed in `Optimizations Faster Startup -------------- -Static objects -~~~~~~~~~~~~~~ +Frozen imports / Static code objects +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Freezing imports -~~~~~~~~~~~~~~~~ +Python caches bytecode in the :ref:`__pycache__` directory to +speed up module loading. + +Previously in 3.10, Python module execution looked like this: + +.. code-block:: text + + Read __pycache__ -> Unmarshal -> Heap allocated code object -> Evaluate + +In Python 3.11, the core modules essential for Python startup are "frozen". +This means that their code objects (and bytecode) are statically allocated +by the interpreter. This reduces the steps in module execution process to this: + +.. code-block:: text + + Statically allocated code object -> Evaluate + +Interpreter startup is now 10-15% faster in Python 3.11. + +(Contributed by Eric Snow, Guido van Rossum and Kumar Aditya in numerous issues.) Faster Runtime From 2d4171f0a09d77939f730829fc77c594a48a423a Mon Sep 17 00:00:00 2001 From: Fidget-Spinner <28750310+Fidget-Spinner@users.noreply.github.com> Date: Fri, 25 Mar 2022 17:41:35 +0800 Subject: [PATCH 03/18] clarify where the benefits are --- Doc/whatsnew/3.11.rst | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/Doc/whatsnew/3.11.rst b/Doc/whatsnew/3.11.rst index 72c16a054371fd..fe37d3bde350a1 100644 --- a/Doc/whatsnew/3.11.rst +++ b/Doc/whatsnew/3.11.rst @@ -445,7 +445,9 @@ by the interpreter. This reduces the steps in module execution process to this: Statically allocated code object -> Evaluate -Interpreter startup is now 10-15% faster in Python 3.11. +Interpreter startup is now 10-15% faster in Python 3.11. This positively +impacts short-running programs using Python. Such as ``python -m venv ...``, or +``python -m pip ...```. (Contributed by Eric Snow, Guido van Rossum and Kumar Aditya in numerous issues.) From 2e7647f06234d0e7f2dbd1e64e383c3434451c3b Mon Sep 17 00:00:00 2001 From: Fidget-Spinner <28750310+Fidget-Spinner@users.noreply.github.com> Date: Sat, 26 Mar 2022 15:28:51 +0800 Subject: [PATCH 04/18] Add an FAQ section --- Doc/whatsnew/3.11.rst | 38 ++++++++++++++++++++++++++++++++++---- 1 file changed, 34 insertions(+), 4 deletions(-) diff --git a/Doc/whatsnew/3.11.rst b/Doc/whatsnew/3.11.rst index fe37d3bde350a1..6dbff803f4d364 100644 --- a/Doc/whatsnew/3.11.rst +++ b/Doc/whatsnew/3.11.rst @@ -414,8 +414,8 @@ Optimizations Faster CPython ============== -Python 3.11 is on average `1.20x faster `_ -than Python 3.10 when measured with the +CPython 3.11 is on average `1.20x faster `_ +than CPython 3.10 when measured with the `pyperformance `_ benchmark suite, on Ubuntu Linux. @@ -551,8 +551,8 @@ See :pep:`659` for more information.) | | | In most cases, attribute loading will require zero | | | | | | namespace lookups. | | | +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ -| Load | ``o.meth()`` | The actual address of the method is cached. Method | 10-20% [2]_ | Ken Jin | -| methods for | | loading now has no namespace lookups -- even for | | | +| Load | ``o.meth()`` | The actual address of the method is cached. Method | 10-20% [2]_ | Ken Jin, | +| methods for | | loading now has no namespace lookups -- even for | | Mark Shannon | | call | | classes with long inheritance chains. | | | +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ | Store | ``o.attr = z`` | Similar to load attribute optimization. | ? | Mark Shannon | @@ -568,6 +568,36 @@ See :pep:`659` for more information.) This optimization effectively makes method lookup constant time regardless of inheritance. +FAQ +--- + +| Q: How should I write my code to utilize these speedups? +| +| A: You don't have to change your code. Write Pythonic code that follows common + best practices. The Faster CPython project optimizes for common code + patterns we observe. +| +| +| Q: Will CPython 3.11 use more memory? +| +| A: Yes. However, how much exactly depends on how much code is "hot". We don't + expect memory use to exceed 20% more versus 3.10. This may be offset by + memory optimizations for frame objects and object dictionaries as mentioned + above. +| +| +| Q: I don't see any speedups in my workload. Why? +| +| A: Certain code won't have noticeable benefits. If your code spends most of + its time on IO(Input/Output) operations, or already does most of its + computation in a C extension library like numpy, there won't be significant + speedup. This project currently benefits pure-Python workloads the most. +| +| +| Q: Is there a JIT compiler? +| +| A: No. We're still exploring other optimizations. + About ----- From 07646a6e50efd3a1c44ba8775a783065cf192125 Mon Sep 17 00:00:00 2001 From: Fidget-Spinner <28750310+Fidget-Spinner@users.noreply.github.com> Date: Fri, 1 Apr 2022 23:51:06 +0800 Subject: [PATCH 05/18] Update FAQ, add performance figures and footnotes --- Doc/whatsnew/3.11.rst | 29 ++++++++++++++++++----------- 1 file changed, 18 insertions(+), 11 deletions(-) diff --git a/Doc/whatsnew/3.11.rst b/Doc/whatsnew/3.11.rst index 82d391a0b5cd68..489e42f1038ab2 100644 --- a/Doc/whatsnew/3.11.rst +++ b/Doc/whatsnew/3.11.rst @@ -62,7 +62,7 @@ Summary -- Release highlights .. This section singles out the most important changes in Python 3.11. Brevity is key. -- Python 3.11 is 20% faster than Python 3.10. See `Faster CPython`_ for details. +- Python 3.11 is 22% faster than Python 3.10. See `Faster CPython`_ for details. .. PEP-sized items next. @@ -484,7 +484,7 @@ Optimizations Faster CPython ============== -CPython 3.11 is on average `1.20x faster `_ +CPython 3.11 is on average `1.22x faster `_ than CPython 3.10 when measured with the `pyperformance `_ benchmark suite, on Ubuntu Linux. @@ -573,6 +573,9 @@ to those use cases/types and will generally outperform their generic counterparts. This also brings in another concept called *inline caching*, where Python caches the results of expensive operations directly in the bytecode. +The specializer will also combine certain common instruction pairs into one +super instruction. This reduces the overhead during execution. + Since this exta information requires more memory, Python will only specialize when it sees code that is "hot" (executed multiple times). It can also de-specialize when code is too dynamic. This is attempted every so often, and @@ -605,36 +608,36 @@ See :pep:`659` for more information.) | | | and ``isinstance`` directly call their underlying C | | Ken Jin | | | ``T(arg)`` | version. This avoids going through the internal | 170% | | | | | calling convention. | | | -| | | | ? | | +| | | | | | | | | Calls to certain Python functions are inlined similar | | | | | | to :ref:`inline-calls`. | | | | | | | | | -| | | ``__init__`` is also inlined when creating | | | -| | | Python classes. | | | +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ | Load | ``print; len`` | The object's index in the globals/builtins namespace | 0% [1]_ | Mark Shannon | | global | | is cached. Loading globals and builtins requires | | | | variable | | no namespace lookups. | | | +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ -| Load | ``o.attr`` | Similar to loading global variables. The attribute's | ? | Mark Shannon | +| Load | ``o.attr`` | Similar to loading global variables. The attribute's | 0% [2]_ | Mark Shannon | | attribute | | index inside the class/object's namespace is cached. | | | | | | In most cases, attribute loading will require zero | | | | | | namespace lookups. | | | +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ -| Load | ``o.meth()`` | The actual address of the method is cached. Method | 10-20% [2]_ | Ken Jin, | +| Load | ``o.meth()`` | The actual address of the method is cached. Method | 10-20% [3]_ | Ken Jin, | | methods for | | loading now has no namespace lookups -- even for | | Mark Shannon | | call | | classes with long inheritance chains. | | | +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ -| Store | ``o.attr = z`` | Similar to load attribute optimization. | ? | Mark Shannon | -| attribute | | | | | +| Store | ``o.attr = z`` | Similar to load attribute optimization. | 2% | Mark Shannon | +| attribute | | | in pyperformance | | +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ -| Unpack | ``*seq`` | Specialized for common containers such as ``list`` | ? | ? | +| Unpack | ``*seq`` | Specialized for common containers such as ``list`` | 8% | Brandt Bucher | | Sequence | | and ``tuple``. Avoids internal calling convention. | | | +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ .. [1] Note that a similar optimization already existed since Python 3.8. -.. [2] Classes with longer inheritance chains will see greater speedups. +.. [2] Note that a similar optimization has already existed since Python 3.10. + +.. [3] Classes with longer inheritance chains will see greater speedups. This optimization effectively makes method lookup constant time regardless of inheritance. @@ -663,6 +666,10 @@ FAQ computation in a C extension library like numpy, there won't be significant speedup. This project currently benefits pure-Python workloads the most. | +| Furthermore, the pyperformance figures are a geometric mean. Even within the + pyperformance benchmarks, certain benchmarks have slowed down slightly, while + others have sped up by nearly 1.9x! +| | | Q: Is there a JIT compiler? | From 8063a44f9ce90e6d3a97bafac1d0e50c317b05ca Mon Sep 17 00:00:00 2001 From: Fidget-Spinner <28750310+Fidget-Spinner@users.noreply.github.com> Date: Fri, 1 Apr 2022 23:56:21 +0800 Subject: [PATCH 06/18] Add NEWS --- .../next/Documentation/2022-04-01-23-56-13.bpo-47189.Nss0Y3.rst | 2 ++ 1 file changed, 2 insertions(+) create mode 100644 Misc/NEWS.d/next/Documentation/2022-04-01-23-56-13.bpo-47189.Nss0Y3.rst diff --git a/Misc/NEWS.d/next/Documentation/2022-04-01-23-56-13.bpo-47189.Nss0Y3.rst b/Misc/NEWS.d/next/Documentation/2022-04-01-23-56-13.bpo-47189.Nss0Y3.rst new file mode 100644 index 00000000000000..8c400841ca1d14 --- /dev/null +++ b/Misc/NEWS.d/next/Documentation/2022-04-01-23-56-13.bpo-47189.Nss0Y3.rst @@ -0,0 +1,2 @@ +Add a What's New in Python 3.11 entry for the Faster CPython project. +Documentation by Ken Jin and Kumar Aditya. From 33b04860e80f1035dd7734b7e9a189075971472b Mon Sep 17 00:00:00 2001 From: Fidget-Spinner <28750310+Fidget-Spinner@users.noreply.github.com> Date: Sat, 2 Apr 2022 00:01:28 +0800 Subject: [PATCH 07/18] Fix doc errors --- Doc/whatsnew/3.11.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/Doc/whatsnew/3.11.rst b/Doc/whatsnew/3.11.rst index 16068bd7ceff53..ce13af1331d4a5 100644 --- a/Doc/whatsnew/3.11.rst +++ b/Doc/whatsnew/3.11.rst @@ -530,7 +530,7 @@ by the interpreter. This reduces the steps in module execution process to this: Interpreter startup is now 10-15% faster in Python 3.11. This positively impacts short-running programs using Python. Such as ``python -m venv ...``, or -``python -m pip ...```. +``python -m pip ...``. (Contributed by Eric Snow, Guido van Rossum and Kumar Aditya in numerous issues.) @@ -610,7 +610,7 @@ See :pep:`659` for more information.) | | | | | Dennis Sweeney | +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ | Subscript | ``o[i]`` | Subscripting container types such as ``list``, | 10-30% | Irit Katriel, | -| | | ``tuple`` and ``dict``directly index the underlying | | Mark Shannon | +| | | ``tuple`` and ``dict`` directly index the underlying | | Mark Shannon | | | | data structures. Subscripting custom ``__getitem__`` | | | | | | is also inlined similar to :ref:`inline-calls`. | | | +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ From 3cb89aaece851f9b4655f0cab76334512df83411 Mon Sep 17 00:00:00 2001 From: Fidget-Spinner <28750310+Fidget-Spinner@users.noreply.github.com> Date: Sat, 2 Apr 2022 00:26:56 +0800 Subject: [PATCH 08/18] Add entry for namespace dictionar, make wording consistent --- Doc/whatsnew/3.11.rst | 13 +++++++++++-- 1 file changed, 11 insertions(+), 2 deletions(-) diff --git a/Doc/whatsnew/3.11.rst b/Doc/whatsnew/3.11.rst index ce13af1331d4a5..f58b838b291891 100644 --- a/Doc/whatsnew/3.11.rst +++ b/Doc/whatsnew/3.11.rst @@ -627,8 +627,8 @@ See :pep:`659` for more information.) | | | | | | +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ | Load | ``print; len`` | The object's index in the globals/builtins namespace | 0% [1]_ | Mark Shannon | -| global | | is cached. Loading globals and builtins requires | | | -| variable | | no namespace lookups. | | | +| global | | is cached. Loading globals and builtins require | | | +| variable | | zero namespace lookups. | | | +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ | Load | ``o.attr`` | Similar to loading global variables. The attribute's | 0% [2]_ | Mark Shannon | | attribute | | index inside the class/object's namespace is cached. | | | @@ -654,6 +654,15 @@ See :pep:`659` for more information.) This optimization effectively makes method lookup constant time regardless of inheritance. + +Misc +---- + +* Objects now require less memory due to lazily created object namespaces. Their + namespace dictionaries now also share keys more freely. + (Contributed Mark Shannon in :issue:`45340` and :issue:`40116`.) + + FAQ --- From 121b24d32ecadae877f302d079390bebd651bf2f Mon Sep 17 00:00:00 2001 From: Fidget-Spinner <28750310+Fidget-Spinner@users.noreply.github.com> Date: Sat, 2 Apr 2022 12:11:36 +0800 Subject: [PATCH 09/18] Apply suggestions by Mark and Jelle Co-Authored-By: Jelle Zijlstra --- Doc/whatsnew/3.11.rst | 48 ++++++++++++++++++++++--------------------- 1 file changed, 25 insertions(+), 23 deletions(-) diff --git a/Doc/whatsnew/3.11.rst b/Doc/whatsnew/3.11.rst index f58b838b291891..3fadf97318b2b1 100644 --- a/Doc/whatsnew/3.11.rst +++ b/Doc/whatsnew/3.11.rst @@ -62,7 +62,8 @@ Summary -- Release highlights .. This section singles out the most important changes in Python 3.11. Brevity is key. -- Python 3.11 is 22% faster than Python 3.10. See `Faster CPython`_ for details. +- Python 3.11 is 10-60% faster than Python 3.10. On average, we measured a 1.22x + speedup on the standard benchmark suite. See `Faster CPython`_ for details. .. PEP-sized items next. @@ -500,7 +501,7 @@ Faster CPython CPython 3.11 is on average `1.22x faster `_ than CPython 3.10 when measured with the `pyperformance `_ benchmark suite, on -Ubuntu Linux. +Ubuntu Linux. Depending on your workload, the speedup could be 10-60% faster. The project focuses on two major areas in Python: faster startup and faster runtime. Other optimizations not under this project are listed in `Optimizations`_. @@ -576,23 +577,25 @@ is not exceeded. This resulted in a 1-3% improvement in pyperformance. PEP 659: Specializing Adaptive Interpreter ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ :pep:`659` is one of the key parts of the faster CPython project. The general -idea is that while Python is a dynamic language, most code have regions where +idea is that while Python is a dynamic language, most code has regions where objects and types rarely change. This concept is known as *type stability*. At runtime, Python will try to look for common patterns and type stability in the executing code. Python will then replace the current operation with a -more specialized one. This specialized operation use fast paths available only -to those use cases/types and will generally outperform their generic +more specialized one. This specialized operation uses fast paths available only +to those use cases/types, which generally outperform their generic counterparts. This also brings in another concept called *inline caching*, where Python caches the results of expensive operations directly in the bytecode. The specializer will also combine certain common instruction pairs into one -super instruction. This reduces the overhead during execution. +superinstruction. This reduces the overhead during execution. -Since this exta information requires more memory, Python will only specialize -when it sees code that is "hot" (executed multiple times). It can also -de-specialize when code is too dynamic. This is attempted every so often, and -specialization attempts are not too expensive. +This extra information requires more memory. Python will only specialize +when it sees code that is "hot" (executed multiple times). This prevents Python +from wasting time for run-once code. Python can also de-specialize when code is +too dynamic or when the use changes. Specialization is attempted periodically, +and specialization attempts are not too expensive. This allows specialization +to adapt to new circumstances. (PEP written by Mark Shannon, with ideas inspired by Stefan Brunthaler. See :pep:`659` for more information.) @@ -626,11 +629,11 @@ See :pep:`659` for more information.) | | | to :ref:`inline-calls`. | | | | | | | | | +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ -| Load | ``print; len`` | The object's index in the globals/builtins namespace | 0% [1]_ | Mark Shannon | +| Load | ``print; len`` | The object's index in the globals/builtins namespace | - [1]_ | Mark Shannon | | global | | is cached. Loading globals and builtins require | | | | variable | | zero namespace lookups. | | | +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ -| Load | ``o.attr`` | Similar to loading global variables. The attribute's | 0% [2]_ | Mark Shannon | +| Load | ``o.attr`` | Similar to loading global variables. The attribute's | - [2]_ | Mark Shannon | | attribute | | index inside the class/object's namespace is cached. | | | | | | In most cases, attribute loading will require zero | | | | | | namespace lookups. | | | @@ -646,9 +649,12 @@ See :pep:`659` for more information.) | Sequence | | and ``tuple``. Avoids internal calling convention. | | | +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ -.. [1] Note that a similar optimization already existed since Python 3.8. +.. [1] Note that a similar optimization already existed since Python 3.8. 3.11 + specializes for more forms and reduces some overhead. .. [2] Note that a similar optimization has already existed since Python 3.10. + 3.11 speicalizes for more forms. Furthermore, all attribute loads should + be sped up by :issue:`45947`. .. [3] Classes with longer inheritance chains will see greater speedups. This optimization effectively makes method lookup constant time @@ -675,8 +681,8 @@ FAQ | | Q: Will CPython 3.11 use more memory? | -| A: Yes. However, how much exactly depends on how much code is "hot". We don't - expect memory use to exceed 20% more versus 3.10. This may be offset by +| A: Maybe not. However, how much exactly depends on how much code is "hot". We + don't expect memory use to exceed 20% more versus 3.10. This is offset by memory optimizations for frame objects and object dictionaries as mentioned above. | @@ -684,31 +690,27 @@ FAQ | Q: I don't see any speedups in my workload. Why? | | A: Certain code won't have noticeable benefits. If your code spends most of - its time on IO(Input/Output) operations, or already does most of its + its time on I/O operations, or already does most of its computation in a C extension library like numpy, there won't be significant speedup. This project currently benefits pure-Python workloads the most. | | Furthermore, the pyperformance figures are a geometric mean. Even within the pyperformance benchmarks, certain benchmarks have slowed down slightly, while - others have sped up by nearly 1.9x! + others have sped up by nearly 2x! | | | Q: Is there a JIT compiler? | | A: No. We're still exploring other optimizations. + About ----- Faster CPython explores optimizations for :term:`CPython`. The main team is funded by Microsoft to work on this full-time. The team also collaborates -extensively with volunteer contributors in the community. The following list of -people involved is non-exhaustive: +extensively with volunteer contributors in the community. -* Faster CPython team: Guido van Rossum, Mark Shannon, Eric Snow, Brandt Bucher -* External collaborators: Irit Katriel, Pablo Galindo -* Additional contributors: Dennis Sweeney, Dong-hee Na, Ken Jin, Jelle Zijlstra, Kumar Aditya -* There are too many people contributing ideas to list! CPython bytecode changes ======================== From 48dd46d714c88bf4e430e58283a3958aebf0b72f Mon Sep 17 00:00:00 2001 From: Fidget-Spinner <28750310+Fidget-Spinner@users.noreply.github.com> Date: Sat, 2 Apr 2022 12:14:28 +0800 Subject: [PATCH 10/18] Credit Dennis Sweeney for STORE_SUBSCR --- Doc/whatsnew/3.11.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Doc/whatsnew/3.11.rst b/Doc/whatsnew/3.11.rst index 3fadf97318b2b1..9892a681f17e95 100644 --- a/Doc/whatsnew/3.11.rst +++ b/Doc/whatsnew/3.11.rst @@ -617,7 +617,7 @@ See :pep:`659` for more information.) | | | data structures. Subscripting custom ``__getitem__`` | | | | | | is also inlined similar to :ref:`inline-calls`. | | | +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ -| Store | ``o[i] = z`` | Similar to subscripting specialization above. | ? | | +| Store | ``o[i] = z`` | Similar to subscripting specialization above. | 10-90% | Dennis Sweeney | | subscript | | | | | +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ | Calls | ``f(arg)`` | Calls to common builtin (C) functions such as ``len`` | 20% | Mark Shannon, | From ff34d6ffd19dc8e208c8983661bbb35a31b45d8d Mon Sep 17 00:00:00 2001 From: Fidget-Spinner <28750310+Fidget-Spinner@users.noreply.github.com> Date: Sat, 2 Apr 2022 19:09:32 +0800 Subject: [PATCH 11/18] remove misleading sentence about memory usage and "hot" code --- Doc/whatsnew/3.11.rst | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/Doc/whatsnew/3.11.rst b/Doc/whatsnew/3.11.rst index 9892a681f17e95..33c52620e2a97a 100644 --- a/Doc/whatsnew/3.11.rst +++ b/Doc/whatsnew/3.11.rst @@ -681,10 +681,9 @@ FAQ | | Q: Will CPython 3.11 use more memory? | -| A: Maybe not. However, how much exactly depends on how much code is "hot". We - don't expect memory use to exceed 20% more versus 3.10. This is offset by - memory optimizations for frame objects and object dictionaries as mentioned - above. +| A: Maybe not. We don't expect memory use to exceed 20% more versus 3.10. + This is offset by memory optimizations for frame objects and object + dictionaries as mentioned above. | | | Q: I don't see any speedups in my workload. Why? From acef908aba3dcc8983efd1fefac1170cbc76363a Mon Sep 17 00:00:00 2001 From: Fidget-Spinner <28750310+Fidget-Spinner@users.noreply.github.com> Date: Sun, 3 Apr 2022 00:22:32 +0800 Subject: [PATCH 12/18] Tighten wording, specify GCC --- Doc/whatsnew/3.11.rst | 27 +++++++++++++++------------ 1 file changed, 15 insertions(+), 12 deletions(-) diff --git a/Doc/whatsnew/3.11.rst b/Doc/whatsnew/3.11.rst index 33c52620e2a97a..a98c1a37ee2f39 100644 --- a/Doc/whatsnew/3.11.rst +++ b/Doc/whatsnew/3.11.rst @@ -500,10 +500,11 @@ Faster CPython CPython 3.11 is on average `1.22x faster `_ than CPython 3.10 when measured with the -`pyperformance `_ benchmark suite, on -Ubuntu Linux. Depending on your workload, the speedup could be 10-60% faster. +`pyperformance `_ benchmark suite, +and compiled with GCC on Ubuntu Linux. Depending on your workload, the speedup +could be 10-60% faster. -The project focuses on two major areas in Python: faster startup and faster +This project focuses on two major areas in Python: faster startup and faster runtime. Other optimizations not under this project are listed in `Optimizations`_. Faster Startup @@ -546,13 +547,14 @@ holds execution information. The following are new frame optimizations: - Streamlined the frame creation process. - Avoided memory allocation by generously re-using frame space on the C stack. -- Streamlined the internal frame object to only contain essential information. +- Streamlined the internal frame object to contain only essential information. Frames previously held extra debugging and memory management information. - Old-style frames are now created only when required by debuggers. For most - user code, no frames are created at all. -Nearly all Python functions calls have sped up significantly. -This resulted in a 3-7% improvement in pyperformance. +Old-style frames are now created only when required by debuggers. For most +user code, no frames are created at all. As a result, nearly all Python +functions calls have sped up significantly. We measured a 3-7% speedup +in pyperformance. + (Contributed by Mark Shannon in :issue:`44590`.) .. _inline-calls: @@ -571,7 +573,8 @@ Python function calls now consume almost no C stack space. This speeds up Python to Python function calls. In simple recursive functions like fibonacci or factorial, a 1.7x speedup was observed. This also means recursive functions can recurse significantly deeper, assuming the recursion limit and memory limit -is not exceeded. This resulted in a 1-3% improvement in pyperformance. +is not exceeded. We measured a 1-3% improvement in pyperformance. + (Contributed by Pablo Galindo and Mark Shannon in :issue:`45256`.) PEP 659: Specializing Adaptive Interpreter @@ -649,11 +652,11 @@ See :pep:`659` for more information.) | Sequence | | and ``tuple``. Avoids internal calling convention. | | | +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ -.. [1] Note that a similar optimization already existed since Python 3.8. 3.11 +.. [1] A similar optimization already existed since Python 3.8. 3.11 specializes for more forms and reduces some overhead. -.. [2] Note that a similar optimization has already existed since Python 3.10. - 3.11 speicalizes for more forms. Furthermore, all attribute loads should +.. [2] A similar optimization already existed since Python 3.10. + 3.11 specializes for more forms. Furthermore, all attribute loads should be sped up by :issue:`45947`. .. [3] Classes with longer inheritance chains will see greater speedups. From bc84c89138e555e89f8742392c1812e155fa9dc4 Mon Sep 17 00:00:00 2001 From: Fidget-Spinner <28750310+Fidget-Spinner@users.noreply.github.com> Date: Sun, 3 Apr 2022 14:44:03 +0800 Subject: [PATCH 13/18] Apply suggestions by Alex Co-Authored-By: Alex Waygood --- Doc/whatsnew/3.11.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/Doc/whatsnew/3.11.rst b/Doc/whatsnew/3.11.rst index a98c1a37ee2f39..31107f97e657f1 100644 --- a/Doc/whatsnew/3.11.rst +++ b/Doc/whatsnew/3.11.rst @@ -530,9 +530,9 @@ by the interpreter. This reduces the steps in module execution process to this: Statically allocated code object -> Evaluate -Interpreter startup is now 10-15% faster in Python 3.11. This positively -impacts short-running programs using Python. Such as ``python -m venv ...``, or -``python -m pip ...``. +Interpreter startup is now 10-15% faster in Python 3.11. This has a big +impact for short-running programs using Python such as ``python -m venv ...``, +or ``python -m pip ...``. (Contributed by Eric Snow, Guido van Rossum and Kumar Aditya in numerous issues.) From 1f1e565aea1d09efd659c5d4c8714866c4a0812f Mon Sep 17 00:00:00 2001 From: Fidget-Spinner <28750310+Fidget-Spinner@users.noreply.github.com> Date: Sun, 3 Apr 2022 14:46:27 +0800 Subject: [PATCH 14/18] Remove outdated whatsnew section --- Doc/whatsnew/3.11.rst | 7 ------- 1 file changed, 7 deletions(-) diff --git a/Doc/whatsnew/3.11.rst b/Doc/whatsnew/3.11.rst index 31107f97e657f1..9844b317805403 100644 --- a/Doc/whatsnew/3.11.rst +++ b/Doc/whatsnew/3.11.rst @@ -474,13 +474,6 @@ Optimizations almost eliminated when no exception is raised. (Contributed by Mark Shannon in :issue:`40222`.) -* Method calls with keywords are now faster due to bytecode - changes which avoid creating bound method instances. Previously, this - optimization was applied only to method calls with purely positional - arguments. - (Contributed by Ken Jin and Mark Shannon in :issue:`26110`, based on ideas - implemented in PyPy.) - * Pure ASCII strings are now normalized in constant time by :func:`unicodedata.normalize`. (Contributed by Dong-hee Na in :issue:`44987`.) From 8b182ecf4fed520e5276e979e808294de6cf0be3 Mon Sep 17 00:00:00 2001 From: Fidget-Spinner <28750310+Fidget-Spinner@users.noreply.github.com> Date: Tue, 5 Apr 2022 18:54:05 +0800 Subject: [PATCH 15/18] Address Guido's and Irit's reviews Co-Authored-By: Guido van Rossum Co-Authored-By: Irit Katriel <1055913+iritkatriel@users.noreply.github.com> --- Doc/whatsnew/3.11.rst | 62 +++++++++++++++++++++---------------------- 1 file changed, 31 insertions(+), 31 deletions(-) diff --git a/Doc/whatsnew/3.11.rst b/Doc/whatsnew/3.11.rst index 9844b317805403..e263941135bb1f 100644 --- a/Doc/whatsnew/3.11.rst +++ b/Doc/whatsnew/3.11.rst @@ -62,8 +62,8 @@ Summary -- Release highlights .. This section singles out the most important changes in Python 3.11. Brevity is key. -- Python 3.11 is 10-60% faster than Python 3.10. On average, we measured a 1.22x - speedup on the standard benchmark suite. See `Faster CPython`_ for details. +- Python 3.11 is up to 10-60% faster than Python 3.10. On average, we measured a + 1.22x speedup on the standard benchmark suite. See `Faster CPython`_ for details. .. PEP-sized items next. @@ -495,7 +495,7 @@ CPython 3.11 is on average `1.22x faster `_ benchmark suite, and compiled with GCC on Ubuntu Linux. Depending on your workload, the speedup -could be 10-60% faster. +could be up to 10-60% faster. This project focuses on two major areas in Python: faster startup and faster runtime. Other optimizations not under this project are listed in `Optimizations`_. @@ -524,8 +524,7 @@ by the interpreter. This reduces the steps in module execution process to this: Statically allocated code object -> Evaluate Interpreter startup is now 10-15% faster in Python 3.11. This has a big -impact for short-running programs using Python such as ``python -m venv ...``, -or ``python -m pip ...``. +impact for short-running programs using Python. (Contributed by Eric Snow, Guido van Rossum and Kumar Aditya in numerous issues.) @@ -540,12 +539,12 @@ holds execution information. The following are new frame optimizations: - Streamlined the frame creation process. - Avoided memory allocation by generously re-using frame space on the C stack. -- Streamlined the internal frame object to contain only essential information. +- Streamlined the internal frame struct to contain only essential information. Frames previously held extra debugging and memory management information. -Old-style frames are now created only when required by debuggers. For most -user code, no frames are created at all. As a result, nearly all Python -functions calls have sped up significantly. We measured a 3-7% speedup +Old-style frame objects are now created only when required by debuggers. For +most user code, no frame objects are created at all. As a result, nearly all +Python functions calls have sped up significantly. We measured a 3-7% speedup in pyperformance. (Contributed by Mark Shannon in :issue:`44590`.) @@ -554,19 +553,19 @@ in pyperformance. Inlined Python function calls ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -During a Python function call, Python will call a C function to interpret that -function's code, similar to how a user would call ``eval`` to run arbitrary -code. +During a Python function call, Python will call an evaluating C function to +interpret that function's code. This effectively limits pure Python recursion to +what's safe for the C stack -In 3.11, when Python detects code calling another Python function, +In 3.11, when Python detects Python code calling another Python function, it sets up a new frame, and "jumps" to the new code inside the new frame. This avoids calling the C interpreting function altogether. -Python function calls now consume almost no C stack space. This speeds up Python -to Python function calls. In simple recursive functions like fibonacci or +Python function calls now consume almost no C stack space. This speeds up +most of such calls. In simple recursive functions like fibonacci or factorial, a 1.7x speedup was observed. This also means recursive functions -can recurse significantly deeper, assuming the recursion limit and memory limit -is not exceeded. We measured a 1-3% improvement in pyperformance. +can recurse significantly deeper, (if the user increases the recursion limit). +We measured a 1-3% improvement in pyperformance. (Contributed by Pablo Galindo and Mark Shannon in :issue:`45256`.) @@ -586,7 +585,7 @@ Python caches the results of expensive operations directly in the bytecode. The specializer will also combine certain common instruction pairs into one superinstruction. This reduces the overhead during execution. -This extra information requires more memory. Python will only specialize +Python will only specialize when it sees code that is "hot" (executed multiple times). This prevents Python from wasting time for run-once code. Python can also de-specialize when code is too dynamic or when the use changes. Specialization is attempted periodically, @@ -603,30 +602,32 @@ See :pep:`659` for more information.) | Operation | Form | Specialization | Operation speedup | Contributor(s) | | | | | (up to) | | +===============+====================+=======================================================+===================+===================+ -| Binary | ``o+o; o*o; o-o;`` | Binary add, multiply and subtract for common types | 10% | Mark Shannon, | +| Binary | ``x+x; x*x; x-x;`` | Binary add, multiply and subtract for common types | 10% | Mark Shannon, | | operations | | such as ``int``, ``float``, and ``str`` take custom | | Dong-hee Na, | | | | fast paths for their underlying types. | | Brandt Bucher, | | | | | | Dennis Sweeney | +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ -| Subscript | ``o[i]`` | Subscripting container types such as ``list``, | 10-30% | Irit Katriel, | +| Subscript | ``a[i]`` | Subscripting container types such as ``list``, | 10-30% | Irit Katriel, | | | | ``tuple`` and ``dict`` directly index the underlying | | Mark Shannon | -| | | data structures. Subscripting custom ``__getitem__`` | | | +| | | data structures. | | | +| | | | | | +| | | Subscripting custom ``__getitem__`` | | | | | | is also inlined similar to :ref:`inline-calls`. | | | +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ -| Store | ``o[i] = z`` | Similar to subscripting specialization above. | 10-90% | Dennis Sweeney | +| Store | ``a[i] = z`` | Similar to subscripting specialization above. | 10-90% | Dennis Sweeney | | subscript | | | | | +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ | Calls | ``f(arg)`` | Calls to common builtin (C) functions such as ``len`` | 20% | Mark Shannon, | | | | and ``isinstance`` directly call their underlying C | | Ken Jin | -| | ``T(arg)`` | version. This avoids going through the internal | 170% | | +| | ``C(arg)`` | version. This avoids going through the internal | 170% | | | | | calling convention. | | | | | | | | | | | | Calls to certain Python functions are inlined similar | | | | | | to :ref:`inline-calls`. | | | | | | | | | +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ -| Load | ``print; len`` | The object's index in the globals/builtins namespace | - [1]_ | Mark Shannon | -| global | | is cached. Loading globals and builtins require | | | +| Load | ``print`` | The object's index in the globals/builtins namespace | - [1]_ | Mark Shannon | +| global | ``len`` | is cached. Loading globals and builtins require | | | | variable | | zero namespace lookups. | | | +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ | Load | ``o.attr`` | Similar to loading global variables. The attribute's | - [2]_ | Mark Shannon | @@ -634,7 +635,7 @@ See :pep:`659` for more information.) | | | In most cases, attribute loading will require zero | | | | | | namespace lookups. | | | +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ -| Load | ``o.meth()`` | The actual address of the method is cached. Method | 10-20% [3]_ | Ken Jin, | +| Load | ``o.meth()`` | The actual address of the method is cached. Method | 10-20% | Ken Jin, | | methods for | | loading now has no namespace lookups -- even for | | Mark Shannon | | call | | classes with long inheritance chains. | | | +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ @@ -652,10 +653,6 @@ See :pep:`659` for more information.) 3.11 specializes for more forms. Furthermore, all attribute loads should be sped up by :issue:`45947`. -.. [3] Classes with longer inheritance chains will see greater speedups. - This optimization effectively makes method lookup constant time - regardless of inheritance. - Misc ---- @@ -664,6 +661,9 @@ Misc namespace dictionaries now also share keys more freely. (Contributed Mark Shannon in :issue:`45340` and :issue:`40116`.) +* A more concise representation of exceptions in the interpreter reduced the + time required for catching an exception by about 10%. + (Contributed by Irit Katriel in :issue:`45711`.) FAQ --- @@ -677,7 +677,7 @@ FAQ | | Q: Will CPython 3.11 use more memory? | -| A: Maybe not. We don't expect memory use to exceed 20% more versus 3.10. +| A: Maybe not. We don't expect memory use to exceed 20% more than 3.10. This is offset by memory optimizations for frame objects and object dictionaries as mentioned above. | From 63fa529d72c8e42b48b9456211f3c37b7eae3b39 Mon Sep 17 00:00:00 2001 From: Fidget-Spinner <28750310+Fidget-Spinner@users.noreply.github.com> Date: Tue, 5 Apr 2022 19:39:52 +0800 Subject: [PATCH 16/18] Remove "almost" --- Doc/whatsnew/3.11.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Doc/whatsnew/3.11.rst b/Doc/whatsnew/3.11.rst index e263941135bb1f..78af37c37212f1 100644 --- a/Doc/whatsnew/3.11.rst +++ b/Doc/whatsnew/3.11.rst @@ -561,7 +561,7 @@ In 3.11, when Python detects Python code calling another Python function, it sets up a new frame, and "jumps" to the new code inside the new frame. This avoids calling the C interpreting function altogether. -Python function calls now consume almost no C stack space. This speeds up +Most Python function calls now consume no C stack space. This speeds up most of such calls. In simple recursive functions like fibonacci or factorial, a 1.7x speedup was observed. This also means recursive functions can recurse significantly deeper, (if the user increases the recursion limit). From f460c93447155b6e870e33aa28b79ea38f615d40 Mon Sep 17 00:00:00 2001 From: Fidget-Spinner <28750310+Fidget-Spinner@users.noreply.github.com> Date: Tue, 5 Apr 2022 22:07:31 +0800 Subject: [PATCH 17/18] Add Pablo's employer --- Doc/whatsnew/3.11.rst | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/Doc/whatsnew/3.11.rst b/Doc/whatsnew/3.11.rst index 78af37c37212f1..2c86090cf0fe91 100644 --- a/Doc/whatsnew/3.11.rst +++ b/Doc/whatsnew/3.11.rst @@ -703,8 +703,9 @@ About ----- Faster CPython explores optimizations for :term:`CPython`. The main team is -funded by Microsoft to work on this full-time. The team also collaborates -extensively with volunteer contributors in the community. +funded by Microsoft to work on this full-time. Pablo Galindo Salgado is also +funded by Bloomberg LP to work on the project part-time. Finally, many +contributors are volunteers from the community. CPython bytecode changes From 584dafafb19c25a43c10ec1cabc589864000e609 Mon Sep 17 00:00:00 2001 From: Fidget-Spinner <28750310+Fidget-Spinner@users.noreply.github.com> Date: Wed, 6 Apr 2022 11:36:42 +0800 Subject: [PATCH 18/18] Address Guido's review Co-Authored-By: Guido van Rossum --- Doc/whatsnew/3.11.rst | 23 ++++++++++------------- 1 file changed, 10 insertions(+), 13 deletions(-) diff --git a/Doc/whatsnew/3.11.rst b/Doc/whatsnew/3.11.rst index 2c86090cf0fe91..fce7efd5613007 100644 --- a/Doc/whatsnew/3.11.rst +++ b/Doc/whatsnew/3.11.rst @@ -555,16 +555,16 @@ Inlined Python function calls ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ During a Python function call, Python will call an evaluating C function to interpret that function's code. This effectively limits pure Python recursion to -what's safe for the C stack +what's safe for the C stack. -In 3.11, when Python detects Python code calling another Python function, +In 3.11, when CPython detects Python code calling another Python function, it sets up a new frame, and "jumps" to the new code inside the new frame. This avoids calling the C interpreting function altogether. Most Python function calls now consume no C stack space. This speeds up most of such calls. In simple recursive functions like fibonacci or factorial, a 1.7x speedup was observed. This also means recursive functions -can recurse significantly deeper, (if the user increases the recursion limit). +can recurse significantly deeper (if the user increases the recursion limit). We measured a 1-3% improvement in pyperformance. (Contributed by Pablo Galindo and Mark Shannon in :issue:`45256`.) @@ -607,30 +607,27 @@ See :pep:`659` for more information.) | | | fast paths for their underlying types. | | Brandt Bucher, | | | | | | Dennis Sweeney | +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ -| Subscript | ``a[i]`` | Subscripting container types such as ``list``, | 10-30% | Irit Katriel, | +| Subscript | ``a[i]`` | Subscripting container types such as ``list``, | 10-25% | Irit Katriel, | | | | ``tuple`` and ``dict`` directly index the underlying | | Mark Shannon | | | | data structures. | | | | | | | | | | | | Subscripting custom ``__getitem__`` | | | | | | is also inlined similar to :ref:`inline-calls`. | | | +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ -| Store | ``a[i] = z`` | Similar to subscripting specialization above. | 10-90% | Dennis Sweeney | +| Store | ``a[i] = z`` | Similar to subscripting specialization above. | 10-25% | Dennis Sweeney | | subscript | | | | | +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ -| Calls | ``f(arg)`` | Calls to common builtin (C) functions such as ``len`` | 20% | Mark Shannon, | -| | | and ``isinstance`` directly call their underlying C | | Ken Jin | -| | ``C(arg)`` | version. This avoids going through the internal | 170% | | +| Calls | ``f(arg)`` | Calls to common builtin (C) functions and types such | 20% | Mark Shannon, | +| | ``C(arg)`` | as ``len`` and ``str`` directly call their underlying | | Ken Jin | +| | | C version. This avoids going through the internal | | | | | | calling convention. | | | | | | | | | -| | | Calls to certain Python functions are inlined similar | | | -| | | to :ref:`inline-calls`. | | | -| | | | | | +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ -| Load | ``print`` | The object's index in the globals/builtins namespace | - [1]_ | Mark Shannon | +| Load | ``print`` | The object's index in the globals/builtins namespace | [1]_ | Mark Shannon | | global | ``len`` | is cached. Loading globals and builtins require | | | | variable | | zero namespace lookups. | | | +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+ -| Load | ``o.attr`` | Similar to loading global variables. The attribute's | - [2]_ | Mark Shannon | +| Load | ``o.attr`` | Similar to loading global variables. The attribute's | [2]_ | Mark Shannon | | attribute | | index inside the class/object's namespace is cached. | | | | | | In most cases, attribute loading will require zero | | | | | | namespace lookups. | | |