From d18f804c16fac06a4404be04b2191c580a904c60 Mon Sep 17 00:00:00 2001
From: Fidget-Spinner <28750310+Fidget-Spinner@users.noreply.github.com>
Date: Tue, 8 Mar 2022 22:51:53 +0800
Subject: [PATCH 01/18] Faster-cpython whatsnew initial draft

---
 Doc/whatsnew/3.11.rst | 151 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 151 insertions(+)

diff --git a/Doc/whatsnew/3.11.rst b/Doc/whatsnew/3.11.rst
index 4a64e044c4a167..8e1b1b61977f7b 100644
--- a/Doc/whatsnew/3.11.rst
+++ b/Doc/whatsnew/3.11.rst
@@ -62,6 +62,7 @@ Summary -- Release highlights
 .. This section singles out the most important changes in Python 3.11.
    Brevity is key.
 
+- Python 3.11 is 20% faster than Python 3.10. See `Faster CPython`_ for details.
 
 .. PEP-sized items next.
 
@@ -410,6 +411,156 @@ Optimizations
   (Contributed by Inada Naoki in :issue:`46845`.)
 
 
+Faster CPython
+==============
+
+Python 3.11 is on average `1.20x faster <https://speed.python.org/comparison/?exe=12%2BL%2B3.10%2C12%2BL%2Bmaster&ben=741%2C801%2C742%2C743%2C744%2C745%2C746%2C747%2C748%2C749%2C799%2C800%2C750%2C751%2C752%2C753%2C754%2C755%2C756%2C757%2C758%2C759%2C760%2C761%2C762%2C763%2C764%2C765%2C766%2C767%2C768%2C769%2C770%2C771%2C772%2C773%2C774%2C775%2C776%2C777%2C778%2C779%2C780%2C781%2C782%2C783%2C784%2C785%2C786%2C788%2C787%2C789%2C790%2C791%2C792%2C793%2C794%2C797%2C796%2C795%2C798&env=4&hor=true&bas=12%2BL%2B3.10&chart=normal+bars>`_
+than Python 3.10 when measured with the
+`pyperformance <https://github.com/python/pyperformance>`_ benchmark suite, on
+Ubuntu Linux.
+
+The project focuses on two major areas in Python: faster startup and faster
+runtime. Other optimizations not under this project are listed in `Optimizations`_.
+
+Faster Startup
+--------------
+
+Static objects
+~~~~~~~~~~~~~~
+
+Freezing imports
+~~~~~~~~~~~~~~~~
+
+
+Faster Runtime
+--------------
+
+Cheaper, lazy Python frames
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Python frames are created whenever Python calls a Python function. This frame
+holds execution information. The following are new frame optimizations:
+
+- Streamlined the frame creation process.
+- Avoided memory allocation by generously re-using frame space on the C stack.
+- Streamlined the internal frame object to only contain essential information.
+  Frames previously held extra debugging and memory management information.
+  Old-style frames are now created only when required by debuggers. For most
+  user code, no frames are created at all.
+
+Nearly all Python functions calls have sped up significantly.
+This resulted in a 3-7% improvement in pyperformance.
+(Contributed by Mark Shannon in :issue:`44590`.)
+
+.. _inline-calls:
+
+Inlined Python function calls
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+During a Python function call, Python will call a C function to interpret that
+function's code, similar to how a user would call ``eval`` to run arbitrary
+code.
+
+In 3.11, when Python detects code calling another Python function,
+it sets up a new frame, and "jumps" to the new code inside the new frame. This
+avoids calling the C interpreting function altogether.
+
+Python function calls now consume almost no C stack space. This speeds up Python
+to Python function calls. In simple recursive functions like fibonacci or
+factorial, a 1.7x speedup was observed. This also means recursive functions
+can recurse significantly deeper, assuming the recursion limit and memory limit
+is not exceeded. This resulted in a 1-3% improvement in pyperformance.
+(Contributed by Pablo Galindo and Mark Shannon in :issue:`45256`.)
+
+PEP 659: Specializing Adaptive Interpreter
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+:pep:`659` is one of the key parts of the faster CPython project. The general
+idea is that while Python is a dynamic language, most code have regions where
+objects and types rarely change. This concept is known as *type stability*.
+
+At runtime, Python will try to look for common patterns and type stability
+in the executing code. Python will then replace the current operation with a
+more specialized one. This specialized operation use fast paths available only
+to those use cases/types and will generally outperform their generic
+counterparts. This also brings in another concept called *inline caching*, where
+Python caches the results of expensive operations directly in the bytecode.
+
+Since this exta information requires more memory, Python will only specialize
+when it sees code that is "hot" (executed multiple times). It can also
+de-specialize when code is too dynamic. This is attempted every so often, and
+specialization attempts are not too expensive.
+
+(PEP written by Mark Shannon, with ideas inspired by Stefan Brunthaler.
+See :pep:`659` for more information.)
+
+..
+   If I missed out anyone, please add them.
+
++---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
+| Operation     | Form               | Specialization                                        | Operation speedup | Contributor(s)    |
+|               |                    |                                                       | (up to)           |                   |
++===============+====================+=======================================================+===================+===================+
+| Binary        | ``o+o; o*o; o-o;`` | Binary add, multiply and subtract for common types    | 10%               | Mark Shannon,     |
+| operations    |                    | such as ``int``, ``float``, and ``str`` take custom   |                   | Dong-hee Na,      |
+|               |                    | fast paths for their underlying types.                |                   | Brandt Bucher,    |
+|               |                    |                                                       |                   | Dennis Sweeney    |
++---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
+| Subscript     | ``o[i]``           | Subscripting container types such as ``list``,        | 10-30%            | Irit Katriel,     |
+|               |                    | ``tuple`` and ``dict``directly index the underlying   |                   | Mark Shannon      |
+|               |                    | data structures. Subscripting custom ``__getitem__``  |                   |                   |
+|               |                    | is also inlined similar to :ref:`inline-calls`.       |                   |                   |
++---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
+| Store         | ``o[i] = z``       | Similar to subscripting specialization above.         | ?                 |                   |
+| subscript     |                    |                                                       |                   |                   |
++---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
+| Calls         | ``f(arg)``         | Calls to common builtin (C) functions such as ``len`` | 20%               | Mark Shannon,     |
+|               |                    | and ``isinstance`` directly call their underlying C   |                   | Ken Jin           |
+|               | ``T(arg)``         | version. This avoids going through the internal       | 170%              |                   |
+|               |                    | calling convention.                                   |                   |                   |
+|               |                    |                                                       | ?                 |                   |
+|               |                    | Calls to certain Python functions are inlined similar |                   |                   |
+|               |                    | to :ref:`inline-calls`.                               |                   |                   |
+|               |                    |                                                       |                   |                   |
+|               |                    | ``__init__`` is also inlined when creating            |                   |                   |
+|               |                    | Python classes.                                       |                   |                   |
++---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
+| Load          | ``print; len``     | The object's index in the globals/builtins namespace  | 0% [1]_           | Mark Shannon      |
+| global        |                    | is cached. Loading globals and builtins requires      |                   |                   |
+| variable      |                    | no namespace lookups.                                 |                   |                   |
++---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
+| Load          | ``o.attr``         | Similar to loading global variables. The attribute's  | ?                 | Mark Shannon      |
+| attribute     |                    | index inside the class/object's namespace is cached.  |                   |                   |
+|               |                    | In most cases, attribute loading will require zero    |                   |                   |
+|               |                    | namespace lookups.                                    |                   |                   |
++---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
+| Load          | ``o.meth()``       | The actual address of the method is cached. Method    | 10-20% [2]_       | Ken Jin           |
+| methods for   |                    | loading now has no namespace lookups -- even for      |                   |                   |
+| call          |                    | classes with long inheritance chains.                 |                   |                   |
++---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
+| Store         | ``o.attr = z``     | Similar to load attribute optimization.               | ?                 | Mark Shannon      |
+| attribute     |                    |                                                       |                   |                   |
++---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
+| Unpack        | ``*seq``           | Specialized for common containers such as ``list``    | ?                 | ?                 |
+| Sequence      |                    | and ``tuple``. Avoids internal calling convention.    |                   |                   |
++---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
+
+.. [1] Note that a similar optimization already existed since Python 3.8.
+
+.. [2] Classes with longer inheritance chains will see greater speedups.
+       This optimization effectively makes method lookup constant time
+       regardless of inheritance.
+
+About
+-----
+
+Faster CPython explores optimizations for :term:`CPython`. The main team is
+funded by Microsoft to work on this full-time. The team also collaborates
+extensively with volunteer contributors in the community. The following list of
+people involved is non-exhaustive:
+
+* Faster CPython team: Guido van Rossum, Mark Shannon, Eric Snow, Brandt Bucher
+* External collaborators: Irit Katriel, Pablo Galindo
+* Additional contributors: Dennis Sweeney, Dong-hee Na, Ken Jin, Jelle Zijlstra, Kumar Aditya
+* There are too many people contributing ideas to list!
+
 CPython bytecode changes
 ========================
 

From 77701feba07ed673948bdfe8c64d8c5a44444540 Mon Sep 17 00:00:00 2001
From: Fidget-Spinner <28750310+Fidget-Spinner@users.noreply.github.com>
Date: Fri, 25 Mar 2022 17:34:26 +0800
Subject: [PATCH 02/18] Add section on static code objects/freezing imports

Co-Authored-By: Kumar Aditya <59607654+kumaraditya303@users.noreply.github.com>
---
 Doc/tutorial/modules.rst |  2 ++
 Doc/whatsnew/3.11.rst    | 26 ++++++++++++++++++++++----
 2 files changed, 24 insertions(+), 4 deletions(-)

diff --git a/Doc/tutorial/modules.rst b/Doc/tutorial/modules.rst
index f1d4957e37eb11..342a66be728aea 100644
--- a/Doc/tutorial/modules.rst
+++ b/Doc/tutorial/modules.rst
@@ -209,6 +209,8 @@ directory. This is an error unless the replacement is intended.  See section
 .. %
     Do we need stuff on zip files etc. ? DUBOIS
 
+.. _tut-pycache:
+
 "Compiled" Python files
 -----------------------
 
diff --git a/Doc/whatsnew/3.11.rst b/Doc/whatsnew/3.11.rst
index 8e1b1b61977f7b..72c16a054371fd 100644
--- a/Doc/whatsnew/3.11.rst
+++ b/Doc/whatsnew/3.11.rst
@@ -425,11 +425,29 @@ runtime. Other optimizations not under this project are listed in `Optimizations
 Faster Startup
 --------------
 
-Static objects
-~~~~~~~~~~~~~~
+Frozen imports / Static code objects
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-Freezing imports
-~~~~~~~~~~~~~~~~
+Python caches bytecode in the :ref:`__pycache__<tut-pycache>` directory to
+speed up module loading.
+
+Previously in 3.10, Python module execution looked like this:
+
+.. code-block:: text
+
+   Read __pycache__ -> Unmarshal -> Heap allocated code object -> Evaluate
+
+In Python 3.11, the core modules essential for Python startup are "frozen".
+This means that their code objects (and bytecode) are statically allocated
+by the interpreter. This reduces the steps in module execution process to this:
+
+.. code-block:: text
+
+   Statically allocated code object -> Evaluate
+
+Interpreter startup is now 10-15% faster in Python 3.11.
+
+(Contributed by Eric Snow, Guido van Rossum and Kumar Aditya in numerous issues.)
 
 
 Faster Runtime

From 2d4171f0a09d77939f730829fc77c594a48a423a Mon Sep 17 00:00:00 2001
From: Fidget-Spinner <28750310+Fidget-Spinner@users.noreply.github.com>
Date: Fri, 25 Mar 2022 17:41:35 +0800
Subject: [PATCH 03/18] clarify where the benefits are

---
 Doc/whatsnew/3.11.rst | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/Doc/whatsnew/3.11.rst b/Doc/whatsnew/3.11.rst
index 72c16a054371fd..fe37d3bde350a1 100644
--- a/Doc/whatsnew/3.11.rst
+++ b/Doc/whatsnew/3.11.rst
@@ -445,7 +445,9 @@ by the interpreter. This reduces the steps in module execution process to this:
 
    Statically allocated code object -> Evaluate
 
-Interpreter startup is now 10-15% faster in Python 3.11.
+Interpreter startup is now 10-15% faster in Python 3.11. This positively
+impacts short-running programs using Python. Such as ``python -m venv ...``, or
+``python -m pip ...```.
 
 (Contributed by Eric Snow, Guido van Rossum and Kumar Aditya in numerous issues.)
 

From 2e7647f06234d0e7f2dbd1e64e383c3434451c3b Mon Sep 17 00:00:00 2001
From: Fidget-Spinner <28750310+Fidget-Spinner@users.noreply.github.com>
Date: Sat, 26 Mar 2022 15:28:51 +0800
Subject: [PATCH 04/18] Add an FAQ section

---
 Doc/whatsnew/3.11.rst | 38 ++++++++++++++++++++++++++++++++++----
 1 file changed, 34 insertions(+), 4 deletions(-)

diff --git a/Doc/whatsnew/3.11.rst b/Doc/whatsnew/3.11.rst
index fe37d3bde350a1..6dbff803f4d364 100644
--- a/Doc/whatsnew/3.11.rst
+++ b/Doc/whatsnew/3.11.rst
@@ -414,8 +414,8 @@ Optimizations
 Faster CPython
 ==============
 
-Python 3.11 is on average `1.20x faster <https://speed.python.org/comparison/?exe=12%2BL%2B3.10%2C12%2BL%2Bmaster&ben=741%2C801%2C742%2C743%2C744%2C745%2C746%2C747%2C748%2C749%2C799%2C800%2C750%2C751%2C752%2C753%2C754%2C755%2C756%2C757%2C758%2C759%2C760%2C761%2C762%2C763%2C764%2C765%2C766%2C767%2C768%2C769%2C770%2C771%2C772%2C773%2C774%2C775%2C776%2C777%2C778%2C779%2C780%2C781%2C782%2C783%2C784%2C785%2C786%2C788%2C787%2C789%2C790%2C791%2C792%2C793%2C794%2C797%2C796%2C795%2C798&env=4&hor=true&bas=12%2BL%2B3.10&chart=normal+bars>`_
-than Python 3.10 when measured with the
+CPython 3.11 is on average `1.20x faster <https://github.com/faster-cpython/ideas/blob/main/main-vs-310.rst>`_
+than CPython 3.10 when measured with the
 `pyperformance <https://github.com/python/pyperformance>`_ benchmark suite, on
 Ubuntu Linux.
 
@@ -551,8 +551,8 @@ See :pep:`659` for more information.)
 |               |                    | In most cases, attribute loading will require zero    |                   |                   |
 |               |                    | namespace lookups.                                    |                   |                   |
 +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
-| Load          | ``o.meth()``       | The actual address of the method is cached. Method    | 10-20% [2]_       | Ken Jin           |
-| methods for   |                    | loading now has no namespace lookups -- even for      |                   |                   |
+| Load          | ``o.meth()``       | The actual address of the method is cached. Method    | 10-20% [2]_       | Ken Jin,          |
+| methods for   |                    | loading now has no namespace lookups -- even for      |                   | Mark Shannon      |
 | call          |                    | classes with long inheritance chains.                 |                   |                   |
 +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
 | Store         | ``o.attr = z``     | Similar to load attribute optimization.               | ?                 | Mark Shannon      |
@@ -568,6 +568,36 @@ See :pep:`659` for more information.)
        This optimization effectively makes method lookup constant time
        regardless of inheritance.
 
+FAQ
+---
+
+| Q: How should I write my code to utilize these speedups?
+|
+| A: You don't have to change your code. Write Pythonic code that follows common
+  best practices. The Faster CPython project optimizes for common code
+  patterns we observe.
+|
+|
+| Q: Will CPython 3.11 use more memory?
+|
+| A: Yes. However, how much exactly depends on how much code is "hot". We don't
+  expect memory use to exceed 20% more versus 3.10. This may be offset by
+  memory optimizations for frame objects and object dictionaries as mentioned
+  above.
+|
+|
+| Q: I don't see any speedups in my workload. Why?
+|
+| A: Certain code won't have noticeable benefits. If your code spends most of
+  its time on IO(Input/Output) operations, or already does most of its
+  computation in a C extension library like numpy, there won't be significant
+  speedup. This project currently benefits pure-Python workloads the most.
+|
+|
+| Q: Is there a JIT compiler?
+|
+| A: No. We're still exploring other optimizations.
+
 About
 -----
 

From 07646a6e50efd3a1c44ba8775a783065cf192125 Mon Sep 17 00:00:00 2001
From: Fidget-Spinner <28750310+Fidget-Spinner@users.noreply.github.com>
Date: Fri, 1 Apr 2022 23:51:06 +0800
Subject: [PATCH 05/18] Update FAQ, add performance figures and footnotes

---
 Doc/whatsnew/3.11.rst | 29 ++++++++++++++++++-----------
 1 file changed, 18 insertions(+), 11 deletions(-)

diff --git a/Doc/whatsnew/3.11.rst b/Doc/whatsnew/3.11.rst
index 82d391a0b5cd68..489e42f1038ab2 100644
--- a/Doc/whatsnew/3.11.rst
+++ b/Doc/whatsnew/3.11.rst
@@ -62,7 +62,7 @@ Summary -- Release highlights
 .. This section singles out the most important changes in Python 3.11.
    Brevity is key.
 
-- Python 3.11 is 20% faster than Python 3.10. See `Faster CPython`_ for details.
+- Python 3.11 is 22% faster than Python 3.10. See `Faster CPython`_ for details.
 
 .. PEP-sized items next.
 
@@ -484,7 +484,7 @@ Optimizations
 Faster CPython
 ==============
 
-CPython 3.11 is on average `1.20x faster <https://github.com/faster-cpython/ideas/blob/main/main-vs-310.rst>`_
+CPython 3.11 is on average `1.22x faster <https://github.com/faster-cpython/ideas/blob/main/main-vs-310.rst>`_
 than CPython 3.10 when measured with the
 `pyperformance <https://github.com/python/pyperformance>`_ benchmark suite, on
 Ubuntu Linux.
@@ -573,6 +573,9 @@ to those use cases/types and will generally outperform their generic
 counterparts. This also brings in another concept called *inline caching*, where
 Python caches the results of expensive operations directly in the bytecode.
 
+The specializer will also combine certain common instruction pairs into one
+super instruction. This reduces the overhead during execution.
+
 Since this exta information requires more memory, Python will only specialize
 when it sees code that is "hot" (executed multiple times). It can also
 de-specialize when code is too dynamic. This is attempted every so often, and
@@ -605,36 +608,36 @@ See :pep:`659` for more information.)
 |               |                    | and ``isinstance`` directly call their underlying C   |                   | Ken Jin           |
 |               | ``T(arg)``         | version. This avoids going through the internal       | 170%              |                   |
 |               |                    | calling convention.                                   |                   |                   |
-|               |                    |                                                       | ?                 |                   |
+|               |                    |                                                       |                   |                   |
 |               |                    | Calls to certain Python functions are inlined similar |                   |                   |
 |               |                    | to :ref:`inline-calls`.                               |                   |                   |
 |               |                    |                                                       |                   |                   |
-|               |                    | ``__init__`` is also inlined when creating            |                   |                   |
-|               |                    | Python classes.                                       |                   |                   |
 +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
 | Load          | ``print; len``     | The object's index in the globals/builtins namespace  | 0% [1]_           | Mark Shannon      |
 | global        |                    | is cached. Loading globals and builtins requires      |                   |                   |
 | variable      |                    | no namespace lookups.                                 |                   |                   |
 +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
-| Load          | ``o.attr``         | Similar to loading global variables. The attribute's  | ?                 | Mark Shannon      |
+| Load          | ``o.attr``         | Similar to loading global variables. The attribute's  | 0% [2]_           | Mark Shannon      |
 | attribute     |                    | index inside the class/object's namespace is cached.  |                   |                   |
 |               |                    | In most cases, attribute loading will require zero    |                   |                   |
 |               |                    | namespace lookups.                                    |                   |                   |
 +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
-| Load          | ``o.meth()``       | The actual address of the method is cached. Method    | 10-20% [2]_       | Ken Jin,          |
+| Load          | ``o.meth()``       | The actual address of the method is cached. Method    | 10-20% [3]_       | Ken Jin,          |
 | methods for   |                    | loading now has no namespace lookups -- even for      |                   | Mark Shannon      |
 | call          |                    | classes with long inheritance chains.                 |                   |                   |
 +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
-| Store         | ``o.attr = z``     | Similar to load attribute optimization.               | ?                 | Mark Shannon      |
-| attribute     |                    |                                                       |                   |                   |
+| Store         | ``o.attr = z``     | Similar to load attribute optimization.               | 2%                | Mark Shannon      |
+| attribute     |                    |                                                       | in pyperformance  |                   |
 +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
-| Unpack        | ``*seq``           | Specialized for common containers such as ``list``    | ?                 | ?                 |
+| Unpack        | ``*seq``           | Specialized for common containers such as ``list``    | 8%                | Brandt Bucher     |
 | Sequence      |                    | and ``tuple``. Avoids internal calling convention.    |                   |                   |
 +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
 
 .. [1] Note that a similar optimization already existed since Python 3.8.
 
-.. [2] Classes with longer inheritance chains will see greater speedups.
+.. [2] Note that a similar optimization has already existed since Python 3.10.
+
+.. [3] Classes with longer inheritance chains will see greater speedups.
        This optimization effectively makes method lookup constant time
        regardless of inheritance.
 
@@ -663,6 +666,10 @@ FAQ
   computation in a C extension library like numpy, there won't be significant
   speedup. This project currently benefits pure-Python workloads the most.
 |
+| Furthermore, the pyperformance figures are a geometric mean. Even within the
+  pyperformance benchmarks, certain benchmarks have slowed down slightly, while
+  others have sped up by nearly 1.9x!
+|
 |
 | Q: Is there a JIT compiler?
 |

From 8063a44f9ce90e6d3a97bafac1d0e50c317b05ca Mon Sep 17 00:00:00 2001
From: Fidget-Spinner <28750310+Fidget-Spinner@users.noreply.github.com>
Date: Fri, 1 Apr 2022 23:56:21 +0800
Subject: [PATCH 06/18] Add NEWS

---
 .../next/Documentation/2022-04-01-23-56-13.bpo-47189.Nss0Y3.rst | 2 ++
 1 file changed, 2 insertions(+)
 create mode 100644 Misc/NEWS.d/next/Documentation/2022-04-01-23-56-13.bpo-47189.Nss0Y3.rst

diff --git a/Misc/NEWS.d/next/Documentation/2022-04-01-23-56-13.bpo-47189.Nss0Y3.rst b/Misc/NEWS.d/next/Documentation/2022-04-01-23-56-13.bpo-47189.Nss0Y3.rst
new file mode 100644
index 00000000000000..8c400841ca1d14
--- /dev/null
+++ b/Misc/NEWS.d/next/Documentation/2022-04-01-23-56-13.bpo-47189.Nss0Y3.rst
@@ -0,0 +1,2 @@
+Add a What's New in Python 3.11 entry for the Faster CPython project.
+Documentation by Ken Jin and Kumar Aditya.

From 33b04860e80f1035dd7734b7e9a189075971472b Mon Sep 17 00:00:00 2001
From: Fidget-Spinner <28750310+Fidget-Spinner@users.noreply.github.com>
Date: Sat, 2 Apr 2022 00:01:28 +0800
Subject: [PATCH 07/18] Fix doc errors

---
 Doc/whatsnew/3.11.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Doc/whatsnew/3.11.rst b/Doc/whatsnew/3.11.rst
index 16068bd7ceff53..ce13af1331d4a5 100644
--- a/Doc/whatsnew/3.11.rst
+++ b/Doc/whatsnew/3.11.rst
@@ -530,7 +530,7 @@ by the interpreter. This reduces the steps in module execution process to this:
 
 Interpreter startup is now 10-15% faster in Python 3.11. This positively
 impacts short-running programs using Python. Such as ``python -m venv ...``, or
-``python -m pip ...```.
+``python -m pip ...``.
 
 (Contributed by Eric Snow, Guido van Rossum and Kumar Aditya in numerous issues.)
 
@@ -610,7 +610,7 @@ See :pep:`659` for more information.)
 |               |                    |                                                       |                   | Dennis Sweeney    |
 +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
 | Subscript     | ``o[i]``           | Subscripting container types such as ``list``,        | 10-30%            | Irit Katriel,     |
-|               |                    | ``tuple`` and ``dict``directly index the underlying   |                   | Mark Shannon      |
+|               |                    | ``tuple`` and ``dict`` directly index the underlying  |                   | Mark Shannon      |
 |               |                    | data structures. Subscripting custom ``__getitem__``  |                   |                   |
 |               |                    | is also inlined similar to :ref:`inline-calls`.       |                   |                   |
 +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+

From 3cb89aaece851f9b4655f0cab76334512df83411 Mon Sep 17 00:00:00 2001
From: Fidget-Spinner <28750310+Fidget-Spinner@users.noreply.github.com>
Date: Sat, 2 Apr 2022 00:26:56 +0800
Subject: [PATCH 08/18] Add entry for namespace dictionar, make wording
 consistent

---
 Doc/whatsnew/3.11.rst | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/Doc/whatsnew/3.11.rst b/Doc/whatsnew/3.11.rst
index ce13af1331d4a5..f58b838b291891 100644
--- a/Doc/whatsnew/3.11.rst
+++ b/Doc/whatsnew/3.11.rst
@@ -627,8 +627,8 @@ See :pep:`659` for more information.)
 |               |                    |                                                       |                   |                   |
 +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
 | Load          | ``print; len``     | The object's index in the globals/builtins namespace  | 0% [1]_           | Mark Shannon      |
-| global        |                    | is cached. Loading globals and builtins requires      |                   |                   |
-| variable      |                    | no namespace lookups.                                 |                   |                   |
+| global        |                    | is cached. Loading globals and builtins require       |                   |                   |
+| variable      |                    | zero namespace lookups.                               |                   |                   |
 +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
 | Load          | ``o.attr``         | Similar to loading global variables. The attribute's  | 0% [2]_           | Mark Shannon      |
 | attribute     |                    | index inside the class/object's namespace is cached.  |                   |                   |
@@ -654,6 +654,15 @@ See :pep:`659` for more information.)
        This optimization effectively makes method lookup constant time
        regardless of inheritance.
 
+
+Misc
+----
+
+* Objects now require less memory due to lazily created object namespaces. Their
+  namespace dictionaries now also share keys more freely.
+  (Contributed Mark Shannon in :issue:`45340` and :issue:`40116`.)
+
+
 FAQ
 ---
 

From 121b24d32ecadae877f302d079390bebd651bf2f Mon Sep 17 00:00:00 2001
From: Fidget-Spinner <28750310+Fidget-Spinner@users.noreply.github.com>
Date: Sat, 2 Apr 2022 12:11:36 +0800
Subject: [PATCH 09/18] Apply suggestions by Mark and Jelle

Co-Authored-By: Jelle Zijlstra <jelle.zijlstra@gmail.com>
---
 Doc/whatsnew/3.11.rst | 48 ++++++++++++++++++++++---------------------
 1 file changed, 25 insertions(+), 23 deletions(-)

diff --git a/Doc/whatsnew/3.11.rst b/Doc/whatsnew/3.11.rst
index f58b838b291891..3fadf97318b2b1 100644
--- a/Doc/whatsnew/3.11.rst
+++ b/Doc/whatsnew/3.11.rst
@@ -62,7 +62,8 @@ Summary -- Release highlights
 .. This section singles out the most important changes in Python 3.11.
    Brevity is key.
 
-- Python 3.11 is 22% faster than Python 3.10. See `Faster CPython`_ for details.
+- Python 3.11 is 10-60% faster than Python 3.10. On average, we measured a 1.22x
+  speedup on the standard benchmark suite. See `Faster CPython`_ for details.
 
 .. PEP-sized items next.
 
@@ -500,7 +501,7 @@ Faster CPython
 CPython 3.11 is on average `1.22x faster <https://github.com/faster-cpython/ideas/blob/main/main-vs-310.rst>`_
 than CPython 3.10 when measured with the
 `pyperformance <https://github.com/python/pyperformance>`_ benchmark suite, on
-Ubuntu Linux.
+Ubuntu Linux. Depending on your workload, the speedup could be 10-60% faster.
 
 The project focuses on two major areas in Python: faster startup and faster
 runtime. Other optimizations not under this project are listed in `Optimizations`_.
@@ -576,23 +577,25 @@ is not exceeded. This resulted in a 1-3% improvement in pyperformance.
 PEP 659: Specializing Adaptive Interpreter
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 :pep:`659` is one of the key parts of the faster CPython project. The general
-idea is that while Python is a dynamic language, most code have regions where
+idea is that while Python is a dynamic language, most code has regions where
 objects and types rarely change. This concept is known as *type stability*.
 
 At runtime, Python will try to look for common patterns and type stability
 in the executing code. Python will then replace the current operation with a
-more specialized one. This specialized operation use fast paths available only
-to those use cases/types and will generally outperform their generic
+more specialized one. This specialized operation uses fast paths available only
+to those use cases/types, which generally outperform their generic
 counterparts. This also brings in another concept called *inline caching*, where
 Python caches the results of expensive operations directly in the bytecode.
 
 The specializer will also combine certain common instruction pairs into one
-super instruction. This reduces the overhead during execution.
+superinstruction. This reduces the overhead during execution.
 
-Since this exta information requires more memory, Python will only specialize
-when it sees code that is "hot" (executed multiple times). It can also
-de-specialize when code is too dynamic. This is attempted every so often, and
-specialization attempts are not too expensive.
+This extra information requires more memory. Python will only specialize
+when it sees code that is "hot" (executed multiple times). This prevents Python
+from wasting time for run-once code. Python can also de-specialize when code is
+too dynamic or when the use changes. Specialization is attempted periodically,
+and specialization attempts are not too expensive. This allows specialization
+to adapt to new circumstances.
 
 (PEP written by Mark Shannon, with ideas inspired by Stefan Brunthaler.
 See :pep:`659` for more information.)
@@ -626,11 +629,11 @@ See :pep:`659` for more information.)
 |               |                    | to :ref:`inline-calls`.                               |                   |                   |
 |               |                    |                                                       |                   |                   |
 +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
-| Load          | ``print; len``     | The object's index in the globals/builtins namespace  | 0% [1]_           | Mark Shannon      |
+| Load          | ``print; len``     | The object's index in the globals/builtins namespace  | - [1]_            | Mark Shannon      |
 | global        |                    | is cached. Loading globals and builtins require       |                   |                   |
 | variable      |                    | zero namespace lookups.                               |                   |                   |
 +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
-| Load          | ``o.attr``         | Similar to loading global variables. The attribute's  | 0% [2]_           | Mark Shannon      |
+| Load          | ``o.attr``         | Similar to loading global variables. The attribute's  | - [2]_            | Mark Shannon      |
 | attribute     |                    | index inside the class/object's namespace is cached.  |                   |                   |
 |               |                    | In most cases, attribute loading will require zero    |                   |                   |
 |               |                    | namespace lookups.                                    |                   |                   |
@@ -646,9 +649,12 @@ See :pep:`659` for more information.)
 | Sequence      |                    | and ``tuple``. Avoids internal calling convention.    |                   |                   |
 +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
 
-.. [1] Note that a similar optimization already existed since Python 3.8.
+.. [1] Note that a similar optimization already existed since Python 3.8.  3.11
+       specializes for more forms and reduces some overhead.
 
 .. [2] Note that a similar optimization has already existed since Python 3.10.
+       3.11 speicalizes for more forms. Furthermore, all attribute loads should
+       be sped up by :issue:`45947`.
 
 .. [3] Classes with longer inheritance chains will see greater speedups.
        This optimization effectively makes method lookup constant time
@@ -675,8 +681,8 @@ FAQ
 |
 | Q: Will CPython 3.11 use more memory?
 |
-| A: Yes. However, how much exactly depends on how much code is "hot". We don't
-  expect memory use to exceed 20% more versus 3.10. This may be offset by
+| A: Maybe not. However, how much exactly depends on how much code is "hot". We
+  don't expect memory use to exceed 20% more versus 3.10. This is offset by
   memory optimizations for frame objects and object dictionaries as mentioned
   above.
 |
@@ -684,31 +690,27 @@ FAQ
 | Q: I don't see any speedups in my workload. Why?
 |
 | A: Certain code won't have noticeable benefits. If your code spends most of
-  its time on IO(Input/Output) operations, or already does most of its
+  its time on I/O operations, or already does most of its
   computation in a C extension library like numpy, there won't be significant
   speedup. This project currently benefits pure-Python workloads the most.
 |
 | Furthermore, the pyperformance figures are a geometric mean. Even within the
   pyperformance benchmarks, certain benchmarks have slowed down slightly, while
-  others have sped up by nearly 1.9x!
+  others have sped up by nearly 2x!
 |
 |
 | Q: Is there a JIT compiler?
 |
 | A: No. We're still exploring other optimizations.
 
+
 About
 -----
 
 Faster CPython explores optimizations for :term:`CPython`. The main team is
 funded by Microsoft to work on this full-time. The team also collaborates
-extensively with volunteer contributors in the community. The following list of
-people involved is non-exhaustive:
+extensively with volunteer contributors in the community.
 
-* Faster CPython team: Guido van Rossum, Mark Shannon, Eric Snow, Brandt Bucher
-* External collaborators: Irit Katriel, Pablo Galindo
-* Additional contributors: Dennis Sweeney, Dong-hee Na, Ken Jin, Jelle Zijlstra, Kumar Aditya
-* There are too many people contributing ideas to list!
 
 CPython bytecode changes
 ========================

From 48dd46d714c88bf4e430e58283a3958aebf0b72f Mon Sep 17 00:00:00 2001
From: Fidget-Spinner <28750310+Fidget-Spinner@users.noreply.github.com>
Date: Sat, 2 Apr 2022 12:14:28 +0800
Subject: [PATCH 10/18] Credit Dennis Sweeney for STORE_SUBSCR

---
 Doc/whatsnew/3.11.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Doc/whatsnew/3.11.rst b/Doc/whatsnew/3.11.rst
index 3fadf97318b2b1..9892a681f17e95 100644
--- a/Doc/whatsnew/3.11.rst
+++ b/Doc/whatsnew/3.11.rst
@@ -617,7 +617,7 @@ See :pep:`659` for more information.)
 |               |                    | data structures. Subscripting custom ``__getitem__``  |                   |                   |
 |               |                    | is also inlined similar to :ref:`inline-calls`.       |                   |                   |
 +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
-| Store         | ``o[i] = z``       | Similar to subscripting specialization above.         | ?                 |                   |
+| Store         | ``o[i] = z``       | Similar to subscripting specialization above.         | 10-90%            | Dennis Sweeney    |
 | subscript     |                    |                                                       |                   |                   |
 +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
 | Calls         | ``f(arg)``         | Calls to common builtin (C) functions such as ``len`` | 20%               | Mark Shannon,     |

From ff34d6ffd19dc8e208c8983661bbb35a31b45d8d Mon Sep 17 00:00:00 2001
From: Fidget-Spinner <28750310+Fidget-Spinner@users.noreply.github.com>
Date: Sat, 2 Apr 2022 19:09:32 +0800
Subject: [PATCH 11/18] remove misleading sentence about memory usage and "hot"
 code

---
 Doc/whatsnew/3.11.rst | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/Doc/whatsnew/3.11.rst b/Doc/whatsnew/3.11.rst
index 9892a681f17e95..33c52620e2a97a 100644
--- a/Doc/whatsnew/3.11.rst
+++ b/Doc/whatsnew/3.11.rst
@@ -681,10 +681,9 @@ FAQ
 |
 | Q: Will CPython 3.11 use more memory?
 |
-| A: Maybe not. However, how much exactly depends on how much code is "hot". We
-  don't expect memory use to exceed 20% more versus 3.10. This is offset by
-  memory optimizations for frame objects and object dictionaries as mentioned
-  above.
+| A: Maybe not. We don't expect memory use to exceed 20% more versus 3.10.
+  This is offset by memory optimizations for frame objects and object
+  dictionaries as mentioned above.
 |
 |
 | Q: I don't see any speedups in my workload. Why?

From acef908aba3dcc8983efd1fefac1170cbc76363a Mon Sep 17 00:00:00 2001
From: Fidget-Spinner <28750310+Fidget-Spinner@users.noreply.github.com>
Date: Sun, 3 Apr 2022 00:22:32 +0800
Subject: [PATCH 12/18] Tighten wording, specify GCC

---
 Doc/whatsnew/3.11.rst | 27 +++++++++++++++------------
 1 file changed, 15 insertions(+), 12 deletions(-)

diff --git a/Doc/whatsnew/3.11.rst b/Doc/whatsnew/3.11.rst
index 33c52620e2a97a..a98c1a37ee2f39 100644
--- a/Doc/whatsnew/3.11.rst
+++ b/Doc/whatsnew/3.11.rst
@@ -500,10 +500,11 @@ Faster CPython
 
 CPython 3.11 is on average `1.22x faster <https://github.com/faster-cpython/ideas/blob/main/main-vs-310.rst>`_
 than CPython 3.10 when measured with the
-`pyperformance <https://github.com/python/pyperformance>`_ benchmark suite, on
-Ubuntu Linux. Depending on your workload, the speedup could be 10-60% faster.
+`pyperformance <https://github.com/python/pyperformance>`_ benchmark suite,
+and compiled with GCC on Ubuntu Linux. Depending on your workload, the speedup
+could be 10-60% faster.
 
-The project focuses on two major areas in Python: faster startup and faster
+This project focuses on two major areas in Python: faster startup and faster
 runtime. Other optimizations not under this project are listed in `Optimizations`_.
 
 Faster Startup
@@ -546,13 +547,14 @@ holds execution information. The following are new frame optimizations:
 
 - Streamlined the frame creation process.
 - Avoided memory allocation by generously re-using frame space on the C stack.
-- Streamlined the internal frame object to only contain essential information.
+- Streamlined the internal frame object to contain only essential information.
   Frames previously held extra debugging and memory management information.
-  Old-style frames are now created only when required by debuggers. For most
-  user code, no frames are created at all.
 
-Nearly all Python functions calls have sped up significantly.
-This resulted in a 3-7% improvement in pyperformance.
+Old-style frames are now created only when required by debuggers. For most
+user code, no frames are created at all. As a result, nearly all Python
+functions calls have sped up significantly. We measured a 3-7% speedup
+in pyperformance.
+
 (Contributed by Mark Shannon in :issue:`44590`.)
 
 .. _inline-calls:
@@ -571,7 +573,8 @@ Python function calls now consume almost no C stack space. This speeds up Python
 to Python function calls. In simple recursive functions like fibonacci or
 factorial, a 1.7x speedup was observed. This also means recursive functions
 can recurse significantly deeper, assuming the recursion limit and memory limit
-is not exceeded. This resulted in a 1-3% improvement in pyperformance.
+is not exceeded. We measured a 1-3% improvement in pyperformance.
+
 (Contributed by Pablo Galindo and Mark Shannon in :issue:`45256`.)
 
 PEP 659: Specializing Adaptive Interpreter
@@ -649,11 +652,11 @@ See :pep:`659` for more information.)
 | Sequence      |                    | and ``tuple``. Avoids internal calling convention.    |                   |                   |
 +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
 
-.. [1] Note that a similar optimization already existed since Python 3.8.  3.11
+.. [1] A similar optimization already existed since Python 3.8.  3.11
        specializes for more forms and reduces some overhead.
 
-.. [2] Note that a similar optimization has already existed since Python 3.10.
-       3.11 speicalizes for more forms. Furthermore, all attribute loads should
+.. [2] A similar optimization already existed since Python 3.10.
+       3.11 specializes for more forms. Furthermore, all attribute loads should
        be sped up by :issue:`45947`.
 
 .. [3] Classes with longer inheritance chains will see greater speedups.

From bc84c89138e555e89f8742392c1812e155fa9dc4 Mon Sep 17 00:00:00 2001
From: Fidget-Spinner <28750310+Fidget-Spinner@users.noreply.github.com>
Date: Sun, 3 Apr 2022 14:44:03 +0800
Subject: [PATCH 13/18] Apply suggestions by Alex

Co-Authored-By: Alex Waygood <Alex.Waygood@Gmail.com>
---
 Doc/whatsnew/3.11.rst | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/Doc/whatsnew/3.11.rst b/Doc/whatsnew/3.11.rst
index a98c1a37ee2f39..31107f97e657f1 100644
--- a/Doc/whatsnew/3.11.rst
+++ b/Doc/whatsnew/3.11.rst
@@ -530,9 +530,9 @@ by the interpreter. This reduces the steps in module execution process to this:
 
    Statically allocated code object -> Evaluate
 
-Interpreter startup is now 10-15% faster in Python 3.11. This positively
-impacts short-running programs using Python. Such as ``python -m venv ...``, or
-``python -m pip ...``.
+Interpreter startup is now 10-15% faster in Python 3.11. This has a big
+impact for short-running programs using Python such as ``python -m venv ...``,
+or ``python -m pip ...``.
 
 (Contributed by Eric Snow, Guido van Rossum and Kumar Aditya in numerous issues.)
 

From 1f1e565aea1d09efd659c5d4c8714866c4a0812f Mon Sep 17 00:00:00 2001
From: Fidget-Spinner <28750310+Fidget-Spinner@users.noreply.github.com>
Date: Sun, 3 Apr 2022 14:46:27 +0800
Subject: [PATCH 14/18] Remove outdated whatsnew section

---
 Doc/whatsnew/3.11.rst | 7 -------
 1 file changed, 7 deletions(-)

diff --git a/Doc/whatsnew/3.11.rst b/Doc/whatsnew/3.11.rst
index 31107f97e657f1..9844b317805403 100644
--- a/Doc/whatsnew/3.11.rst
+++ b/Doc/whatsnew/3.11.rst
@@ -474,13 +474,6 @@ Optimizations
   almost eliminated when no exception is raised.
   (Contributed by Mark Shannon in :issue:`40222`.)
 
-* Method calls with keywords are now faster due to bytecode
-  changes which avoid creating bound method instances. Previously, this
-  optimization was applied only to method calls with purely positional
-  arguments.
-  (Contributed by Ken Jin and Mark Shannon in :issue:`26110`, based on ideas
-  implemented in PyPy.)
-
 * Pure ASCII strings are now normalized in constant time by :func:`unicodedata.normalize`.
   (Contributed by Dong-hee Na in :issue:`44987`.)
 

From 8b182ecf4fed520e5276e979e808294de6cf0be3 Mon Sep 17 00:00:00 2001
From: Fidget-Spinner <28750310+Fidget-Spinner@users.noreply.github.com>
Date: Tue, 5 Apr 2022 18:54:05 +0800
Subject: [PATCH 15/18] Address Guido's and Irit's reviews

Co-Authored-By: Guido van Rossum <gvanrossum@users.noreply.github.com>
Co-Authored-By: Irit Katriel <1055913+iritkatriel@users.noreply.github.com>
---
 Doc/whatsnew/3.11.rst | 62 +++++++++++++++++++++----------------------
 1 file changed, 31 insertions(+), 31 deletions(-)

diff --git a/Doc/whatsnew/3.11.rst b/Doc/whatsnew/3.11.rst
index 9844b317805403..e263941135bb1f 100644
--- a/Doc/whatsnew/3.11.rst
+++ b/Doc/whatsnew/3.11.rst
@@ -62,8 +62,8 @@ Summary -- Release highlights
 .. This section singles out the most important changes in Python 3.11.
    Brevity is key.
 
-- Python 3.11 is 10-60% faster than Python 3.10. On average, we measured a 1.22x
-  speedup on the standard benchmark suite. See `Faster CPython`_ for details.
+- Python 3.11 is up to 10-60% faster than Python 3.10. On average, we measured a
+  1.22x speedup on the standard benchmark suite. See `Faster CPython`_ for details.
 
 .. PEP-sized items next.
 
@@ -495,7 +495,7 @@ CPython 3.11 is on average `1.22x faster <https://github.com/faster-cpython/idea
 than CPython 3.10 when measured with the
 `pyperformance <https://github.com/python/pyperformance>`_ benchmark suite,
 and compiled with GCC on Ubuntu Linux. Depending on your workload, the speedup
-could be 10-60% faster.
+could be up to 10-60% faster.
 
 This project focuses on two major areas in Python: faster startup and faster
 runtime. Other optimizations not under this project are listed in `Optimizations`_.
@@ -524,8 +524,7 @@ by the interpreter. This reduces the steps in module execution process to this:
    Statically allocated code object -> Evaluate
 
 Interpreter startup is now 10-15% faster in Python 3.11. This has a big
-impact for short-running programs using Python such as ``python -m venv ...``,
-or ``python -m pip ...``.
+impact for short-running programs using Python.
 
 (Contributed by Eric Snow, Guido van Rossum and Kumar Aditya in numerous issues.)
 
@@ -540,12 +539,12 @@ holds execution information. The following are new frame optimizations:
 
 - Streamlined the frame creation process.
 - Avoided memory allocation by generously re-using frame space on the C stack.
-- Streamlined the internal frame object to contain only essential information.
+- Streamlined the internal frame struct to contain only essential information.
   Frames previously held extra debugging and memory management information.
 
-Old-style frames are now created only when required by debuggers. For most
-user code, no frames are created at all. As a result, nearly all Python
-functions calls have sped up significantly. We measured a 3-7% speedup
+Old-style frame objects are now created only when required by debuggers. For
+most user code, no frame objects are created at all. As a result, nearly all
+Python functions calls have sped up significantly. We measured a 3-7% speedup
 in pyperformance.
 
 (Contributed by Mark Shannon in :issue:`44590`.)
@@ -554,19 +553,19 @@ in pyperformance.
 
 Inlined Python function calls
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-During a Python function call, Python will call a C function to interpret that
-function's code, similar to how a user would call ``eval`` to run arbitrary
-code.
+During a Python function call, Python will call an evaluating C function to
+interpret that function's code. This effectively limits pure Python recursion to
+what's safe for the C stack
 
-In 3.11, when Python detects code calling another Python function,
+In 3.11, when Python detects Python code calling another Python function,
 it sets up a new frame, and "jumps" to the new code inside the new frame. This
 avoids calling the C interpreting function altogether.
 
-Python function calls now consume almost no C stack space. This speeds up Python
-to Python function calls. In simple recursive functions like fibonacci or
+Python function calls now consume almost no C stack space. This speeds up
+most of such calls. In simple recursive functions like fibonacci or
 factorial, a 1.7x speedup was observed. This also means recursive functions
-can recurse significantly deeper, assuming the recursion limit and memory limit
-is not exceeded. We measured a 1-3% improvement in pyperformance.
+can recurse significantly deeper, (if the user increases the recursion limit).
+We measured a 1-3% improvement in pyperformance.
 
 (Contributed by Pablo Galindo and Mark Shannon in :issue:`45256`.)
 
@@ -586,7 +585,7 @@ Python caches the results of expensive operations directly in the bytecode.
 The specializer will also combine certain common instruction pairs into one
 superinstruction. This reduces the overhead during execution.
 
-This extra information requires more memory. Python will only specialize
+Python will only specialize
 when it sees code that is "hot" (executed multiple times). This prevents Python
 from wasting time for run-once code. Python can also de-specialize when code is
 too dynamic or when the use changes. Specialization is attempted periodically,
@@ -603,30 +602,32 @@ See :pep:`659` for more information.)
 | Operation     | Form               | Specialization                                        | Operation speedup | Contributor(s)    |
 |               |                    |                                                       | (up to)           |                   |
 +===============+====================+=======================================================+===================+===================+
-| Binary        | ``o+o; o*o; o-o;`` | Binary add, multiply and subtract for common types    | 10%               | Mark Shannon,     |
+| Binary        | ``x+x; x*x; x-x;`` | Binary add, multiply and subtract for common types    | 10%               | Mark Shannon,     |
 | operations    |                    | such as ``int``, ``float``, and ``str`` take custom   |                   | Dong-hee Na,      |
 |               |                    | fast paths for their underlying types.                |                   | Brandt Bucher,    |
 |               |                    |                                                       |                   | Dennis Sweeney    |
 +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
-| Subscript     | ``o[i]``           | Subscripting container types such as ``list``,        | 10-30%            | Irit Katriel,     |
+| Subscript     | ``a[i]``           | Subscripting container types such as ``list``,        | 10-30%            | Irit Katriel,     |
 |               |                    | ``tuple`` and ``dict`` directly index the underlying  |                   | Mark Shannon      |
-|               |                    | data structures. Subscripting custom ``__getitem__``  |                   |                   |
+|               |                    | data structures.                                      |                   |                   |
+|               |                    |                                                       |                   |                   |
+|               |                    | Subscripting custom ``__getitem__``                   |                   |                   |
 |               |                    | is also inlined similar to :ref:`inline-calls`.       |                   |                   |
 +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
-| Store         | ``o[i] = z``       | Similar to subscripting specialization above.         | 10-90%            | Dennis Sweeney    |
+| Store         | ``a[i] = z``       | Similar to subscripting specialization above.         | 10-90%            | Dennis Sweeney    |
 | subscript     |                    |                                                       |                   |                   |
 +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
 | Calls         | ``f(arg)``         | Calls to common builtin (C) functions such as ``len`` | 20%               | Mark Shannon,     |
 |               |                    | and ``isinstance`` directly call their underlying C   |                   | Ken Jin           |
-|               | ``T(arg)``         | version. This avoids going through the internal       | 170%              |                   |
+|               | ``C(arg)``         | version. This avoids going through the internal       | 170%              |                   |
 |               |                    | calling convention.                                   |                   |                   |
 |               |                    |                                                       |                   |                   |
 |               |                    | Calls to certain Python functions are inlined similar |                   |                   |
 |               |                    | to :ref:`inline-calls`.                               |                   |                   |
 |               |                    |                                                       |                   |                   |
 +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
-| Load          | ``print; len``     | The object's index in the globals/builtins namespace  | - [1]_            | Mark Shannon      |
-| global        |                    | is cached. Loading globals and builtins require       |                   |                   |
+| Load          | ``print``          | The object's index in the globals/builtins namespace  | - [1]_            | Mark Shannon      |
+| global        | ``len``            | is cached. Loading globals and builtins require       |                   |                   |
 | variable      |                    | zero namespace lookups.                               |                   |                   |
 +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
 | Load          | ``o.attr``         | Similar to loading global variables. The attribute's  | - [2]_            | Mark Shannon      |
@@ -634,7 +635,7 @@ See :pep:`659` for more information.)
 |               |                    | In most cases, attribute loading will require zero    |                   |                   |
 |               |                    | namespace lookups.                                    |                   |                   |
 +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
-| Load          | ``o.meth()``       | The actual address of the method is cached. Method    | 10-20% [3]_       | Ken Jin,          |
+| Load          | ``o.meth()``       | The actual address of the method is cached. Method    | 10-20%            | Ken Jin,          |
 | methods for   |                    | loading now has no namespace lookups -- even for      |                   | Mark Shannon      |
 | call          |                    | classes with long inheritance chains.                 |                   |                   |
 +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
@@ -652,10 +653,6 @@ See :pep:`659` for more information.)
        3.11 specializes for more forms. Furthermore, all attribute loads should
        be sped up by :issue:`45947`.
 
-.. [3] Classes with longer inheritance chains will see greater speedups.
-       This optimization effectively makes method lookup constant time
-       regardless of inheritance.
-
 
 Misc
 ----
@@ -664,6 +661,9 @@ Misc
   namespace dictionaries now also share keys more freely.
   (Contributed Mark Shannon in :issue:`45340` and :issue:`40116`.)
 
+* A more concise representation of exceptions in the interpreter reduced the
+  time required for catching an exception by about 10%.
+  (Contributed by Irit Katriel in :issue:`45711`.)
 
 FAQ
 ---
@@ -677,7 +677,7 @@ FAQ
 |
 | Q: Will CPython 3.11 use more memory?
 |
-| A: Maybe not. We don't expect memory use to exceed 20% more versus 3.10.
+| A: Maybe not. We don't expect memory use to exceed 20% more than 3.10.
   This is offset by memory optimizations for frame objects and object
   dictionaries as mentioned above.
 |

From 63fa529d72c8e42b48b9456211f3c37b7eae3b39 Mon Sep 17 00:00:00 2001
From: Fidget-Spinner <28750310+Fidget-Spinner@users.noreply.github.com>
Date: Tue, 5 Apr 2022 19:39:52 +0800
Subject: [PATCH 16/18] Remove "almost"

---
 Doc/whatsnew/3.11.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Doc/whatsnew/3.11.rst b/Doc/whatsnew/3.11.rst
index e263941135bb1f..78af37c37212f1 100644
--- a/Doc/whatsnew/3.11.rst
+++ b/Doc/whatsnew/3.11.rst
@@ -561,7 +561,7 @@ In 3.11, when Python detects Python code calling another Python function,
 it sets up a new frame, and "jumps" to the new code inside the new frame. This
 avoids calling the C interpreting function altogether.
 
-Python function calls now consume almost no C stack space. This speeds up
+Most Python function calls now consume no C stack space. This speeds up
 most of such calls. In simple recursive functions like fibonacci or
 factorial, a 1.7x speedup was observed. This also means recursive functions
 can recurse significantly deeper, (if the user increases the recursion limit).

From f460c93447155b6e870e33aa28b79ea38f615d40 Mon Sep 17 00:00:00 2001
From: Fidget-Spinner <28750310+Fidget-Spinner@users.noreply.github.com>
Date: Tue, 5 Apr 2022 22:07:31 +0800
Subject: [PATCH 17/18] Add Pablo's employer

---
 Doc/whatsnew/3.11.rst | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/Doc/whatsnew/3.11.rst b/Doc/whatsnew/3.11.rst
index 78af37c37212f1..2c86090cf0fe91 100644
--- a/Doc/whatsnew/3.11.rst
+++ b/Doc/whatsnew/3.11.rst
@@ -703,8 +703,9 @@ About
 -----
 
 Faster CPython explores optimizations for :term:`CPython`. The main team is
-funded by Microsoft to work on this full-time. The team also collaborates
-extensively with volunteer contributors in the community.
+funded by Microsoft to work on this full-time. Pablo Galindo Salgado is also
+funded by Bloomberg LP to work on the project part-time. Finally, many
+contributors are volunteers from the community.
 
 
 CPython bytecode changes

From 584dafafb19c25a43c10ec1cabc589864000e609 Mon Sep 17 00:00:00 2001
From: Fidget-Spinner <28750310+Fidget-Spinner@users.noreply.github.com>
Date: Wed, 6 Apr 2022 11:36:42 +0800
Subject: [PATCH 18/18] Address Guido's review

Co-Authored-By: Guido van Rossum <gvanrossum@users.noreply.github.com>
---
 Doc/whatsnew/3.11.rst | 23 ++++++++++-------------
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/Doc/whatsnew/3.11.rst b/Doc/whatsnew/3.11.rst
index 2c86090cf0fe91..fce7efd5613007 100644
--- a/Doc/whatsnew/3.11.rst
+++ b/Doc/whatsnew/3.11.rst
@@ -555,16 +555,16 @@ Inlined Python function calls
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 During a Python function call, Python will call an evaluating C function to
 interpret that function's code. This effectively limits pure Python recursion to
-what's safe for the C stack
+what's safe for the C stack.
 
-In 3.11, when Python detects Python code calling another Python function,
+In 3.11, when CPython detects Python code calling another Python function,
 it sets up a new frame, and "jumps" to the new code inside the new frame. This
 avoids calling the C interpreting function altogether.
 
 Most Python function calls now consume no C stack space. This speeds up
 most of such calls. In simple recursive functions like fibonacci or
 factorial, a 1.7x speedup was observed. This also means recursive functions
-can recurse significantly deeper, (if the user increases the recursion limit).
+can recurse significantly deeper (if the user increases the recursion limit).
 We measured a 1-3% improvement in pyperformance.
 
 (Contributed by Pablo Galindo and Mark Shannon in :issue:`45256`.)
@@ -607,30 +607,27 @@ See :pep:`659` for more information.)
 |               |                    | fast paths for their underlying types.                |                   | Brandt Bucher,    |
 |               |                    |                                                       |                   | Dennis Sweeney    |
 +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
-| Subscript     | ``a[i]``           | Subscripting container types such as ``list``,        | 10-30%            | Irit Katriel,     |
+| Subscript     | ``a[i]``           | Subscripting container types such as ``list``,        | 10-25%            | Irit Katriel,     |
 |               |                    | ``tuple`` and ``dict`` directly index the underlying  |                   | Mark Shannon      |
 |               |                    | data structures.                                      |                   |                   |
 |               |                    |                                                       |                   |                   |
 |               |                    | Subscripting custom ``__getitem__``                   |                   |                   |
 |               |                    | is also inlined similar to :ref:`inline-calls`.       |                   |                   |
 +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
-| Store         | ``a[i] = z``       | Similar to subscripting specialization above.         | 10-90%            | Dennis Sweeney    |
+| Store         | ``a[i] = z``       | Similar to subscripting specialization above.         | 10-25%            | Dennis Sweeney    |
 | subscript     |                    |                                                       |                   |                   |
 +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
-| Calls         | ``f(arg)``         | Calls to common builtin (C) functions such as ``len`` | 20%               | Mark Shannon,     |
-|               |                    | and ``isinstance`` directly call their underlying C   |                   | Ken Jin           |
-|               | ``C(arg)``         | version. This avoids going through the internal       | 170%              |                   |
+| Calls         | ``f(arg)``         | Calls to common builtin (C) functions and types such  | 20%               | Mark Shannon,     |
+|               | ``C(arg)``         | as ``len`` and ``str`` directly call their underlying |                   | Ken Jin           |
+|               |                    | C version. This avoids going through the internal     |                   |                   |
 |               |                    | calling convention.                                   |                   |                   |
 |               |                    |                                                       |                   |                   |
-|               |                    | Calls to certain Python functions are inlined similar |                   |                   |
-|               |                    | to :ref:`inline-calls`.                               |                   |                   |
-|               |                    |                                                       |                   |                   |
 +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
-| Load          | ``print``          | The object's index in the globals/builtins namespace  | - [1]_            | Mark Shannon      |
+| Load          | ``print``          | The object's index in the globals/builtins namespace  | [1]_              | Mark Shannon      |
 | global        | ``len``            | is cached. Loading globals and builtins require       |                   |                   |
 | variable      |                    | zero namespace lookups.                               |                   |                   |
 +---------------+--------------------+-------------------------------------------------------+-------------------+-------------------+
-| Load          | ``o.attr``         | Similar to loading global variables. The attribute's  | - [2]_            | Mark Shannon      |
+| Load          | ``o.attr``         | Similar to loading global variables. The attribute's  | [2]_              | Mark Shannon      |
 | attribute     |                    | index inside the class/object's namespace is cached.  |                   |                   |
 |               |                    | In most cases, attribute loading will require zero    |                   |                   |
 |               |                    | namespace lookups.                                    |                   |                   |