From e319adbe6931ff8759c3778e64525b064b255c17 Mon Sep 17 00:00:00 2001
From: Petr Viktorin <encukou@gmail.com>
Date: Wed, 5 Mar 2025 17:35:57 +0100
Subject: [PATCH 1/3] Link token names in Lexical analysis

---
 Doc/reference/lexical_analysis.rst | 66 +++++++++++++++++++-----------
 1 file changed, 41 insertions(+), 25 deletions(-)

diff --git a/Doc/reference/lexical_analysis.rst b/Doc/reference/lexical_analysis.rst
index f7167032ad7df9..10f46e45cdef73 100644
--- a/Doc/reference/lexical_analysis.rst
+++ b/Doc/reference/lexical_analysis.rst
@@ -8,8 +8,8 @@ Lexical analysis
 .. index:: lexical analysis, parser, token
 
 A Python program is read by a *parser*.  Input to the parser is a stream of
-*tokens*, generated by the *lexical analyzer*.  This chapter describes how the
-lexical analyzer breaks a file into tokens.
+:term:`tokens <token>`, generated by the *lexical analyzer*.
+This chapter describes how the lexical analyzer breaks a file into tokens.
 
 Python reads program text as Unicode code points; the encoding of a source file
 can be given by an encoding declaration and defaults to UTF-8, see :pep:`3120`
@@ -34,11 +34,11 @@ Logical lines
 
 .. index:: logical line, physical line, line joining, NEWLINE token
 
-The end of a logical line is represented by the token NEWLINE.  Statements
-cannot cross logical line boundaries except where NEWLINE is allowed by the
-syntax (e.g., between statements in compound statements). A logical line is
-constructed from one or more *physical lines* by following the explicit or
-implicit *line joining* rules.
+The end of a logical line is represented by the token :data:`~token.NEWLINE`.
+Statements cannot cross logical line boundaries except where :data:`!NEWLINE`
+is allowed by the syntax (e.g., between statements in compound statements).
+A logical line is constructed from one or more *physical lines* by following
+the explicit or implicit *line joining* rules.
 
 
 .. _physical-lines:
@@ -159,11 +159,12 @@ Blank lines
 .. index:: single: blank line
 
 A logical line that contains only spaces, tabs, formfeeds and possibly a
-comment, is ignored (i.e., no NEWLINE token is generated).  During interactive
-input of statements, handling of a blank line may differ depending on the
-implementation of the read-eval-print loop.  In the standard interactive
-interpreter, an entirely blank logical line (i.e. one containing not even
-whitespace or a comment) terminates a multi-line statement.
+comment, is ignored (i.e., no :data:`~token.NEWLINE` token is generated).
+During interactive input of statements, handling of a blank line may differ
+depending on the implementation of the read-eval-print loop.
+In the standard interactive interpreter, an entirely blank logical line (that
+is, one containing not even whitespace or a comment) terminates a multi-line
+statement.
 
 
 .. _indentation:
@@ -201,19 +202,20 @@ the space count to zero).
 
 .. index:: INDENT token, DEDENT token
 
-The indentation levels of consecutive lines are used to generate INDENT and
-DEDENT tokens, using a stack, as follows.
+The indentation levels of consecutive lines are used to generate
+:data:`~token.INDENT` and :data:`~token.DEDENT` tokens, using a stack,
+as follows.
 
 Before the first line of the file is read, a single zero is pushed on the stack;
 this will never be popped off again.  The numbers pushed on the stack will
 always be strictly increasing from bottom to top.  At the beginning of each
 logical line, the line's indentation level is compared to the top of the stack.
 If it is equal, nothing happens. If it is larger, it is pushed on the stack, and
-one INDENT token is generated.  If it is smaller, it *must* be one of the
+one :data:`!INDENT` token is generated.  If it is smaller, it *must* be one of the
 numbers occurring on the stack; all numbers on the stack that are larger are
-popped off, and for each number popped off a DEDENT token is generated.  At the
-end of the file, a DEDENT token is generated for each number remaining on the
-stack that is larger than zero.
+popped off, and for each number popped off a :data:`!DEDENT` token is generated.
+At the end of the file, a :data:`!DEDENT` token is generated for each number
+remaining on the stack that is larger than zero.
 
 Here is an example of a correctly (though confusingly) indented piece of Python
 code::
@@ -253,8 +255,18 @@ Whitespace between tokens
 Except at the beginning of a logical line or in string literals, the whitespace
 characters space, tab and formfeed can be used interchangeably to separate
 tokens.  Whitespace is needed between two tokens only if their concatenation
-could otherwise be interpreted as a different token (e.g., ab is one token, but
-a b is two tokens).
+could otherwise be interpreted as a different token. For example, ``ab`` is one
+token, but ``a b`` is two tokens. For another example, ``+a`` is two tokens,
+and is equivalent to ``+ a``.
+
+
+.. _endmarker-token:
+
+End marker
+----------
+
+At the end of non-interactive input, the lexical analyzer generates an
+:data:`~token.ENDMARKER` token.
 
 
 .. _other-tokens:
@@ -262,11 +274,15 @@ a b is two tokens).
 Other tokens
 ============
 
-Besides NEWLINE, INDENT and DEDENT, the following categories of tokens exist:
-*identifiers*, *keywords*, *literals*, *operators*, and *delimiters*. Whitespace
-characters (other than line terminators, discussed earlier) are not tokens, but
-serve to delimit tokens. Where ambiguity exists, a token comprises the longest
-possible string that forms a legal token, when read from left to right.
+Besides :data:`~token.NEWLINE`, :data:`~token.INDENT` and :data:`~token.DEDENT`,
+the following categories of tokens exist:
+*identifiers* and *keywords* (:data:`~token.NAME`), *literals* (such as
+:data:`~token.NUMBER` and :data:`~token.STRING`), and other symbols
+(*operators* and *delimiters*, :data:`~token.OP`).
+Whitespace characters (other than logical line terminators, discussed earlier)
+are not tokens, but serve to delimit tokens.
+Where ambiguity exists, a token comprises the longest possible string that
+forms a legal token, when read from left to right.
 
 
 .. _identifiers:

From 8ea8576663eeb79377f04e9c45231e3e2ebe495d Mon Sep 17 00:00:00 2001
From: Petr Viktorin <encukou@gmail.com>
Date: Thu, 8 May 2025 11:21:54 +0200
Subject: [PATCH 2/3] Update Doc/reference/lexical_analysis.rst

---
 Doc/reference/lexical_analysis.rst | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Doc/reference/lexical_analysis.rst b/Doc/reference/lexical_analysis.rst
index 356eecf7c12524..2cef1d8fead8a5 100644
--- a/Doc/reference/lexical_analysis.rst
+++ b/Doc/reference/lexical_analysis.rst
@@ -257,8 +257,8 @@ Except at the beginning of a logical line or in string literals, the whitespace
 characters space, tab and formfeed can be used interchangeably to separate
 tokens.  Whitespace is needed between two tokens only if their concatenation
 could otherwise be interpreted as a different token. For example, ``ab`` is one
-token, but ``a b`` is two tokens. For another example, ``+a`` is two tokens,
-and is equivalent to ``+ a``.
+token, but ``a b`` is two tokens. However, ``+a`` and ``+ a`` both produce
+two tokens, ``a`` and ``b``, as ``+a`` is not a valid token.
 
 
 .. _endmarker-token:

From f8044f78cc2177a10864f9091817ecc16d7892e0 Mon Sep 17 00:00:00 2001
From: Petr Viktorin <encukou@gmail.com>
Date: Thu, 8 May 2025 11:32:58 +0200
Subject: [PATCH 3/3] Fix Doc/reference/lexical_analysis.rst

Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
---
 Doc/reference/lexical_analysis.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Doc/reference/lexical_analysis.rst b/Doc/reference/lexical_analysis.rst
index 2cef1d8fead8a5..c465cb16870d56 100644
--- a/Doc/reference/lexical_analysis.rst
+++ b/Doc/reference/lexical_analysis.rst
@@ -258,7 +258,7 @@ characters space, tab and formfeed can be used interchangeably to separate
 tokens.  Whitespace is needed between two tokens only if their concatenation
 could otherwise be interpreted as a different token. For example, ``ab`` is one
 token, but ``a b`` is two tokens. However, ``+a`` and ``+ a`` both produce
-two tokens, ``a`` and ``b``, as ``+a`` is not a valid token.
+two tokens, ``+`` and ``a``, as ``+a`` is not a valid token.
 
 
 .. _endmarker-token: