diff --git a/Doc/reference/expressions.rst b/Doc/reference/expressions.rst index 17f39aaf5f57cd..0803fadd0eeb37 100644 --- a/Doc/reference/expressions.rst +++ b/Doc/reference/expressions.rst @@ -133,13 +133,18 @@ Literals Python supports string and bytes literals and various numeric literals: -.. productionlist:: python-grammar - literal: `stringliteral` | `bytesliteral` | `NUMBER` +.. grammar-snippet:: + :group: python-grammar + + literal: `strings` | `NUMBER` Evaluation of a literal yields an object of the given type (string, bytes, integer, floating-point number, complex number) with the given value. The value may be approximated in the case of floating-point and imaginary (complex) -literals. See section :ref:`literals` for details. +literals. +See section :ref:`literals` for details. +See section :ref:`string-concatenation` for details on ``strings``. + .. index:: triple: immutable; data; type @@ -152,6 +157,45 @@ occurrence) may obtain the same object or a different object with the same value. +.. _string-concatenation: + +String literal concatenation +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Multiple adjacent string or bytes literals (delimited by whitespace), possibly +using different quoting conventions, are allowed, and their meaning is the same +as their concatenation. Thus, ``"hello" 'world'`` is equivalent to +``"helloworld"``. + +Formally: + +.. grammar-snippet:: + :group: python-grammar + + strings: ( `STRING` | fstring | tstring)+ + +Note that this feature is defined at the syntactical level, so it only works +with literals. +To concatenate string expressions at run time, the '+' operator may be used:: + + greeting = "Hello" + space = " " + name = "Blaise" + print(greeting + space + name) # not: print(greeting space name) + +Also note that literal concatenation can freely mix raw strings, +triple-quoted strings, and formatted or template string literals. +However, bytes literals may not be combined with string literals of any kind. + +This feature can be used to reduce the number of backslashes +needed, to split long strings conveniently across long lines, or even to add +comments to parts of strings, for example:: + + re.compile("[A-Za-z_]" # letter or underscore + "[A-Za-z0-9_]*" # letter, digit or underscore + ) + + .. _parenthesized: Parenthesized forms diff --git a/Doc/reference/grammar.rst b/Doc/reference/grammar.rst index 55c148801d8559..1037feb691f6bc 100644 --- a/Doc/reference/grammar.rst +++ b/Doc/reference/grammar.rst @@ -10,11 +10,8 @@ error recovery. The notation used here is the same as in the preceding docs, and is described in the :ref:`notation ` section, -except for a few extra complications: +except for an extra complication: -* ``&e``: a positive lookahead (that is, ``e`` is required to match but - not consumed) -* ``!e``: a negative lookahead (that is, ``e`` is required *not* to match) * ``~`` ("cut"): commit to the current alternative and fail the rule even if this fails to parse diff --git a/Doc/reference/introduction.rst b/Doc/reference/introduction.rst index 444acac374a690..c62240b18cfe55 100644 --- a/Doc/reference/introduction.rst +++ b/Doc/reference/introduction.rst @@ -145,15 +145,23 @@ The definition to the right of the colon uses the following syntax elements: * ``e?``: A question mark has exactly the same meaning as square brackets: the preceding item is optional. * ``(e)``: Parentheses are used for grouping. + +The following notation is only used in +:ref:`lexical definitions `. + * ``"a"..."z"``: Two literal characters separated by three dots mean a choice of any single character in the given (inclusive) range of ASCII characters. - This notation is only used in - :ref:`lexical definitions `. * ``<...>``: A phrase between angular brackets gives an informal description of the matched symbol (for example, ````), or an abbreviation that is defined in nearby text (for example, ````). - This notation is only used in - :ref:`lexical definitions `. + +.. _lexical-lookaheads: + +Some definitions also use *lookaheads*, which indicate that an element +must (or must not) match at a given position, but without consuming any input: + +* ``&e``: a positive lookahead (that is, ``e`` is required to match) +* ``!e``: a negative lookahead (that is, ``e`` is required *not* to match) The unary operators (``*``, ``+``, ``?``) bind as tightly as possible; the vertical bar (``|``) binds most loosely. diff --git a/Doc/reference/lexical_analysis.rst b/Doc/reference/lexical_analysis.rst index 567c70111c20ec..e3d0bab8942ced 100644 --- a/Doc/reference/lexical_analysis.rst +++ b/Doc/reference/lexical_analysis.rst @@ -39,7 +39,8 @@ The end of a logical line is represented by the token :data:`~token.NEWLINE`. Statements cannot cross logical line boundaries except where :data:`!NEWLINE` is allowed by the syntax (e.g., between statements in compound statements). A logical line is constructed from one or more *physical lines* by following -the explicit or implicit *line joining* rules. +the :ref:`explicit ` or :ref:`implicit ` +*line joining* rules. .. _physical-lines: @@ -47,17 +48,28 @@ the explicit or implicit *line joining* rules. Physical lines -------------- -A physical line is a sequence of characters terminated by an end-of-line -sequence. In source files and strings, any of the standard platform line -termination sequences can be used - the Unix form using ASCII LF (linefeed), -the Windows form using the ASCII sequence CR LF (return followed by linefeed), -or the old Macintosh form using the ASCII CR (return) character. All of these -forms can be used equally, regardless of platform. The end of input also serves -as an implicit terminator for the final physical line. +A physical line is a sequence of characters terminated by one the following +end-of-line sequences: -When embedding Python, source code strings should be passed to Python APIs using -the standard C conventions for newline characters (the ``\n`` character, -representing ASCII LF, is the line terminator). +* the Unix form using ASCII LF (linefeed), +* the Windows form using the ASCII sequence CR LF (return followed by linefeed), +* the old Macintosh form using the ASCII CR (return) character. + +Regardless of platform, each of these sequences is replaced by a single +ASCII LF (linefeed) character. +(This is done even inside :ref:`string literals `.) +Each line can use any of the sequences; they do not need to be consistent +within a file. + +The end of input also serves as an implicit terminator for the final +physical line. + +Formally: + +.. grammar-snippet:: + :group: python-grammar + + newline: | | .. _comments: @@ -106,6 +118,16 @@ If an encoding is declared, the encoding name must be recognized by Python encoding is used for all lexical analysis, including string literals, comments and identifiers. +All lexical analysis, including string literals, comments +and identifiers, works on Unicode text decoded using the source encoding. +Any Unicode code point, except the NUL control character, can appear in +Python source. + +.. grammar-snippet:: + :group: python-grammar + + source_character: + .. _explicit-joining: @@ -474,80 +496,110 @@ Literals Literals are notations for constant values of some built-in types. +In terms of lexical analysis, Python has :ref:`string, bytes ` +and :ref:`numeric ` literals. + +Other “literals” are lexically denoted using :ref:`keywords ` +(``None``, ``True``, ``False``) and the special +:ref:`ellipsis token ` (``...``). + .. index:: string literal, bytes literal, ASCII single: ' (single quote); string literal single: " (double quote); string literal - single: u'; string literal - single: u"; string literal .. _strings: String and Bytes literals -------------------------- +========================= -String literals are described by the following lexical definitions: +String literals are text enclosed in single quotes (``'``) or double +quotes (``"``). For example: -.. productionlist:: python-grammar - stringliteral: [`stringprefix`](`shortstring` | `longstring`) - stringprefix: "r" | "u" | "R" | "U" | "f" | "F" | "t" | "T" - : | "fr" | "Fr" | "fR" | "FR" | "rf" | "rF" | "Rf" | "RF" - : | "tr" | "Tr" | "tR" | "TR" | "rt" | "rT" | "Rt" | "RT" - shortstring: "'" `shortstringitem`* "'" | '"' `shortstringitem`* '"' - longstring: "'''" `longstringitem`* "'''" | '"""' `longstringitem`* '"""' - shortstringitem: `shortstringchar` | `stringescapeseq` - longstringitem: `longstringchar` | `stringescapeseq` - shortstringchar: - longstringchar: - stringescapeseq: "\" +.. code-block:: python -.. productionlist:: python-grammar - bytesliteral: `bytesprefix`(`shortbytes` | `longbytes`) - bytesprefix: "b" | "B" | "br" | "Br" | "bR" | "BR" | "rb" | "rB" | "Rb" | "RB" - shortbytes: "'" `shortbytesitem`* "'" | '"' `shortbytesitem`* '"' - longbytes: "'''" `longbytesitem`* "'''" | '"""' `longbytesitem`* '"""' - shortbytesitem: `shortbyteschar` | `bytesescapeseq` - longbytesitem: `longbyteschar` | `bytesescapeseq` - shortbyteschar: - longbyteschar: - bytesescapeseq: "\" - -One syntactic restriction not indicated by these productions is that whitespace -is not allowed between the :token:`~python-grammar:stringprefix` or -:token:`~python-grammar:bytesprefix` and the rest of the literal. The source -character set is defined by the encoding declaration; it is UTF-8 if no encoding -declaration is given in the source file; see section :ref:`encodings`. - -.. index:: triple-quoted string, Unicode Consortium, raw string + "spam" + 'eggs' + +The quote used to start the literal also terminates it, so a string literal +can only contain the other quote (except with escape sequences, see below). +For example: + +.. code-block:: python + + 'Say "Hello", please.' + "Don't do that!" + +Except for this limitation, the choice of quote character (``'`` or ``"``) +does not affect how the literal is parsed. + +Inside a string literal, the backslash (``\``) character introduces an +:dfn:`escape sequence`, which has special meaning depending on the character +after the backslash. +For example, ``\"`` denotes the double quote character, and does *not* end +the string: + +.. code-block:: python + + >>> print("Say \"Hello\" to everyone!") + Say "Hello" to everyone! + +See :ref:`escape sequences ` below for a full list of such +sequences, and more details. + + +.. index:: triple-quoted string single: """; string literal single: '''; string literal -In plain English: Both types of literals can be enclosed in matching single quotes -(``'``) or double quotes (``"``). They can also be enclosed in matching groups -of three single or double quotes (these are generally referred to as -*triple-quoted strings*). The backslash (``\``) character is used to give special -meaning to otherwise ordinary characters like ``n``, which means 'newline' when -escaped (``\n``). It can also be used to escape characters that otherwise have a -special meaning, such as newline, backslash itself, or the quote character. -See :ref:`escape sequences ` below for examples. +Triple-quoted strings +--------------------- -.. index:: - single: b'; bytes literal - single: b"; bytes literal +Strings can also be enclosed in matching groups of three single or double +quotes. +These are generally referred to as :dfn:`triple-quoted strings`:: + + """This is a triple-quoted string.""" + +In triple-quoted literals, unescaped quotes are allowed (and are +retained), except that three unescaped quotes in a row terminate the literal, +if they are of the same kind (``'`` or ``"``) used at the start:: + + """This string has "quotes" inside.""" + +Unescaped newlines are also allowed and retained:: + + '''This triple-quoted string + continues on the next line.''' -Bytes literals are always prefixed with ``'b'`` or ``'B'``; they produce an -instance of the :class:`bytes` type instead of the :class:`str` type. They -may only contain ASCII characters; bytes with a numeric value of 128 or greater -must be expressed with escapes. .. index:: - single: r'; raw string literal - single: r"; raw string literal + single: u'; string literal + single: u"; string literal -Both string and bytes literals may optionally be prefixed with a letter ``'r'`` -or ``'R'``; such constructs are called :dfn:`raw string literals` -and :dfn:`raw bytes literals` respectively and treat backslashes as -literal characters. As a result, in raw string literals, ``'\U'`` and ``'\u'`` -escapes are not treated specially. +String prefixes +--------------- + +String literals can have an optional :dfn:`prefix` that influences how the +content of the literal is parsed, for example: + +.. code-block:: python + + b"data" + f'{result=}' + +The allowed prefixes are: + +* ``b``: :ref:`Bytes literal ` +* ``r``: :ref:`Raw string ` +* ``f``: :ref:`Formatted string literal ` ("f-string") +* ``t``: :ref:`Template string literal ` ("t-string") +* ``u``: No effect (allowed for backwards compatibility) + +See the linked sections for details on each type. + +Prefixes are case-insensitive (for example, ``B`` works the same as ``b``). +The ``r`` prefix can be combined with ``f``, ``t`` or ``b``, so ``fr``, +``rf``, ``tr``, ``rt``, ``br`` and ``rb`` are also valid prefixes. .. versionadded:: 3.3 The ``'rb'`` prefix of raw bytes literals has been added as a synonym @@ -557,18 +609,35 @@ escapes are not treated specially. to simplify the maintenance of dual Python 2.x and 3.x codebases. See :pep:`414` for more information. -.. index:: - single: f'; formatted string literal - single: f"; formatted string literal -A string literal with ``'f'`` or ``'F'`` in its prefix is a -:dfn:`formatted string literal`; see :ref:`f-strings`. The ``'f'`` may be -combined with ``'r'``, but not with ``'b'`` or ``'u'``, therefore raw -formatted strings are possible, but formatted bytes literals are not. +Formal grammar +-------------- + +String literals, except :ref:`"F-strings" ` and +:ref:`"T-strings" `, are described by the +following lexical definitions. -In triple-quoted literals, unescaped newlines and quotes are allowed (and are -retained), except that three unescaped quotes in a row terminate the literal. (A -"quote" is the character used to open the literal, i.e. either ``'`` or ``"``.) +These definitions use :ref:`negative lookaheads ` (``!``) +to indicate that an ending quote ends the literal. + +.. grammar-snippet:: + :group: python-grammar + + STRING: [`stringprefix`] (`stringcontent`) + stringprefix: <("r" | "u" | "b" | "br" | "rb"), case-insensitive> + stringcontent: + | "'" ( !"'" `stringitem`)* "'" + | '"' ( !'"' `stringitem`)* '"' + | "'''" ( !"'''" `longstringitem`)* "'''" + | '"""' ( !'"""' `longstringitem`)* '"""' + stringitem: `stringchar` | `stringescapeseq` + stringchar: + longstringitem: `stringitem` | newline + stringescapeseq: "\" + +Note that as in all lexical definitions, whitespace is significant. +In particular, the prefix (if any) must be immediately followed by the starting +quote. .. index:: physical line, escape sequence, Standard C, C single: \ (backslash); escape sequence @@ -587,63 +656,57 @@ retained), except that three unescaped quotes in a row terminate the literal. ( .. _escape-sequences: - Escape sequences -^^^^^^^^^^^^^^^^ +---------------- Unless an ``'r'`` or ``'R'`` prefix is present, escape sequences in string and bytes literals are interpreted according to rules similar to those used by Standard C. The recognized escape sequences are: -+-------------------------+---------------------------------+-------+ -| Escape Sequence | Meaning | Notes | -+=========================+=================================+=======+ -| ``\``\ | Backslash and newline ignored | \(1) | -+-------------------------+---------------------------------+-------+ -| ``\\`` | Backslash (``\``) | | -+-------------------------+---------------------------------+-------+ -| ``\'`` | Single quote (``'``) | | -+-------------------------+---------------------------------+-------+ -| ``\"`` | Double quote (``"``) | | -+-------------------------+---------------------------------+-------+ -| ``\a`` | ASCII Bell (BEL) | | -+-------------------------+---------------------------------+-------+ -| ``\b`` | ASCII Backspace (BS) | | -+-------------------------+---------------------------------+-------+ -| ``\f`` | ASCII Formfeed (FF) | | -+-------------------------+---------------------------------+-------+ -| ``\n`` | ASCII Linefeed (LF) | | -+-------------------------+---------------------------------+-------+ -| ``\r`` | ASCII Carriage Return (CR) | | -+-------------------------+---------------------------------+-------+ -| ``\t`` | ASCII Horizontal Tab (TAB) | | -+-------------------------+---------------------------------+-------+ -| ``\v`` | ASCII Vertical Tab (VT) | | -+-------------------------+---------------------------------+-------+ -| :samp:`\\\\{ooo}` | Character with octal value | (2,4) | -| | *ooo* | | -+-------------------------+---------------------------------+-------+ -| :samp:`\\x{hh}` | Character with hex value *hh* | (3,4) | -+-------------------------+---------------------------------+-------+ - -Escape sequences only recognized in string literals are: - -+-------------------------+---------------------------------+-------+ -| Escape Sequence | Meaning | Notes | -+=========================+=================================+=======+ -| :samp:`\\N\\{{name}\\}` | Character named *name* in the | \(5) | -| | Unicode database | | -+-------------------------+---------------------------------+-------+ -| :samp:`\\u{xxxx}` | Character with 16-bit hex value | \(6) | -| | *xxxx* | | -+-------------------------+---------------------------------+-------+ -| :samp:`\\U{xxxxxxxx}` | Character with 32-bit hex value | \(7) | -| | *xxxxxxxx* | | -+-------------------------+---------------------------------+-------+ - -Notes: - -(1) +.. list-table:: + :widths: auto + :header-rows: 1 + + * * Escape Sequence + * Meaning + * * ``\``\ + * :ref:`string-escape-ignore` + * * ``\\`` + * :ref:`Backslash ` + * * ``\'`` + * :ref:`Single quote ` + * * ``\"`` + * :ref:`Double quote ` + * * ``\a`` + * ASCII Bell (BEL) + * * ``\b`` + * ASCII Backspace (BS) + * * ``\f`` + * ASCII Formfeed (FF) + * * ``\n`` + * ASCII Linefeed (LF) + * * ``\r`` + * ASCII Carriage Return (CR) + * * ``\t`` + * ASCII Horizontal Tab (TAB) + * * ``\v`` + * ASCII Vertical Tab (VT) + * * :samp:`\\\\{ooo}` + * :ref:`string-escape-oct` + * * :samp:`\\x{hh}` + * :ref:`string-escape-hex` + * * :samp:`\\N\\{{name}\\}` + * :ref:`string-escape-named` + * * :samp:`\\u{xxxx}` + * :ref:`Hexadecimal Unicode character ` + * * :samp:`\\U{xxxxxxxx}` + * :ref:`Hexadecimal Unicode character ` + +.. _string-escape-ignore: + +Ignored end of line +^^^^^^^^^^^^^^^^^^^ + A backslash can be added at the end of a line to ignore the newline:: >>> 'This string will not include \ @@ -653,54 +716,177 @@ Notes: The same result can be achieved using :ref:`triple-quoted strings `, or parentheses and :ref:`string literal concatenation `. +.. _string-escape-escaped-char: + +Escaped characters +^^^^^^^^^^^^^^^^^^ + + To include a backslash in a non-:ref:`raw ` Python string + literal, it must be doubled. The ``\\`` escape sequence denotes a single + backslash character:: + + >>> print('C:\\Program Files') + C:\Program Files + + Similarly, the ``\'`` and ``\"`` sequences denote the single and double + quote character, respectively:: + + >>> print('\' and \"') + ' and " + +.. _string-escape-oct: + +Octal character +^^^^^^^^^^^^^^^ + + The sequence :samp:`\\\\{ooo}` denotes a *character* with the octal (base 8) + value *ooo*:: + + >>> '\120' + 'P' + + Up to three octal digits (0 through 7) are accepted. -(2) - As in Standard C, up to three octal digits are accepted. + In a bytes literal, *character* means a *byte* with the given value. + In a string literal, it means a Unicode character with the given value. .. versionchanged:: 3.11 - Octal escapes with value larger than ``0o377`` produce a + Octal escapes with value larger than ``0o377`` (255) produce a :exc:`DeprecationWarning`. .. versionchanged:: 3.12 - Octal escapes with value larger than ``0o377`` produce a - :exc:`SyntaxWarning`. In a future Python version they will be eventually - a :exc:`SyntaxError`. + Octal escapes with value larger than ``0o377`` (255) produce a + :exc:`SyntaxWarning`. + In a future Python version they will raise a :exc:`SyntaxError`. -(3) - Unlike in Standard C, exactly two hex digits are required. +.. _string-escape-hex: -(4) - In a bytes literal, hexadecimal and octal escapes denote the byte with the - given value. In a string literal, these escapes denote a Unicode character - with the given value. +Hexadecimal character +^^^^^^^^^^^^^^^^^^^^^ -(5) - .. versionchanged:: 3.3 - Support for name aliases [#]_ has been added. + The sequence :samp:`\\x{hh}` denotes a *character* with the hex (base 16) + value *hh*:: -(6) - Exactly four hex digits are required. + >>> '\x50' + 'P' -(7) - Any Unicode character can be encoded this way. Exactly eight hex digits - are required. + Unlike in Standard C, exactly two hex digits are required. + + In a bytes literal, *character* means a *byte* with the given value. + In a string literal, it means a Unicode character with the given value. + +.. _string-escape-named: + +Named Unicode character +^^^^^^^^^^^^^^^^^^^^^^^ + + The sequence :samp:`\\N\\{{name}\\}` denotes a Unicode character + with the given *name*:: + + >>> '\N{LATIN CAPITAL LETTER P}' + 'P' + >>> '\N{SNAKE}' + '🐍' + + This sequence cannot appear in :ref:`bytes literals `. + + .. versionchanged:: 3.3 + Support for `name aliases `__ + has been added. + +.. _string-escape-long-hex: + +Hexadecimal Unicode characters +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + + These sequences :samp:`\\u{xxxx}` and :samp:`\\U{xxxxxxxx}` denote the + Unicode character with the given hex (base 16) value. + Exactly four digits are required for ``\u``; exactly eight digits are + required for ``\U``. + The latter can encode any Unicode character. + + .. code-block:: python + + >>> '\u1234' + 'ሴ' + >>> '\U0001f40d' + '🐍' + + These sequences cannot appear in :ref:`bytes literals `. .. index:: unrecognized escape sequence -Unlike Standard C, all unrecognized escape sequences are left in the string -unchanged, i.e., *the backslash is left in the result*. (This behavior is -useful when debugging: if an escape sequence is mistyped, the resulting output -is more easily recognized as broken.) It is also important to note that the -escape sequences only recognized in string literals fall into the category of -unrecognized escapes for bytes literals. +Unrecognized escape sequences +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Unlike in Standard C, all unrecognized escape sequences are left in the string +unchanged, that is, *the backslash is left in the result*:: + + >>> print('\q') + \q + >>> list('\q') + ['\\', 'q'] + +Note that for bytes literals, the escape sequences only recognized in string +literals (``\N...``, ``\u...``, ``\U...``) fall into the category of +unrecognized escapes. .. versionchanged:: 3.6 Unrecognized escape sequences produce a :exc:`DeprecationWarning`. .. versionchanged:: 3.12 - Unrecognized escape sequences produce a :exc:`SyntaxWarning`. In a future - Python version they will be eventually a :exc:`SyntaxError`. + Unrecognized escape sequences produce a :exc:`SyntaxWarning`. + In a future Python version they will raise a :exc:`SyntaxError`. + + +.. index:: + single: b'; bytes literal + single: b"; bytes literal + + +.. _bytes-literal: + +Bytes literals +-------------- + +:dfn:`Bytes literals` are always prefixed with ``'b'`` or ``'B'``; they produce an +instance of the :class:`bytes` type instead of the :class:`str` type. +They may only contain ASCII characters; bytes with a numeric value of 128 +or greater must be expressed with escape sequences (typically +:ref:`string-escape-hex` or :ref:`string-escape-oct`): + +.. code-block:: python + + >>> b'\x89PNG\r\n\x1a\n' + b'\x89PNG\r\n\x1a\n' + >>> list(b'\x89PNG\r\n\x1a\n') + [137, 80, 78, 71, 13, 10, 26, 10] + +Similarly, a zero byte must be expressed using an escape sequence (typically +``\0`` or ``\x00``). + + +.. index:: + single: r'; raw string literal + single: r"; raw string literal + +.. _raw-strings: + +Raw string literals +------------------- + +Both string and bytes literals may optionally be prefixed with a letter ``'r'`` +or ``'R'``; such constructs are called :dfn:`raw string literals` +and :dfn:`raw bytes literals` respectively and treat backslashes as +literal characters. +As a result, in raw string literals, :ref:`escape sequences ` +are not treated specially: + +.. code-block:: python + + >>> r'\d{4}-\d{2}-\d{2}' + '\\d{4}-\\d{2}-\\d{2}' Even in a raw literal, quotes can be escaped with a backslash, but the backslash remains in the result; for example, ``r"\""`` is a valid string @@ -712,29 +898,6 @@ that a single backslash followed by a newline is interpreted as those two characters as part of the literal, *not* as a line continuation. -.. _string-concatenation: - -String literal concatenation ----------------------------- - -Multiple adjacent string or bytes literals (delimited by whitespace), possibly -using different quoting conventions, are allowed, and their meaning is the same -as their concatenation. Thus, ``"hello" 'world'`` is equivalent to -``"helloworld"``. This feature can be used to reduce the number of backslashes -needed, to split long strings conveniently across long lines, or even to add -comments to parts of strings, for example:: - - re.compile("[A-Za-z_]" # letter or underscore - "[A-Za-z0-9_]*" # letter, digit or underscore - ) - -Note that this feature is defined at the syntactical level, but implemented at -compile time. The '+' operator must be used to concatenate string expressions -at run time. Also note that literal concatenation can use different quoting -styles for each component (even mixing raw strings and triple quoted strings), -and formatted string literals may be concatenated with plain string literals. - - .. index:: single: formatted string literal single: interpolated string literal @@ -742,6 +905,8 @@ and formatted string literals may be concatenated with plain string literals. single: string; interpolated literal single: f-string single: fstring + single: f'; formatted string literal + single: f"; formatted string literal single: {} (curly brackets); in formatted string literal single: ! (exclamation); in formatted string literal single: : (colon); in formatted string literal @@ -912,11 +1077,24 @@ include expressions. See also :pep:`498` for the proposal that added formatted string literals, and :meth:`str.format`, which uses a related format string mechanism. +.. _t-strings: +.. _template-string-literals: + +t-strings +--------- + +A :dfn:`template string literal` or :dfn:`t-string` is a string literal that +is prefixed with ``'t'`` or ``'T'``. +These strings have internal structure similar to :ref:`f-strings`, +but are evaluated as Template objects instead of strings. + +.. versionadded:: 3.14 + .. _numbers: Numeric literals ----------------- +================ .. index:: number, numeric literal, integer literal floating-point literal, hexadecimal literal @@ -949,7 +1127,7 @@ actually an expression composed of the unary operator '``-``' and the literal .. _integers: Integer literals -^^^^^^^^^^^^^^^^ +---------------- Integer literals denote whole numbers. For example:: @@ -1022,7 +1200,7 @@ Formally, integer literals are described by the following lexical definitions: .. _floating: Floating-point literals -^^^^^^^^^^^^^^^^^^^^^^^ +----------------------- Floating-point (float) literals, such as ``3.14`` or ``1.5``, denote :ref:`approximations of real numbers `. @@ -1084,7 +1262,7 @@ lexical definitions: .. _imaginary: Imaginary literals -^^^^^^^^^^^^^^^^^^ +------------------ Python has :ref:`complex number ` objects, but no complex literals. @@ -1172,14 +1350,26 @@ The following tokens serve as delimiters in the grammar: ( ) [ ] { } , : ! . ; @ = + +The period can also occur in floating-point and imaginary literals. + +.. _lexical-ellipsis: + +A sequence of three periods has a special meaning as an +:py:data:`Ellipsis` literal: + +.. code-block:: none + + ... + +The following *augmented assignment operators* serve +lexically as delimiters, but also perform an operation: + +.. code-block:: none + -> += -= *= /= //= %= @= &= |= ^= >>= <<= **= -The period can also occur in floating-point and imaginary literals. A sequence -of three periods has a special meaning as an ellipsis literal. The second half -of the list, the augmented assignment operators, serve lexically as delimiters, -but also perform an operation. - The following printing ASCII characters have special meaning as part of other tokens or are otherwise significant to the lexical analyzer: @@ -1194,7 +1384,3 @@ occurrence outside string literals and comments is an unconditional error: $ ? ` - -.. rubric:: Footnotes - -.. [#] https://www.unicode.org/Public/16.0.0/ucd/NameAliases.txt