Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit efa5a39

Browse files
Issue #19405: Fixed outdated comments in the _sre module.
1 parent 246eb11 commit efa5a39

2 files changed

Lines changed: 6 additions & 7 deletions

File tree

Lib/sre_compile.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -276,10 +276,10 @@ def _mk_bitmap(bits):
276276
# set is constructed. Then, this bitmap is sliced into chunks of 256
277277
# characters, duplicate chunks are eliminated, and each chunk is
278278
# given a number. In the compiled expression, the charset is
279-
# represented by a 16-bit word sequence, consisting of one word for
280-
# the number of different chunks, a sequence of 256 bytes (128 words)
279+
# represented by a 32-bit word sequence, consisting of one word for
280+
# the number of different chunks, a sequence of 256 bytes (64 words)
281281
# of chunk numbers indexed by their original chunk position, and a
282-
# sequence of chunks (16 words each).
282+
# sequence of 256-bit chunks (8 words each).
283283

284284
# Compression is normally good: in a typical charset, large ranges of
285285
# Unicode will be either completely excluded (e.g. if only cyrillic
@@ -292,9 +292,9 @@ def _mk_bitmap(bits):
292292
# less significant byte is a bit index in the chunk (just like the
293293
# CHARSET matching).
294294

295-
# In UCS-4 mode, the BIGCHARSET opcode still supports only subsets
295+
# The BIGCHARSET opcode still supports only subsets
296296
# of the basic multilingual plane; an efficient representation
297-
# for all of UTF-16 has not yet been developed. This means,
297+
# for all of Unicode has not yet been developed. This means,
298298
# in particular, that negated charsets cannot be represented as
299299
# bigcharsets.
300300

Modules/_sre.c

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2749,8 +2749,7 @@ _compile(PyObject* self_, PyObject* args)
27492749
\_________\_____/ /
27502750
\____________/
27512751
2752-
It also helps that SRE_CODE is always an unsigned type, either 2 bytes or 4
2753-
bytes wide (the latter if Python is compiled for "wide" unicode support).
2752+
It also helps that SRE_CODE is always an unsigned type.
27542753
*/
27552754

27562755
/* Defining this one enables tracing of the validator */

0 commit comments

Comments
 (0)