Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 1985f7b

Browse files
Issue #19405: Fixed outdated comments in the _sre module.
2 parents b9dcfea + efa5a39 commit 1985f7b

2 files changed

Lines changed: 6 additions & 7 deletions

File tree

Lib/sre_compile.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -270,10 +270,10 @@ def _mk_bitmap(bits):
270270
# set is constructed. Then, this bitmap is sliced into chunks of 256
271271
# characters, duplicate chunks are eliminated, and each chunk is
272272
# given a number. In the compiled expression, the charset is
273-
# represented by a 16-bit word sequence, consisting of one word for
274-
# the number of different chunks, a sequence of 256 bytes (128 words)
273+
# represented by a 32-bit word sequence, consisting of one word for
274+
# the number of different chunks, a sequence of 256 bytes (64 words)
275275
# of chunk numbers indexed by their original chunk position, and a
276-
# sequence of chunks (16 words each).
276+
# sequence of 256-bit chunks (8 words each).
277277

278278
# Compression is normally good: in a typical charset, large ranges of
279279
# Unicode will be either completely excluded (e.g. if only cyrillic
@@ -286,9 +286,9 @@ def _mk_bitmap(bits):
286286
# less significant byte is a bit index in the chunk (just like the
287287
# CHARSET matching).
288288

289-
# In UCS-4 mode, the BIGCHARSET opcode still supports only subsets
289+
# The BIGCHARSET opcode still supports only subsets
290290
# of the basic multilingual plane; an efficient representation
291-
# for all of UTF-16 has not yet been developed. This means,
291+
# for all of Unicode has not yet been developed. This means,
292292
# in particular, that negated charsets cannot be represented as
293293
# bigcharsets.
294294

Modules/_sre.c

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1348,8 +1348,7 @@ _compile(PyObject* self_, PyObject* args)
13481348
\_________\_____/ /
13491349
\____________/
13501350
1351-
It also helps that SRE_CODE is always an unsigned type, either 2 bytes or 4
1352-
bytes wide (the latter if Python is compiled for "wide" unicode support).
1351+
It also helps that SRE_CODE is always an unsigned type.
13531352
*/
13541353

13551354
/* Defining this one enables tracing of the validator */

0 commit comments

Comments
 (0)