@@ -270,10 +270,10 @@ def _mk_bitmap(bits):
270270# set is constructed. Then, this bitmap is sliced into chunks of 256
271271# characters, duplicate chunks are eliminated, and each chunk is
272272# given a number. In the compiled expression, the charset is
273- # represented by a 16 -bit word sequence, consisting of one word for
274- # the number of different chunks, a sequence of 256 bytes (128 words)
273+ # represented by a 32 -bit word sequence, consisting of one word for
274+ # the number of different chunks, a sequence of 256 bytes (64 words)
275275# of chunk numbers indexed by their original chunk position, and a
276- # sequence of chunks (16 words each).
276+ # sequence of 256-bit chunks (8 words each).
277277
278278# Compression is normally good: in a typical charset, large ranges of
279279# Unicode will be either completely excluded (e.g. if only cyrillic
@@ -286,9 +286,9 @@ def _mk_bitmap(bits):
286286# less significant byte is a bit index in the chunk (just like the
287287# CHARSET matching).
288288
289- # In UCS-4 mode, the BIGCHARSET opcode still supports only subsets
289+ # The BIGCHARSET opcode still supports only subsets
290290# of the basic multilingual plane; an efficient representation
291- # for all of UTF-16 has not yet been developed. This means,
291+ # for all of Unicode has not yet been developed. This means,
292292# in particular, that negated charsets cannot be represented as
293293# bigcharsets.
294294
0 commit comments