Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 38d03a5

Browse files
authored
Standalone emoji support (#202)
CLoses #133 and #154. This decreases general performance by ~10% for complex CJK/combining and emoji strings. - Regional Indicator pairs measured as one 2-cell flag; unpaired RI measured individually - Fitzpatrick skin tone modifiers are zero-width when following an emoji base, Narrow when "standalone"
1 parent 8c8eac8 commit 38d03a5

10 files changed

Lines changed: 217 additions & 30 deletions

File tree

.github/workflows/ci.yml

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -108,14 +108,18 @@ jobs:
108108
109109
python -Im pip install tox
110110
111-
- name: Build wheel
111+
- name: Prepare sdist and source-dir
112112
shell: bash
113113
run: |
114114
python -Im pip install build
115-
python -Im build --wheel
115+
python -Im build
116+
117+
mkdir source-dir
118+
tar -xzvf dist/wcwidth-*.tar.gz -C source-dir --strip-components=1
116119
117120
- name: Fetch test data files
118121
if: matrix.python-version == '3.14' && matrix.os == 'ubuntu-latest'
122+
continue-on-error: true
119123
shell: bash
120124
run: |
121125
python -Im tox -e fetch

docs/intro.rst

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -455,10 +455,10 @@ languages.
455455
History
456456
=======
457457

458-
0.5.2 *unreleased*
459-
* **Bugfix** Specification and result of category ``Mc`` (`Spacing Combining Mark`_), approx. 443
460-
codepoints, has a more nuanced specification_, and may be categorized as both zero or wide.
461-
`PR #200`.
458+
0.5.2 *2026-01-29*
459+
* **Bugfix** Measurement of category ``Mc`` (`Spacing Combining Mark`_), approx. 443, has a more
460+
nuanced specification_, and may be categorized as either zero or wide. `PR #200`_.
461+
* **Bugfix** Measurement of "standalone" modifiers and regional indicators, `PR #202`_.
462462
* **Updated** Data files used in some automatic tests are no longer distributed. `PR #199`_
463463

464464
0.5.1 *2026-01-27*
@@ -666,6 +666,7 @@ https://www.cl.cam.ac.uk/~mgk25/ucs/wcwidth.c::
666666
.. _`PR #196`: https://github.com/jquast/wcwidth/pull/196
667667
.. _`PR #199`: https://github.com/jquast/wcwidth/pull/199
668668
.. _`PR #200`: https://github.com/jquast/wcwidth/pull/200
669+
.. _`PR #202`: https://github.com/jquast/wcwidth/pull/202
669670
.. _`Issue #101`: https://github.com/jquast/wcwidth/issues/101
670671
.. _`Issue #190`: https://github.com/jquast/wcwidth/issues/190
671672
.. _`jquast/blessed`: https://github.com/jquast/blessed

docs/specs.rst

Lines changed: 29 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -33,13 +33,22 @@ Any characters defined by `General Category`_ codes in `DerivedGeneralCategory.t
3333
`Prepended_Concatenation_Mark`_ characters, aprox. 147 characters.
3434
- 'Zl': `U+2028`_ LINE SEPARATOR only
3535
- 'Zp': `U+2029`_ PARAGRAPH SEPARATOR only
36-
- 'Sk': `Modifier Symbol`_, aprox. 4 characters of only those where phrase
37-
``'EMOJI MODIFIER'`` is present in comment of `UnicodeData.txt`_.
36+
- 'Sk': `Modifier Symbol`_, aprox. 1 character with ``'FULLWIDTH'`` in comment
37+
of `UnicodeData.txt`_ (see `Width of 2`_). `Emoji Modifier`_ Fitzpatrick
38+
symbols (`U+1F3FB`_ through `U+1F3FF`_) are zero-width only when following
39+
an emoji base character in sequence; see `Width of 2`_ for standalone.
3840

3941
The NULL character (`U+0000`_).
4042

41-
Any character following ZWJ (`U+200D`_) when in sequence by
42-
function :func:`wcwidth.wcswidth`.
43+
Any character following ZWJ (`U+200D`_) when preceded by an emoji
44+
(`Extended_Pictographic`_ property) or `Regional Indicator`_ in sequence by
45+
function :func:`wcwidth.wcswidth`. When ZWJ follows a non-emoji character
46+
(including CJK), only the ZWJ itself is zero-width; the following character
47+
is measured normally.
48+
49+
The second `Regional Indicator`_ symbol (`U+1F1E6`_ through `U+1F1FF`_) in a
50+
consecutive pair, when measured in sequence by :func:`wcwidth.wcswidth` or
51+
:func:`wcwidth.width`. The first indicator of the pair is `Width of 2`_.
4352

4453
`Hangul Jamo`_ Jungseong and "Extended-B" code blocks, `U+1160`_ through
4554
`U+11FF`_ and `U+D7B0`_ through `U+D7FF`_.
@@ -62,6 +71,15 @@ Any character defined by `East Asian`_ Fullwidth (``F``) or Wide (``W``)
6271
properties in `EastAsianWidth.txt`_ files, except those that are defined by the
6372
Category code of `Nonspacing Mark`_ (``Mn``).
6473

74+
`Regional Indicator`_ symbols (`U+1F1E6`_ through `U+1F1FF`_). Though
75+
classified as Neutral in `EastAsianWidth.txt`_, terminals universally render
76+
these as double-width. A consecutive pair of Regional Indicators forms a flag
77+
emoji and is measured as width 2 total (first indicator is 2, second is 0).
78+
79+
`Emoji Modifier`_ Fitzpatrick symbols (`U+1F3FB`_ through `U+1F3FF`_) when
80+
measured standalone (not following an emoji base character). When following
81+
an emoji base, they combine with the base and add 0 to total width.
82+
6583
Any characters of `Modifier Symbol`_ category, ``'Sk'`` where ``'FULLWIDTH'`` is
6684
present in comment of `UnicodeData.txt`_, aprox. 3 characters.
6785

@@ -105,4 +123,11 @@ by a Nukta (``Mn``) and then a vowel sign (``Mc``) is measured as base + 1.
105123
.. _`U+D7FF`: https://codepoints.net/U+D7FF
106124
.. _`UnicodeData.txt`: https://www.unicode.org/Public/UCD/latest/ucd/UnicodeData.txt
107125
.. _`East Asian`: https://www.unicode.org/reports/tr11/
126+
.. _`U+1F1E6`: https://codepoints.net/U+1F1E6
127+
.. _`U+1F1FF`: https://codepoints.net/U+1F1FF
128+
.. _`U+1F3FB`: https://codepoints.net/U+1F3FB
129+
.. _`U+1F3FF`: https://codepoints.net/U+1F3FF
130+
.. _`Regional Indicator`: https://www.unicode.org/charts/PDF/U1F100.pdf
131+
.. _`Emoji Modifier`: https://unicode.org/reports/tr51/#Emoji_Modifiers
132+
.. _`Extended_Pictographic`: https://www.unicode.org/reports/tr51/#def_extended_pictographic
108133
.. _`Nonspacing Mark`: https://www.unicode.org/versions/latest/core-spec/chapter-4/#G134153

tests/test_benchmarks.py

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,31 @@ def test_wcswidth_emoji_sequence(benchmark):
6161
benchmark(wcwidth.wcswidth, text)
6262

6363

64+
# Regional Indicator benchmarks - paired flags and unpaired RI
65+
RI_FLAGS_PAIRED = '🇺🇸🇬🇧🇫🇷🇩🇪🇯🇵' * 100
66+
RI_FLAGS_UNPAIRED = '🇺🇸🇬🇧🇫' * 100
67+
68+
69+
def test_wcswidth_ri_flags_paired(benchmark):
70+
"""Benchmark wcswidth() with paired regional indicator flags."""
71+
benchmark(wcwidth.wcswidth, RI_FLAGS_PAIRED)
72+
73+
74+
def test_wcswidth_ri_flags_unpaired(benchmark):
75+
"""Benchmark wcswidth() with mixed paired and unpaired regional indicators."""
76+
benchmark(wcwidth.wcswidth, RI_FLAGS_UNPAIRED)
77+
78+
79+
def test_width_ri_flags_paired(benchmark):
80+
"""Benchmark width() with paired regional indicator flags."""
81+
benchmark(wcwidth.width, RI_FLAGS_PAIRED)
82+
83+
84+
def test_width_ri_flags_unpaired(benchmark):
85+
"""Benchmark width() with mixed paired and unpaired regional indicators."""
86+
benchmark(wcwidth.width, RI_FLAGS_UNPAIRED)
87+
88+
6489
# NFC vs NFD comparison - text with combining marks
6590
DIACRITICS_COMPOSED = 'café résumé naïve ' * 100
6691
DIACRITICS_DECOMPOSED = unicodedata.normalize('NFD', DIACRITICS_COMPOSED)

tests/test_emojis.py

Lines changed: 55 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ def emoji_zwj_sequence():
3131
"\u200d" # Joiner, Category Cf, East Asian Width property 'N' -- ZERO WIDTH JOINER
3232
"\U0001f4bb") # Fused, Category So, East Asian Width property 'W' -- PERSONAL COMPUTER
3333
# This test adapted from https://www.unicode.org/L2/L2023/23107-terminal-suppt.pdf
34-
expect_length_each = (2, 0, 0, 2)
34+
expect_length_each = (2, 2, 0, 2)
3535
expect_length_phrase = 2
3636

3737
# exercise,
@@ -49,7 +49,7 @@ def test_unfinished_zwj_sequence():
4949
phrase = ("\U0001f469" # Base, Category So, East Asian Width property 'W' -- WOMAN
5050
"\U0001f3fb" # Modifier, Category Sk, East Asian Width property 'W' -- EMOJI MODIFIER FITZPATRICK TYPE-1-2
5151
"\u200d") # Joiner, Category Cf, East Asian Width property 'N' -- ZERO WIDTH JOINER
52-
expect_length_each = (2, 0, 0)
52+
expect_length_each = (2, 2, 0)
5353
expect_length_phrase = 2
5454

5555
# exercise,
@@ -67,7 +67,7 @@ def test_non_recommended_zwj_sequence():
6767
phrase = ("\U0001f469" # Base, Category So, East Asian Width property 'W' -- WOMAN
6868
"\U0001f3fb" # Modifier, Category Sk, East Asian Width property 'W' -- EMOJI MODIFIER FITZPATRICK TYPE-1-2
6969
"\u200d") # Joiner, Category Cf, East Asian Width property 'N' -- ZERO WIDTH JOINER
70-
expect_length_each = (2, 0, 0)
70+
expect_length_each = (2, 2, 0)
7171
expect_length_phrase = 2
7272

7373
# exercise,
@@ -87,7 +87,7 @@ def test_another_emoji_zwj_sequence():
8787
"\u200D" # ZERO WIDTH JOINER
8888
"\u2640" # FEMALE SIGN
8989
"\uFE0F") # VARIATION SELECTOR-16
90-
expect_length_each = (1, 0, 0, 1, 0)
90+
expect_length_each = (1, 2, 0, 1, 0)
9191
expect_length_phrase = 2
9292

9393
# exercise,
@@ -120,7 +120,7 @@ def test_longer_emoji_zwj_sequence():
120120
"\U0001F3FD" # 'Sk', 'W' -- EMOJI MODIFIER FITZPATRICK TYPE-4
121121
) * 2
122122
# This test adapted from https://www.unicode.org/L2/L2023/23107-terminal-suppt.pdf
123-
expect_length_each = (2, 0, 0, 1, 0, 0, 2, 0, 2, 0) * 2
123+
expect_length_each = (2, 2, 0, 1, 0, 0, 2, 0, 2, 2) * 2
124124
expect_length_phrase = 4
125125

126126
# exercise,
@@ -191,6 +191,56 @@ def measure_all():
191191
assert len(sequences) >= 742
192192

193193

194+
@pytest.mark.skipif(NARROW_ONLY, reason="Test cannot verify on python 'narrow' builds")
195+
def test_regional_indicator_single():
196+
"""Single Regional Indicator symbol is width 2."""
197+
assert wcwidth.wcwidth('\U0001F1FA') == 2
198+
assert wcwidth.wcswidth('\U0001F1FA') == 2
199+
200+
201+
@pytest.mark.skipif(NARROW_ONLY, reason="Test cannot verify on python 'narrow' builds")
202+
def test_regional_indicator_pair():
203+
"""Flag pair (two Regional Indicators) is width 2, not 4."""
204+
assert wcwidth.wcswidth('\U0001F1FA\U0001F1F8') == 2
205+
206+
207+
@pytest.mark.skipif(NARROW_ONLY, reason="Test cannot verify on python 'narrow' builds")
208+
def test_regional_indicator_three():
209+
"""Three Regional Indicators: one pair (2) + one single (2) = 4."""
210+
assert wcwidth.wcswidth('\U0001F1FA\U0001F1F8\U0001F1E6') == 4
211+
212+
213+
@pytest.mark.skipif(NARROW_ONLY, reason="Test cannot verify on python 'narrow' builds")
214+
def test_regional_indicator_four():
215+
"""Four Regional Indicators: two pairs = 2 + 2 = 4."""
216+
assert wcwidth.wcswidth(
217+
'\U0001F1FA\U0001F1F8\U0001F1E6\U0001F1FA') == 4
218+
219+
220+
@pytest.mark.skipif(NARROW_ONLY, reason="Test cannot verify on python 'narrow' builds")
221+
def test_zwj_after_non_emoji():
222+
"""ZWJ after non-emoji unconditionally consumes next character."""
223+
# This does *not* match most terminal behavior -- it is a negative test,
224+
# they fail because our library doesn't handle 'glitch' emoji as an
225+
# optimization. Non-emoji + ZWJ is undefined per Unicode UAX #29 GB11.
226+
assert wcwidth.wcswidth('xx\u200d\U0001F384') == 2
227+
assert wcwidth.wcswidth('a\u200d\U0001F600') == 1
228+
assert wcwidth.wcswidth('\u4e16\u200d\U0001F600') == 2
229+
230+
231+
@pytest.mark.skipif(NARROW_ONLY, reason="Test cannot verify on python 'narrow' builds")
232+
def test_fitzpatrick_standalone():
233+
"""Standalone Fitzpatrick modifier is width 2."""
234+
assert wcwidth.wcwidth('\U0001F3FB') == 2
235+
assert wcwidth.wcswidth('\U0001F3FB') == 2
236+
237+
238+
@pytest.mark.skipif(NARROW_ONLY, reason="Test cannot verify on python 'narrow' builds")
239+
def test_fitzpatrick_after_emoji():
240+
"""Fitzpatrick modifier after emoji base combines, total width 2."""
241+
assert wcwidth.wcswidth('\U0001F469\U0001F3FB') == 2
242+
243+
194244
def test_vs16_effect():
195245
"""Verify effect of VS-16 (always active with latest Unicode version)."""
196246
phrase = ("\u2640" # FEMALE SIGN

tests/test_width.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -437,3 +437,15 @@ def test_soft_hyphen_exception():
437437
"""U+00AD SOFT HYPHEN remains width 1 for ISO-8859-1 compatibility."""
438438
result = wcwidth.wcwidth('\u00AD')
439439
assert result == 1
440+
441+
442+
def test_fitzpatrick_modifier_after_emoji():
443+
"""Fitzpatrick modifier following emoji base adds zero-width in width()."""
444+
result = wcwidth.width('\U0001F469\U0001F3FB')
445+
assert result == 2
446+
447+
448+
def test_fitzpatrick_modifier_standalone_width():
449+
"""Standalone Fitzpatrick modifier, however, is wide character in width()."""
450+
result = wcwidth.width('\U0001F3FB')
451+
assert result == 2

tox.ini

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -107,7 +107,7 @@ commands = python {toxinidir}/bin/update-tables.py {posargs:--fetch-all-versions
107107
basepython = python3.14
108108
usedevelop = true
109109
deps = -r requirements-update.txt
110-
commands = python {toxinidir}/bin/update-tables.py {posargs:--only-fetch}
110+
commands = - python {toxinidir}/bin/update-tables.py {posargs:--only-fetch}
111111

112112
[testenv:autopep8]
113113
basepython = python3.14

wcwidth/table_wide.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -90,6 +90,7 @@
9090
(0x1f0cf, 0x1f0cf,), # Playing Card Black Joker
9191
(0x1f18e, 0x1f18e,), # Negative Squared Ab
9292
(0x1f191, 0x1f19a,), # Squared Cl ..Squared Vs
93+
(0x1f1e6, 0x1f1ff,), # Regional Indicator Symbo..Regional Indicator Symbo
9394
(0x1f200, 0x1f202,), # Square Hiragana Hoka ..Squared Katakana Sa
9495
(0x1f210, 0x1f23b,), # Squared Cjk Unified Ideo..Squared Cjk Unified Ideo
9596
(0x1f240, 0x1f248,), # Tortoise Shell Bracketed..Tortoise Shell Bracketed
@@ -104,6 +105,7 @@
104105
(0x1f3e0, 0x1f3f0,), # House Building ..European Castle
105106
(0x1f3f4, 0x1f3f4,), # Waving Black Flag
106107
(0x1f3f8, 0x1f3fa,), # Badminton Racquet And Sh..Amphora
108+
(0x1f3fb, 0x1f3ff,), # Emoji Modifier Fitzpatri..Emoji Modifier Fitzpatri
107109
(0x1f400, 0x1f43e,), # Rat ..Paw Prints
108110
(0x1f440, 0x1f440,), # Eyes
109111
(0x1f442, 0x1f4fc,), # Ear ..Videocassette

wcwidth/table_zero.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -345,7 +345,9 @@
345345
(0x1e6f5, 0x1e6f5,), # Tai Yo Sign Om
346346
(0x1e8d0, 0x1e8d6,), # Mende Kikakui Combining ..Mende Kikakui Combining
347347
(0x1e944, 0x1e94a,), # Adlam Alif Lengthener ..Adlam Nukta
348-
(0x1f3fb, 0x1f3ff,), # Emoji Modifier Fitzpatri..Emoji Modifier Fitzpatri
348+
# Emoji Modifier Fitzpatrick types (U+1F3FB..U+1F3FF) excluded:
349+
# standalone they display as wide (2 cells), only zero-width
350+
# when following an emoji base character in sequence.
349351
(0xe0000, 0xe0fff,), # (nil)
350352
),
351353
}

0 commit comments

Comments
 (0)