Thanks to visit codestin.com
Credit goes to github.com

Skip to content

gh-66428 Stop including all bidirectional "B" characters in line breakers #132369

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 0 additions & 6 deletions Doc/library/stdtypes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2323,12 +2323,6 @@ expression support in the :mod:`re` module).
+-----------------------+-----------------------------+
| ``\f`` or ``\x0c`` | Form Feed |
+-----------------------+-----------------------------+
| ``\x1c`` | File Separator |
+-----------------------+-----------------------------+
| ``\x1d`` | Group Separator |
+-----------------------+-----------------------------+
| ``\x1e`` | Record Separator |
+-----------------------+-----------------------------+
| ``\x85`` | Next Line (C1 Control Code) |
+-----------------------+-----------------------------+
| ``\u2028`` | Line Separator |
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
Remove Unicode characters that have the bidirectional B property but are not
mandatory line breakers (U+001C, U+001D and U+001E) from the list of
line-breaking characters. ``str.splitlines()`` will not break on these
characters any more.
8 changes: 2 additions & 6 deletions Objects/unicodetype_db.h

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

5 changes: 2 additions & 3 deletions Tools/unicode/makeunicodedata.py
Original file line number Diff line number Diff line change
Expand Up @@ -437,7 +437,7 @@ def makeunicodetype(unicode, trace):
flags |= ALPHA_MASK
if "Lowercase" in properties:
flags |= LOWER_MASK
if 'Line_Break' in properties or bidirectional == "B":
if 'Line_Break' in properties:
flags |= LINEBREAK_MASK
linebreaks.append(char)
if category == "Zs" or bidirectional in ("WS", "B", "S"):
Expand Down Expand Up @@ -603,8 +603,7 @@ def makeunicodetype(unicode, trace):

# Generate code for _PyUnicode_IsLinebreak()
fprint("/* Returns 1 for Unicode characters having the line break")
fprint(" * property 'BK', 'CR', 'LF' or 'NL' or having bidirectional")
fprint(" * type 'B', 0 otherwise.")
fprint(" * property 'BK', 'CR', 'LF' or 'NL', 0 otherwise.")
fprint(" */")
fprint('int _PyUnicode_IsLinebreak(const Py_UCS4 ch)')
fprint('{')
Expand Down
Loading