Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit f98c3c5

Browse files
redshiftzerocsabella
authored andcommitted
docs 36789: resolve incorrect note regarding UTF-8 (GH-13111)
1 parent af8646c commit f98c3c5

1 file changed

Lines changed: 10 additions & 5 deletions

File tree

Doc/howto/unicode.rst

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -135,17 +135,22 @@ used than UTF-8.) UTF-8 uses the following rules:
135135
UTF-8 has several convenient properties:
136136

137137
1. It can handle any Unicode code point.
138-
2. A Unicode string is turned into a sequence of bytes containing no embedded zero
139-
bytes. This avoids byte-ordering issues, and means UTF-8 strings can be
140-
processed by C functions such as ``strcpy()`` and sent through protocols that
141-
can't handle zero bytes.
138+
2. A Unicode string is turned into a sequence of bytes that contains embedded
139+
zero bytes only where they represent the null character (U+0000). This means
140+
that UTF-8 strings can be processed by C functions such as ``strcpy()`` and sent
141+
through protocols that can't handle zero bytes for anything other than
142+
end-of-string markers.
142143
3. A string of ASCII text is also valid UTF-8 text.
143144
4. UTF-8 is fairly compact; the majority of commonly used characters can be
144145
represented with one or two bytes.
145146
5. If bytes are corrupted or lost, it's possible to determine the start of the
146147
next UTF-8-encoded code point and resynchronize. It's also unlikely that
147148
random 8-bit data will look like valid UTF-8.
148-
149+
6. UTF-8 is a byte oriented encoding. The encoding specifies that each
150+
character is represented by a specific sequence of one or more bytes. This
151+
avoids the byte-ordering issues that can occur with integer and word oriented
152+
encodings, like UTF-16 and UTF-32, where the sequence of bytes varies depending
153+
on the hardware on which the string was encoded.
149154

150155

151156
References

0 commit comments

Comments
 (0)