Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit bfa36f5

Browse files
committed
Marc-Andre Lemburg <[email protected]>:
Updated to version 1.5. Includes typo fixes by Andrew Kuchling and a new section on the default encoding.
1 parent 59a044b commit bfa36f5

1 file changed

Lines changed: 18 additions & 18 deletions

File tree

Misc/unicode.txt

Lines changed: 18 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -19,11 +19,11 @@ due to the many different aspects of the Unicode-Python integration.
1919

2020
The latest version of this document is always available at:
2121

22-
http://starship.skyport.net/~lemburg/unicode-proposal.txt
22+
http://starship.python.net/~lemburg/unicode-proposal.txt
2323

2424
Older versions are available as:
2525

26-
http://starship.skyport.net/~lemburg/unicode-proposal-X.X.txt
26+
http://starship.python.net/~lemburg/unicode-proposal-X.X.txt
2727

2828

2929
Conventions:
@@ -101,7 +101,7 @@ of the source file (e.g. '# source file encoding: latin-1'). If you
101101
only use 7-bit ASCII then everything is fine and no such notice is
102102
needed, but if you include Latin-1 characters not defined in ASCII, it
103103
may well be worthwhile including a hint since people in other
104-
countries will want to be able to read you source strings too.
104+
countries will want to be able to read your source strings too.
105105

106106

107107
Unicode Type Object:
@@ -169,7 +169,7 @@ during coercion of strings to Unicode should not be masked and passed
169169
through to the user.
170170

171171
In containment tests ('a' in u'abc' and u'a' in 'abc') both sides
172-
should be coerced to Unicode before applying the test. Errors occuring
172+
should be coerced to Unicode before applying the test. Errors occurring
173173
during coercion (e.g. None in u'abc') should not be masked.
174174

175175

@@ -184,7 +184,7 @@ always coerce to the more precise format, i.e. Unicode objects.
184184
s + u := unicode(s) + u
185185

186186
All string methods should delegate the call to an equivalent Unicode
187-
object method call by converting all envolved strings to Unicode and
187+
object method call by converting all involved strings to Unicode and
188188
then applying the arguments to the Unicode method of the same name,
189189
e.g.
190190

@@ -199,7 +199,7 @@ Formatting Markers.
199199
Exceptions:
200200
-----------
201201

202-
UnicodeError is defined in the exceptions module as subclass of
202+
UnicodeError is defined in the exceptions module as a subclass of
203203
ValueError. It is available at the C level via PyExc_UnicodeError.
204204
All exceptions related to Unicode encoding/decoding should be
205205
subclasses of UnicodeError.
@@ -268,7 +268,7 @@ Python should provide a few standard codecs for the most relevant
268268
encodings, e.g.
269269

270270
'utf-8': 8-bit variable length encoding
271-
'utf-16': 16-bit variable length encoding (litte/big endian)
271+
'utf-16': 16-bit variable length encoding (little/big endian)
272272
'utf-16-le': utf-16 but explicitly little endian
273273
'utf-16-be': utf-16 but explicitly big endian
274274
'ascii': 7-bit ASCII codepage
@@ -284,7 +284,7 @@ Note: 'utf-16' should be implemented by using and requiring byte order
284284
marks (BOM) for file input/output.
285285

286286
All other encodings such as the CJK ones to support Asian scripts
287-
should be implemented in seperate packages which do not get included
287+
should be implemented in separate packages which do not get included
288288
in the core Python distribution and are not a part of this proposal.
289289

290290

@@ -324,14 +324,14 @@ class Codec:
324324
"""
325325
def encode(self,input,errors='strict'):
326326

327-
""" Encodes the object intput and returns a tuple (output
327+
""" Encodes the object input and returns a tuple (output
328328
object, length consumed).
329329

330330
errors defines the error handling to apply. It defaults to
331331
'strict' handling.
332332

333333
The method may not store state in the Codec instance. Use
334-
SteamCodec for codecs which have to keep state in order to
334+
StreamCodec for codecs which have to keep state in order to
335335
make encoding/decoding efficient.
336336

337337
"""
@@ -350,7 +350,7 @@ class Codec:
350350
'strict' handling.
351351

352352
The method may not store state in the Codec instance. Use
353-
SteamCodec for codecs which have to keep state in order to
353+
StreamCodec for codecs which have to keep state in order to
354354
make encoding/decoding efficient.
355355

356356
"""
@@ -490,7 +490,7 @@ class StreamReader(Codec):
490490
the line breaking knowledge from the underlying stream's
491491
.readline() method -- there is currently no support for
492492
line breaking using the codec decoder due to lack of line
493-
buffering. Sublcasses should however, if possible, try to
493+
buffering. Subclasses should however, if possible, try to
494494
implement this method using their own knowledge of line
495495
breaking.
496496

@@ -527,7 +527,7 @@ class StreamReader(Codec):
527527
""" Resets the codec buffers used for keeping state.
528528

529529
Note that no stream repositioning should take place.
530-
This method is primarely intended to be able to recover
530+
This method is primarily intended to be able to recover
531531
from decoding errors.
532532

533533
"""
@@ -553,7 +553,7 @@ interfaces, though.
553553

554554
It is not required by the Unicode implementation to use these base
555555
classes, only the interfaces must match; this allows writing Codecs as
556-
extensions types.
556+
extension types.
557557

558558
As guideline, large mapping tables should be implemented using static
559559
C data in separate (shared) extension modules. That way multiple
@@ -628,8 +628,8 @@ Private Code Point Areas:
628628
-------------------------
629629

630630
Support for these is left to user land Codecs and not explicitly
631-
intergrated into the core. Note that due to the Internal Format being
632-
implemented, only the area between \uE000 and \uF8FF is useable for
631+
integrated into the core. Note that due to the Internal Format being
632+
implemented, only the area between \uE000 and \uF8FF is usable for
633633
private encodings.
634634

635635

@@ -649,14 +649,14 @@ provides access to about 64k characters and covers all characters in
649649
the Basic Multilingual Plane (BMP) of Unicode.
650650

651651
It is the Codec's responsibility to ensure that the data they pass to
652-
the Unicode object constructor repects this assumption. The
652+
the Unicode object constructor respects this assumption. The
653653
constructor does not check the data for Unicode compliance or use of
654654
surrogates.
655655

656656
Future implementations can extend the 32 bit restriction to the full
657657
set of all UTF-16 addressable characters (around 1M characters).
658658

659-
The Unicode API should provide inteface routines from <PythonUnicode>
659+
The Unicode API should provide interface routines from <PythonUnicode>
660660
to the compiler's wchar_t which can be 16 or 32 bit depending on the
661661
compiler/libc/platform being used.
662662

0 commit comments

Comments
 (0)