Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 10dfd4c

Browse files
committed
M.-A. Lemburg <[email protected]>:
Updated to version 1.4.
1 parent e0243e2 commit 10dfd4c

1 file changed

Lines changed: 73 additions & 7 deletions

File tree

Misc/unicode.txt

Lines changed: 73 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
=============================================================================
2-
Python Unicode Integration Proposal Version: 1.3
2+
Python Unicode Integration Proposal Version: 1.4
33
-----------------------------------------------------------------------------
44

55

@@ -162,6 +162,17 @@ encoding>.
162162
For the same reason, Unicode objects should return the same hash value
163163
as their UTF-8 equivalent strings.
164164

165+
When compared using cmp() (or PyObject_Compare()) the implementation
166+
should mask TypeErrors raised during the conversion to remain in synch
167+
with the string behavior. All other errors such as ValueErrors raised
168+
during coercion of strings to Unicode should not be masked and passed
169+
through to the user.
170+
171+
In containment tests ('a' in u'abc' and u'a' in 'abc') both sides
172+
should be coerced to Unicode before applying the test. Errors occuring
173+
during coercion (e.g. None in u'abc') should not be masked.
174+
175+
165176
Coercion:
166177
---------
167178

@@ -380,6 +391,13 @@ class StreamWriter(Codec):
380391
data, consumed = self.encode(object,self.errors)
381392
self.stream.write(data)
382393

394+
def writelines(self, list):
395+
396+
""" Writes the concatenated list of strings to the stream
397+
using .write().
398+
"""
399+
self.write(''.join(list))
400+
383401
def reset(self):
384402

385403
""" Flushes and resets the codec buffers used for keeping state.
@@ -463,6 +481,47 @@ class StreamReader(Codec):
463481
else:
464482
return object
465483

484+
def readline(self, size=None):
485+
486+
""" Read one line from the input stream and return the
487+
decoded data.
488+
489+
Note: Unlike the .readlines() method, this method inherits
490+
the line breaking knowledge from the underlying stream's
491+
.readline() method -- there is currently no support for
492+
line breaking using the codec decoder due to lack of line
493+
buffering. Sublcasses should however, if possible, try to
494+
implement this method using their own knowledge of line
495+
breaking.
496+
497+
size, if given, is passed as size argument to the stream's
498+
.readline() method.
499+
500+
"""
501+
if size is None:
502+
line = self.stream.readline()
503+
else:
504+
line = self.stream.readline(size)
505+
return self.decode(line)[0]
506+
507+
def readlines(self, sizehint=0):
508+
509+
""" Read all lines available on the input stream
510+
and return them as list of lines.
511+
512+
Line breaks are implemented using the codec's decoder
513+
method and are included in the list entries.
514+
515+
sizehint, if given, is passed as size argument to the
516+
stream's .read() method.
517+
518+
"""
519+
if sizehint is None:
520+
data = self.stream.read()
521+
else:
522+
data = self.stream.read(sizehint)
523+
return self.decode(data)[0].splitlines(1)
524+
466525
def reset(self):
467526

468527
""" Resets the codec buffers used for keeping state.
@@ -482,9 +541,6 @@ class StreamReader(Codec):
482541
"""
483542
return getattr(self.stream,name)
484543

485-
XXX What about .readline(), .readlines() ? These could be implemented
486-
using .read() as generic functions instead of requiring their
487-
implementation by all codecs. Also see Line Breaks.
488544

489545
Stream codec implementors are free to combine the StreamWriter and
490546
StreamReader interfaces into one class. Even combining all these with
@@ -692,9 +748,10 @@ Format markers are used in Python format strings. If Python strings
692748
are used as format strings, the following interpretations should be in
693749
effect:
694750

695-
'%s': '%s' does str(u) for Unicode objects embedded
696-
in Python strings, so the output will be
697-
u.encode(<default encoding>)
751+
'%s': For Unicode objects this will cause coercion of the
752+
whole format string to Unicode. Note that
753+
you should use a Unicode format string to start
754+
with for performance reasons.
698755

699756
In case the format string is an Unicode object, all parameters are coerced
700757
to Unicode first and then put together and formatted according to the format
@@ -922,6 +979,9 @@ For comparison:
922979
Introducing Unicode to ECMAScript --
923980
http://www-4.ibm.com/software/developer/library/internationalization-support.html
924981

982+
IANA Character Set Names:
983+
ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets
984+
925985
Encodings:
926986

927987
Overview:
@@ -944,6 +1004,12 @@ Encodings:
9441004

9451005
History of this Proposal:
9461006
-------------------------
1007+
1.4: Added note about mixed type comparisons and contains tests.
1008+
Changed treating of Unicode objects in format strings (if used
1009+
with '%s' % u they will now cause the format string to be
1010+
coerced to Unicode, thus producing a Unicode object on return).
1011+
Added link to IANA charset names (thanks to Lars Marius Garshol).
1012+
Added new codec methods .readline(), .readlines() and .writelines().
9471013
1.3: Added new "es" and "es#" parser markers
9481014
1.2: Removed POD about codecs.open()
9491015
1.1: Added note about comparisons and hash values. Added note about

0 commit comments

Comments
 (0)