11=============================================================================
2- Python Unicode Integration Proposal Version: 1.3
2+ Python Unicode Integration Proposal Version: 1.4
33-----------------------------------------------------------------------------
44
55
@@ -162,6 +162,17 @@ encoding>.
162162For the same reason, Unicode objects should return the same hash value
163163as their UTF-8 equivalent strings.
164164
165+ When compared using cmp() (or PyObject_Compare()) the implementation
166+ should mask TypeErrors raised during the conversion to remain in synch
167+ with the string behavior. All other errors such as ValueErrors raised
168+ during coercion of strings to Unicode should not be masked and passed
169+ through to the user.
170+
171+ In containment tests ('a' in u'abc' and u'a' in 'abc') both sides
172+ should be coerced to Unicode before applying the test. Errors occuring
173+ during coercion (e.g. None in u'abc') should not be masked.
174+
175+
165176Coercion:
166177---------
167178
@@ -380,6 +391,13 @@ class StreamWriter(Codec):
380391 data, consumed = self.encode(object,self.errors)
381392 self.stream.write(data)
382393
394+ def writelines(self, list):
395+
396+ """ Writes the concatenated list of strings to the stream
397+ using .write().
398+ """
399+ self.write(''.join(list))
400+
383401 def reset(self):
384402
385403 """ Flushes and resets the codec buffers used for keeping state.
@@ -463,6 +481,47 @@ class StreamReader(Codec):
463481 else:
464482 return object
465483
484+ def readline(self, size=None):
485+
486+ """ Read one line from the input stream and return the
487+ decoded data.
488+
489+ Note: Unlike the .readlines() method, this method inherits
490+ the line breaking knowledge from the underlying stream's
491+ .readline() method -- there is currently no support for
492+ line breaking using the codec decoder due to lack of line
493+ buffering. Sublcasses should however, if possible, try to
494+ implement this method using their own knowledge of line
495+ breaking.
496+
497+ size, if given, is passed as size argument to the stream's
498+ .readline() method.
499+
500+ """
501+ if size is None:
502+ line = self.stream.readline()
503+ else:
504+ line = self.stream.readline(size)
505+ return self.decode(line)[0]
506+
507+ def readlines(self, sizehint=0):
508+
509+ """ Read all lines available on the input stream
510+ and return them as list of lines.
511+
512+ Line breaks are implemented using the codec's decoder
513+ method and are included in the list entries.
514+
515+ sizehint, if given, is passed as size argument to the
516+ stream's .read() method.
517+
518+ """
519+ if sizehint is None:
520+ data = self.stream.read()
521+ else:
522+ data = self.stream.read(sizehint)
523+ return self.decode(data)[0].splitlines(1)
524+
466525 def reset(self):
467526
468527 """ Resets the codec buffers used for keeping state.
@@ -482,9 +541,6 @@ class StreamReader(Codec):
482541 """
483542 return getattr(self.stream,name)
484543
485- XXX What about .readline(), .readlines() ? These could be implemented
486- using .read() as generic functions instead of requiring their
487- implementation by all codecs. Also see Line Breaks.
488544
489545Stream codec implementors are free to combine the StreamWriter and
490546StreamReader interfaces into one class. Even combining all these with
@@ -692,9 +748,10 @@ Format markers are used in Python format strings. If Python strings
692748are used as format strings, the following interpretations should be in
693749effect:
694750
695- '%s': '%s' does str(u) for Unicode objects embedded
696- in Python strings, so the output will be
697- u.encode(<default encoding>)
751+ '%s': For Unicode objects this will cause coercion of the
752+ whole format string to Unicode. Note that
753+ you should use a Unicode format string to start
754+ with for performance reasons.
698755
699756In case the format string is an Unicode object, all parameters are coerced
700757to Unicode first and then put together and formatted according to the format
@@ -922,6 +979,9 @@ For comparison:
922979 Introducing Unicode to ECMAScript --
923980 http://www-4.ibm.com/software/developer/library/internationalization-support.html
924981
982+ IANA Character Set Names:
983+ ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets
984+
925985Encodings:
926986
927987 Overview:
@@ -944,6 +1004,12 @@ Encodings:
9441004
9451005History of this Proposal:
9461006-------------------------
1007+ 1.4: Added note about mixed type comparisons and contains tests.
1008+ Changed treating of Unicode objects in format strings (if used
1009+ with '%s' % u they will now cause the format string to be
1010+ coerced to Unicode, thus producing a Unicode object on return).
1011+ Added link to IANA charset names (thanks to Lars Marius Garshol).
1012+ Added new codec methods .readline(), .readlines() and .writelines().
94710131.3: Added new "es" and "es#" parser markers
94810141.2: Removed POD about codecs.open()
94910151.1: Added note about comparisons and hash values. Added note about
0 commit comments