Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 9c1aed8

Browse files
committed
Close #7475: Restore binary & text transform codecs
The codecs themselves were restored in Python 3.2, this completes the restoration by adding back the convenience aliases. These aliases were originally left out due to confusing errors when attempting to use them with the text encoding specific convenience methods. Python 3.4 includes several improvements to those errors, thus permitting the aliases to be restored as well.
1 parent 12820c0 commit 9c1aed8

4 files changed

Lines changed: 142 additions & 80 deletions

File tree

Doc/library/codecs.rst

Lines changed: 70 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -1188,6 +1188,9 @@ common use case for codecs, the underlying codec infrastructure supports
11881188
arbitrary data transforms rather than just text encodings). For asymmetric
11891189
codecs, the stated purpose describes the encoding direction.
11901190

1191+
Text Encodings
1192+
^^^^^^^^^^^^^^
1193+
11911194
The following codecs provide :class:`str` to :class:`bytes` encoding and
11921195
:term:`bytes-like object` to :class:`str` decoding, similar to the Unicode text
11931196
encodings.
@@ -1234,62 +1237,83 @@ encodings.
12341237
| | | .. deprecated:: 3.3 |
12351238
+--------------------+---------+---------------------------+
12361239

1237-
The following codecs provide :term:`bytes-like object` to :class:`bytes`
1238-
mappings.
1239-
1240-
1241-
.. tabularcolumns:: |l|L|L|
1242-
1243-
+----------------------+------------------------------+------------------------------+
1244-
| Codec | Purpose | Encoder / decoder |
1245-
+======================+==============================+==============================+
1246-
| base64_codec [#b64]_ | Convert operand to MIME | :meth:`base64.b64encode` / |
1247-
| | base64 (the result always | :meth:`base64.b64decode` |
1248-
| | includes a trailing | |
1249-
| | ``'\n'``) | |
1250-
| | | |
1251-
| | .. versionchanged:: 3.4 | |
1252-
| | accepts any | |
1253-
| | :term:`bytes-like object` | |
1254-
| | as input for encoding and | |
1255-
| | decoding | |
1256-
+----------------------+------------------------------+------------------------------+
1257-
| bz2_codec | Compress the operand | :meth:`bz2.compress` / |
1258-
| | using bz2 | :meth:`bz2.decompress` |
1259-
+----------------------+------------------------------+------------------------------+
1260-
| hex_codec | Convert operand to | :meth:`base64.b16encode` / |
1261-
| | hexadecimal | :meth:`base64.b16decode` |
1262-
| | representation, with two | |
1263-
| | digits per byte | |
1264-
+----------------------+------------------------------+------------------------------+
1265-
| quopri_codec | Convert operand to MIME | :meth:`quopri.encodestring` /|
1266-
| | quoted printable | :meth:`quopri.decodestring` |
1267-
+----------------------+------------------------------+------------------------------+
1268-
| uu_codec | Convert the operand using | :meth:`uu.encode` / |
1269-
| | uuencode | :meth:`uu.decode` |
1270-
+----------------------+------------------------------+------------------------------+
1271-
| zlib_codec | Compress the operand | :meth:`zlib.compress` / |
1272-
| | using gzip | :meth:`zlib.decompress` |
1273-
+----------------------+------------------------------+------------------------------+
1240+
.. _binary-transforms:
1241+
1242+
Binary Transforms
1243+
^^^^^^^^^^^^^^^^^
1244+
1245+
The following codecs provide binary transforms: :term:`bytes-like object`
1246+
to :class:`bytes` mappings.
1247+
1248+
1249+
.. tabularcolumns:: |l|L|L|L|
1250+
1251+
+----------------------+------------------+------------------------------+------------------------------+
1252+
| Codec | Aliases | Purpose | Encoder / decoder |
1253+
+======================+==================+==============================+==============================+
1254+
| base64_codec [#b64]_ | base64, base_64 | Convert operand to MIME | :meth:`base64.b64encode` / |
1255+
| | | base64 (the result always | :meth:`base64.b64decode` |
1256+
| | | includes a trailing | |
1257+
| | | ``'\n'``) | |
1258+
| | | | |
1259+
| | | .. versionchanged:: 3.4 | |
1260+
| | | accepts any | |
1261+
| | | :term:`bytes-like object` | |
1262+
| | | as input for encoding and | |
1263+
| | | decoding | |
1264+
+----------------------+------------------+------------------------------+------------------------------+
1265+
| bz2_codec | bz2 | Compress the operand | :meth:`bz2.compress` / |
1266+
| | | using bz2 | :meth:`bz2.decompress` |
1267+
+----------------------+------------------+------------------------------+------------------------------+
1268+
| hex_codec | hex | Convert operand to | :meth:`base64.b16encode` / |
1269+
| | | hexadecimal | :meth:`base64.b16decode` |
1270+
| | | representation, with two | |
1271+
| | | digits per byte | |
1272+
+----------------------+------------------+------------------------------+------------------------------+
1273+
| quopri_codec | quopri, | Convert operand to MIME | :meth:`quopri.encodestring` /|
1274+
| | quotedprintable, | quoted printable | :meth:`quopri.decodestring` |
1275+
| | quoted_printable | | |
1276+
+----------------------+------------------+------------------------------+------------------------------+
1277+
| uu_codec | uu | Convert the operand using | :meth:`uu.encode` / |
1278+
| | | uuencode | :meth:`uu.decode` |
1279+
+----------------------+------------------+------------------------------+------------------------------+
1280+
| zlib_codec | zip, zlib | Compress the operand | :meth:`zlib.compress` / |
1281+
| | | using gzip | :meth:`zlib.decompress` |
1282+
+----------------------+------------------+------------------------------+------------------------------+
12741283

12751284
.. [#b64] In addition to :term:`bytes-like objects <bytes-like object>`,
12761285
``'base64_codec'`` also accepts ASCII-only instances of :class:`str` for
12771286
decoding
12781287
1288+
.. versionadded:: 3.2
1289+
Restoration of the binary transforms.
12791290

1280-
The following codecs provide :class:`str` to :class:`str` mappings.
1291+
.. versionchanged:: 3.4
1292+
Restoration of the aliases for the binary transforms.
12811293

1282-
.. tabularcolumns:: |l|L|
12831294

1284-
+--------------------+---------------------------+
1285-
| Codec | Purpose |
1286-
+====================+===========================+
1287-
| rot_13 | Returns the Caesar-cypher |
1288-
| | encryption of the operand |
1289-
+--------------------+---------------------------+
1295+
.. _text-transforms:
1296+
1297+
Text Transforms
1298+
^^^^^^^^^^^^^^^
1299+
1300+
The following codec provides a text transform: a :class:`str` to :class:`str`
1301+
mapping.
1302+
1303+
.. tabularcolumns:: |l|l|L|
1304+
1305+
+--------------------+---------+---------------------------+
1306+
| Codec | Aliases | Purpose |
1307+
+====================+=========+===========================+
1308+
| rot_13 | rot13 | Returns the Caesar-cypher |
1309+
| | | encryption of the operand |
1310+
+--------------------+---------+---------------------------+
12901311

12911312
.. versionadded:: 3.2
1292-
bytes-to-bytes and str-to-str codecs.
1313+
Restoration of the ``rot_13`` text transform.
1314+
1315+
.. versionchanged:: 3.4
1316+
Restoration of the ``rot13`` alias.
12931317

12941318

12951319
:mod:`encodings.idna` --- Internationalized Domain Names in Applications

Doc/whatsnew/3.4.rst

Lines changed: 34 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -103,7 +103,8 @@ New expected features for Python implementations:
103103
* :ref:`PEP 446: Make newly created file descriptors non-inheritable <pep-446>`.
104104
* command line option for :ref:`isolated mode <using-on-misc-options>`,
105105
(:issue:`16499`).
106-
* improvements to handling of non-Unicode codecs
106+
* :ref:`improvements <codec-handling-improvements>` in the handling of
107+
codecs that are not text encodings
107108

108109
Significantly Improved Library Modules:
109110

@@ -173,8 +174,10 @@ PEP 446: Make newly created file descriptors non-inheritable
173174
PEP written and implemented by Victor Stinner.
174175

175176

176-
Improvements to handling of non-Unicode codecs
177-
==============================================
177+
.. _codec-handling-improvements:
178+
179+
Improvements to codec handling
180+
==============================
178181

179182
Since it was first introduced, the :mod:`codecs` module has always been
180183
intended to operate as a type-neutral dynamic encoding and decoding
@@ -186,7 +189,7 @@ fact.
186189
As a key step in clarifying the situation, the :meth:`codecs.encode` and
187190
:meth:`codecs.decode` convenience functions are now properly documented in
188191
Python 2.7, 3.3 and 3.4. These functions have existed in the :mod:`codecs`
189-
module and have been covered by the regression test suite since Python 2.4,
192+
module (and have been covered by the regression test suite) since Python 2.4,
190193
but were previously only discoverable through runtime introspection.
191194

192195
Unlike the convenience methods on :class:`str`, :class:`bytes` and
@@ -199,43 +202,58 @@ In Python 3.4, the interpreter is able to identify the known non-text
199202
encodings provided in the standard library and direct users towards these
200203
general purpose convenience functions when appropriate::
201204

202-
>>> import codecs
203-
204-
>>> b"abcdef".decode("hex_codec")
205+
>>> b"abcdef".decode("hex")
205206
Traceback (most recent call last):
206207
File "<stdin>", line 1, in <module>
207-
LookupError: 'hex_codec' is not a text encoding; use codecs.decode() to handle arbitrary codecs
208+
LookupError: 'hex' is not a text encoding; use codecs.decode() to handle arbitrary codecs
208209

209-
>>> "hello".encode("rot_13")
210+
>>> "hello".encode("rot13")
210211
Traceback (most recent call last):
211212
File "<stdin>", line 1, in <module>
212-
LookupError: 'rot_13' is not a text encoding; use codecs.encode() to handle arbitrary codecs
213+
LookupError: 'rot13' is not a text encoding; use codecs.encode() to handle arbitrary codecs
213214

214215
In a related change, whenever it is feasible without breaking backwards
215216
compatibility, exceptions raised during encoding and decoding operations
216217
will be wrapped in a chained exception of the same type that mentions the
217218
name of the codec responsible for producing the error::
218219

219-
>>> codecs.decode(b"abcdefgh", "hex_codec")
220+
>>> import codecs
221+
222+
>>> codecs.decode(b"abcdefgh", "hex")
220223
binascii.Error: Non-hexadecimal digit found
221224

222225
The above exception was the direct cause of the following exception:
223226

224227
Traceback (most recent call last):
225228
File "<stdin>", line 1, in <module>
226-
binascii.Error: decoding with 'hex_codec' codec failed (Error: Non-hexadecimal digit found)
229+
binascii.Error: decoding with 'hex' codec failed (Error: Non-hexadecimal digit found)
227230

228-
>>> codecs.encode("hello", "bz2_codec")
231+
>>> codecs.encode("hello", "bz2")
229232
TypeError: 'str' does not support the buffer interface
230233

231234
The above exception was the direct cause of the following exception:
232235

233236
Traceback (most recent call last):
234237
File "<stdin>", line 1, in <module>
235-
TypeError: encoding with 'bz2_codec' codec failed (TypeError: 'str' does not support the buffer interface)
238+
TypeError: encoding with 'bz2' codec failed (TypeError: 'str' does not support the buffer interface)
239+
240+
Finally, as the examples above show, these improvements have permitted
241+
the restoration of the convenience aliases for the non-Unicode codecs that
242+
were themselves restored in Python 3.2. This means that encoding binary data
243+
to and from its hexadecimal representation (for example) can now be written
244+
as::
245+
246+
>>> from codecs import encode, decode
247+
>>> encode(b"hello", "hex")
248+
b'68656c6c6f'
249+
>>> decode(b"68656c6c6f", "hex")
250+
b'hello'
251+
252+
The binary and text transforms provided in the standard library are detailed
253+
in :ref:`binary-transforms` and :ref:`text-transforms`.
236254

237-
(Contributed by Nick Coghlan in :issue:`17827`, :issue:`17828` and
238-
:issue:`19619`)
255+
(Contributed by Nick Coghlan in :issue:`7475`, , :issue:`17827`,
256+
:issue:`17828` and :issue:`19619`)
239257

240258
.. _pep-451:
241259

Lib/encodings/aliases.py

Lines changed: 18 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -33,9 +33,9 @@
3333
'us' : 'ascii',
3434
'us_ascii' : 'ascii',
3535

36-
## base64_codec codec
37-
#'base64' : 'base64_codec',
38-
#'base_64' : 'base64_codec',
36+
# base64_codec codec
37+
'base64' : 'base64_codec',
38+
'base_64' : 'base64_codec',
3939

4040
# big5 codec
4141
'big5_tw' : 'big5',
@@ -45,8 +45,8 @@
4545
'big5_hkscs' : 'big5hkscs',
4646
'hkscs' : 'big5hkscs',
4747

48-
## bz2_codec codec
49-
#'bz2' : 'bz2_codec',
48+
# bz2_codec codec
49+
'bz2' : 'bz2_codec',
5050

5151
# cp037 codec
5252
'037' : 'cp037',
@@ -248,8 +248,8 @@
248248
'cp936' : 'gbk',
249249
'ms936' : 'gbk',
250250

251-
## hex_codec codec
252-
#'hex' : 'hex_codec',
251+
# hex_codec codec
252+
'hex' : 'hex_codec',
253253

254254
# hp_roman8 codec
255255
'roman8' : 'hp_roman8',
@@ -450,13 +450,13 @@
450450
'cp154' : 'ptcp154',
451451
'cyrillic_asian' : 'ptcp154',
452452

453-
## quopri_codec codec
454-
#'quopri' : 'quopri_codec',
455-
#'quoted_printable' : 'quopri_codec',
456-
#'quotedprintable' : 'quopri_codec',
453+
# quopri_codec codec
454+
'quopri' : 'quopri_codec',
455+
'quoted_printable' : 'quopri_codec',
456+
'quotedprintable' : 'quopri_codec',
457457

458-
## rot_13 codec
459-
#'rot13' : 'rot_13',
458+
# rot_13 codec
459+
'rot13' : 'rot_13',
460460

461461
# shift_jis codec
462462
'csshiftjis' : 'shift_jis',
@@ -518,12 +518,12 @@
518518
'utf8_ucs2' : 'utf_8',
519519
'utf8_ucs4' : 'utf_8',
520520

521-
## uu_codec codec
522-
#'uu' : 'uu_codec',
521+
# uu_codec codec
522+
'uu' : 'uu_codec',
523523

524-
## zlib_codec codec
525-
#'zip' : 'zlib_codec',
526-
#'zlib' : 'zlib_codec',
524+
# zlib_codec codec
525+
'zip' : 'zlib_codec',
526+
'zlib' : 'zlib_codec',
527527

528528
# temporary mac CJK aliases, will be replaced by proper codecs in 3.1
529529
'x_mac_japanese' : 'shift_jis',

Lib/test/test_codecs.py

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2320,18 +2320,29 @@ def test_seek0(self):
23202320
"quopri_codec",
23212321
"hex_codec",
23222322
]
2323+
2324+
transform_aliases = {
2325+
"base64_codec": ["base64", "base_64"],
2326+
"uu_codec": ["uu"],
2327+
"quopri_codec": ["quopri", "quoted_printable", "quotedprintable"],
2328+
"hex_codec": ["hex"],
2329+
"rot_13": ["rot13"],
2330+
}
2331+
23232332
try:
23242333
import zlib
23252334
except ImportError:
23262335
pass
23272336
else:
23282337
bytes_transform_encodings.append("zlib_codec")
2338+
transform_aliases["zlib_codec"] = ["zip", "zlib"]
23292339
try:
23302340
import bz2
23312341
except ImportError:
23322342
pass
23332343
else:
23342344
bytes_transform_encodings.append("bz2_codec")
2345+
transform_aliases["bz2_codec"] = ["bz2"]
23352346

23362347
class TransformCodecTest(unittest.TestCase):
23372348

@@ -2445,6 +2456,15 @@ def test_custom_hex_error_is_wrapped(self):
24452456
# Unfortunately, the bz2 module throws OSError, which the codec
24462457
# machinery currently can't wrap :(
24472458

2459+
# Ensure codec aliases from http://bugs.python.org/issue7475 work
2460+
def test_aliases(self):
2461+
for codec_name, aliases in transform_aliases.items():
2462+
expected_name = codecs.lookup(codec_name).name
2463+
for alias in aliases:
2464+
with self.subTest(alias=alias):
2465+
info = codecs.lookup(alias)
2466+
self.assertEqual(info.name, expected_name)
2467+
24482468

24492469
# The codec system tries to wrap exceptions in order to ensure the error
24502470
# mentions the operation being performed and the codec involved. We

0 commit comments

Comments
 (0)