Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit d9dfaa9

Browse files
committed
Issue #6137: The pickle module now translates module names when loading
or dumping pickles with a 2.x-compatible protocol, in order to make data sharing and migration easier. This behaviour can be disabled using the new `fix_imports` optional argument.
1 parent 751899a commit d9dfaa9

8 files changed

Lines changed: 532 additions & 157 deletions

File tree

Doc/library/pickle.rst

Lines changed: 36 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -141,7 +141,7 @@ an unpickler, then you call the unpickler's :meth:`load` method. The
141141
The :mod:`pickle` module provides the following functions to make the pickling
142142
process more convenient:
143143

144-
.. function:: dump(obj, file[, protocol])
144+
.. function:: dump(obj, file[, protocol, \*, fix_imports=True])
145145

146146
Write a pickled representation of *obj* to the open file object *file*. This
147147
is equivalent to ``Pickler(file, protocol).dump(obj)``.
@@ -158,7 +158,11 @@ process more convenient:
158158
argument. It can thus be a file object opened for binary writing, a
159159
io.BytesIO instance, or any other custom object that meets this interface.
160160

161-
.. function:: dumps(obj[, protocol])
161+
If *fix_imports* is True and *protocol* is less than 3, pickle will try to
162+
map the new Python 3.x names to the old module names used in Python 2.x,
163+
so that the pickle data stream is readable with Python 2.x.
164+
165+
.. function:: dumps(obj[, protocol, \*, fix_imports=True])
162166

163167
Return the pickled representation of the object as a :class:`bytes`
164168
object, instead of writing it to a file.
@@ -171,7 +175,11 @@ process more convenient:
171175
supported. The higher the protocol used, the more recent the version of
172176
Python needed to read the pickle produced.
173177

174-
.. function:: load(file, [\*, encoding="ASCII", errors="strict"])
178+
If *fix_imports* is True and *protocol* is less than 3, pickle will try to
179+
map the new Python 3.x names to the old module names used in Python 2.x,
180+
so that the pickle data stream is readable with Python 2.x.
181+
182+
.. function:: load(file, [\*, fix_imports=True, encoding="ASCII", errors="strict"])
175183

176184
Read a pickled object representation from the open file object *file* and
177185
return the reconstituted object hierarchy specified therein. This is
@@ -187,11 +195,14 @@ process more convenient:
187195
for reading, a BytesIO object, or any other custom object that meets this
188196
interface.
189197

190-
Optional keyword arguments are encoding and errors, which are used to decode
191-
8-bit string instances pickled by Python 2.x. These default to 'ASCII' and
192-
'strict', respectively.
198+
Optional keyword arguments are *fix_imports*, *encoding* and *errors*,
199+
which are used to control compatiblity support for pickle stream generated
200+
by Python 2.x. If *fix_imports* is True, pickle will try to map the old
201+
Python 2.x names to the new names used in Python 3.x. The *encoding* and
202+
*errors* tell pickle how to decode 8-bit string instances pickled by Python
203+
2.x; these default to 'ASCII' and 'strict', respectively.
193204

194-
.. function:: loads(bytes_object, [\*, encoding="ASCII", errors="strict"])
205+
.. function:: loads(bytes_object, [\*, fix_imports=True, encoding="ASCII", errors="strict"])
195206

196207
Read a pickled object hierarchy from a :class:`bytes` object and return the
197208
reconstituted object hierarchy specified therein
@@ -200,9 +211,12 @@ process more convenient:
200211
argument is needed. Bytes past the pickled object's representation are
201212
ignored.
202213

203-
Optional keyword arguments are encoding and errors, which are used to decode
204-
8-bit string instances pickled by Python 2.x. These default to 'ASCII' and
205-
'strict', respectively.
214+
Optional keyword arguments are *fix_imports*, *encoding* and *errors*,
215+
which are used to control compatiblity support for pickle stream generated
216+
by Python 2.x. If *fix_imports* is True, pickle will try to map the old
217+
Python 2.x names to the new names used in Python 3.x. The *encoding* and
218+
*errors* tell pickle how to decode 8-bit string instances pickled by Python
219+
2.x; these default to 'ASCII' and 'strict', respectively.
206220

207221

208222
The :mod:`pickle` module defines three exceptions:
@@ -233,7 +247,7 @@ The :mod:`pickle` module defines three exceptions:
233247
The :mod:`pickle` module exports two classes, :class:`Pickler` and
234248
:class:`Unpickler`:
235249

236-
.. class:: Pickler(file[, protocol])
250+
.. class:: Pickler(file[, protocol, \*, fix_imports=True])
237251

238252
This takes a binary file for writing a pickle data stream.
239253

@@ -249,6 +263,10 @@ The :mod:`pickle` module exports two classes, :class:`Pickler` and
249263
argument. It can thus be a file object opened for binary writing, a
250264
io.BytesIO instance, or any other custom object that meets this interface.
251265

266+
If *fix_imports* is True and *protocol* is less than 3, pickle will try to
267+
map the new Python 3.x names to the old module names used in Python 2.x,
268+
so that the pickle data stream is readable with Python 2.x.
269+
252270
.. method:: dump(obj)
253271

254272
Write a pickled representation of *obj* to the open file object given in
@@ -277,7 +295,7 @@ The :mod:`pickle` module exports two classes, :class:`Pickler` and
277295
Use :func:`pickletools.optimize` if you need more compact pickles.
278296

279297

280-
.. class:: Unpickler(file, [\*, encoding="ASCII", errors="strict"])
298+
.. class:: Unpickler(file, [\*, fix_imports=True, encoding="ASCII", errors="strict"])
281299

282300
This takes a binary file for reading a pickle data stream.
283301

@@ -290,9 +308,12 @@ The :mod:`pickle` module exports two classes, :class:`Pickler` and
290308
for reading, a BytesIO object, or any other custom object that meets this
291309
interface.
292310

293-
Optional keyword arguments are encoding and errors, which are used to decode
294-
8-bit string instances pickled by Python 2.x. These default to 'ASCII' and
295-
'strict', respectively.
311+
Optional keyword arguments are *fix_imports*, *encoding* and *errors*,
312+
which are used to control compatiblity support for pickle stream generated
313+
by Python 2.x. If *fix_imports* is True, pickle will try to map the old
314+
Python 2.x names to the new names used in Python 3.x. The *encoding* and
315+
*errors* tell pickle how to decode 8-bit string instances pickled by Python
316+
2.x; these default to 'ASCII' and 'strict', respectively.
296317

297318
.. method:: load()
298319

Lib/_compat_pickle.py

Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,81 @@
1+
# This module is used to map the old Python 2 names to the new names used in
2+
# Python 3 for the pickle module. This needed to make pickle streams
3+
# generated with Python 2 loadable by Python 3.
4+
5+
# This is a copy of lib2to3.fixes.fix_imports.MAPPING. We cannot import
6+
# lib2to3 and use the mapping defined there, because lib2to3 uses pickle.
7+
# Thus, this could cause the module to be imported recursively.
8+
IMPORT_MAPPING = {
9+
'StringIO': 'io',
10+
'cStringIO': 'io',
11+
'cPickle': 'pickle',
12+
'__builtin__' : 'builtins',
13+
'copy_reg': 'copyreg',
14+
'Queue': 'queue',
15+
'SocketServer': 'socketserver',
16+
'ConfigParser': 'configparser',
17+
'repr': 'reprlib',
18+
'FileDialog': 'tkinter.filedialog',
19+
'tkFileDialog': 'tkinter.filedialog',
20+
'SimpleDialog': 'tkinter.simpledialog',
21+
'tkSimpleDialog': 'tkinter.simpledialog',
22+
'tkColorChooser': 'tkinter.colorchooser',
23+
'tkCommonDialog': 'tkinter.commondialog',
24+
'Dialog': 'tkinter.dialog',
25+
'Tkdnd': 'tkinter.dnd',
26+
'tkFont': 'tkinter.font',
27+
'tkMessageBox': 'tkinter.messagebox',
28+
'ScrolledText': 'tkinter.scrolledtext',
29+
'Tkconstants': 'tkinter.constants',
30+
'Tix': 'tkinter.tix',
31+
'ttk': 'tkinter.ttk',
32+
'Tkinter': 'tkinter',
33+
'markupbase': '_markupbase',
34+
'_winreg': 'winreg',
35+
'thread': '_thread',
36+
'dummy_thread': '_dummy_thread',
37+
'dbhash': 'dbm.bsd',
38+
'dumbdbm': 'dbm.dumb',
39+
'dbm': 'dbm.ndbm',
40+
'gdbm': 'dbm.gnu',
41+
'xmlrpclib': 'xmlrpc.client',
42+
'DocXMLRPCServer': 'xmlrpc.server',
43+
'SimpleXMLRPCServer': 'xmlrpc.server',
44+
'httplib': 'http.client',
45+
'htmlentitydefs' : 'html.entities',
46+
'HTMLParser' : 'html.parser',
47+
'Cookie': 'http.cookies',
48+
'cookielib': 'http.cookiejar',
49+
'BaseHTTPServer': 'http.server',
50+
'SimpleHTTPServer': 'http.server',
51+
'CGIHTTPServer': 'http.server',
52+
'test.test_support': 'test.support',
53+
'commands': 'subprocess',
54+
'UserString' : 'collections',
55+
'UserList' : 'collections',
56+
'urlparse' : 'urllib.parse',
57+
'robotparser' : 'urllib.robotparser',
58+
'whichdb': 'dbm',
59+
'anydbm': 'dbm'
60+
}
61+
62+
63+
# This contains rename rules that are easy to handle. We ignore the more
64+
# complex stuff (e.g. mapping the names in the urllib and types modules).
65+
# These rules should be run before import names are fixed.
66+
NAME_MAPPING = {
67+
('__builtin__', 'xrange'): ('builtins', 'range'),
68+
('__builtin__', 'reduce'): ('functools', 'reduce'),
69+
('__builtin__', 'intern'): ('sys', 'intern'),
70+
('__builtin__', 'unichr'): ('builtins', 'chr'),
71+
('__builtin__', 'basestring'): ('builtins', 'str'),
72+
('__builtin__', 'long'): ('builtins', 'int'),
73+
('itertools', 'izip'): ('builtins', 'zip'),
74+
('itertools', 'imap'): ('builtins', 'map'),
75+
('itertools', 'ifilter'): ('builtins', 'filter'),
76+
('itertools', 'ifilterfalse'): ('itertools', 'filterfalse'),
77+
}
78+
79+
# Same, but for 3.x to 2.x
80+
REVERSE_IMPORT_MAPPING = dict((v, k) for (k, v) in IMPORT_MAPPING.items())
81+
REVERSE_NAME_MAPPING = dict((v, k) for (k, v) in NAME_MAPPING.items())

Lib/pickle.py

Lines changed: 40 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@
3434
import re
3535
import io
3636
import codecs
37+
import _compat_pickle
3738

3839
__all__ = ["PickleError", "PicklingError", "UnpicklingError", "Pickler",
3940
"Unpickler", "dump", "dumps", "load", "loads"]
@@ -171,12 +172,11 @@ def __init__(self, value):
171172

172173
__all__.extend([x for x in dir() if re.match("[A-Z][A-Z0-9_]+$",x)])
173174

174-
175175
# Pickling machinery
176176

177177
class _Pickler:
178178

179-
def __init__(self, file, protocol=None):
179+
def __init__(self, file, protocol=None, *, fix_imports=True):
180180
"""This takes a binary file for writing a pickle data stream.
181181
182182
The optional protocol argument tells the pickler to use the
@@ -193,6 +193,10 @@ def __init__(self, file, protocol=None):
193193
bytes argument. It can thus be a file object opened for binary
194194
writing, a io.BytesIO instance, or any other custom object that
195195
meets this interface.
196+
197+
If fix_imports is True and protocol is less than 3, pickle will try to
198+
map the new Python 3.x names to the old module names used in Python
199+
2.x, so that the pickle data stream is readable with Python 2.x.
196200
"""
197201
if protocol is None:
198202
protocol = DEFAULT_PROTOCOL
@@ -208,6 +212,7 @@ def __init__(self, file, protocol=None):
208212
self.proto = int(protocol)
209213
self.bin = protocol >= 1
210214
self.fast = 0
215+
self.fix_imports = fix_imports and protocol < 3
211216

212217
def clear_memo(self):
213218
"""Clears the pickler's "memo".
@@ -698,6 +703,11 @@ def save_global(self, obj, name=None, pack=struct.pack):
698703
write(GLOBAL + bytes(module, "utf-8") + b'\n' +
699704
bytes(name, "utf-8") + b'\n')
700705
else:
706+
if self.fix_imports:
707+
if (module, name) in _compat_pickle.REVERSE_NAME_MAPPING:
708+
module, name = _compat_pickle.REVERSE_NAME_MAPPING[(module, name)]
709+
if module in _compat_pickle.REVERSE_IMPORT_MAPPING:
710+
module = _compat_pickle.REVERSE_IMPORT_MAPPING[module]
701711
try:
702712
write(GLOBAL + bytes(module, "ascii") + b'\n' +
703713
bytes(name, "ascii") + b'\n')
@@ -766,7 +776,8 @@ def whichmodule(func, funcname):
766776

767777
class _Unpickler:
768778

769-
def __init__(self, file, *, encoding="ASCII", errors="strict"):
779+
def __init__(self, file, *, fix_imports=True,
780+
encoding="ASCII", errors="strict"):
770781
"""This takes a binary file for reading a pickle data stream.
771782
772783
The protocol version of the pickle is detected automatically, so no
@@ -779,15 +790,21 @@ def __init__(self, file, *, encoding="ASCII", errors="strict"):
779790
reading, a BytesIO object, or any other custom object that
780791
meets this interface.
781792
782-
Optional keyword arguments are encoding and errors, which are
783-
used to decode 8-bit string instances pickled by Python 2.x.
784-
These default to 'ASCII' and 'strict', respectively.
793+
Optional keyword arguments are *fix_imports*, *encoding* and *errors*,
794+
which are used to control compatiblity support for pickle stream
795+
generated by Python 2.x. If *fix_imports* is True, pickle will try to
796+
map the old Python 2.x names to the new names used in Python 3.x. The
797+
*encoding* and *errors* tell pickle how to decode 8-bit string
798+
instances pickled by Python 2.x; these default to 'ASCII' and
799+
'strict', respectively.
785800
"""
786801
self.readline = file.readline
787802
self.read = file.read
788803
self.memo = {}
789804
self.encoding = encoding
790805
self.errors = errors
806+
self.proto = 0
807+
self.fix_imports = fix_imports
791808

792809
def load(self):
793810
"""Read a pickled object representation from the open file.
@@ -838,6 +855,7 @@ def load_proto(self):
838855
proto = ord(self.read(1))
839856
if not 0 <= proto <= HIGHEST_PROTOCOL:
840857
raise ValueError("unsupported pickle protocol: %d" % proto)
858+
self.proto = proto
841859
dispatch[PROTO[0]] = load_proto
842860

843861
def load_persid(self):
@@ -1088,7 +1106,12 @@ def get_extension(self, code):
10881106
self.append(obj)
10891107

10901108
def find_class(self, module, name):
1091-
# Subclasses may override this
1109+
# Subclasses may override this.
1110+
if self.proto < 3 and self.fix_imports:
1111+
if (module, name) in _compat_pickle.NAME_MAPPING:
1112+
module, name = _compat_pickle.NAME_MAPPING[(module, name)]
1113+
if module in _compat_pickle.IMPORT_MAPPING:
1114+
module = _compat_pickle.IMPORT_MAPPING[module]
10921115
__import__(module, level=0)
10931116
mod = sys.modules[module]
10941117
klass = getattr(mod, name)
@@ -1327,27 +1350,28 @@ def decode_long(data):
13271350

13281351
# Shorthands
13291352

1330-
def dump(obj, file, protocol=None):
1331-
Pickler(file, protocol).dump(obj)
1353+
def dump(obj, file, protocol=None, *, fix_imports=True):
1354+
Pickler(file, protocol, fix_imports=fix_imports).dump(obj)
13321355

1333-
def dumps(obj, protocol=None):
1356+
def dumps(obj, protocol=None, *, fix_imports=True):
13341357
f = io.BytesIO()
1335-
Pickler(f, protocol).dump(obj)
1358+
Pickler(f, protocol, fix_imports=fix_imports).dump(obj)
13361359
res = f.getvalue()
13371360
assert isinstance(res, bytes_types)
13381361
return res
13391362

1340-
def load(file, *, encoding="ASCII", errors="strict"):
1341-
return Unpickler(file, encoding=encoding, errors=errors).load()
1363+
def load(file, *, fix_imports=True, encoding="ASCII", errors="strict"):
1364+
return Unpickler(file, fix_imports=fix_imports,
1365+
encoding=encoding, errors=errors).load()
13421366

1343-
def loads(s, *, encoding="ASCII", errors="strict"):
1367+
def loads(s, *, fix_imports=True, encoding="ASCII", errors="strict"):
13441368
if isinstance(s, str):
13451369
raise TypeError("Can't load pickle from unicode string")
13461370
file = io.BytesIO(s)
1347-
return Unpickler(file, encoding=encoding, errors=errors).load()
1371+
return Unpickler(file, fix_imports=fix_imports,
1372+
encoding=encoding, errors=errors).load()
13481373

13491374
# Doctest
1350-
13511375
def _test():
13521376
import doctest
13531377
return doctest.testmod()

0 commit comments

Comments
 (0)