pathlib.PurePath.__reduce__() currently accesses and returns the parts tuple. Pathlib ensures that the strings therein are interned.
There's a good reason to do this: it ensures that the pickled data is as small as possible, with maximum re-use of small string objects.
However, it comes with some disadvantages:
- When normalising any path, we need to call
sys.intern(str(part)) on each part
- When pickling a path, we must join, parse and normalise, and then generate the
parts tuple.
We could instead make __reduce__() return the raw paths fed to the constructor (the _raw_paths attribute). This would be faster but less space efficient. With the cost of storage and bandwidth falling at a faster rate than compute, I suspect this trade-off is worth making.
Linked PRs
pathlib.PurePath.__reduce__()currently accesses and returns thepartstuple. Pathlib ensures that the strings therein are interned.There's a good reason to do this: it ensures that the pickled data is as small as possible, with maximum re-use of small string objects.
However, it comes with some disadvantages:
sys.intern(str(part))on each partpartstuple.We could instead make
__reduce__()return the raw paths fed to the constructor (the_raw_pathsattribute). This would be faster but less space efficient. With the cost of storage and bandwidth falling at a faster rate than compute, I suspect this trade-off is worth making.Linked PRs
pathlib.PurePathpickling #112856pathlib.PurePathpickling #113243