Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Leverage the new PEP 574 for no-copy pickling of contiguous arrays #11161

@ogrisel

Description

@ogrisel

PEP 574 (scheduled for Python 3.8) introduces pickle protocol 5 with support for no-copy pickling of large mutable buffers.

I made a small proof-of-concept benchmark script using @pitrou's pickle5 backport of his draft implementation of PEP 547.

See: https://gist.github.com/ogrisel/a2b0e5ae4987a398caa7f9277cb3b90a

The meat lies in the following reducer:

from pickle5 import PickleBuffer

def _array_from_buffer(buffer, dtype, shape):
    return np.frombuffer(buffer, dtype=dtype).reshape(shape)


def reduce_ndarray_pickle5(a):
    # This reducer assumes protocol 5 as currently there is no way to register
    # protocol-aware reduce function in the global copyreg dispatch table.
    if not a.dtype.hasobject and a.flags.c_contiguous:
        # No-copy pickling for C-contiguous arrays and protocol 5
        return _array_from_buffer, (PickleBuffer(a), a.dtype, a.shape), None
    else:
        # Fall-back to generic method
        return a.__reduce__()

This works as expected (no extra copy when dumping and loading) and also fixes the in-memory speed overhead reported in by @mrocklin in #7544.

To get this in numpy, we would need to make a protocol-aware reduce function that is, have ndarray implement a __reduce_ex__ method that accepts a protocol argument instead of the existing bytes-based implementation from array_reduce in https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/methods.c#L1577. This bytes-based implementation should probably be kept as a fallback when protocol < 5.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions