Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Copying bytes object to shared memory list truncates trailing zeros #106939

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
pinkhamr-fb opened this issue Jul 21, 2023 · 4 comments
Open

Copying bytes object to shared memory list truncates trailing zeros #106939

pinkhamr-fb opened this issue Jul 21, 2023 · 4 comments
Labels
topic-multiprocessing type-bug An unexpected behavior, bug, or error

Comments

@pinkhamr-fb
Copy link

pinkhamr-fb commented Jul 21, 2023

Bug report

tl;dr; See stack overflow post

When copying a bytes object to a shareable list, the trailing zeros are stripped causing data loss. This doesn't appear in the documentation as far as I can tell, and seems to be unexpected behavior related to the implementation.

Example code:

from multiprocessing import shared_memory as shm

shmList = shm.ShareableList([bytes(50)])
testBytes = bytes.fromhex("00112233445566778899aabbccddeeff0000")

shmList[0] = testBytes
print(testBytes)
print(shmList[0])

shmList.shm.close()
shmList.shm.unlink()

Output:

b'\x00\x11"3DUfw\x88\x99\xaa\xbb\xcc\xdd\xee\xff\x00\x00'
b'\x00\x11"3DUfw\x88\x99\xaa\xbb\xcc\xdd\xee\xff'

Offending portion of CPython code:

_back_transforms_mapping = {
        0: lambda value: value,                   # int, float, bool
        1: lambda value: value.rstrip(b'\x00').decode(_encoding),  # str
        2: lambda value: value.rstrip(b'\x00'),   # bytes
        3: lambda _value: None,                   # None
    }

Linked PRs

@pinkhamr-fb pinkhamr-fb added the type-bug An unexpected behavior, bug, or error label Jul 21, 2023
@corona10 corona10 added the docs Documentation in the Doc dir label Jul 21, 2023
@corona10
Copy link
Member

@pitrou @gpshead Would you like to take a look at this issue?

@gpshead
Copy link
Member

gpshead commented Jul 21, 2023

While that is "surprising" behavior, the implementation of that ShareableList does not appear to make good guarantees.

  1. We should document this behavior / bug today for existing <=3.12 releases. Regardless of if we backport a bugfix, people need to know that they may be writing code running on impacted versions.
  2. We should fix this for any impacted types for 3.13+. Both str and bytes are impacted. Trailing \x00 characters are valid in both.

Workaround: unconditionally append a single non-0 character or byte to any shared data when putting items in and unconditionally ignore the final character (truncation or memoryview) on the consuming side.

There are other constraints worth documenting as well. those "int"s are a maximum of 8 bytes struct packed without specifying if they are signed or not. https://docs.python.org/3/library/multiprocessing.shared_memory.html#multiprocessing.shared_memory.ShareableList needs improvement.

@pinkhamr-fb
Copy link
Author

FWIW, the workaround you proposed is what I ended up doing in my code to get around this.

gpshead added a commit to gpshead/cpython that referenced this issue Jul 25, 2023
gpshead added a commit that referenced this issue Jul 25, 2023
* gh-106939: document ShareableList nul-strip quirk.
* Mention the `int` size constraint.
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jul 25, 2023
…7266)

* pythongh-106939: document ShareableList nul-strip quirk.
* Mention the `int` size constraint.
(cherry picked from commit 70dc009)

Co-authored-by: Gregory P. Smith <[email protected]>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Jul 25, 2023
…7266)

* pythongh-106939: document ShareableList nul-strip quirk.
* Mention the `int` size constraint.
(cherry picked from commit 70dc009)

Co-authored-by: Gregory P. Smith <[email protected]>
@gpshead gpshead added topic-multiprocessing and removed docs Documentation in the Doc dir labels Jul 25, 2023
gpshead added a commit that referenced this issue Jul 25, 2023
…#107270)

gh-106939: document ShareableList nul-strip quirk. (GH-107266)

* gh-106939: document ShareableList nul-strip quirk.
* Mention the `int` size constraint.
(cherry picked from commit 70dc009)

Co-authored-by: Gregory P. Smith <[email protected]>
gpshead added a commit that referenced this issue Jul 25, 2023
…#107269)

gh-106939: document ShareableList nul-strip quirk. (GH-107266)

* gh-106939: document ShareableList nul-strip quirk.
* Mention the `int` size constraint.
(cherry picked from commit 70dc009)

Co-authored-by: Gregory P. Smith <[email protected]>
jtcave pushed a commit to jtcave/cpython that referenced this issue Jul 27, 2023
* pythongh-106939: document ShareableList nul-strip quirk.
* Mention the `int` size constraint.
@zdelv
Copy link

zdelv commented Dec 16, 2023

I'm willing to work on a fix for this. Is implementing the workaround mentioned into ShareableList considered an acceptable solution, or are we looking for something more involved?

To me, it seems like the issue is that we're padding all str and bytes to an 8 byte alignment, but we're forgetting to save the actual data length. Adding a sentinel value to the end of the str or bytes (like the workaround does) seems like the most reasonable method to fixing it without changing the underlying encoding to add the actual data length.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-multiprocessing type-bug An unexpected behavior, bug, or error
Projects
Status: No status
Development

No branches or pull requests

4 participants