-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
bpo-18819: tarfile: only set device fields for device files #18080
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-18819: tarfile: only set device fields for device files #18080
Conversation
The GNU docs describe the `devmajor` and `devminor` fields of the tar header struct only in the context of character and block special files, suggesting that in other cases they are not populated. Typical utilities behave accordingly; this patch teaches `tarfile` to do the same. Test Plan: No tests added because none appear to exist for this module. Manually verified that this enables output that is bit-for-bit compatible with GNU tar. In particular, this program now passes on my Ubuntu 16.04, whereas it failed before this patch: ```python import os import subprocess import tarfile import tempfile filename = "important_data" contents = b"The quick brown fox jumps over the lazy dog" with tempfile.TemporaryDirectory() as tmpdir: os.chdir(tmpdir) with open(filename, "wb") as outfile: outfile.write(contents) with tarfile.open("py.tar", "x", format=tarfile.GNU_FORMAT) as outfile: outfile.add(filename) subprocess.check_call(["tar", "cf", "gnu.tar", filename]) subprocess.check_call(["shasum", "-a", "256", "py.tar", "gnu.tar"]) subprocess.check_call(["cmp", "-b", "py.tar", "gnu.tar"]) ``` (The exact hashes depend on the calling user and current time but should always be the same across both output archives.) wchargin-branch: tarfile-limit-device-headers
The code looks good. Question, though: why use both Also, the tests for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good. Add your test and we should be ready to commit.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question, though: why use both
cmp
andshasum
? Ifcmp
passes aren't the files identical?
Yes; I had the shasum
just as a sanity check that the files weren’t
empty, but you’re right that the cmp
is certainly sufficient. I’ll go
ahead and remove this from the description to avoid confusion.
Also, the tests for
tarfile
are inLib/test/test_tarfile.py
.
Great, thanks. Not sure how I missed this.
Add your test and we should be ready to commit.
Will add a test to test_tarfile.py
, probably more like the existing
tests in that file than like the end-to-end test in the description
(which would only work on systems with GNU tar
installed).
Thanks for the speedy review! I’ll add a test and get back to you.
Test Plan: This passes as written, and fails if the previous commit is reverted. wchargin-branch: tarfile-limit-device-headers
I’ve put the test in a new test class because it needs to run only in I have made the requested changes; please review again. |
Thanks for making the requested changes! @ethanfurman: please review the changes made to this pull request. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good. Nice tests.
Please add yourself to Misc/ACKS and we should be good to go. (Sorry for not noticing that earlier.)
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
(Per request of Ethan Furman.) wchargin-branch: tarfile-limit-device-headers
Done; thanks! I have made the requested changes; please review again. |
Thanks for making the requested changes! @ethanfurman: please review the changes made to this pull request. |
I’m not sure how to interpret the coverage check failure. As far as @ethanfurman, is there any action that I should take here? |
bpo-18819: tarfile: only set device fields for device files (pythonGH-18080)
Awesome; thanks for merging, and thanks for your help on this PR! :-) |
python/cpython#18080 hasn't been back-ported to python3.8, which happens to be the default interpreter version in a lot of popular base docker images. This change in python's tarfile implementation affects the behavior of https://github.com/bazelbuild/rules_docker/blob/bdea23404dd256c7ae2f8e02d346428044cc7d91/container/archive.py#L144 used by a popular container_image rule. This means that images built by bazel in environments with different python3 versions will result in different image digests. Until the hermeticity of python toolchains is guaranteed, it may be useful to have a workaround for users who can provide their own build_tar tool, akin to https://github.com/kubernetes/repo-infra/blob/72a6f5d05659f7d255b51f152e0725adfe970718/tools/build_tar/buildtar.go
python/cpython#18080 hasn't been back-ported to python3.8, which happens to be the default interpreter version in a lot of popular base docker images. This change in python's tarfile implementation affects the behavior of https://github.com/bazelbuild/rules_docker/blob/bdea23404dd256c7ae2f8e02d346428044cc7d91/container/archive.py#L144 used by a popular container_image rule. This means that images built by bazel in environments with different python3 versions will result in different image digests. Until the hermeticity of python toolchains is guaranteed, it may be useful to have a workaround for users who can provide their own build_tar tool, akin to https://github.com/kubernetes/repo-infra/blob/72a6f5d05659f7d255b51f152e0725adfe970718/tools/build_tar/buildtar.go
python/cpython#18080 hasn't been back-ported to python3.8, which happens to be the default interpreter version in a lot of popular base docker images. This change in python's tarfile implementation affects the behavior of https://github.com/bazelbuild/rules_docker/blob/bdea23404dd256c7ae2f8e02d346428044cc7d91/container/archive.py#L144 used by a popular container_image rule. This means that images built by bazel in environments with different python3 versions will result in different image digests. Until the hermeticity of python toolchains is guaranteed, it may be useful to have a workaround for users who can provide their own build_tar tool, akin to https://github.com/kubernetes/repo-infra/blob/72a6f5d05659f7d255b51f152e0725adfe970718/tools/build_tar/buildtar.go
The GNU docs describe the
devmajor
anddevminor
fields of the tarheader struct only in the context of character and block special files,
suggesting that in other cases they are not populated. Typical utilities
behave accordingly; this patch teaches
tarfile
to do the same.Test Plan:
Unit tests added; they fail before this commit and pass after it. Also
verified that this enables output that is bit-for-bit compatible with
GNU
tar
. In particular, this program now passes on my Ubuntu 16.04,whereas it failed before this patch:
(The exact contents of the files depend on the calling user and current
time, but should always be the same across both output archives.)
wchargin-branch: tarfile-limit-device-headers
https://bugs.python.org/issue18819