Thanks to visit codestin.com
Credit goes to github.com

Skip to content

bpo-18819: tarfile: only set device fields for device files #18080

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

wchargin
Copy link
Contributor

@wchargin wchargin commented Jan 20, 2020

The GNU docs describe the devmajor and devminor fields of the tar
header struct only in the context of character and block special files,
suggesting that in other cases they are not populated. Typical utilities
behave accordingly; this patch teaches tarfile to do the same.

Test Plan:
Unit tests added; they fail before this commit and pass after it. Also
verified that this enables output that is bit-for-bit compatible with
GNU tar. In particular, this program now passes on my Ubuntu 16.04,
whereas it failed before this patch:

import os
import subprocess
import tarfile
import tempfile

filename = "important_data"
contents = b"The quick brown fox jumps over the lazy dog"

with tempfile.TemporaryDirectory() as tmpdir:
    os.chdir(tmpdir)
    with open(filename, "wb") as outfile:
        outfile.write(contents)
    with tarfile.open("py.tar", "x", format=tarfile.GNU_FORMAT) as outfile:
        outfile.add(filename)
    subprocess.check_call(["tar", "cf", "gnu.tar", filename])
    subprocess.check_call(["cmp", "-b", "py.tar", "gnu.tar"])

(The exact contents of the files depend on the calling user and current
time, but should always be the same across both output archives.)

wchargin-branch: tarfile-limit-device-headers

https://bugs.python.org/issue18819

The GNU docs describe the `devmajor` and `devminor` fields of the tar
header struct only in the context of character and block special files,
suggesting that in other cases they are not populated. Typical utilities
behave accordingly; this patch teaches `tarfile` to do the same.

Test Plan:
No tests added because none appear to exist for this module. Manually
verified that this enables output that is bit-for-bit compatible with
GNU tar. In particular, this program now passes on my Ubuntu 16.04,
whereas it failed before this patch:

```python
import os
import subprocess
import tarfile
import tempfile

filename = "important_data"
contents = b"The quick brown fox jumps over the lazy dog"

with tempfile.TemporaryDirectory() as tmpdir:
    os.chdir(tmpdir)
    with open(filename, "wb") as outfile:
        outfile.write(contents)
    with tarfile.open("py.tar", "x", format=tarfile.GNU_FORMAT) as outfile:
        outfile.add(filename)
    subprocess.check_call(["tar", "cf", "gnu.tar", filename])
    subprocess.check_call(["shasum", "-a", "256", "py.tar", "gnu.tar"])
    subprocess.check_call(["cmp", "-b", "py.tar", "gnu.tar"])
```

(The exact hashes depend on the calling user and current time but should
always be the same across both output archives.)

wchargin-branch: tarfile-limit-device-headers
@ethanfurman
Copy link
Member

The code looks good. Question, though: why use both cmp and shasum? If cmp passes aren't the files identical?

Also, the tests for tarfile are in Lib/test/test_tarfile.py. A quick perusal suggests that MemberReadTest might be a good place for your test above (appropriately modified, of course). Feel free to suggest a different class in that file, though.

@ethanfurman ethanfurman self-assigned this Jan 22, 2020
Copy link
Member

@ethanfurman ethanfurman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. Add your test and we should be ready to commit.

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

Copy link
Contributor Author

@wchargin wchargin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question, though: why use both cmp and shasum? If cmp passes aren't the files identical?

Yes; I had the shasum just as a sanity check that the files weren’t
empty, but you’re right that the cmp is certainly sufficient. I’ll go
ahead and remove this from the description to avoid confusion.

Also, the tests for tarfile are in Lib/test/test_tarfile.py.

Great, thanks. Not sure how I missed this.

Add your test and we should be ready to commit.

Will add a test to test_tarfile.py, probably more like the existing
tests in that file than like the end-to-end test in the description
(which would only work on systems with GNU tar installed).

Thanks for the speedy review! I’ll add a test and get back to you.

Test Plan:
This passes as written, and fails if the previous commit is reverted.

wchargin-branch: tarfile-limit-device-headers
@wchargin
Copy link
Contributor Author

I’ve put the test in a new test class because it needs to run only in
the uncompressed case (or, we’d need special handling to decompress the
compressed archives, but I don’t see any benefit to doing that).

I have made the requested changes; please review again.

@bedevere-bot
Copy link

Thanks for making the requested changes!

@ethanfurman: please review the changes made to this pull request.

Copy link
Member

@ethanfurman ethanfurman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good. Nice tests.

Please add yourself to Misc/ACKS and we should be good to go. (Sorry for not noticing that earlier.)

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

(Per request of Ethan Furman.)

wchargin-branch: tarfile-limit-device-headers
@wchargin
Copy link
Contributor Author

Done; thanks!

I have made the requested changes; please review again.

@bedevere-bot
Copy link

Thanks for making the requested changes!

@ethanfurman: please review the changes made to this pull request.

@wchargin
Copy link
Contributor Author

I’m not sure how to interpret the coverage check failure. As far as
I can tell, this PR doesn’t add any uncovered lines or branches. Each
file in the Codecov “diff” has positive coverage delta, but the
“coverage changes” tab shows 363 files, almost all unrelated.

@ethanfurman, is there any action that I should take here?

@ethanfurman ethanfurman merged commit 674935b into python:master Feb 12, 2020
sthagen added a commit to sthagen/python-cpython that referenced this pull request Feb 12, 2020
bpo-18819: tarfile: only set device fields for device files (pythonGH-18080)
@wchargin
Copy link
Contributor Author

Awesome; thanks for merging, and thanks for your help on this PR! :-)

@wchargin wchargin deleted the wchargin-tarfile-limit-device-headers branch February 12, 2020 19:57
dataoleg added a commit to dataoleg/rules_docker that referenced this pull request Sep 15, 2021
python/cpython#18080 hasn't been back-ported to python3.8, which happens to be
the default interpreter version in a lot of popular base docker images.

This change in python's tarfile implementation affects the behavior of
https://github.com/bazelbuild/rules_docker/blob/bdea23404dd256c7ae2f8e02d346428044cc7d91/container/archive.py#L144
used by a popular container_image rule. This means that images built by bazel
in environments with different python3 versions will result in different image
digests.

Until the hermeticity of python toolchains is guaranteed, it may be useful to
have a workaround for users who can provide their own build_tar tool, akin to
https://github.com/kubernetes/repo-infra/blob/72a6f5d05659f7d255b51f152e0725adfe970718/tools/build_tar/buildtar.go
dataoleg added a commit to dataoleg/rules_docker that referenced this pull request Oct 4, 2021
python/cpython#18080 hasn't been back-ported to python3.8, which happens to be
the default interpreter version in a lot of popular base docker images.

This change in python's tarfile implementation affects the behavior of
https://github.com/bazelbuild/rules_docker/blob/bdea23404dd256c7ae2f8e02d346428044cc7d91/container/archive.py#L144
used by a popular container_image rule. This means that images built by bazel
in environments with different python3 versions will result in different image
digests.

Until the hermeticity of python toolchains is guaranteed, it may be useful to
have a workaround for users who can provide their own build_tar tool, akin to
https://github.com/kubernetes/repo-infra/blob/72a6f5d05659f7d255b51f152e0725adfe970718/tools/build_tar/buildtar.go
gravypod pushed a commit to bazelbuild/rules_docker that referenced this pull request Dec 6, 2021
python/cpython#18080 hasn't been back-ported to python3.8, which happens to be
the default interpreter version in a lot of popular base docker images.

This change in python's tarfile implementation affects the behavior of
https://github.com/bazelbuild/rules_docker/blob/bdea23404dd256c7ae2f8e02d346428044cc7d91/container/archive.py#L144
used by a popular container_image rule. This means that images built by bazel
in environments with different python3 versions will result in different image
digests.

Until the hermeticity of python toolchains is guaranteed, it may be useful to
have a workaround for users who can provide their own build_tar tool, akin to
https://github.com/kubernetes/repo-infra/blob/72a6f5d05659f7d255b51f152e0725adfe970718/tools/build_tar/buildtar.go
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants