Thanks to visit codestin.com
Credit goes to github.com

Skip to content

ignore .pyc files in py_library() targets created in pip_install_dependencies() #825

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

lazcamus
Copy link
Contributor

@lazcamus lazcamus commented Sep 12, 2022

.pyc files are not hermetic and ruin hashes for remote-caching. Observed with pypi__pip and pypi__setuptools

PR Checklist

Please check if your PR fulfills the following requirements:

  • Tests for the changes have been added (for bug fixes / features)
  • Docs have been added / updated (for bug fixes / features)

PR Type

What kind of change does this PR introduce?

  • [ X ] Bugfix

What is the current behavior?

Issue Number: N/A

The py_library()ies created by pip_install_requirements() have checksums for .pyc files included. These files don't match from one install to the next, and because they are build inputs they guarantee a remote-cache miss.

I noticed this because the requirements_test target that's set up by compile_pip_requirements() is never a cache hit in my CI pipeline.

Running bazel with --execution_log_json_file= allowed me to debug. When run twice on the same machine with a remote-cache, even with a bazel clean in between still yields a cache hit.

But when running bazel on 2 different executors (docker containers), then diffing the runs yields only file hash changes:

--- exec1.json  2022-09-12 15:29:38.000000000 +0900
+++ exec1-1.json        2022-09-12 15:34:03.000000000 +0900
@@ -2343,21 +2343,21 @@
   }, {
     "path": "external/pypi__pip/pip/_internal/utils/__pycache__/compat.cpython-310.pyc",
     "digest": {
-      "hash": "72f8ee456dd18c7b6b370e7705c1f4a254b56e9291182cf40445e2be6fbd0ef5",
+      "hash": "6691abe643cd9e5aae626749720fa8c49d2b3bb3ed3e653e3a90378336b096ad",
       "sizeBytes": "1534",
       "hashFunctionName": "SHA-256"
     }
   }, {
     "path": "external/pypi__pip/pip/_internal/utils/__pycache__/compat.cpython-39.pyc",
     "digest": {
-      "hash": "d1933621a8710b6b017e6b8c6516e81bf5c99a5f22e2aaac45161f13e5b9a501",
+      "hash": "0078e3ce171c2162f656db8fe2b40bcfc27ae6d076f6194366ae85b027fd1603",
       "sizeBytes": "1535",
       "hashFunctionName": "SHA-256"
     }
   }, {
     "path": "external/pypi__pip/pip/_internal/utils/__pycache__/compatibility_tags.cpython-310.pyc",
     "digest": {
-      "hash": "6b7d2b0c22492c289a93fbc34e11fea1f4c8941916b314074e515af2f0063ace",
+      "hash": "7854acaf76bb430d0e57dcc878824114b5fcf1f44735c129bdcfc85298d82b0a",
       "sizeBytes": "4103",
       "hashFunctionName": "SHA-256"
     }
...

It seems to only include hashes from pypi__pip and pypi__setuptools.

What is the new behavior?

With my patch applied, the remote-cache is a hit on differing CI executors when previously it was not.

Does this PR introduce a breaking change?

  • Yes
  • [ X ] No

Other information

…ndencies()

.pyc files are not hermetic and ruin hashes for remote-caching. Observed with
pypi__pip and pypi__setuptools
@hrfuller
Copy link
Contributor

hrfuller commented Sep 20, 2022

I'm nearly certain these wheels don't ship with bytecode. I think the byte code is getting generated when we use these external repos during repository rule execution. Can you try adding -B to this args list

@f0rmiga
Copy link
Member

f0rmiga commented Oct 10, 2022

This is related to #749. Please, see the related PRs in that description.

@f0rmiga f0rmiga closed this Oct 10, 2022
@f0rmiga
Copy link
Member

f0rmiga commented Oct 10, 2022

Please, reopen if running as non-root doesn't solve your problem.

@lazcamus
Copy link
Contributor Author

I'm not running bazel as root, but I'll double back and debug this again

@lazcamus
Copy link
Contributor Author

I verified I'm not running this as root or with any god-like container powers. It's a builder user id 1337 in circleci's x86 container deployment.

WRT #749 the chown on lib in the hermetic python install isn't going to affect the wheel install directory under pypi__setuptools/ ... so I don't think it's related?

@hrfuller passing -B does fix this for me. I copied the "ignore *.pyc" pattern from https://github.com/bazelbuild/rules_python/blob/main/python/pip_install/extract_wheels/bazel.py#L140 , but I could do a PR to whack this mole in the -B way if it's preferred.

@f0rmiga I don't know how to "re-open" this. Do I submit another PR, file an issue, both?

@f0rmiga f0rmiga reopened this Oct 12, 2022
@f0rmiga
Copy link
Member

f0rmiga commented Oct 12, 2022

I reopened. You can just update this PR so we keep the history.

@lazcamus
Copy link
Contributor Author

Looking at the -B solution again, it works for my circleci use case where the test is run with a fresh cache and nothing is run before bazel test, but upon further inspection it doesn't work for subsequent runs of the same target.

This is because the first run builds the .pyc files, and then they're subsequently included in the dep.

No remote cache, I rm -rf'd the local cache dir. A bazel query with the -B change applied yields:

laz@builder-git_bando ~/git/bando$ bazel query 'deps(@pypi__pip//:lib)' | grep pyc
Loading: 0 packages loaded
Loading: 0 packages loaded
Loading: 0 packages loaded

But if I run a target that has it listed as a dependency, the .pyc files get created. The file glob picks them up the next time:

laz@builder-git_bando ~/git/bando$ bazel test //build:requirements_test
(23:53:46) INFO: Current date is 2022-10-12
(23:53:58) INFO: Analyzed target //build:requirements_test (67 packages loaded, 4020 targets configured).
(23:53:58) INFO: Found 1 test target...
Target //build:requirements_test up-to-date:
  bazel-bin/build/requirements_test
(23:56:33) INFO: Elapsed time: 167.735s, Critical Path: 154.48s
(23:56:33) INFO: 5 processes: 3 internal, 2 processwrapper-sandbox.
(23:56:33) INFO: Build completed successfully, 5 total actions
//build:requirements_test                                                PASSED in 153.6s

(23:56:33) INFO: Build completed successfully, 5 total actions
laz@builder-git_bando ~/git/bando$ bazel query 'deps(@pypi__pip//:lib)' | grep pyc
Loading: 0 packages loaded
@pypi__pip//:pip/__pycache__/__init__.cpython-310.pyc
@pypi__pip//:pip/__pycache__/__main__.cpython-310.pyc
@pypi__pip//:pip/_internal/__pycache__/__init__.cpython-310.pyc
@pypi__pip//:pip/_internal/__pycache__/build_env.cpython-310.pyc
@pypi__pip//:pip/_internal/__pycache__/cache.cpython-310.pyc
@pypi__pip//:pip/_internal/__pycache__/configuration.cpython-310.pyc
@pypi__pip//:pip/_internal/__pycache__/exceptions.cpython-310.pyc
@pypi__pip//:pip/_internal/__pycache__/pyproject.cpython-310.pyc
@pypi__pip//:pip/_internal/__pycache__/self_outdated_check.cpython-310.pyc
@pypi__pip//:pip/_internal/__pycache__/wheel_builder.cpython-310.pyc
@pypi__pip//:pip/_internal/cli/__pycache__/__init__.cpython-310.pyc
@pypi__pip//:pip/_internal/cli/__pycache__/autocompletion.cpython-310.pyc
@pypi__pip//:pip/_internal/cli/__pycache__/base_command.cpython-310.pyc

So for local caching, adding -B breaks test execution caching between executions 1 and 2, but the .pyc files don't change so the cache will be a hit on subsequent runs.

For remote caching, -B fixes the first invocation of the test b/c the .pyc files aren't included as deps just after the wheel install, but it will break the cache on subsequent runs because they get deposited on the FS when code is run.

Unless you would prefer doing readonly chowns on every wheel install, like we do with the hermetic python, I think ignoring the .pyc files in the glob is the proper fix and this PR stands as-is?

@f0rmiga
Copy link
Member

f0rmiga commented Nov 2, 2022

I don't think making these .pyc files read-only makes sense as it does with the hermetic toolchain. Probably ignoring them from the glob is better in this case.

@mattem what do you think?

@f0rmiga f0rmiga requested review from mattem and removed request for brandjon and lberki November 2, 2022 18:40
@f0rmiga
Copy link
Member

f0rmiga commented Dec 1, 2022

@lazcamus could you resolve the conflicts, please? I'll go ahead and merge this.

@lazcamus
Copy link
Contributor Author

lazcamus commented Dec 6, 2022

the conflicts are from it being fixed by #881 ... guess I should have added more comments ;)

closing this PR :)

@lazcamus lazcamus closed this Dec 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants