Thanks to visit codestin.com
Credit goes to github.com

Skip to content

GH-80789: Bundle ensurepip wheels at build time #109130

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 12 commits into from

Conversation

AA-Turner
Copy link
Member

@AA-Turner AA-Turner commented Sep 8, 2023

Based on the discussion in #12791, this is a sketch of a different approach (building very heavily on @webknjaz's work).

We add a fairly simple bundler script at Tools/build/bundle_ensurepip_wheels.py, with two changes to ensurepip itself to adapt. This PR doesn't attempt to switch to the one-project model, as I found the diff was too large to reasonably review.

One question: should we make this part of the default build process (i.e. integration into make all / PCBuild)?

A


📚 Documentation preview 📚: https://cpython-previews--109130.org.readthedocs.build/

@AA-Turner AA-Turner requested a review from a team as a code owner September 8, 2023 10:56
@AA-Turner
Copy link
Member Author

0aaf135 was an attempt to integrate this into the Makefile/PCBuild more 'properly', but I'm not confident in making wider-scale changes here (said commit didn't work, so I've reverted it).

Copy link
Member

@zware zware left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is going to need something for buildbots to avoid breaking them all, but I like the direction here.

@ned-deily
Copy link
Member

Adding @ambv and @Yhg1s for review of impact on release process and supply chain security.

.cirrus.yml Outdated
@@ -20,6 +20,8 @@ freebsd_task:
pythoninfo_script:
- cd build
- make pythoninfo
bundle_ensurepip_script:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bundle_ensurepip_script:
# Download wheels for the venv step
bundle_ensurepip_script:

Is this the reason for adding it here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything using venv or ensurepip during testing needs this provisioning step. If there's no uses, it'd be unnecessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual reason to include it is that Cirrus CI failed without this and passed with it. I don't know why Free BSD requires the venv step though, as all other tests pass normally -- only documentation and hypothesis (which use venvs) seem to have an explicit venv step.

A

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AA-Turner It's not because of venv but ensurepip itself. The call chain looks as follows:

  1. test.test_tools.test_freeze.TestFreeze.test_freeze_simple_script calls a CPython build preparation helper @ https://github.com/python/cpython/blob/cbb3a6f/Lib/test/test_tools/test_freeze.py#L28
  2. That, in turn, calls make install here https://github.com/python/cpython/blob/cbb3a6f/Tools/freeze/test/freeze.py#L184.
  3. make install calls ensurepip at https://github.com/python/cpython/blob/cbb3a6f/Makefile.pre.in#L1932-L1933 to provision it into the new Python install directory.
  4. ensurepip uses the bundled pip wheel to run bootstrapping so it attempts to unpack it and freaks out when there's no dist for it: https://github.com/python/cpython/blob/cbb3a6f/Lib/ensurepip/__init__.py#L176.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why Free BSD requires

It's not that FreeBSD or Cirrus requires the test explicitly, it's rather that this test is skipped on most platforms via https://github.com/python/cpython/blob/cbb3a6f/Lib/test/test_tools/test_freeze.py#L11-L20.

@webknjaz
Copy link
Contributor

This is going to need something for buildbots to avoid breaking them all, but I like the direction here.

I identified the place needing to be changed in #12791 (comment).

@AA-Turner FYI

@AA-Turner
Copy link
Member Author

I identified the place needing to be changed

I tried to explore a different approach of fully integrating this into make all in 0aaf135, which I think may be a better approach as it doesn't require downsteam distributors to change anything in their processes. I defer to those with much more experience with the build systems, though.

A

@webknjaz
Copy link
Contributor

I tried to explore a different approach of fully integrating this into make all

I tried doing that in my PR and was getting weird behavior, I wasn't sure why. Maybe I was misusing something in the build machinery or was just confused...

If that works, it's probably fine. OTOH, you may want to take into account that many distributions prefer their build envs to be disconnected from the internet so it might still be wise to provide them with means to pre-provision the wheel separately.

Tagging @encukou @hroncok @mgorny for downstream opinions.

@webknjaz
Copy link
Contributor

This PR doesn't attempt to switch to the one-project model, as I found the diff was too large to reasonably review.

Would #109245 help?

@mgorny
Copy link
Contributor

mgorny commented Sep 11, 2023

If that works, it's probably fine. OTOH, you may want to take into account that many distributions prefer their build envs to be disconnected from the internet so it might still be wise to provide them with means to pre-provision the wheel separately.

In Gentoo we aren't actually using the wheels originally bundled with CPython itself but providing the newest versions of the them separately, so as long as that continues working, I suppose that's fine with us. However, it is is paramount that:

  1. The build process and test suite fully respect --with-wheel-pkg-dir and do not attempt to download anything.
  2. This is entirely restricted to CPython build-time, and in particularly creating venv doesn't attempt any fetching.

However, I can imagine it could be mildly annoying to users who aren't used to the new workflow that having a complete git clone would no longer suffice to actually build CPython offline.

@hroncok
Copy link
Contributor

hroncok commented Sep 11, 2023

Same for Fedora. With an addition that we do sometimes keep the bundled wheels for too new or too old Pythons, so having a way to pre-populate the wheels instead of downloaidng them would still be needed.

@pradyunsg
Copy link
Member

@AA-Turner could you resolve the conflicts with this PR?

I think it would be good to have this be in Python 3.13's cycle as early as possible, to allow any potential redistributors who care about the details of ensurepip to have time to adjust things as well as provide us feedback on this as early as feasible.

spec = importlib.util.spec_from_file_location("ensurepip", ENSURE_PIP_INIT)
ensurepip = importlib.util.module_from_spec(spec)
spec.loader.exec_module(ensurepip)
return ensurepip._PROJECTS
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AA-Turner I believe this would need to be updated after #109245, and it'll probably let you work with simpler structures.

whl = response.read()
except URLError as exc:
print_error(f"Failed to download {wheel_url!r}: {exc}")
errors = 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this meant to be a counter?

Suggested change
errors = 1
errors += 1

continue
else:
print_error(f"An invalid '{name}' wheel exists.")
os.remove(wheel_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to use pathlib's method? Seeing that it's already used everywhere..

from urllib.error import URLError
from urllib.request import urlopen

HOST = 'https://files.pythonhosted.org'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we name this something like a PYPI_CDN_URL for maintainability?


def print_notice(message: str) -> None:
if GITHUB_ACTIONS:
print(f"::notice::{message}", end="\n\n")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this also works if output to stderr FYI.


try:
projects = _get_projects()
except (AttributeError, TypeError):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's an abstraction leak here: it's hard to guess from looking at the _get_projects() function, what in it might trigger these exceptions. I'd recommend processing them inside that function and making obvious such places where the exceptions may occur. Instead, I'd convert both of the exceptions into something like an ImportError and handle just that. This would contribute to transparency of how that function works.

return 1

errors = 0
for name, version, checksum in projects:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW with #109245, looping will stop making sense here. So maybe you could start working with just projects[0] already?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, let's keep them separate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was kinda assuming that the simplification PR would get merged first, in which case, this one would have to adapt..

@@ -907,6 +907,10 @@ Build Changes
* Building CPython now requires a compiler with support for the C11 atomic
library, GCC built-in atomic functions, or MSVC interlocked intrinsics.

* Wheels for :mod:`ensurepip` are no longer bundled in the CPython source
tree. Distributors should bundle these as part of the build process by
running :file:`Tools/build/bundle_ensurepip_wheels.py`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would probably be clearer:

Suggested change
running :file:`Tools/build/bundle_ensurepip_wheels.py`.
running :file:`Tools/build/bundle_ensurepip_wheels.py` with no arguments.

# ensure the pip wheel exists
pip_filename = os.path.join(test.support.STDLIB_DIR, 'ensurepip', '_bundled',
f'pip-{ensurepip._PIP_VERSION}-py3-none-any.whl')
if not os.path.isfile(pip_filename):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like if it didn't exist, there should be some code to clean it up after testing so that the dummy file doesn't get forgotten on disk. Tests should avoid side effects...

import bundle_ensurepip_wheels as bew


# Disable fancy GitHub actions output during the tests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a test for enabled GHA, just for those helpers, then. It'd be sad not to get coverage there.



def _is_valid_wheel(content: bytes, *, checksum: str) -> bool:
return checksum == sha256(content, usedforsecurity=False).hexdigest()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AA-Turner I already know that this effort is going to be rejected for now, but out of curiosity — what's the motivation for setting usedforsecurity=False? The other verification script doesn't have that: https://github.com/python/cpython/blob/3d18034/Tools/build/verify_ensurepip_wheels.py#L81.

@pradyunsg
Copy link
Member

pradyunsg commented Oct 12, 2023

Closing this per #80789 (comment). Thanks @AA-Turner for filing this, even though we're not going ahead with this (despite me implying that earlier in the discussion).

@pradyunsg pradyunsg closed this Oct 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants