Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Full pep 508 / pypa dependency specifier support in starlark #2826

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
12 of 16 tasks
rickeylev opened this issue Apr 25, 2025 · 19 comments
Open
12 of 16 tasks

Full pep 508 / pypa dependency specifier support in starlark #2826

rickeylev opened this issue Apr 25, 2025 · 19 comments
Milestone

Comments

@rickeylev
Copy link
Collaborator

rickeylev commented Apr 25, 2025

This issue is to track a starlark implementation for the pypa-spec for requirements line parsing and metadata Requires-Dist parsing.

Right now, we have a partial implementation in pep508_evaluate.bzl.

Some relevant docs:

TODO

  • python_version - use the X.Y python version
  • python_full_version - use the X.Y.Z python version
  • os_name - use a value derived from @platforms//os:<name> values
  • sys_platform - use a value derived from @platforms//os:<name> values
  • platform_release - use "" for the value
  • platform_system - use a value derived from @platforms//os:<name> values
  • platform_version - uses "" for the value, missed the fact that we should default to 0
  • platform_machine - use a value derived from @platforms//cpu:<name> values
  • platform_python_implementation - Use CPython constant.
  • implementation_name - Use cpython constant.
  • Support * in version comparison (see https://peps.python.org/pep-0440/#compatible-release)

Some keys seem tedious to support

platform_release, platform_version

For pip, this is simply the host machine value. For bazel, it would mean whatever the target platform would return. Using the bazel host machine value wouldn't be correct for cross builds, but will probably have to suffice as the first impl and in the short term.

Long term, I think we'll need a --platform_{release,version} flag (or toolchain?), which points to a target that can provide the appropriate value.

Notable bits from the spec:


Variables whose value cannot be calculated on a given Python implementation should evaluate to 0 for versions, and an empty string for all other variables.

src


cc @aignas @groodt

@rickeylev
Copy link
Collaborator Author

(I didn't see an existing issue that captures a starlark impl of 508 in detail, so created this; close as a dupe if I just didn't see it)

@aignas
Copy link
Collaborator

aignas commented Apr 25, 2025

The === and ~= operator should work. The tests are here: https://github.com/bazel-contrib/rules_python/blob/main/tests/pypi/pep508/evaluate_tests.bzl#L120

I'll mark these as done in the issue description.

@groodt
Copy link
Collaborator

groodt commented Apr 26, 2025

Does there need to be an item for version detection? You need to parse values first (there's a regex) to know if version comparison should happen or fallback behaviour should happen.

Basically, to implement the version comparison ops you need to first know if you are doing version comparison on LHS and RHS first.

@aignas
Copy link
Collaborator

aignas commented Apr 26, 2025

There is a check that checks the name of the marker. All of the version markers have version in them. At the end of the day, there is a finite number of version markers that we can just have in a list. That list is used in https://github.com/bazel-contrib/rules_python/blob/main/python/private/pypi/pep508_evaluate.bzl#L329

@groodt
Copy link
Collaborator

groodt commented Apr 26, 2025

Rather than sniffing values for things or having them passed via flags, I'd consider ways to define a static "platform" and these are then used.

Most of the values used in marker evaluation come from "sysconfig" and it's easy to get a static json from that. Indeed, in future versions of CPython, there will be a static version as part of a CPython installation that downstream tools can use without invoking Python. But, for now, we can just make it easy for users to prepare "structs" for their expected target platforms and use these when evaluating markers in Starlark.

@aignas
Copy link
Collaborator

aignas commented Apr 26, 2025

(copied list to OP - rickeylev)

@aignas
Copy link
Collaborator

aignas commented Apr 26, 2025

Rather than sniffing values for things or having them passed via flags, I'd consider ways to define a static "platform" and these are then used.

Yeah, I am thinking that I need to work on a better bzlmod API to define target platforms to allow users to configure the values in here.

@aignas
Copy link
Collaborator

aignas commented Apr 26, 2025

Regarding the original ticket and the following list of TODO items:

TODO

I think we need to only implement the version comparison so that we can compare platform_version and python_version. That means that only a subset PEP508 needs to be supported.

As for version-specifiers, I think we don't need to worry about them because they are for package versions and not the versions that platform_version and python_version talk about? Am I wrong here?

What would >=, <=, <, > operators would be used for in string comparisons?

@aignas
Copy link
Collaborator

aignas commented Apr 26, 2025

Ah, I missed these comments: #2821 (comment) Now I see....

@groodt
Copy link
Collaborator

groodt commented Apr 26, 2025

There is a check that checks the name of the marker. All of the version markers have version in them.

That's probably pragmatic, but it's not to spec. I think driving the behaviour off the operator is more correct than driving it off the values. Indeed, platform_release can have version like semantics.

When evaluation occurs, what happens is that the marker values are plugged in and then evaluated as true or false. It actually doesn't matter what the value names are.

@rickeylev
Copy link
Collaborator Author

I created #2832 as a wip for an analysis-time flag that evaluates the pep 508 expressions. It's just WIP; i created it so other maintainers can directly edit that branch and easily grab it if they want.

I think we should have a CLI flag for each of these:

  • platform_release
  • platform_version

By default, we'll point the flag to some best guess for a value, or use whatever the "unknown" value the spec says. I think the rest of the values we can auto-determine based on existing flags.

We could put all the specifiers where a user can override them (flag or toolchain), but I think for a first impl, we can just populate it with what we know.

@rickeylev
Copy link
Collaborator Author

os_name

This one is a bit weird. I was going to base it on @platforms//os, too, but I think this is actually a property of the runtime, not target platform.

os.name docs say

The name of the operating system dependent module imported. The following names have currently been registered: 'posix', 'nt', 'java'

Some light research seems to indicate that os.name (originally?) represents how python is talking to the OS, and is more a python compile time setting than the runtime OS. "java" would be returned because, under jython, python is using java interfaces to talk to the OS.

Given that Jython is effectively defunct, I suspect this is a distinction without any meaning today, though, and don't see why platforms:os wouldn't suffice

@rickeylev
Copy link
Collaborator Author

Bah. It looks like pypa defines their own version format, and it's not entirely compatible with semantic versioning.

From https://packaging.python.org/en/latest/specifications/version-specifiers/#version-scheme

  • Public version: [N!]N(.N)*[{a|b|rc}N][.postN][.devN]
  • local version: <public version identifier>[+<local version label>]
  • local version label: [a-Z0-9.]

Aha, this is specified by pep 440, which we have a parser for already! https://github.com/bazel-contrib/rules_python/blob/main/python/private/py_wheel_normalize_pep440.bzl

@aignas
Copy link
Collaborator

aignas commented Apr 28, 2025

Nice find, however, these versions are for package versions but not necessarily the python interpreter versions. What is important here is that the PEP508 spec includes all of the following elements of the grammar:

<dep> <dep_version_specifier> ; <marker_expr>

The dep_version_specifier is the lowest priority in rules_python, IMHO, if we are relying on tools like uv. The <marker_expr> does not need all of the versions spec to be implemented here, I think.

github-merge-queue bot pushed a commit that referenced this issue Apr 29, 2025
…uation (#2827)

Right now, if two strings are compared, it results in an error.

Per spec, strings are suppose to "use the python behavior". Starlark is
going
to use Java semantics underneath, but it should behave close enough for
the (almost
exclusively) ASCII input that will be used.

Work towards #2826
@rickeylev
Copy link
Collaborator Author

Ah, I see what you mean. Yes, simple semver parsing of the values in marker_expr will probably work for most cases. The big one being the python version, obviously.

An unknown are the poorly specified fields. e.g. platform_release. We had a user report of e.g. platform_release < '9.0', which, according to spec, should use version_cmp semantics, but only if platform_release also evaluated to a version-like string.

@aignas
Copy link
Collaborator

aignas commented Apr 29, 2025

Yeah, that one is the most problematic.

Nevertheless, I think the version normalization rules may be good to have anyway, because one thing is the version, but the other thing is the expressions - Python is wild west and people may come up with very inventive ways to write them, so I would err on the caution side there with all my experience in the recent month. :D

@rickeylev
Copy link
Collaborator Author

re: providing user-configurable knobs for the env marker values

Here's my proposal: have a single "env marker config" target that is a flag.
By default, it points to //python/private/pypi:env_marker_default_config, which does most of the env-dict building logic env_marker_setting is computing itself in the PR right now. e.g

//python/config_settings:pip_env_marker_config
 label_flag default=//python/private/pypi:env_marker_default_config

//python/private/pypi:env_marker_default_config
  provides PyPiEnvMarkerInfo, which basically has the non-toolchain parts of the env dict

This allows users to set e.g. --@rules_python//python/config_settings:pip_env_marker_config=//some:target and have near total control over what values are used. If you have an esoteric platform/runtime/etc, you can now make the pypi-resolution step work for you, though it might require more wiring. Which is fine -- it's at least doable.


If we want, we can make the :env_marker_default_config target customizable using command line flags. I mention this because bazel build //my:bin --platform_machine=whatever sounds convenient? Honestly, I'd rather not do this unless requested / there's a clear user need/demand for it. So much can be auto-computed using the target platform info and toolchain that I'm not sure the value overriding a specific env marker has.

In anycase, this is fairly easy to implement: just make :env_marker_default_config have flags as implicit dependencies.

@aignas
Copy link
Collaborator

aignas commented May 2, 2025

+1 for providing the flags the way you suggest. I think our config setting could be the default value of that pip_env_marker_config and would act as the auto-detecting one.

@aignas
Copy link
Collaborator

aignas commented May 2, 2025

FYI Keith wrote in the issue an interesting python_full_version == "3.10.*", which I have never seen.

#2847 (comment)

EDIT: added a line to the TODO to support better version comparison with * in the versions. I think we can do something about it in reasonable way since there is an equivalence principle:

~= 2.2
>= 2.2, == 2.*

If we have correct support for 2.* then we could re-express the implementation of the ~= operators. Since we do have working ~= operators, I think the 2.* support is almost there.

github-merge-queue bot pushed a commit that referenced this issue May 2, 2025
…ers (#2832)

wip/prototype to help bootstrap the impl of an analysis-time flag that
evaluates
the pep508 dep specs

Creating a PR to make collab easier (maintainers can directly edit)

TODO:
* Remove the todo markers after discussion


Work towards #2826

---------

Co-authored-by: Ignas Anikevicius <[email protected]>
aignas added a commit that referenced this issue May 3, 2025
@aignas aignas added this to the v1.5 milestone May 3, 2025
github-merge-queue bot pushed a commit that referenced this issue May 5, 2025
This is a flag to start leveraging of the new code paths. The Starlark
implementation has been added in 1.4 and has been reverted in the latest
release candidates. The `env` variable will be a good way to roll it out
more
gradually and get more testing.

For now we are switching only the `whl_library` internals as the
`requirements.txt` files from `uv` may use `*` in `python_full_version`
and
`platform_version` that are not yet fully supported (#2826).

Main goals for this is to start using Starlark implementation so that we
don't
have any hidden variables. What is more, having this in Starlark is the
most
maintainable long-term solution for supporting cross-platform builds.

Work towards #260

---------

Co-authored-by: Richard Levasseur <[email protected]>
github-merge-queue bot pushed a commit that referenced this issue May 6, 2025
Fixes:
```
ERROR: /Users/fmeum/git/rules_python/tests/pypi/env_marker_setting/BUILD.bazel:3:30: Illegal ambiguous match on configurable attribute "platform_machine" in //tests/pypi/env_marker_setting:test_expr_python_full_version_lt_negative_subject:
@@platforms//cpu:aarch64
@@platforms//cpu:arm64
Multiple matches are not allowed unless one is unambiguously more specialized or they resolve to the same value. See https://bazel.build/reference/be/functions#select.
```

Work towards #2850.
Work towards #2826.
github-merge-queue bot pushed a commit that referenced this issue May 6, 2025
…2853)

This factors creation of (most of) the env marker dict into a separate
target and
provides a label flag to allow customizing the target that provides it.

This makes it easier for users to override how env marker values are
computed. The
`env_marker_setting` rule will still, if necessary, compute values from
the toolchain,
but existing keys (computed from the env marker config target) have
precedence.

The `EnvMarkerInfo` provider is the interface for implementing a custom
env marker
config target; it will be publically exposed in a subsequent PR.

Along the way, unify how the env dict and defaults are set.

Work towards #2826
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants