-
Notifications
You must be signed in to change notification settings - Fork 31
CEP XXXX: MatchSpec minilanguage
#82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
I'm seeing myself referring to the "MatchSpec" interface in other CEPs yet this is not standardized, so there we go. Let's open that can of worms. |
|
This will probably need another CEP on |
cep-??.md
Outdated
|
|
||
| ### Exact matches | ||
|
|
||
| To fully-specify a package record with a full, exact spec, these fields must be given as exact values: `channel` (preferrably by URL), `subdir`, `name`, `version`, `build`. Alternatively, an exact spec can also be given by `*[md5=12345678901234567890123456789012]` or `*[sha256=f453db4ffe2271ec492a2913af4e61d4a6c118201f07de757df0eff769b65d2e]`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When matching by checksum, should you also add the subdir? If I'm not mistaken, it's possible for two subdirs to contain a package with the same checksum right? Or is this a corner case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These checksums are coming from the compressed artifacts, so in principle they should be unique (even with unique contents, the index.json file should have "subdir": <subdir>, I think?).
The hash that conda-build uses for the build_string doesn't consider the subdir, indeed (and maybe it should).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just FYI, rattler does not currently support this. There we require that at least the package name is still specified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the great write up @jaimergp !
cep-??.md
Outdated
|
|
||
| The simplest form merely consists of up to three positional arguments: `name [version [build]]`. Only `name` is required. `version` can be any version specifier. `build` can be any string matcher. See "Match conventions" below. | ||
|
|
||
| The positional syntax also allows the `=` character as a separator, instead of a space. When this is the case, versions are interpreted differently. `pkg=1.8` will be taken as `1.8.*` (fuzzy), but `pkg 1.8` will give `1.8` (exact). To have fuzzy matches with the space syntax, you need to use `pkg =1.8`. This nuance does not apply if a `build` string is present; both `foo==1.0=*` and `foo=1.0=*` are equivalent (they both understand the version as `1.0`, exact). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is just reporting the current state of affairs but, jucky.
In rattler, this form is no longer allowed when parsing in strict mode. (still accepted in lenient parsing mode).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@baszalmstra which form is not allowed?
IIRC in mamba pkg 1.8 and pkg =1.8 are the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The form foo=1.0=bla is disallowed! (in strict mode only, used in rattler build)
cep-??.md
Outdated
|
|
||
| ### Exact matches | ||
|
|
||
| To fully-specify a package record with a full, exact spec, these fields must be given as exact values: `channel` (preferrably by URL), `subdir`, `name`, `version`, `build`. Alternatively, an exact spec can also be given by `*[md5=12345678901234567890123456789012]` or `*[sha256=f453db4ffe2271ec492a2913af4e61d4a6c118201f07de757df0eff769b65d2e]`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just FYI, rattler does not currently support this. There we require that at least the package name is still specified.
cep-9999.md
Outdated
| 6. If `channel` is an exact value and `subdir` is an exact value, `subdir` is appended to | ||
| `channel` with a `/` separator. Otherwise, `subdir` is included in the key-value brackets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this related to the label channels? e.g. pytorch/label/nightly::libfaiss?
With the seperator logic this will be assumed to be a subdir.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logic in conda is to take the last component and compare it against known subdirs. As a result, channels cannot be named like subdirs. e. g. I can't register a channel named linux-64.
|
Not sure about the current status of this CEP, but before moving forward with it, we should maybe consider finalizing this one if we think it could be of interest? |
cep-9999.md
Outdated
|
|
||
| These are also accepted but have reduced utility. Their usage is discouraged: | ||
|
|
||
| - `url` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that a full URL can be parsed into a MatchSpec object so... should a URL be considered a valid form? In those cases, a note: parsers need to account for %-decoding. See xref conda/conda#14481.
Co-authored-by: Bas Zalmstra <[email protected]>
Co-authored-by: JeanChristopheMorinPerso <[email protected]>
|
@baszalmstra, @beckermr, @AntoinePrv, @ruben-arts, @JeanChristopheMorinPerso, I've tackled some of the pending items if you want to take a look. Perhaps more critically, the version strings and ordering conversation is now part of #132. I think I'll rewrite part of the Specification so we don't lose time with historical details and go straight for the syntax, since it's all intertwined anyway... This is valid 🤦: >>> str(MS("channel:namespace:pkg 1 2[subdir=linux-63,channel=XX,name=jaime]"))
'XX/linux-63::pkg==1=2' |
MatchSpec minilanguage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great!
| - `version: str | VersionSpec`. Optional. | ||
| - `build: str`. Optional. It requires `version` to be present. | ||
| - All keyword expressions are optional. If present, they MUST be enclosed in a single set of square brackets, after the positional expressions. The following rules apply: | ||
| - Keyword expressions are written as key-value pairs. They MUST be built by joining the name of the target field (key) and the expression string (value) with a single `=` character. The expression string MUST be quoted with single `'` or double `"` quotes if it contains spaces. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The expression string MUST be quoted with single
'or double"quotes if it contains spaces.
Is this refering to the value of the expression as a whole? "The expression string" feels unclear to me. I could interpret this as ['key=value'] for example.
| - `build: str`. Optional. It requires `version` to be present. | ||
| - All keyword expressions are optional. If present, they MUST be enclosed in a single set of square brackets, after the positional expressions. The following rules apply: | ||
| - Keyword expressions are written as key-value pairs. They MUST be built by joining the name of the target field (key) and the expression string (value) with a single `=` character. The expression string MUST be quoted with single `'` or double `"` quotes if it contains spaces. | ||
| - Target-expression pairs MUST be separated by a single comma character `,`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the first reference to "target-expression" in this document so far. Maybe we could replace this with "Keyword expression pairs" since that's what line 58 calls them?
|
|
||
| The canonical string representation of a `MatchSpec` expression follows these rules: | ||
|
|
||
| 1. `name` is required and MUST be written as a positional expression. Its value MAY be `*` if necessary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should cases when * is necessary for the name be documented?
| 1. `name` is required and MUST be written as a positional expression. Its value MAY be `*` if necessary. | ||
| 2. If `version` describes an exact equality expression, it MUST be written as a positional expression, prepended by `==`. If `version` denotes fuzzy equality (e.g. `1.11.*`), it MUST be written as a positional expression with the `.*` suffix left off and prepended by `=`. Otherwise `version` MUST be included inside the key-value brackets. | ||
| 3. If `version` is an exact equality expression, and `build` does not contain asterisks, `build` MUST be written as a positional expression, prepended by `=`. Otherwise, `build` MUST go inside the key-value brackets. | ||
| 4. If `channel` is defined and does not contain asterisks, a `::` separator is used between `channel` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe reorder this so that 4 and 5 become 1 and 2? This would feel more natural I think, following the left to right order.
|
|
||
| #### Version matching | ||
|
|
||
| Expressions targeting the `version` field MUST be handled with additional rules. These expressions are referred to as _version specifiers_. A _version identifier_ will refer to versions strings as described in [CEP PR #132](https://github.com/conda/ceps/pull/132). For ordering-aware comparisons, the implied ordering is also described in [CEP PR #132](https://github.com/conda/ceps/pull/132). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These expressions are referred to as version specifiers. A version identifier will refer to versions strings as described in CEP PR #132.
That referenced CEP doesn't contain the words "version identifier".
Should this be clarified?
| - `,` denotes the logical AND. | ||
| - `,` (AND) has higher precedence than `|` (OR). | ||
|
|
||
| A _version clause_ MUST follow one of these conventions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find version specifier, version identifier and version clauses confusing. Maybe we could give small example like: "in 1.2|==4.0.0, 1.2|==4.0.0 is a version specifier, 1.2 and ==4.0.01 are version clauses, and 1.2 and 4.0.0 are version identifiers" or something like that?
|
Finally got this up on GitHub. If anyone's interested, I started a collection of strings that That repo also contains a Lark EBNF-type grammar for |
|
@chenghlee shared this gem yesterday 😂 😭 >>> from conda.models.match_spec import MatchSpec
>>> MatchSpec("foo")
MatchSpec("foo")
>>> MatchSpec("foo=")
MatchSpec("foo")
>>> MatchSpec("foo # comment")
MatchSpec("foo")
>>> MatchSpec("foo=# comment")
MatchSpec("foo")
>>> MatchSpec("foo # comment")
MatchSpec("foo")
>>> MatchSpec("foo= # comment")
Traceback (most recent call last):
File "/opt/conda/lib/python3.11/site-packages/conda/models/version.py", line 44, in __call__
return cls._cache_[arg]
~~~~~~~~~~~^^^^^
...
conda.exceptions.InvalidMatchSpec: Invalid spec 'foo= # comment': Invalid version '=': invalid operatorWe also re-discovered that |
|
There's also this gem: How |
|
Perhaps we can keep the general logic of the language and then add a collection section of "known allowed exception to the previous rules" with everything that is a strong candidate for deprecation (in a future CEP). |
|
We should write only what we want in the CEP now. The deprecation of unsupported syntax is a separate issue to manage directly on conda. |
Checklist for submitter
cep-0000.mdnamedcep-XXXX.mdin the root level.Checklist for CEP approvals
${greatest-number-in-main} + 1.cep-XXXX.mdfile has been renamed accordingly.# CEP XXXX -header has been edited accordingly.pre-commitchecks are passing.Closes #80