Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@jaimergp
Copy link
Contributor

@jaimergp jaimergp commented Jun 4, 2024

Checklist for submitter

  • I am submitting a new CEP: MatchSpec minilanguage.
    • I am using the CEP template by creating a copy cep-0000.md named cep-XXXX.md in the root level.
  • I am submitting modifications to CEP XX.
  • Something else: (add your description here).

Checklist for CEP approvals

  • The vote period has ended and the vote has passed the necessary quorum and approval thresholds.
  • A new CEP number has been minted. Usually, this is ${greatest-number-in-main} + 1.
  • The cep-XXXX.md file has been renamed accordingly.
  • The # CEP XXXX - header has been edited accordingly.
  • The CEP status in the table has been changed to approved.
  • The last modification date in the table has been updated accordingly.
  • The pre-commit checks are passing.

Closes #80

@jaimergp
Copy link
Contributor Author

jaimergp commented Jun 4, 2024

I'm seeing myself referring to the "MatchSpec" interface in other CEPs yet this is not standardized, so there we go. Let's open that can of worms.

@jaimergp jaimergp mentioned this pull request Jun 4, 2024
2 tasks
@jaimergp
Copy link
Contributor Author

jaimergp commented Jun 5, 2024

This will probably need another CEP on PackageRecord, which will probably ask for Repodata counterparts and... channel structure. Yay. I like how packaging.python.org does this btw. I'll probably copy some of that structure.

cep-??.md Outdated

### Exact matches

To fully-specify a package record with a full, exact spec, these fields must be given as exact values: `channel` (preferrably by URL), `subdir`, `name`, `version`, `build`. Alternatively, an exact spec can also be given by `*[md5=12345678901234567890123456789012]` or `*[sha256=f453db4ffe2271ec492a2913af4e61d4a6c118201f07de757df0eff769b65d2e]`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When matching by checksum, should you also add the subdir? If I'm not mistaken, it's possible for two subdirs to contain a package with the same checksum right? Or is this a corner case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These checksums are coming from the compressed artifacts, so in principle they should be unique (even with unique contents, the index.json file should have "subdir": <subdir>, I think?).

The hash that conda-build uses for the build_string doesn't consider the subdir, indeed (and maybe it should).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI, rattler does not currently support this. There we require that at least the package name is still specified.

Copy link
Contributor

@baszalmstra baszalmstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the great write up @jaimergp !

cep-??.md Outdated

The simplest form merely consists of up to three positional arguments: `name [version [build]]`. Only `name` is required. `version` can be any version specifier. `build` can be any string matcher. See "Match conventions" below.

The positional syntax also allows the `=` character as a separator, instead of a space. When this is the case, versions are interpreted differently. `pkg=1.8` will be taken as `1.8.*` (fuzzy), but `pkg 1.8` will give `1.8` (exact). To have fuzzy matches with the space syntax, you need to use `pkg =1.8`. This nuance does not apply if a `build` string is present; both `foo==1.0=*` and `foo=1.0=*` are equivalent (they both understand the version as `1.0`, exact).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is just reporting the current state of affairs but, jucky.

In rattler, this form is no longer allowed when parsing in strict mode. (still accepted in lenient parsing mode).

Copy link

@AntoinePrv AntoinePrv Apr 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@baszalmstra which form is not allowed?
IIRC in mamba pkg 1.8 and pkg =1.8 are the same.

Copy link
Contributor

@baszalmstra baszalmstra Apr 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The form foo=1.0=bla is disallowed! (in strict mode only, used in rattler build)

cep-??.md Outdated

### Exact matches

To fully-specify a package record with a full, exact spec, these fields must be given as exact values: `channel` (preferrably by URL), `subdir`, `name`, `version`, `build`. Alternatively, an exact spec can also be given by `*[md5=12345678901234567890123456789012]` or `*[sha256=f453db4ffe2271ec492a2913af4e61d4a6c118201f07de757df0eff769b65d2e]`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI, rattler does not currently support this. There we require that at least the package name is still specified.

cep-9999.md Outdated
Comment on lines 76 to 77
6. If `channel` is an exact value and `subdir` is an exact value, `subdir` is appended to
`channel` with a `/` separator. Otherwise, `subdir` is included in the key-value brackets.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this related to the label channels? e.g. pytorch/label/nightly::libfaiss?
With the seperator logic this will be assumed to be a subdir.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic in conda is to take the last component and compare it against known subdirs. As a result, channels cannot be named like subdirs. e. g. I can't register a channel named linux-64.

@Hind-M
Copy link
Contributor

Hind-M commented Dec 16, 2024

Not sure about the current status of this CEP, but before moving forward with it, we should maybe consider finalizing this one if we think it could be of interest?

cep-9999.md Outdated

These are also accepted but have reduced utility. Their usage is discouraged:

- `url`
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that a full URL can be parsed into a MatchSpec object so... should a URL be considered a valid form? In those cases, a note: parsers need to account for %-decoding. See xref conda/conda#14481.

jaimergp and others added 2 commits September 26, 2025 18:42
Co-authored-by: JeanChristopheMorinPerso <[email protected]>
@jaimergp
Copy link
Contributor Author

@baszalmstra, @beckermr, @AntoinePrv, @ruben-arts, @JeanChristopheMorinPerso, I've tackled some of the pending items if you want to take a look. Perhaps more critically, the version strings and ordering conversation is now part of #132.

I think I'll rewrite part of the Specification so we don't lose time with historical details and go straight for the syntax, since it's all intertwined anyway... This is valid 🤦:

>>> str(MS("channel:namespace:pkg 1 2[subdir=linux-63,channel=XX,name=jaime]"))
'XX/linux-63::pkg==1=2'

@jaimergp jaimergp changed the title Add CEP for MatchSpec minilanguage CEP XXXX: MatchSpec minilanguage Sep 26, 2025
Copy link

@JeanChristopheMorinPerso JeanChristopheMorinPerso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great!

- `version: str | VersionSpec`. Optional.
- `build: str`. Optional. It requires `version` to be present.
- All keyword expressions are optional. If present, they MUST be enclosed in a single set of square brackets, after the positional expressions. The following rules apply:
- Keyword expressions are written as key-value pairs. They MUST be built by joining the name of the target field (key) and the expression string (value) with a single `=` character. The expression string MUST be quoted with single `'` or double `"` quotes if it contains spaces.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expression string MUST be quoted with single ' or double " quotes if it contains spaces.

Is this refering to the value of the expression as a whole? "The expression string" feels unclear to me. I could interpret this as ['key=value'] for example.

- `build: str`. Optional. It requires `version` to be present.
- All keyword expressions are optional. If present, they MUST be enclosed in a single set of square brackets, after the positional expressions. The following rules apply:
- Keyword expressions are written as key-value pairs. They MUST be built by joining the name of the target field (key) and the expression string (value) with a single `=` character. The expression string MUST be quoted with single `'` or double `"` quotes if it contains spaces.
- Target-expression pairs MUST be separated by a single comma character `,`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the first reference to "target-expression" in this document so far. Maybe we could replace this with "Keyword expression pairs" since that's what line 58 calls them?


The canonical string representation of a `MatchSpec` expression follows these rules:

1. `name` is required and MUST be written as a positional expression. Its value MAY be `*` if necessary.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should cases when * is necessary for the name be documented?

1. `name` is required and MUST be written as a positional expression. Its value MAY be `*` if necessary.
2. If `version` describes an exact equality expression, it MUST be written as a positional expression, prepended by `==`. If `version` denotes fuzzy equality (e.g. `1.11.*`), it MUST be written as a positional expression with the `.*` suffix left off and prepended by `=`. Otherwise `version` MUST be included inside the key-value brackets.
3. If `version` is an exact equality expression, and `build` does not contain asterisks, `build` MUST be written as a positional expression, prepended by `=`. Otherwise, `build` MUST go inside the key-value brackets.
4. If `channel` is defined and does not contain asterisks, a `::` separator is used between `channel`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe reorder this so that 4 and 5 become 1 and 2? This would feel more natural I think, following the left to right order.


#### Version matching

Expressions targeting the `version` field MUST be handled with additional rules. These expressions are referred to as _version specifiers_. A _version identifier_ will refer to versions strings as described in [CEP PR #132](https://github.com/conda/ceps/pull/132). For ordering-aware comparisons, the implied ordering is also described in [CEP PR #132](https://github.com/conda/ceps/pull/132).

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These expressions are referred to as version specifiers. A version identifier will refer to versions strings as described in CEP PR #132.

That referenced CEP doesn't contain the words "version identifier".

Should this be clarified?

- `,` denotes the logical AND.
- `,` (AND) has higher precedence than `|` (OR).

A _version clause_ MUST follow one of these conventions:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find version specifier, version identifier and version clauses confusing. Maybe we could give small example like: "in 1.2|==4.0.0, 1.2|==4.0.0 is a version specifier, 1.2 and ==4.0.01 are version clauses, and 1.2 and 4.0.0 are version identifiers" or something like that?

@chenghlee
Copy link
Contributor

chenghlee commented Oct 8, 2025

Finally got this up on GitHub. If anyone's interested, I started a collection of strings that conda currently does and does not accept as arguments to the MatchSpec constructor.

That repo also contains a Lark EBNF-type grammar for MatchSpec. Currently very incomplete and/or broken, but happy to continue refining it and contributing to the conda org once it's more mature.

@jaimergp
Copy link
Contributor Author

@chenghlee shared this gem yesterday 😂 😭

>>> from conda.models.match_spec import MatchSpec
>>> MatchSpec("foo")
MatchSpec("foo")
>>> MatchSpec("foo=")
MatchSpec("foo")
>>> MatchSpec("foo # comment")
MatchSpec("foo")
>>> MatchSpec("foo=# comment")
MatchSpec("foo")
>>> MatchSpec("foo  # comment")
MatchSpec("foo")
>>> MatchSpec("foo= # comment")
Traceback (most recent call last):
  File "/opt/conda/lib/python3.11/site-packages/conda/models/version.py", line 44, in __call__
    return cls._cache_[arg]
           ~~~~~~~~~~~^^^^^
...
conda.exceptions.InvalidMatchSpec: Invalid spec 'foo= # comment': Invalid version '=': invalid operator

We also re-discovered that pkg=version[key=value](optional=True) is a valid spec according to conda's parser, but we really want to deprecate those parentheses.

@chenghlee
Copy link
Contributor

There's also this gem:

>>> MatchSpec('foo >=1,<2')
MatchSpec("foo[version='>=1,<2']")
>>> MatchSpec('foo >=1, <2')
MatchSpec("foo[version='>=1,<2']")
>>> MatchSpec('foo >=1, < 2')
MatchSpec("foo[version='>=1,<2']")
>>> MatchSpec('foo >=1,  < 2')
MatchSpec("foo[version='>=1,<2']")
>>> MatchSpec('foo >=1,  <  2')
Traceback (most recent call last):
  File "/Users/clee/Applications/miniconda3/envs/matchspec-grammar/lib/python3.13/site-packages/conda/models/version.py", line 44, in __call__
    return cls._cache_[arg]
           ~~~~~~~~~~~^^^^^
KeyError: '>=1,<'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/clee/Applications/miniconda3/envs/matchspec-grammar/lib/python3.13/site-packages/conda/models/version.py", line 44, in __call__
    return cls._cache_[arg]
           ~~~~~~~~~~~^^^^^
KeyError: '<'
...
conda.exceptions.InvalidMatchSpec: Invalid spec 'foo >=1,  <  2': Invalid version '<': invalid operator
>>> MatchSpec('foo >=1,  <  2,!=3')
MatchSpec("foo[version='>=1,<2,!=3']")

How conda handles whitespaces in MatchSpecs is....inconsistent. Having played around with it, I'm now inclined to disallow whitespace within each "subspec" (package name, version, build string).

@AntoinePrv
Copy link

Perhaps we can keep the general logic of the language and then add a collection section of "known allowed exception to the previous rules" with everything that is a strong candidate for deprecation (in a future CEP).

@beckermr
Copy link
Contributor

We should write only what we want in the CEP now. The deprecation of unsupported syntax is a separate issue to manage directly on conda.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

CEP: MatchSpec query language CEP request: Document MatchSpec

8 participants