Thanks to visit codestin.com
Credit goes to github.com

Skip to content

importlib.metadata.Distribution equality check not working #107220

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
megies opened this issue Jul 25, 2023 · 7 comments Β· Fixed by python/importlib_metadata#498
Closed

importlib.metadata.Distribution equality check not working #107220

megies opened this issue Jul 25, 2023 · 7 comments Β· Fixed by python/importlib_metadata#498
Assignees
Labels
stdlib Python modules in the Lib dir topic-importlib type-feature A feature request or enhancement

Comments

@megies
Copy link

megies commented Jul 25, 2023

Bug report

During migrating our package from pkg_resources to using importlib.metadata I encountered some weirdness when it comes to using importlib.metadata.Distribution objects. Specifically, it seems to lack a custom __eq__ operator, simply inheriting object.__eq__. In consequence checking distributions for equality seems to just always fail..

from importlib.metadata import distribution

distribution('pip') == distribution('pip')  # False

.. which in turn makes it impossible to use dist kwarg in entry point selection and users have to fall back to manually comparing distribution plain text name attribute (or something similar / more sophisticated):

from importlib.metadata import entry_points, EntryPoints

entry_points(group='console_scripts', dist='pip')  # empty EntryPoints list
entry_points(group='console_scripts', dist=distribution('pip'))  # empty EntryPoints list
EntryPoints(ep for ep in entry_points(group='console_scripts') if ep.dist == distribution('pip'))  # empty EntryPoints list
EntryPoints(ep for ep in entry_points(group='console_scripts') if ep.dist.name == 'pip')  # expected list of entry points

I think comparing distributions for equality should be fixed, maybe even allowing comparison with a plain string containing the distribution's name (e.g. "pip") might make sense (although I'm not aware of potential drawbacks / conflicts that might happen in case there might be multiple distributions with same "name" which intuitively seems odd to me).

I believe this is the right place for this report, I'm very sorry if it isn't.

Your environment

  • CPython versions tested on: 3.11.4
  • Operating system and architecture: Debian Linux 11, linux-64

conda env:

# packages in environment:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
ca-certificates           2023.7.22            hbcca054_0    conda-forge
ld_impl_linux-64          2.40                 h41732ed_0    conda-forge
libexpat                  2.5.0                hcb278e6_1    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 13.1.0               he5830b7_0    conda-forge
libgomp                   13.1.0               he5830b7_0    conda-forge
libnsl                    2.0.0                h7f98852_0    conda-forge
libsqlite                 3.42.0               h2797004_0    conda-forge
libuuid                   2.38.1               h0b41bf4_0    conda-forge
libzlib                   1.2.13               hd590300_5    conda-forge
ncurses                   6.4                  hcb278e6_0    conda-forge
openssl                   3.1.1                hd590300_1    conda-forge
pip                       23.2.1             pyhd8ed1ab_0    conda-forge
python                    3.11.4          hab00c5b_0_cpython    conda-forge
readline                  8.2                  h8228510_1    conda-forge
setuptools                68.0.0             pyhd8ed1ab_0    conda-forge
tk                        8.6.12               h27826a3_0    conda-forge
tzdata                    2023c                h71feb2d_0    conda-forge
wheel                     0.41.0             pyhd8ed1ab_0    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge

miniconda base env:

            shell level : 4
          conda version : 23.3.1
         python version : 3.9.15.final.0
       virtual packages : __archspec=1=x86_64
                          __glibc=2.31=0
                          __linux=5.10.0=0
                          __unix=0=0
           channel URLs : https://conda.anaconda.org/conda-forge/linux-64
                          https://conda.anaconda.org/conda-forge/noarch
               platform : linux-64
             user-agent : conda/23.3.1 requests/2.29.0 CPython/3.9.15 Linux/5.10.0-21-amd64 debian/11 glibc/2.31 solver/libmamba conda-libmamba-solver/22.8.1 libmambapy/1.1.0

Linked PRs

@megies megies added the type-bug An unexpected behavior, bug, or error label Jul 25, 2023
@hugovk hugovk added stdlib Python modules in the Lib dir topic-importlib labels Jul 25, 2023
@jaraco jaraco self-assigned this Sep 21, 2023
@jaraco
Copy link
Member

jaraco commented Sep 21, 2023

Thanks for the report. Happy to explore the issue here.

I'm not at all confident that equality by name is correct, because it's possible for multiple distributions of the same name to exist in the environment. For example:

 ~ @ pip-run 'importlib_metadata<5' -- -m pip-run 'importlib_metadata>5' -- -q
>>> import importlib.metadata as md
>>> ds = [dist for dist in md.distributions() if dist.name == 'importlib-metadata']
>>> ds
[<importlib.metadata.PathDistribution object at 0x10305aed0>, <importlib.metadata.PathDistribution object at 0x10306f5d0>]
>>> ds[0].version
'6.8.0'
>>> ds[1].version
'4.13.0'

This example uses pip-run to install two different versions of importlib_metadata (I could have used any PyPI package) onto two different places in sys.path and then filters on those two distribution objects.

In this case, I wouldn't expect these two distributions to compare equal, even though they have the same name. Moreover, if two distributions had the same name and version but were located in different places on sys.path, those also should probably not necessarily compare equal.

Moreover, metadata providers are expected to implement their own Distribution subclasses, so it's conceivable that two Distribution might be distinguished by some factor not yet known.

Another thing to consider - should a distribution compare equal to a string of the name? In one example, you imply that's expected:

entry_points(group='console_scripts', dist='pip')  # empty EntryPoints list

That suggests that distribution('pip') == 'pip' should evaluate to True. That's possible, but I'm not sure it's desirable. What if someone wants instead to compare for a specific version (e.g. distribution('pip') == 'pip==23.1')? That gets messy and isn't even implementable without behavior only available outside the stdlib (packaging).

I agree that entry point selection is a compelling use case for such a comparison.

I wouldn't characterize the described behavior as a bug, but rather an unaddressed feature.

@jaraco jaraco added type-feature A feature request or enhancement and removed type-bug An unexpected behavior, bug, or error labels Sep 21, 2023
@megies
Copy link
Author

megies commented Sep 21, 2023

Thanks for looking at this @jaraco

because it's possible for multiple distributions of the same name to exist in the environment

Seems weird to me, at least I have never seen or heard of anything like that.
But then again, if the consensus is that filtering by distribution name is too ambiguous so that it might actually be harmful, I'd argue that still the current implementation should still change or at least this behavior for dist kwarg as a filter should be mentioned in docs, since currently filtering/matching by dist will always result in an empty result for reasons laid out above.
If there is too much concern for ambiguous behavior doing proper "equal" comparisons, then I would propose to either..

  • raise an exception if user tries to use dist for filtering entry points or..
  • at least show a warning that using dist in filtering will always result in empty results

Currently dist is accepted for filtering, which results in users assuming it can be used and leading to time spent trying to figure out why it isn't working.

@megies
Copy link
Author

megies commented Sep 21, 2023

To be constructive, I set up a quick PR with what I would've wished to see in the docs at the time when I was transitioning our package from pgk_resources to this new recommended stdlib API.

@jaraco
Copy link
Member

jaraco commented Mar 21, 2024

Sorry for the delay in reviewing this. Can you share more about how pkg_resources previously supported your use-case?

@megies
Copy link
Author

megies commented Mar 21, 2024

What we did (still do actually) with pkg_resources (using pip just as an example here):

from pkg_resources import load_entry_point
load_entry_point('pip', 'console_scripts', 'pip')

What I thought would do the trick with importlib.metadata, doesn't work though and took quite some time to figure out why:

import importlib.metadata
# doesn't work, returns an empty list
importlib.metadata.entry_points(dist='pip', group='console_scripts', name='pip')[0].load()

Only way I could get this replicated with importlib.metadata:

import importlib.metadata
list(ep for ep in importlib.metadata.entry_points(group='console_scripts', name='pip') if ep.dist.name == 'pip')[0].load()

Like mentioned above, my main issue with it is that dist is allowed as a kwarg to filter for (as opposed to using some arbitrary other kwarg which raises an exception), implying that it can be used to filter by distribution name or at least filter with a Distribution object. It is fairly easy to work around this once one has figured out whats going on, but it takes time debugging. I can just imagine that more people spend time trying to figure out what is happening and that dist kwarg right now is just unusable as a filter.

@jaraco
Copy link
Member

jaraco commented Aug 1, 2024

It's been quite a year, and I'm just now digging deep enough into my emails to follow up on this. Sorry for the delay.

I can't recall, but I may have failed to connect the dots on the PR until just now. Thanks for that proposal.

Thinking about the use-case, I believe the recommended approach for getting entry points for a specific distribution would be done like so:

importlib.metadata.distribution('pip').entry_points.select(name='pip', group='console_scripts')

e.g.:

 🐚 py -c "import importlib.metadata; ep, = importlib.metadata.distribution('pip').entry_points.select(name='pip', group='console_scripts'); print(ep)"
EntryPoint(name='pip', value='pip._internal.cli.main:main', group='console_scripts')

my main issue with it is that dist is allowed as a kwarg to filter

I think I agree we should update the guidance or maybe issue warnings to reduce the risk of this unintended use, though I'd also consider adding support for it if we can figure out how richly to do so.

@jaraco
Copy link
Member

jaraco commented Aug 1, 2024

Another option, now that I think about it, is to simply query for the "pip" "console_scripts":

importlib.metadata.entry_points().select(name='pip', group='console_scripts')

And not bother filtering by distribution at all, since it's unlikely another package would implement the pip console script (and if they did, you have bigger problems).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir topic-importlib type-feature A feature request or enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants