Allow direct resolution of Distribution from EntryPoint #266

jaraco · 2020-12-06T19:02:23Z

Backport of python/cpython#23334, Fixes #265.

jaraco · 2020-12-06T19:04:36Z

In this branch, I've ported the first commit from the upstream pull request, but it's causing flake8 checks to fail:

___________________________________________________________ FLAKE8-check ___________________________________________________________
.tox/python/lib/python3.9/site-packages/pluggy/hooks.py:286: in __call__
    return self._hookexec(self, self.get_hookimpls(), kwargs)
.tox/python/lib/python3.9/site-packages/pluggy/manager.py:93: in _hookexec
    return self._inner_hookexec(hook, methods, kwargs)
.tox/python/lib/python3.9/site-packages/pluggy/manager.py:84: in <lambda>
    self._inner_hookexec = lambda hook, methods, kwargs: hook.multicall(
.tox/python/lib/python3.9/site-packages/_pytest/runner.py:171: in pytest_runtest_call
    raise e
.tox/python/lib/python3.9/site-packages/_pytest/runner.py:163: in pytest_runtest_call
    item.runtest()
.tox/python/lib/python3.9/site-packages/pytest_flake8.py:120: in runtest
    found_errors, out, err = call(
.tox/python/lib/python3.9/site-packages/py/_io/capture.py:150: in call
    res = func(*args, **kwargs)
.tox/python/lib/python3.9/site-packages/pytest_flake8.py:200: in check_file
    app.find_plugins(config_finder)
.tox/python/lib/python3.9/site-packages/flake8/main/application.py:153: in find_plugins
    self.check_plugins = plugin_manager.Checkers(local_plugins.extension)
.tox/python/lib/python3.9/site-packages/flake8/plugins/manager.py:356: in __init__
    self.manager = PluginManager(
.tox/python/lib/python3.9/site-packages/flake8/plugins/manager.py:238: in __init__
    self._load_entrypoint_plugins()
.tox/python/lib/python3.9/site-packages/flake8/plugins/manager.py:258: in _load_entrypoint_plugins
    for entry_point in sorted(frozenset(eps)):
E   TypeError: '<' not supported between instances of 'PathDistribution' and 'PathDistribution'

It seems that we've already identified a regression with the expectation that entry points are sortable.

Probably we should fix that first.

s0undt3ch · 2020-12-06T19:53:44Z

How do we carry dist through the pickle/unpikle process?

jaraco · 2020-12-06T19:57:05Z

How do we carry dist through the pickle/unpikle process?

Indeed, this implementation doesn't yet attempt to address that concern. Do we know if there's a use-case that would demand restoration of the dist?

jaraco · 2020-12-06T20:11:28Z

In 0d31cab, I illustrate how one might consider including the Distribution when pickling.

I can imagine another scenario where the pickle resolves the name of the distribution, pickles that, and then _new_for could use Distribution.from_name to revive it. I'm hesitant to add this support, however, unless there's a good case that it's needed.

I see the original request for pickleability was made by @asottile in #96. Anthony, can you elaborate on what use-case demands the pickleability of EntryPoint objects and whether retaining an attached Distribution instance is important in that use case?

asottile · 2020-12-06T20:19:22Z

flake8 forks multiprocessing workers and used to contain importlib-metadata objects in the object graph -- I no longer need it as we're working around it and passing along our own objects instead

s0undt3ch · 2020-12-06T20:41:06Z

CHANGES.rst

+======
+
+* #265: ``EntryPoint`` objects now expose a ``.dist`` object
+  referencing the ``Distribution`` when constructed from a


When?
An entry point exists based on a distribution defining it right?
Or, in which case does an entrypoint exist without a distribution?
So, why when

The test suite in particular constructs EntryPoints without a distribution. I wouldn't expect library consumers to construct EntryPoints this way, so I'm not describing it here, and instead focusing on the use-case that users would expect to see.

s0undt3ch · 2020-12-06T20:58:19Z

Bear with me.

How expensive is it to create a Distribution?
Based on the missing feature, we almost never need a Distribution, right?
So why keep an instance of Distribution always attached to an EntryPoint?

If we adopt a lazy approach, we only load a distribution instance when we actually need it.
This does mean an extra attribute, the distribution name and a method(cached method) to resolve the distribution.

How bad would this approach be?

Could dist be the distribution name, or, to better reflect the intent, use dist_name, and then either use a cached property(.distribution) or a cached method(.resolve()?), be a lighter and faster approach to this?

I mean, if loading a distribution instance isn't a great penalty, then let's just avoid this lazy approach.

The idea of getting bug reports that there's a noticeable performance penalty is what's triggering my Q's here.

jaraco · 2020-12-06T22:09:22Z

Bear with me.

Gladly. I'm grateful for every bit of input.

1. How expensive is it to create a `Distribution`?

I guess it depends. The Distribution object is actually an abstract base class (though I notice it's not declared as such). PathDistribution is the only concrete class implementation supplied by the package. Neither are particularly expensive to create, but resolving a Distribution from a name is expensive, requiring path traversal and discovery with arbitrarily complex behavior (imagine a worst-case scenario where an object on sys.meta_path is designed to search and load packages from pypi or over a pidgeon IP stack).

Moreover, it's not necessarily the case that a distribution that was constructed a second ago would be constructed from the same name in the next second, if, for example, pip were to have uninstalled or upgraded that distribution in the meantime.

It's also important to note that third-parties can implement their own Distribution subclasses and there's nothing in the protocol or spec that demands that they implement something that's pickleable or able to be reconstructed from the same name.

2. Based on the missing feature, we almost never need a `Distribution`, right?

Perhaps not. And you bring up a good point - I'm not aware of what use-cases drive the demand for resolving a distribution from an entry point. I've asked in bpo-42382 for some clarification as to the requirement.

3. So why keep an instance of `Distribution` always attached to an `EntryPoint`?

My primary motivation for keeping an instance is because that instance links directly to the object that created the EntryPoint. Also, because that instance already exists, it avoids complication around possible changes to the state. That is, it reflects the exact Distribution that was present when the EntryPoint was read.

If we adopt a lazy approach, we only load a distribution instance when we actually need it.
This does mean an extra attribute, the distribution name and a method(cached method) to resolve the distribution.

How bad would this approach be?

Could dist be the distribution name, or, to better reflect the intent, use dist_name, and then either use a cached property(.distribution) or a cached method(.resolve()?), be a lighter and faster approach to this?

I mean, if loading a distribution instance isn't a great penalty, then let's just avoid this lazy approach.

The idea of getting bug reports that there's a noticeable performance penalty is what's triggering my Q's here.

I guess what I'm wondering is why bother resolving a Distribution to a name only to lazily resolve that name back to a Distribution? If the concern is pickleability, and we determine that users will in fact desire with reason for the Distribution object to also be pickled and restored, then I would recommend adding pickleability as a constraint on the Distribution interface.

s0undt3ch · 2020-12-06T23:10:16Z

How expensive is it to create a Distribution?

I guess it depends. The Distribution object is actually an abstract base class (though I notice it's not declared as such). PathDistribution is the only concrete class implementation supplied by the package. Neither are particularly expensive to create, but resolving a Distribution from a name is expensive, requiring path traversal and discovery with arbitrarily complex behavior (imagine a worst-case scenario where an object on sys.meta_path is designed to search and load packages from pypi or over a pidgeon IP stack).

It's the path traversal that worries me.

Moreover, it's not necessarily the case that a distribution that was constructed a second ago would be constructed from the same name in the next second, if, for example, pip were to have uninstalled or upgraded that distribution in the meantime.

Fair point.

It's also important to note that third-parties can implement their own Distribution subclasses and there's nothing in the protocol or spec that demands that they implement something that's pickleable or able to be reconstructed from the same name.

Hmm, but if we add the .dist attribute which is an instance of Distribution, then we probably need to make sure all implentations are pickleable. Or, we remove pickle support.

Based on the missing feature, we almost never need a Distribution, right?

Perhaps not. And you bring up a good point - I'm not aware of what use-cases drive the demand for resolving a distribution from an entry point. I've asked in bpo-42382 for some clarification as to the requirement.

Replied.

So why keep an instance of Distribution always attached to an EntryPoint?

My primary motivation for keeping an instance is because that instance links directly to the object that created the EntryPoint. Also, because that instance already exists, it avoids complication around possible changes to the state. That is, it reflects the exact Distribution that was present when the EntryPoint was read.

Yeah, that's why I was setting ._distribution when creating the entry point instance im the original PR.

If we adopt a lazy approach, we only load a distribution instance when we actually need it.
This does mean an extra attribute, the distribution name and a method(cached method) to resolve the distribution.
How bad would this approach be?
Could dist be the distribution name, or, to better reflect the intent, use dist_name, and then either use a cached property(.distribution) or a cached method(.resolve()?), be a lighter and faster approach to this?
I mean, if loading a distribution instance isn't a great penalty, then let's just avoid this lazy approach.
The idea of getting bug reports that there's a noticeable performance penalty is what's triggering my Q's here.

I guess what I'm wondering is why bother resolving a Distribution to a name only to lazily resolve that name back to a Distribution? If the concern is pickleability, and we determine that users will in fact desire with reason for the Distribution object to also be pickled and restored, then I would recommend adding pickleability as a constraint on the Distribution interface.

My personal use case does not require pickleability. But dropping support for that now seems counterintuitive. Deprecate it's support to see if there's an actual need these days?

jaraco · 2020-12-13T20:20:14Z

My personal use case does not require pickleability. But dropping support for that now seems counterintuitive. Deprecate it's support to see if there's an actual need these days?

Well, the tests for pickleability continue to work - they just lose the Distribution object in the process, but my feeling is that gap is okay for now, especially considering that the sole known case for pickleability is no longer a need. I'd like to proceed with this approach for now, validate that it meets your needs, and defer the pickleability Distribution concern unless it's raised.

s0undt3ch · 2020-12-13T21:11:08Z

Should I update the opened Python PR with these changes then?

jaraco · 2020-12-13T22:24:00Z

I'll do it, since it needs to incorporate the changes from 3.1 and 3.2 also.

jaraco mentioned this pull request Dec 6, 2020

EntryPoints treated as sortable but not tested #267

Closed

s0undt3ch and others added 2 commits December 6, 2020 14:44

Make sure each EntryPoint carries it's Distribution information

bd5cac5

Make 'dist' a single, optional attribute on an EntryPoint.

ad7c371

jaraco force-pushed the bugfix/bpo-42382 branch from 34134da to ad7c371 Compare December 6, 2020 19:45

jaraco closed this Dec 6, 2020

jaraco reopened this Dec 6, 2020

The distribution name can be egginfo too

84d6961

Update changelog.

c5dbfdb

s0undt3ch reviewed Dec 6, 2020

View reviewed changes

jaraco marked this pull request as ready for review December 6, 2020 22:41

jaraco changed the base branch from master to main December 13, 2020 18:34

Merge branch 'main' into bugfix/bpo-42382

c19a2d0

jaraco added the automerge label Dec 13, 2020

github-actions bot merged commit 1816266 into main Dec 13, 2020

jaraco deleted the bugfix/bpo-42382 branch December 13, 2020 20:54

neutrinoceros mentioned this pull request Mar 21, 2023

MNT: drop runtime dependency on pkg_resources (setuptools) glue-viz/glue#2365

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow direct resolution of Distribution from EntryPoint #266

Allow direct resolution of Distribution from EntryPoint #266

jaraco commented Dec 6, 2020 •

edited

Loading

jaraco commented Dec 6, 2020

s0undt3ch commented Dec 6, 2020

jaraco commented Dec 6, 2020

jaraco commented Dec 6, 2020

asottile commented Dec 6, 2020

s0undt3ch Dec 6, 2020

jaraco Dec 6, 2020

s0undt3ch commented Dec 6, 2020

jaraco commented Dec 6, 2020

s0undt3ch commented Dec 6, 2020

jaraco commented Dec 13, 2020

s0undt3ch commented Dec 13, 2020 •

edited

Loading

jaraco commented Dec 13, 2020

Allow direct resolution of Distribution from EntryPoint #266

Allow direct resolution of Distribution from EntryPoint #266

Conversation

jaraco commented Dec 6, 2020 • edited Loading

jaraco commented Dec 6, 2020

s0undt3ch commented Dec 6, 2020

jaraco commented Dec 6, 2020

jaraco commented Dec 6, 2020

asottile commented Dec 6, 2020

s0undt3ch Dec 6, 2020

Choose a reason for hiding this comment

jaraco Dec 6, 2020

Choose a reason for hiding this comment

s0undt3ch commented Dec 6, 2020

jaraco commented Dec 6, 2020

s0undt3ch commented Dec 6, 2020

jaraco commented Dec 13, 2020

s0undt3ch commented Dec 13, 2020 • edited Loading

jaraco commented Dec 13, 2020

jaraco commented Dec 6, 2020 •

edited

Loading

s0undt3ch commented Dec 13, 2020 •

edited

Loading