Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Allow direct resolution of Distribution from EntryPoint #266

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Dec 13, 2020

Conversation

jaraco
Copy link
Member

@jaraco jaraco commented Dec 6, 2020

Backport of python/cpython#23334, Fixes #265.

@jaraco
Copy link
Member Author

jaraco commented Dec 6, 2020

In this branch, I've ported the first commit from the upstream pull request, but it's causing flake8 checks to fail:

___________________________________________________________ FLAKE8-check ___________________________________________________________
.tox/python/lib/python3.9/site-packages/pluggy/hooks.py:286: in __call__
    return self._hookexec(self, self.get_hookimpls(), kwargs)
.tox/python/lib/python3.9/site-packages/pluggy/manager.py:93: in _hookexec
    return self._inner_hookexec(hook, methods, kwargs)
.tox/python/lib/python3.9/site-packages/pluggy/manager.py:84: in <lambda>
    self._inner_hookexec = lambda hook, methods, kwargs: hook.multicall(
.tox/python/lib/python3.9/site-packages/_pytest/runner.py:171: in pytest_runtest_call
    raise e
.tox/python/lib/python3.9/site-packages/_pytest/runner.py:163: in pytest_runtest_call
    item.runtest()
.tox/python/lib/python3.9/site-packages/pytest_flake8.py:120: in runtest
    found_errors, out, err = call(
.tox/python/lib/python3.9/site-packages/py/_io/capture.py:150: in call
    res = func(*args, **kwargs)
.tox/python/lib/python3.9/site-packages/pytest_flake8.py:200: in check_file
    app.find_plugins(config_finder)
.tox/python/lib/python3.9/site-packages/flake8/main/application.py:153: in find_plugins
    self.check_plugins = plugin_manager.Checkers(local_plugins.extension)
.tox/python/lib/python3.9/site-packages/flake8/plugins/manager.py:356: in __init__
    self.manager = PluginManager(
.tox/python/lib/python3.9/site-packages/flake8/plugins/manager.py:238: in __init__
    self._load_entrypoint_plugins()
.tox/python/lib/python3.9/site-packages/flake8/plugins/manager.py:258: in _load_entrypoint_plugins
    for entry_point in sorted(frozenset(eps)):
E   TypeError: '<' not supported between instances of 'PathDistribution' and 'PathDistribution'

It seems that we've already identified a regression with the expectation that entry points are sortable.

Probably we should fix that first.

@jaraco jaraco closed this Dec 6, 2020
@jaraco jaraco reopened this Dec 6, 2020
@s0undt3ch
Copy link
Contributor

How do we carry dist through the pickle/unpikle process?

@jaraco
Copy link
Member Author

jaraco commented Dec 6, 2020

How do we carry dist through the pickle/unpikle process?

Indeed, this implementation doesn't yet attempt to address that concern. Do we know if there's a use-case that would demand restoration of the dist?

@jaraco
Copy link
Member Author

jaraco commented Dec 6, 2020

In 0d31cab, I illustrate how one might consider including the Distribution when pickling.

I can imagine another scenario where the pickle resolves the name of the distribution, pickles that, and then _new_for could use Distribution.from_name to revive it. I'm hesitant to add this support, however, unless there's a good case that it's needed.

I see the original request for pickleability was made by @asottile in #96. Anthony, can you elaborate on what use-case demands the pickleability of EntryPoint objects and whether retaining an attached Distribution instance is important in that use case?

@asottile
Copy link
Contributor

asottile commented Dec 6, 2020

flake8 forks multiprocessing workers and used to contain importlib-metadata objects in the object graph -- I no longer need it as we're working around it and passing along our own objects instead

======

* #265: ``EntryPoint`` objects now expose a ``.dist`` object
referencing the ``Distribution`` when constructed from a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When?
An entry point exists based on a distribution defining it right?
Or, in which case does an entrypoint exist without a distribution?
So, why when

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test suite in particular constructs EntryPoints without a distribution. I wouldn't expect library consumers to construct EntryPoints this way, so I'm not describing it here, and instead focusing on the use-case that users would expect to see.

@s0undt3ch
Copy link
Contributor

Bear with me.

  1. How expensive is it to create a Distribution?
  2. Based on the missing feature, we almost never need a Distribution, right?
  3. So why keep an instance of Distribution always attached to an EntryPoint?

If we adopt a lazy approach, we only load a distribution instance when we actually need it.
This does mean an extra attribute, the distribution name and a method(cached method) to resolve the distribution.

How bad would this approach be?

Could dist be the distribution name, or, to better reflect the intent, use dist_name, and then either use a cached property(.distribution) or a cached method(.resolve()?), be a lighter and faster approach to this?

I mean, if loading a distribution instance isn't a great penalty, then let's just avoid this lazy approach.

The idea of getting bug reports that there's a noticeable performance penalty is what's triggering my Q's here.

@jaraco
Copy link
Member Author

jaraco commented Dec 6, 2020

Bear with me.

Gladly. I'm grateful for every bit of input.

1. How expensive is it to create a `Distribution`?

I guess it depends. The Distribution object is actually an abstract base class (though I notice it's not declared as such). PathDistribution is the only concrete class implementation supplied by the package. Neither are particularly expensive to create, but resolving a Distribution from a name is expensive, requiring path traversal and discovery with arbitrarily complex behavior (imagine a worst-case scenario where an object on sys.meta_path is designed to search and load packages from pypi or over a pidgeon IP stack).

Moreover, it's not necessarily the case that a distribution that was constructed a second ago would be constructed from the same name in the next second, if, for example, pip were to have uninstalled or upgraded that distribution in the meantime.

It's also important to note that third-parties can implement their own Distribution subclasses and there's nothing in the protocol or spec that demands that they implement something that's pickleable or able to be reconstructed from the same name.

2. Based on the missing feature, we almost never need a `Distribution`, right?

Perhaps not. And you bring up a good point - I'm not aware of what use-cases drive the demand for resolving a distribution from an entry point. I've asked in bpo-42382 for some clarification as to the requirement.

3. So why keep an instance of `Distribution` always attached to an `EntryPoint`?

My primary motivation for keeping an instance is because that instance links directly to the object that created the EntryPoint. Also, because that instance already exists, it avoids complication around possible changes to the state. That is, it reflects the exact Distribution that was present when the EntryPoint was read.

If we adopt a lazy approach, we only load a distribution instance when we actually need it.
This does mean an extra attribute, the distribution name and a method(cached method) to resolve the distribution.

How bad would this approach be?

Could dist be the distribution name, or, to better reflect the intent, use dist_name, and then either use a cached property(.distribution) or a cached method(.resolve()?), be a lighter and faster approach to this?

I mean, if loading a distribution instance isn't a great penalty, then let's just avoid this lazy approach.

The idea of getting bug reports that there's a noticeable performance penalty is what's triggering my Q's here.

I guess what I'm wondering is why bother resolving a Distribution to a name only to lazily resolve that name back to a Distribution? If the concern is pickleability, and we determine that users will in fact desire with reason for the Distribution object to also be pickled and restored, then I would recommend adding pickleability as a constraint on the Distribution interface.

@jaraco jaraco marked this pull request as ready for review December 6, 2020 22:41
@s0undt3ch
Copy link
Contributor

  1. How expensive is it to create a Distribution?

I guess it depends. The Distribution object is actually an abstract base class (though I notice it's not declared as such). PathDistribution is the only concrete class implementation supplied by the package. Neither are particularly expensive to create, but resolving a Distribution from a name is expensive, requiring path traversal and discovery with arbitrarily complex behavior (imagine a worst-case scenario where an object on sys.meta_path is designed to search and load packages from pypi or over a pidgeon IP stack).

It's the path traversal that worries me.

Moreover, it's not necessarily the case that a distribution that was constructed a second ago would be constructed from the same name in the next second, if, for example, pip were to have uninstalled or upgraded that distribution in the meantime.

Fair point.

It's also important to note that third-parties can implement their own Distribution subclasses and there's nothing in the protocol or spec that demands that they implement something that's pickleable or able to be reconstructed from the same name.

Hmm, but if we add the .dist attribute which is an instance of Distribution, then we probably need to make sure all implentations are pickleable. Or, we remove pickle support.

  1. Based on the missing feature, we almost never need a Distribution, right?

Perhaps not. And you bring up a good point - I'm not aware of what use-cases drive the demand for resolving a distribution from an entry point. I've asked in bpo-42382 for some clarification as to the requirement.

Replied.

  1. So why keep an instance of Distribution always attached to an EntryPoint?

My primary motivation for keeping an instance is because that instance links directly to the object that created the EntryPoint. Also, because that instance already exists, it avoids complication around possible changes to the state. That is, it reflects the exact Distribution that was present when the EntryPoint was read.

Yeah, that's why I was setting ._distribution when creating the entry point instance im the original PR.

If we adopt a lazy approach, we only load a distribution instance when we actually need it.
This does mean an extra attribute, the distribution name and a method(cached method) to resolve the distribution.
How bad would this approach be?
Could dist be the distribution name, or, to better reflect the intent, use dist_name, and then either use a cached property(.distribution) or a cached method(.resolve()?), be a lighter and faster approach to this?
I mean, if loading a distribution instance isn't a great penalty, then let's just avoid this lazy approach.
The idea of getting bug reports that there's a noticeable performance penalty is what's triggering my Q's here.

I guess what I'm wondering is why bother resolving a Distribution to a name only to lazily resolve that name back to a Distribution? If the concern is pickleability, and we determine that users will in fact desire with reason for the Distribution object to also be pickled and restored, then I would recommend adding pickleability as a constraint on the Distribution interface.

My personal use case does not require pickleability. But dropping support for that now seems counterintuitive. Deprecate it's support to see if there's an actual need these days?

@jaraco jaraco changed the base branch from master to main December 13, 2020 18:34
@jaraco
Copy link
Member Author

jaraco commented Dec 13, 2020

My personal use case does not require pickleability. But dropping support for that now seems counterintuitive. Deprecate it's support to see if there's an actual need these days?

Well, the tests for pickleability continue to work - they just lose the Distribution object in the process, but my feeling is that gap is okay for now, especially considering that the sole known case for pickleability is no longer a need. I'd like to proceed with this approach for now, validate that it meets your needs, and defer the pickleability Distribution concern unless it's raised.

@github-actions github-actions bot merged commit 1816266 into main Dec 13, 2020
@jaraco jaraco deleted the bugfix/bpo-42382 branch December 13, 2020 20:54
@s0undt3ch
Copy link
Contributor

s0undt3ch commented Dec 13, 2020

Should I update the opened Python PR with these changes then?

@jaraco
Copy link
Member Author

jaraco commented Dec 13, 2020

I'll do it, since it needs to incorporate the changes from 3.1 and 3.2 also.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Resolve EntryPoint to Distribution
3 participants