Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Introduce the 'tree' module to allow traversal of packages for resources in namespace packages - [merged] #164

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jaraco opened this issue Oct 21, 2020 · 51 comments

Comments

@jaraco
Copy link
Member

jaraco commented Oct 21, 2020

In GitLab by @jaraco on May 6, 2019, 15:55

Merges feature/traversable -> master

Ref #68. May also address #58.

@jaraco jaraco added this to the 1.1 milestone Oct 21, 2020
@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on May 6, 2019, 15:56

added 1 commit

  • 3978ce1 - Bump to zipp 0.4 to support zip files without directory entries.

Compare with previous version

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @warsaw on May 29, 2019, 13:25

@jaraco We have only a few days left before 3.8 beta 1. Should we try to get this feature in? I also want to work on issue #58

Maybe we can get an exception if we don't get them in on time, but I'd like to try anyway.

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on May 29, 2019, 14:09

I'll try to spend a few hours on this in the next two days.

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on May 29, 2019, 14:09

assigned to @jaraco

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on May 29, 2019, 18:00

So the reason the Python 2 code is failing is because I never applied the changes for _py3 to _py2. :(

2020 can't come fast enough.

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on May 30, 2019, 09:25

added 1 commit

  • 3de48de - Revert "Regenerate zipdata01/ziptestdata.zip ensuring that 'subdirectory/' is...

Compare with previous version

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on May 30, 2019, 09:28

After reviewing the code, I'm surprised that the Python 2 functionality was affected, as only the _py3 module was changed. The only file implicated in the Python 2 behavior was the test fixture - a zip file with folder entries, but with zipp 0.4 and later, those aren't necessary, so I've removed those and now the Python 2 tests pass again. It's not a proper fix, but it should at least establish a baseline functionality for the Python 3 code.

@warsaw How do you feel about releasing this functionality for Python 3 only. Python 2 keeps the old implementation?

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on May 30, 2019, 09:29

added 9 commits

Compare with previous version

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @codecov on May 30, 2019, 09:33

Codecov Report

❗ No coverage uploaded for pull request base (master@9f4cc35). Click here to learn what that means.
The diff coverage is 100%.

Impacted file tree graph

@@           Coverage Diff           @@
##             master    #76   +/-   ##
=======================================
  Coverage          ?   100%           
=======================================
  Files             ?      2           
  Lines             ?    147           
  Branches          ?     12           
=======================================
  Hits              ?    147           
  Misses            ?      0           
  Partials          ?      0
Impacted Files Coverage Δ
importlib_resources/_py3.py 100% <100%> (ø)
importlib_resources/trees.py 100% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9f4cc35...79da86d. Read the comment docs.

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on May 30, 2019, 09:48

I just re-read #68. I'm glad I documented my thoughts there because I'd forgotten what remains to be done:

  • Provide a new, public API to return traversable objects.
  • For namespace packages, return a multiplexer of traversable objects for each path of the namespace.
  • Complete port to Python 2 (maybe).

The first task is a big one, and the first two are what are needed to get to a feature for importlib.resources in Python 3.8. I'm not sure I'll find the time to get to those before the cutoff, but I'll see what I can do.

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on May 31, 2019, 10:39

added 1 commit

  • dbb69a6a - Update trees module for Python 2.7 compatibility

Compare with previous version

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on May 31, 2019, 10:51

(nevermind, found solution in bfaee17) @warsaw, do you have any idea why the tests are failing here? Error message is:

qa runtests: commands[1] | mypy importlib_resources
importlib_resources/trees.py:8: error: Name 'pathlib' already defined (by an import)

But this sort of import technique is used in importlib_metadata without triggering the error.

Any idea what accounts for the difference?

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on May 31, 2019, 11:04

added 1 commit

  • 9f4cc35 - Update trees module for Python 2.7 compatibility

Compare with previous version

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on May 31, 2019, 11:33

added 1 commit

  • bfaee17 - Rely on _compat for Python 2.7 compatibility shims

Compare with previous version

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on May 31, 2019, 11:47

added 2 commits

  • 1150cd6 - Add support for loading a package spec on Python 2
  • 295c699 - Rely on trees for detecting resources on Python 2.

Compare with previous version

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on May 31, 2019, 12:00

Additionally, something I want to do:

  • Add a context manager to Traversable objects that ensures a path on a file system, something like Traversable.as_file.
  • In importlib_resources.path, rely on Traversable.as_file.

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Jun 2, 2019, 10:28

added 1 commit

  • dba77ef - Rely on trees for 'path' operation. Passes all but one operation.

Compare with previous version

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Jun 2, 2019, 10:49

As you can see, in this last commit, I've started work on having path rely on the trees module. One test fails, and it's taken me a while to understand what's happening:

The test test_resource_opener creates a resource that raises FileNotFound for a resource_path, but returns the bytes for open_resource. For a moment, I was pondering removing that test (and the related support), but probably the path function needs to continue to support this mode.

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Jun 2, 2019, 11:40

I estimate there's only a small chance I'm going to have this work done today.

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Jun 2, 2019, 11:57

added 2 commits

  • 1526d8a - Extract _tempfile context manager
  • 78f927bf - Also try open_resource for path operation.

Compare with previous version

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Jun 2, 2019, 12:10

added 3 commits

  • adea250 - Also try open_resource for path operation.
  • 2a4b00d - Extract _path_from_reader helper for cleaner implementation
  • 2e9a79f - Move check for path.is_file to _py2, as it's only encountered there :(

Compare with previous version

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Jun 2, 2019, 12:16

As implemented in this latest commit, the "new, public API" for retrieving Path objects is importlib_resources.get(filename). I haven't yet updated the documentation to reflect this proposed change, though.

More importantly, I have a new concern. As I've been working on the port for path, I'm realizing that the "resource reader" API isn't invoked for this .get() API. .get() relies entirely on assumptions about the loader, namely that the package is backed by a zip file or file on the file system.

I'm beginning to think now that what is needed is a Traversable object that wraps the "resource reader" API.

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Jan 16, 2020, 19:18

added 15 commits

Compare with previous version

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Jan 17, 2020, 16:30

added 2 commits

  • 5fe5e85 - Doubly ignore the type
  • 344fb6a - Fix type indication on `_py3.path` now that context manager is in a subroutine.

Compare with previous version

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Jan 17, 2020, 20:37

unmarked as a Work In Progress

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Jan 18, 2020, 10:37

added 7 commits

Compare with previous version

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Jan 18, 2020, 10:44

added 1 commit

  • 4ef7916 - Rely on zipfile.Path as found in stdlib on Python 3.8

Compare with previous version

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Jan 18, 2020, 11:08

added 1 commit

Compare with previous version

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Feb 9, 2020, 10:41

added 1 commit

  • 67abc60 - Rename 'get' to 'files' and no longer solicit a resource.

Compare with previous version

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Feb 14, 2020, 08:47

I'd like to proceed with merging this change as proposed and iterate from there. If no objection, I'll plan to do that Sunday.

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @warsaw on Feb 14, 2020, 17:41

Commented on importlib_resources/trees.py line 48

join_path()?

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @warsaw on Feb 14, 2020, 17:41

Commented on importlib_resources/trees.py line 18

I guess iterdir() is meant to iterate over the files in a directory? What about __iter__()? Or if you want to keep the non-special method spelling, iter_dir()?

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @warsaw on Feb 14, 2020, 17:42

LGTM, with some comments.

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Feb 14, 2020, 19:09

Commented on importlib_resources/trees.py line 18

pathlib uses iterdir (ref), so that's why I chose it--so that pathlib.Path objects (and thus zipp.Path) are Traversables.

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Feb 14, 2020, 19:34

Commented on importlib_resources/trees.py line 48

pathlib uses joinpath (ref), so I've mirrored that behavior for compatibility (so pathlib and zipp.Path objects are Traversable).

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Feb 14, 2020, 19:34

resolved all threads

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Feb 16, 2020, 13:40

Today, I decided to update the documentation to reflect a preference for this new files behavior, and I stumbled on a possibly serious concern. The using docs state:

Even when this hierarchical structure isn't represented by physical files and
directories, the model still holds. So zip files can contain packages and
resources, as could databases or other storage medium. In fact, while
importlib_resources supports physical file systems and zip files by
default, anything that can be loaded with a Python import system loader can
provide resources, as long as the loader implements the ResourceReader
abstract base class.

However, this new files() implementation does not rely on ResourceReader interface and thus can't support arbitrary loaders. One can call files() and get a traversable object as long as that package is one of the supported forms (file or zipfile), or one can call path(), which will use ResourceReader and get a traversable object that's not guaranteed to be traversable (because the Traversable object was constructed around the result of resource_path(), which is no longer necessarily associated with the package.

This creates an awkward deviation of behavior. If you want to support packages that are hosted by arbitrary loaders, you need to use the old API, but if you want to support package resources not directly contained by a package, you need to use the files() API. files() doesn't supersede this behavior.

I'd like for this new files() behavior to supersede the existing behavior, but I don't think it can if this constraint needs to be maintained. Ultimately, I think it boils down to there not being a suitable low-level interface (like Traversable) for loading resources from the child of a package.

I don't think this issue can be addressed in short order, so here's what I'm going to recommend:

  1. The documentation will advertise and encourage files() as the primary invocation.
  2. The documentation will declare the previous API as discouraged (maybe deprecated).
  3. The documentation may explain the tradeoffs above.

Thoughts?

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Feb 16, 2020, 14:21

added 3 commits

  • 0727ad2 - Sort everything alphabetically on separate lines.
  • 68c7db1 - Add files to the exported names.
  • 0a84bb5 - Update documentation to reflect the primacy of 'files()' and remove the...

Compare with previous version

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Feb 16, 2020, 14:21

In 0a84bb5, I updated the documentation according to my thoughts above.

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Feb 17, 2020, 09:10

added 2 commits

Compare with previous version

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Feb 17, 2020, 09:10

I believe this updated documentation and implementation are the best available and we need to get these changes in to keep the ball rolling.

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Feb 17, 2020, 09:10

enabled an automatic merge when the pipeline for 79da86d succeeds

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Feb 17, 2020, 09:11

Closes #58

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Feb 17, 2020, 09:12

canceled the automatic merge

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Feb 17, 2020, 09:12

enabled an automatic merge when the pipeline for 79da86d succeeds

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Feb 17, 2020, 09:14

merged

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Feb 17, 2020, 09:14

mentioned in commit 10f7b72

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @warsaw on Feb 17, 2020, 13:50

I'd like for this new files() behavior to supersede the existing behavior, but I don't think it can if this constraint needs to be maintained. Ultimately, I think it boils down to there not being a suitable low-level interface (like Traversable) for loading resources from the child of a package.

I guess my question is, will it be possible for people to customize and extend the package resource loading mechanism, and if so, how? ResourceReader was the way this was supposed to work, but that was within the constraints of the previous semantics.

If we got that wrong, so be it. But I think the basic concept of extensibility, allowing other systems to play along, is important to keep. For example, how would something like PyOxidizer support resource loading in sub-non-package directories?

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Feb 17, 2020, 14:39

will it be possible for people to customize and extend the package resource loading mechanism?

Good question. I think we'll want to ask loaders to implement a new interface to supersede the ResourceReader, something like:

class TraversableResources(metaclass=ABCMeta):
    @abstractmethod
    def files(self):
        """Return a Traversable object for the loaded package."""

I called the method files to match the importlib_resources API, but it could be something very different.

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @warsaw on Feb 17, 2020, 14:45

Yes, something like that. Have to think about whether files() is the right name. Maybe open a separate issue to discuss the topic?

@jaraco
Copy link
Member Author

jaraco commented Oct 21, 2020

In GitLab by @jaraco on Feb 17, 2020, 14:46

Yes. #77.

@jaraco jaraco closed this as completed Oct 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant