Thanks to visit codestin.com
Credit goes to github.com

Skip to content

MAINT: Hide internals of np.lib to only show submodules #18447

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

seberg
Copy link
Member

@seberg seberg commented Feb 19, 2021

This commit creates a new module to "conveniently" import all
symbols into the main namespace, while hiding them in np.lib.


What this does and does not:

  • np.lib.<tab> is now very lean
  • np.lib.<symbol> will keep working for all symbols (but we can deprecate that)
  • It still uses __all__ to define what ends up in the main namespace. That is a bit orthogonal to hiding np.lib though. This means for example that from numpy.lib.stride_tricks import * will not import as_strided.

It would not be hard to fix the __all__ usage (as the comment says) there are two options I see:

  • Use a decorator to add functions to the main namespace, since we already use that to "fix" the module anyway.
  • Make a second list besides __all__ (which all can add to, or ignore) and manually import all of those names (this is slightly ugly, but not really tricky, I had the code ready but felt it is better taken in steps)

Keeping as draft, because I think it requires a decision, I expect it is ready for review code-wise, besides a test for what is in np.lib.__dir__...

This commit creates a new module to "conveniently" import all
symbols into the main namespace, while hiding them in `np.lib`.
@seberg seberg force-pushed the hide-lib-namespace branch from 4621823 to 91bfcfa Compare February 20, 2021 01:20
@seberg
Copy link
Member Author

seberg commented Feb 20, 2021

Hmm, I guess I may have thought a bit too complicated here. I tried to circumvent the way we currently use __all__, that is great if we are OK to make from numpy.lib import * not import much anymore. Otherwise, it would be even more minimal: Just add a __dir__ with the list things we like (the current __all__).

@rgommers
Copy link
Member

Thanks for this @seberg!

Make a second list besides __all__ (which all can add to, or ignore) and manually import all of those names (this is slightly ugly, but not really tricky, I had the code ready but felt it is better taken in steps)

I'd prefer this option, just an alphabetically ordered explicit list seems like the cleanest solution - and easiest to modify for contributors that aren't familiar with how this works exactly.

@seberg
Copy link
Member Author

seberg commented Feb 22, 2021

Well, I think we need to clarify what we want from numpy.lib import * to do. Because that decides on what the solution should be here. The current version de-facto disables it, because what you get is almost identical to from numpy import *, we could even do the weird "deprecation warning on * import" trick. Which is not entirely ridiculous, considering that lib is 95% an implementation detail.

@rgommers
Copy link
Member

There's maybe a cleaner alternative here:

  1. Move the current numpy/lib/ to numpy/_main_namespace/.
  2. Create a new empty numpy/lib/ and re-add only the few subsubmodules that we want to be public.

Well, I think we need to clarify what we want from numpy.lib import * to do. Because that decides on what the solution should be here. The current version de-facto disables it, because what you get is almost identical to from numpy import *, we could even do the weird "deprecation warning on * import" trick. Which is not entirely ridiculous, considering that lib is 95% an implementation detail.

If we can deprecate it, that would be great.

@rossbar
Copy link
Contributor

rossbar commented Feb 23, 2021

There's maybe a cleaner alternative here:

  1. Move the current numpy/lib/ to numpy/_main_namespace/.
  2. Create a new empty numpy/lib/ and re-add only the few subsubmodules that we want to be public.

In this case, what would happen with the submodules in lib that export some objects but not others. For example, stride_tricks has a few functions e.g. broadcast_to that are available from the main numpy namespace, but several others (as_strided, sliding_window_view) that aren't.

@rgommers
Copy link
Member

In this case, what would happen with the submodules in lib that export some objects but not others.

That's wrong to begin with. Functions must be public API in exactly one place. So we do what we have to for backwards compat with the "hiding trick", but there's no conflict here.

@rossbar
Copy link
Contributor

rossbar commented Feb 23, 2021

I am just confused by the file/directory renaming. It would be strange to me (though not necessarily any better/worse than the current situation) to have a numpy/_main_namespace/stride_tricks.py, which had some objects imported into the main numpy namespace, but not others.

@rgommers
Copy link
Member

... but not others.

There should not be any of those, that does not make sense. Only things that end up in the main namespace need moving. stride_tricks is one of the very few public namespace, so you wouldn't move that. Things like twodim_base.py or nanfunctions.py are not meant as separate public namespaces, they're just randomly named files that happen to miss an underscore. So there's no problem moving them.

@BvB93
Copy link
Member

BvB93 commented Feb 23, 2021

If the goal here is to hide all re-exported functions from the np.lib namespace, then it would also be worthwhile to clear out the annotations in numpy/lib/__init__.pyi (except for __all__ I suppose, that one can stay).

# through `__getattr__` (see `__dir__`).
__all__ = {'emath', 'Arrayterator', 'tracemalloc_domain', 'NumpyVersion'}

__lib_submodules_ = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, is there a reason why this isn't just a normal single-underscored variable (e.g. _lib_submodules)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I guess I had started with dunders, and then thought that dunders are sometimes special, so no dunder might be better, I suppose it looked very private, like in classes with name mangling :).

I have to think about the approach a bit more. I am not certain that actually renaming files helps much, this achieves almost the same thing.

@BvB93 yeah, if we do this, the pyi file should just be removed, only leaving the things that current are there also. If that doesn't include modules, that only leaves __all__ and 3 others symbols or so, I guess. I am almost surprised we have a a pyi at all...

Copy link
Member

@BvB93 BvB93 Feb 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the pyi file: there have been issues in the past with certain circular imports crashing mypy (#17316), issues that were alleviated by adding the likes numpy/core/__init__.pyi. Because of this I'm a bit reluctant to remove the pyi file here in its entirety, as I'm afraid at might break something.

@leofang
Copy link
Contributor

leofang commented Feb 24, 2021

cc: @kmaehashi

Base automatically changed from master to main March 4, 2021 02:05
@stefanv
Copy link
Contributor

stefanv commented Mar 13, 2021

In this case, what would happen with the submodules in lib that export some objects but not others.

That's wrong to begin with. Functions must be public API in exactly one place. So we do what we have to for backwards compat with the "hiding trick", but there's no conflict here.

Ultimately, I think that would be true if we had done a better job at keeping the main namespace clean. But, given that we haven't, a sensible topic-based organization of the lib namespace may have to duplicate functions that are already in the numpy namespace. This is not ideal, but I'd find it surprising if all broadcasting related functionsβ€”except for the ones in the numpy namespaceβ€”live in np.lib.broadcast, e.g.

Ideally, we'd move many functions from the main namespace to np.lib.xyz, and deprecate the main namespace functionsβ€”but that's going to be slow.

Side question: do we need the lib? Why not numpy.index instead of numpy.lib.index?

@rgommers
Copy link
Member

rgommers commented Mar 13, 2021

This is not ideal, but I'd find it surprising if all broadcasting related functionsβ€”except for the ones in the numpy namespaceβ€”live in np.lib.broadcast, e.g.

That would be a major change. I'm not saying I disagree with the principle (it does make sense), but then we'd have to do it consistently - e.g. add np.linalg.dot and other missing functions in the linalg namespace (that'd actually be nice, cause the split is completely arbitrary now).

Side question: do we need the lib? Why not numpy.index instead of numpy.lib.index?

We definitely need .lib imho. The whole point is to have a single decent place to put things like utilities that we would otherwise not consider important enough for inclusion in NumPy.

The "core array library" and recommended/essential parts of numpy are:

  • main namespace
  • numpy.linalg
  • numpy.fft
  • numpy.random

All the other numpy.xxx namespace are now largely niche and unmaintained. We can clean that up and message consistently that we have the four "good" namespaces above, clean up the rest, and have a single new place for more narrow functionality and utilities, numpy.lib.xxx, where the bar for inclusion is lower. IDEs will also then make the linalg, fft and random namespaces more prominent (that's what's triggered this discussion in the first place).

@stefanv
Copy link
Contributor

stefanv commented Mar 13, 2021

That all makes sense. Would you be okay with the bigger change if Sebastian were to embark on that mission?

@rgommers
Copy link
Member

That all makes sense. Would you be okay with the bigger change if Sebastian were to embark on that mission?

I think so. But note that it may have consequences for downstream libraries that aim for compatibility with numpy - it would be good to make a proposal on the mailing list.

The current broadcasting/stride-tricks questions are completely hypothetical I think, so I'd suggest finishing this PR first before attempting to mess with linalg.

# Note that it does _not_ hide symbols from `np.lib.__dict__()`, to do
# that, we would have to delete them from the locals here and hide them
# into `__getattr__`. But this seems sufficient, e.g. for tab completion.
return __all__
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return __all__
return sorted(globals().keys() | set(__all__))

More or less equivalent to sorted(super().__dir__() | set(__all__)) in this case,
but module-level __getattr__ lacks a self argument.

The import thing here is that you're missing quite a few types.ModuleType-based attributes if you just return __all__ here (and even the __all__ attribute itself).

@seberg
Copy link
Member Author

seberg commented Feb 4, 2022

Going to close this for now, since I am not going to make it a priority to finish it off, and whatever we do may look a bit different anyway.

@seberg seberg closed this Feb 4, 2022
@stefanv
Copy link
Contributor

stefanv commented Feb 4, 2022

Thanks for the attempt, Sebastian.

Here's the current list of submodules:

__config__
_distributor_init
_globals
_mat
_pytesttester
_version
char
compat
core
ctypeslib
emath
fft
lib
linalg
ma
math
matrixlib
os
polynomial
random
rec
sys
testing
version
warnings

For something we can do right now, without much controversy, how about we override __dir__ to only show:

__config__
ctypeslib
distutils
fft
lib
linalg
ma
polynomial
random
testing

The other submodules will still be accessible, but they won't be in lists of auto-completions etc. to confuse users.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants