Thanks to visit codestin.com
Credit goes to github.com

Skip to content

MAINT: Remove unsafe unions and ABCs from return-annotations #18885

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
May 4, 2021

Conversation

BvB93
Copy link
Member

@BvB93 BvB93 commented May 2, 2021

closes #18305.

Per the title, this PR removes unsafe unions and abstract baseclasses from the return annotations,
e.g. functions that currently have one of the following patterns.

from __future__ import annotations
from typing import Any, Sequence, Any as A, Any as B

def func1(*args: Any, **kwargs: Any) -> A | B:
    pass

def func2(*args: Any, **kwargs: Any) -> Sequence[Any]:
    pass

The Problem

The problem with returning a Union (or, almost equivalently, an abstract-ish baseclass such as generic)
is that any and all operations performed on a union must be compatible with all of its members.
For example, operations that are exclusive to either np.float64 and np.ndarray are thus not allowed
to be executed by np.float64 | np.ndarray, unless the union is narrowed down via an explicit isinstance
check a priori.

While returning a union thus adds some a form of simplicity to the annotations (as we don't have to
distinguish between 0D and ND array-likes), the Union type is simply not suited for what we're trying
to describe here (xref python/mypy#1693). This would be a different story
for a hypothetical UnsafeUnion type, one where operations must be compatible with any member
of the union. Such type does not exist though.

from __future__ import annotations
from typing import Any
import numpy as np

array: np.ndarray[Any, np.dtype[np.float64]]
out = np.isneginf(array)

if TYPE_CHECKING:
    # note: Revealed type is 'Union[numpy.bool_, numpy.ndarray[Any, numpy.dtype[numpy.bool_]]]'
    reveal_type(out)

for i in out:
    # error: Item "bool_" of "Union[bool_, ndarray[Any, dtype[bool_]]]" has no attribute "__iter__" (not iterable)
    print(i)

# Only now will things
if isinstance(out, np.ndarray):
    for i in out:
        print(i)
else:
    print(out)

The Solution

The solutions implemented herein fall in either one of the following two categories:

  • Simply set the return type to Any. This is a simple, but non-ideal solution, as it removes any type
    safety for objects returned by aforementioned functions. Nevertheless, there is a group of functions
    that still needs to-be updated for dtype-support anyway (e.g. those in np.core.fromnumeric),
    so setting their return to Any is by no means the worst thing that can happen.
    A second group of functions is those where the use of Any is simply a necasity, as the output type is.
    For example, determined by the value of string literals (see np.einsum). As a silver lining: the einsum
    problem in particular does seem like it can be resolved with relative ease via a future mypy plugin.
  • Add additional an additional overload for 0D array-likes. This is the more thorough and permanent fix
    for the unsafe-union issue; it has been applied to the more recently annotated modules such as
    np.lib.ufunclike.
    There is however one important caveat here: as we currently lack shape-support (Typing support for shapes #16544) it is
    currently impossible to distinguish between 0D and ND ndarrays, and thus we are unable to describe
    the 0D-to-scalar casting that numpy aggressively performs on 0D arrays. While this should change
    once PEP 646 is live, in the mean time users will have to settle for a typing.cast call or a # type: ignore
    comment if it is known in advance that 0D-to-scalar cast will be performed.

Comment on lines +143 to +156
0D arrays
~~~~~~~~~

During runtime numpy aggressively casts any passed 0D arrays into their
corresponding `~numpy.generic` instance. Until the introduction of shape
typing (see :pep:`646`) it is unfortunately not possible to make the
necessary distinction between 0D and >0D arrays. While thus not strictly
correct, all operations are that can potentially perform a 0D-array -> scalar
cast are currently annotated as exclusively returning an `ndarray`.

If it is known in advance that an operation _will_ perform a
0D-array -> scalar cast, then one can consider manually remedying the
situation with either `typing.cast` or a ``# type: ignore`` comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TLDR: The new return annotations will now just be somewhat inconvenient for 0D arrays, rather than 0D and ND arrays.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also a second question: are there any parts of the numpy documentation that describe 0D arrays and/or numpy's aggressive 0D-to-scalar casting? If so, then it might be useful the place a link here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The casts are scattered through the C code as PyArray_Return. The policy, such as it is, is documented as

.. c:function:: PyObject* PyArray_Return(PyArrayObject* arr)

    This function steals a reference to *arr*.

    This function checks to see if *arr* is a 0-dimensional array and,
    if so, returns the appropriate array scalar. It should be used
    whenever 0-dimensional arrays could be returned to Python.

@BvB93
Copy link
Member Author

BvB93 commented May 2, 2021

A question: could we exclude the numpy/typing/tests/data directory from the lint tests?
For those test data we're reliant on single-line comments that are frequently longer than the prescribed line-length limit.

Edit: Done as of 15420c8 and 6fa34d4.

Bas van Beek added 2 commits May 2, 2021 21:10
With the current tests system we cannot reasonably enforce E501 (maximum line length)
E704 (multiple statements on one line (def)) is a style rule not prescribed by PEP8. Furthermore, because it demands a function body it is needlessly inconvenient for static type checking, i.e. situation where this is no function body.
@BvB93
Copy link
Member Author

BvB93 commented May 2, 2021

Any idea why pycodestyle is still checking numpy/typing/tests/data while it has just been added to exclude?

@charris
Copy link
Member

charris commented May 4, 2021

Any idea why pycodestyle is still checking numpy/typing/tests/data

Maybe it is picking up lint_diff.ini before the PR? Or maybe it doesn't work :) We won't know if it is working for numpy/__config__.py until we try to change it.

@charris charris merged commit 4d753a0 into numpy:main May 4, 2021
@charris
Copy link
Member

charris commented May 4, 2021

Thanks Bas. The long lines are no worse than before.

@BvB93
Copy link
Member Author

BvB93 commented May 4, 2021

Maybe it is picking up lint_diff.ini before the PR? Or maybe it doesn't work :) We won't know if it is working for numpy/__config__.py until we try to change it.

Running pycodestyle locally with the config file does seem to work, so I suspect (and hope) it's the former.

@BvB93 BvB93 deleted the unsafe branch May 4, 2021 17:52
@charris charris removed the 09 - Backport-Candidate PRs tagged should be backported label May 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

np.clip typing is not as specific as ideal; existing code doesn't type-check
2 participants