-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
TYP: np.argmin and np.argmax overload changes #28906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you explain why you chose to remove some of the overloads, and why you modified the parameter types and return types of the other ones?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The NumPy type-tests are located at numpy/typing/tests/data/reveal (acceptance- / true negatives) and numpy/typing/tests/data/fail (rejection / true positives)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok I will add tests in the right location.
I am going to reinsert the ones I removed. I removed them initially out of ignorance. But I wanted to remove this one:
@overload
def argmin(
a: ArrayLike,
axis: SupportsIndex | None = ...,
out: None = ...,
*,
keepdims: bool = ...,
) -> Any: ...
Wouldn't a type Any
potentially return a float64
? And since we only want to return integers, then shouldn't this be removed?
Thanks for the review, I'll keep working on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No that one should stay. It handles the case where it cannot be determined whether it should return a scalar or an array.
_IndexArray = NDArray[np.signedinteger] | NDArray[np.unsignedinteger] | NDArray[np.bool_] # type alias for argmin / argmax | ||
_OutT = TypeVar("_OutT", bound=_IndexArray) # Type variable, must be assignable to _IndexArray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since _IndexArray
is only used once, it would be a bit cleaner to inline it. It's also fine if you want to keep it, but in that case it should then be annotated as TypeAlias
.
Also, _IndexArray
is currently too restrictive, because it would currently reject valid types like like NDArray[bool_ | int_]
, NDArray[int8 | uint8]
, and NDArray[integer]
.
Additionally, it could help to have the name of the type parameter reflect the restriction on its upper bound. The name shows up in case of type-checker errors, e.g. when someone passes NDArray[np.float64]
. And if the user then sees _OutT
, it won't be very informative.
So given that, I'd personally probably go for something like:
_IndexArray = NDArray[np.signedinteger] | NDArray[np.unsignedinteger] | NDArray[np.bool_] # type alias for argmin / argmax | |
_OutT = TypeVar("_OutT", bound=_IndexArray) # Type variable, must be assignable to _IndexArray | |
_BoolOrIntArrayT = TypeVar("_BoolOrIntArrayT", bound=NDArray[np.integer | np.bool]) |
But there are probably many other good options too 🤷🏻
Diff from mypy_primer, showing the effect of this PR on type check results on a corpus of open source code: xarray (https://github.com/pydata/xarray)
+ xarray/core/resample_cftime.py: note: In member "first_items" of class "CFTimeGrouper":
+ xarray/core/resample_cftime.py:155: error: Need type annotation for "first_items" [var-annotated]
dedupe (https://github.com/dedupeio/dedupe)
+ dedupe/clustering.py:89: error: No overload variant of "max" matches argument types "ndarray[tuple[int, ...], dtype[signedinteger[_64Bit]]]", "int" [call-overload]
+ dedupe/clustering.py:89: note: Possible overload variants:
+ dedupe/clustering.py:89: note: def [SupportsRichComparisonT: SupportsDunderLT[Any] | SupportsDunderGT[Any]] max(SupportsRichComparisonT, SupportsRichComparisonT, /, *_args: SupportsRichComparisonT, key: None = ...) -> SupportsRichComparisonT
+ dedupe/clustering.py:89: note: def [_T] max(_T, _T, /, *_args: _T, key: Callable[[_T], SupportsDunderLT[Any] | SupportsDunderGT[Any]]) -> _T
+ dedupe/clustering.py:89: note: def [SupportsRichComparisonT: SupportsDunderLT[Any] | SupportsDunderGT[Any]] max(Iterable[SupportsRichComparisonT], /, *, key: None = ...) -> SupportsRichComparisonT
+ dedupe/clustering.py:89: note: def [_T] max(Iterable[_T], /, *, key: Callable[[_T], SupportsDunderLT[Any] | SupportsDunderGT[Any]]) -> _T
+ dedupe/clustering.py:89: note: def [SupportsRichComparisonT: SupportsDunderLT[Any] | SupportsDunderGT[Any], _T] max(Iterable[SupportsRichComparisonT], /, *, key: None = ..., default: _T) -> SupportsRichComparisonT | _T
+ dedupe/clustering.py:89: note: def [_T1, _T2] max(Iterable[_T1], /, *, key: Callable[[_T1], SupportsDunderLT[Any] | SupportsDunderGT[Any]], default: _T2) -> _T1 | _T2
optuna (https://github.com/optuna/optuna)
+ optuna/samplers/_nsgaiii/_elite_population_selection_strategy.py:213: error: Incompatible types in assignment (expression has type "integer[Any]", variable has type "ndarray[Any, Any]") [assignment]
+ optuna/importance/_ped_anova/scott_parzen_estimator.py:71: error: No overload variant of "min" matches argument types "int", "ndarray[tuple[int, ...], dtype[signedinteger[_32Bit | _64Bit]]]" [call-overload]
+ optuna/importance/_ped_anova/scott_parzen_estimator.py:71: note: Possible overload variants:
+ optuna/importance/_ped_anova/scott_parzen_estimator.py:71: note: def [SupportsRichComparisonT: SupportsDunderLT[Any] | SupportsDunderGT[Any]] min(SupportsRichComparisonT, SupportsRichComparisonT, /, *_args: SupportsRichComparisonT, key: None = ...) -> SupportsRichComparisonT
+ optuna/importance/_ped_anova/scott_parzen_estimator.py:71: note: def [_T] min(_T, _T, /, *_args: _T, key: Callable[[_T], SupportsDunderLT[Any] | SupportsDunderGT[Any]]) -> _T
+ optuna/importance/_ped_anova/scott_parzen_estimator.py:71: note: def [SupportsRichComparisonT: SupportsDunderLT[Any] | SupportsDunderGT[Any]] min(Iterable[SupportsRichComparisonT], /, *, key: None = ...) -> SupportsRichComparisonT
+ optuna/importance/_ped_anova/scott_parzen_estimator.py:71: note: def [_T] min(Iterable[_T], /, *, key: Callable[[_T], SupportsDunderLT[Any] | SupportsDunderGT[Any]]) -> _T
+ optuna/importance/_ped_anova/scott_parzen_estimator.py:71: note: def [SupportsRichComparisonT: SupportsDunderLT[Any] | SupportsDunderGT[Any], _T] min(Iterable[SupportsRichComparisonT], /, *, key: None = ..., default: _T) -> SupportsRichComparisonT | _T
+ optuna/importance/_ped_anova/scott_parzen_estimator.py:71: note: def [_T1, _T2] min(Iterable[_T1], /, *, key: Callable[[_T1], SupportsDunderLT[Any] | SupportsDunderGT[Any]], default: _T2) -> _T1 | _T2
static-frame (https://github.com/static-frame/static-frame)
+ static_frame/core/index.py:1371: error: Unused "type: ignore" comment [unused-ignore]
spark (https://github.com/apache/spark)
+ python/pyspark/ml/linalg/__init__.py:844: error: Incompatible return value type (got "ndarray[tuple[int, ...], dtype[float64]]", expected "float64") [return-value]
+ python/pyspark/mllib/linalg/__init__.py:964: error: Incompatible return value type (got "ndarray[tuple[int, ...], dtype[float64]]", expected "float64") [return-value]
|
Attempts to close #28641