-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
TYP: Many typing constructs are invariant #46535
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the approach with TypeVar
will work. Using this PR, with this simple test program, you will get errors from mypy
on the second call to read_csv
:
from io import StringIO
import pandas as pd
dt = { "a": float, "b": "int"}
sio = "a,b\n3.4, 2\n"
okdf = pd.read_csv(StringIO(sio), dtype={ "a": float, "b": "int"})
df = pd.read_csv(StringIO(sio), dtype=dt)
print(df)
The issue seems to be how mypy is interpreting the type of dt
and that inferred type is not compatible with the Dict
that uses HashableT
as its key.
This is what I'm seeing with rename()
as well.
In this case it semes to be an issue with the values, not the keys:
NpDtype = Union[str, np.dtype, type_t[Union[str, float, int, complex, bool, object]]]
Dtype = Union["ExtensionDtype", NpDtype]
DtypeArg = Union[Dtype, Dict[HashableT, Dtype]] But there will be issue when multiple HashableT are used within the same function: from typing import Hashable, TypeVar, List, Dict
HashableT = TypeVar("HashableT", bound=Hashable)
def test(a: Dict[HashableT, str], b: List[HashableT]) -> None:
...
a: Dict[str, str] = {}
b: List[int] = []
test(a, b) # error If TypeVar is a way to get covariant behavior with these typing containers, then we might need many TypeVars: |
Based on the discussions that happened as a result of me raising the issue with |
I agree, this is definitely the better option if the implementation actually works with any Sequence (tuples are often handles very differently than lists in pandas).
I have not yet understood why we cannot use a TypeVar for keys. As far as I understand, it is the only way to achieve covariant keys (assuming that the TypeVar is used once in the definition). Introducing many Anys in the code base reduces the benefit of type checking the pandas code itself. I didn't test this: can Protocols be used as dict keys - might be another why to deal with this. edit: Hashable is a protocol :( https://github.com/python/typeshed/blob/e8fe316a74b49fae43efe36859c294888160d399/stdlib/typing.pyi#L666 |
I don't fully understand it either, but here's another way to look at it. A dictionary or mapping key must be hashable. For example, you cannot create a Where things get harder is where you want to restrict the keys to be a union of 2 types, e.g., |
That works for dict, but Mapping doesn't seem to have this requirement based on typeshed. I'll close this PR. I think we need to address this on a case to case basis and test what works and what doesn't. |
Indexes and columns can be of any hashable type: we often have functions accepting a list/dict/callables of hashable types.
Unfortunately, List/Dict/Mapping/Callable are all invariant in their (first) argument, which leads to unintuitive behavior:
One (honestly not well-promotet) use case of TypeVars is to allow sub-classes in these cases:
If we had type tests, we would have noticed this earlier - Thanks to @Dr-Irv for finding this!
xref #46428 (comment)
It might be worth preventing Hashable being the first argument to the above typing containers by enforcing that in pandas-dev-flaker