Thanks to visit codestin.com
Credit goes to github.com

Skip to content

TYP: Many typing constructs are invariant #46535

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

TYP: Many typing constructs are invariant #46535

wants to merge 2 commits into from

Conversation

twoertwein
Copy link
Member

@twoertwein twoertwein commented Mar 27, 2022

Indexes and columns can be of any hashable type: we often have functions accepting a list/dict/callables of hashable types.

Unfortunately, List/Dict/Mapping/Callable are all invariant in their (first) argument, which leads to unintuitive behavior:

from typing import Hashable

def test(x: list[Hashable]) -> None:
    ...

x1: list[str] = ["a"]  # str is Hashable
test(x1)  # but this still errors

x2: list[Hashable] = ["a"]
test(x2)  # works

One (honestly not well-promotet) use case of TypeVars is to allow sub-classes in these cases:

from typing import Hashable, TypeVar

HashableT = TypeVar("HashableT", bound=Hashable)

def test(x: list[HashableT]) -> None:
    ...

x1: list[str] = ["a"]  # str is Hashable
test(x1)  # works :)

x2: list[Hashable] = ["a"]
test(x2)  # still works

If we had type tests, we would have noticed this earlier - Thanks to @Dr-Irv for finding this!

xref #46428 (comment)

It might be worth preventing Hashable being the first argument to the above typing containers by enforcing that in pandas-dev-flaker

Copy link
Contributor

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the approach with TypeVar will work. Using this PR, with this simple test program, you will get errors from mypy on the second call to read_csv :

from io import StringIO
import pandas as pd

dt = { "a": float, "b": "int"}
sio = "a,b\n3.4, 2\n"

okdf = pd.read_csv(StringIO(sio), dtype={ "a": float, "b": "int"})

df = pd.read_csv(StringIO(sio), dtype=dt)
print(df)

The issue seems to be how mypy is interpreting the type of dt and that inferred type is not compatible with the Dict that uses HashableT as its key.

This is what I'm seeing with rename() as well.

@twoertwein
Copy link
Member Author

The issue seems to be how mypy is interpreting the type of dt and that inferred type is not compatible with the Dict that uses HashableT as its key.

In this case it semes to be an issue with the values, not the keys:

error: Argument "dtype" to "read_csv" has incompatible type "Dict[str, object]"; expected "Optional[Union[Union[ExtensionDtype, Union[str, dtype[Any], Type[str], Type[float], Type[int], Type[complex], Type[bool], Type[object]]], Dict[Any, Union[ExtensionDtype, Union[str, dtype[Any], Type[str], Type[float], Type[int], Type[complex], Type[bool], Type[object]]]]]]" [arg-type]

NpDtype = Union[str, np.dtype, type_t[Union[str, float, int, complex, bool, object]]]
Dtype = Union["ExtensionDtype", NpDtype]
DtypeArg = Union[Dtype, Dict[HashableT, Dtype]]

But there will be issue when multiple HashableT are used within the same function:

from typing import Hashable, TypeVar, List, Dict

HashableT = TypeVar("HashableT", bound=Hashable)

def test(a: Dict[HashableT, str], b:  List[HashableT]) -> None:
    ...

a: Dict[str, str] = {}
b: List[int] = []

test(a, b)  # error

If TypeVar is a way to get covariant behavior with these typing containers, then we might need many TypeVars: HashableT1,, HashableT2, ... Honestly, I would be inclined to wait until we have type tests, then we know empirically how many we need.

@twoertwein twoertwein marked this pull request as draft March 28, 2022 13:08
@Dr-Irv
Copy link
Contributor

Dr-Irv commented Mar 28, 2022

If TypeVar is a way to get covariant behavior with these typing containers, then we might need many TypeVars: HashableT1,, HashableT2, ... Honestly, I would be inclined to wait until we have type tests, then we know empirically how many we need.

Based on the discussions that happened as a result of me raising the issue with mypy, I don't think that TypeVar is the way to get the covariant behavior. I think we need to use Sequence[sometype] instead of List[sometype], where sometype is a Union or parent of multiple types (e.g., Hashable), as that will be covariant. List[str] is OK, but replace List[str|int] with Sequence[str|int] and replace List[Hashable] with Sequence[Hashable] . For Mapping and Dict, we can't use Hashable or HashableT as a key - have to switch to Any. I think Hashable is OK as the value of a Mapping and Dict

@twoertwein
Copy link
Member Author

twoertwein commented Mar 29, 2022

I think we need to use Sequence[sometype] instead of List[sometype]

I agree, this is definitely the better option if the implementation actually works with any Sequence (tuples are often handles very differently than lists in pandas).

For Mapping and Dict, we can't use Hashable or HashableT as a key - have to switch to Any.

I have not yet understood why we cannot use a TypeVar for keys. As far as I understand, it is the only way to achieve covariant keys (assuming that the TypeVar is used once in the definition). Introducing many Anys in the code base reduces the benefit of type checking the pandas code itself.

I didn't test this: can Protocols be used as dict keys - might be another why to deal with this.

edit: Hashable is a protocol :( https://github.com/python/typeshed/blob/e8fe316a74b49fae43efe36859c294888160d399/stdlib/typing.pyi#L666

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Mar 29, 2022

I have not yet understood why we cannot use a TypeVar for keys. As far as I understand, it is the only way to achieve covariant keys (assuming that the TypeVar is used once in the definition). Introducing many Anys in the code base reduces the benefit of type checking the pandas code itself.

I don't fully understand it either, but here's another way to look at it.

A dictionary or mapping key must be hashable. For example, you cannot create a dict that has keys that are lists. So if we think of the universe as consisting of distinct sets of types Hashable and NonHashable types, and all those together correspond to Any, then it is impossible to create a dict object that has keys that are of type NonHashable . So if a method expects an argument that is a dict with keys that are Hashable, and we declare that type dict[Any, Hashable], then the typing is doing what it is supposed to do (allow a dict with any valid type for the keys) because it is impossible to have a non-hashable key.

Where things get harder is where you want to restrict the keys to be a union of 2 types, e.g., dict[int | str, str] . I think a TypeVar might work there (more testing would be required), but Hashable is so special that we're better off not using it as the type of a key in a dict or Mapping .

@twoertwein
Copy link
Member Author

A dictionary or mapping key must be hashable.

That works for dict, but Mapping doesn't seem to have this requirement based on typeshed.

I'll close this PR. I think we need to address this on a case to case basis and test what works and what doesn't.

@twoertwein twoertwein closed this Mar 29, 2022
@twoertwein twoertwein deleted the HashableT branch April 1, 2022 01:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants