-
Notifications
You must be signed in to change notification settings - Fork 274
Description
It is proposed to add special methods __subclass_base__ and __class_getitem__ to CPython, these will allow making generics non-classes thus simplifying them and significantly improving their performance.
@gvanrossum, the main question now is should this be a PEP?
Motivation:
There are three main points of motivation: performance of typing module, metaclass conflicts, and large amount of hacks currently used in typing.
Performance:
The typing module is one of the heaviest and slowest modules in stdlib even with all the optimizations made. Mainly this is because subscripted generics are classes. See also #432. The three main ways how the performance will be improved:
-
Creation of generic classes is slow since the
GenericMeta.__new__is very slow, we will not need it anymore. -
Very long MROs for generic classes will be twice shorter, they are present because we duplicate the
collections.abcinheritance chain intyping. -
Time of instantiation of generic classes will be improved (this is minor however).
Metaclass conflicts:
All generic types are instances of GenericMeta, so if a user uses a custom metaclass, it is hard to make a corresponding class generic. This is in particular hard for library classes, that a user doesn't control. A workaround is to always mix-in GenericMeta:
class AdHocMeta(GenericMeta, LibraryMeta):
pass
class UserClass(LibraryBase, Generic[T], metaclass=AdHocMeta):
...but this is not always practical or even possible.
Hacks and bugs that will be removed by this proposal:
-
_generic_newhack that exists since__init__is not called on instances with a type differing form the type whose__new__was called,C[int]().__class__ is C. -
_next_in_mrospeed hack will be not necessary since subscription will not create new classes. -
Ugly
sys._getframehack, this one is particularly nasty, since it looks like we can't remove it without changes outsidetyping. -
Currently generics do "dangerous" things with private ABC caches to fix large memory consumption that grows at least as
O(N**2), see Optimize ABC caches #383. This point is also important because I would like to re-implementABCMetain C. This will allow to reduce Python start-up time and also start-up times for many programs that extensively use ABCs. My implementation passes all tests excepttest_typing, because I want to make_abc_cacheetc. read-only, so that one can't do something likeMyABC._abc_cache = "Surprise when updating caches!") -
Problems with sharing attributes between subscripted generics, see Subscripted generic classes should not have independent class variables #392. Current solution already uses
__getattr__and__setattr__, but it is still incomplete, and solving this without the current proposal will be hard and will need__getattribute__. -
_no_slots_copyhack, where we clean-up the class dictionary on every subscription thus allowing generics with__slots__. -
General complexity of
typingmodule, the new proposal will not only allow to remove the above mentioned hacks/bugs, but also simplify the implementation, so that it will be easier to maintain.
Details of the proposal:
New methods API:
-
Idea of
__class_getitem__is very simple, it is an exact analog of__getitem__with an exception that it is called on a class that defines it, not on its instances, this allows us to avoidGenericMeta.__getitem__. -
If an object that is not a class object appears in bases of a class definition, the
__subclass_base__is searched on it. If found, it is given an original tuple of bases as an argument. If the result of call is notNone, then it is substituted instead of this object. Otherwise, the base is just removed. This is necessary to avoid inconsistent MRO errors, that are currently prevented by manipulations inGnericMeta.__new__. After creating the class, original bases are saved in__orig_bases__(now this is also done by the metaclass).
Changes necessary in typing module:
Key point is instead of GenericMeta metaclass, we will have GenericAlias class.
Generic will have:
- a
__class_getitem__that will return instances ofGenericAliaswhich keep track of the original class and type arguments. __init_subclass__that will properly initialize the subclasses, and perform necessary bookkeeping.
GenericAlias will have:
- a normal
__getitem__so that it can be further subscripted thus preserving the current API. __call__,__getattr__, and__setattr__that will simply pass everything to the original class object.__subclass_base__that will return the original class (orNonein some special cases).
The generic versions of collections.abc classes will be simple subclasses like this:
class Sequence(collections.abc.Sequence, Generic[T_co]):
pass(typeshed of course will track that Sequence[T_co] inherits from Iterable[T_co] etc.)
Transition plan:
- Merge the changes into CPython (ideally before the end of September).
- Branch a separate version of
typingfor Python 3.7 and simplify it by removing backward compatibility hacks. - Update the 3.7 version to use the dedicated CPython API (this might be done in few separate PRs).
Backwards compatibility and impact on users who don't use typing:
This proposal will allow to have practically 100% backwards compatibility with current public typing API. Actually the whole idea of introducing two special methods appeared form the desire to preserve backwards compatibility while solving the above listed problems.
The only two exceptions that I see now are that currently issubclass(List[int], List) returns True, with this proposal it will raise TypeError. Also issubclass(collections.abc.Iterable, typing.Iterable) will return False, which is actually good I think, since currently we have a (virtual) inheritance cycle between them.
With my implementation, see https://github.com/ilevkivskyi/cpython/pull/2/files, I measured negligible effects (under 1%) for regular (non-generic) classes.