Description
Feature or enhancement
Re-use the runtime-global singleton strings inside the interned_strings dict to reduce duplication of
singleton strings (improving performance where such strings are used).
Pitch
bpo-46430 (#30683) caused an interesting side effect; the code
x = 'a'; x[0] is x
no longer returned True. This in turn
is because there are two different cached versions of 'a':
- One that was cached when code in frozen modules was compiled
(and is stored in the interned_dict) - One that is stored as a runtime-global object that is used
during function calls (and is stored in _Py_SINGLETON(strings))
However, some characters do not have this behaviour (for example,
'g', 'u', and 'z'). I suspect it because these characters are not
used in co_consts of frozen modules.
The interned_dict is per interpreter, and is initialized by
init_interned_dict(PyInterpreterState *)
. Currently, it is
initialized to an empty dict, which allows code in frozen modules
to use their (different and per interpreter) singleton strings
instead of the runtime-global one.
Using the synthetic test case:
def test():
total = 0
for ch in 'abc':
if ch in {'a', 'c'}:
total += 1
return total
timeit.timeit(`test()`, globals=globals())
I get a ~5.43% improvement when the interned_strings dict reuses the runtime-global singleton strings.