PEP: 550
Title: Execution Context
Version:
This PEP proposes a new mechanism to manage execution state--the logical environment in which a function, a thread, a generator, or a coroutine executes in.
A few examples of where having a reliable state storage is required:
- Context managers like decimal contexts,
numpy.errstate, andwarnings.catch_warnings; - Storing request-related data such as security tokens and request data in web applications, implementing i18n;
- Profiling, tracing, and logging in complex and large code bases.
The usual solution for storing state is to use a Thread-local Storage
(TLS), implemented in the standard library as threading.local().
Unfortunately, TLS does not work for the purpose of state isolation
for generators or asynchronous code, because such code executes
concurrently in a single thread.
Traditionally, a Thread-local Storage (TLS) is used for storing the state. However, the major flaw of using the TLS is that it works only for multi-threaded code. It is not possible to reliably contain the state within a generator or a coroutine. For example, consider the following generator:
def calculate(precision, ...):
with decimal.localcontext() as ctx:
# Set the precision for decimal calculations
# inside this block
ctx.prec = precision
yield calculate_something()
yield calculate_something_else()
Decimal context is using a TLS to store the state, and because TLS is
not aware of generators, the state can leak. If a user iterates over
the calculate() generator with different precisions one by one
using a zip() built-in, the above code will not work correctly.
For example:
g1 = calculate(precision=100) g2 = calculate(precision=50) items = list(zip(g1, g2)) # items[0] will be a tuple of: # first value from g1 calculated with 100 precision, # first value from g2 calculated with 50 precision. # # items[1] will be a tuple of: # second value from g1 calculated with 50 precision (!!!), # second value from g2 calculated with 50 precision.
An even scarier example would be using decimals to represent money in an async/await application: decimal calculations can suddenly lose precision in the middle of processing a request. Currently, bugs like this are extremely hard to find and fix.
Another common need for web applications is to have access to the current request object, or security context, or, simply, the request URL for logging or submitting performance tracing data:
async def handle_http_request(request):
context.current_http_request = request
await ...
# Invoke your framework code, render templates,
# make DB queries, etc, and use the global
# 'current_http_request' in that code.
# This isn't currently possible to do reliably
# in asyncio out of the box.
These examples are just a few out of many, where a reliable way to store context data is absolutely needed.
The inability to use TLS for asynchronous code has lead to proliferation of ad-hoc solutions, which are limited in scope and do not support all required use cases.
Current status quo is that any library, including the standard library, that uses a TLS, will likely not work as expected in asynchronous code or with generators (see [3] as an example issue.)
Some languages that have coroutines or generators recommend to
manually pass a context object to every function, see [1]
describing the pattern for Go. This approach, however, has limited
use for Python, where we have a huge ecosystem that was built to work
with a TLS-like context. Moreover, passing the context explicitly
does not work at all for libraries like decimal or numpy,
which use operator overloading.
.NET runtime, which has support for async/await, has a generic
solution of this problem, called ExecutionContext (see [2]).
On the surface, working with it is very similar to working with a TLS,
but the former explicitly supports asynchronous code.
The goal of this PEP is to provide a more reliable alternative to
threading.local(). It should be explicitly designed to work with
Python execution model, equally supporting threads, generators, and
coroutines.
An acceptable solution for Python should meet the following requirements:
- Transparent support for code executing in threads, coroutines, and generators with an easy to use API.
- Negligible impact on the performance of the existing code or the code that will be using the new mechanism.
- Fast C API for packages like
decimalandnumpy.
Explicit is still better than implicit, hence the new APIs should only be used when there is no acceptable way of passing the state explicitly.
Execution Context is a mechanism of storing and accessing data specific
to a logical thread of execution. We consider OS threads,
generators, and chains of coroutines (such as asyncio.Task)
to be variants of a logical thread.
In this specification, we will use the following terminology:
- Logical Context, or LC, is a key/value mapping that stores the context of a logical thread.
- Execution Context, or EC, is an OS-thread-specific dynamic stack of Logical Contexts.
- Context Key, or CK, is an object used to set and get values from the Execution Context.
Please note that throughout the specification we use simple pseudo-code to illustrate how the EC machinery works. The actual algorithms and data structures that we will use to implement the PEP are discussed in the Implementation Strategy section.
The sys.new_context_key(name) function creates a new ContextKey
object. The name parameter is a str needed to render a
representation of ContextKey object for introspection and
debugging purposes.
ContextKey objects have the following methods and attributes:
.name: read-only name;.set(o)method: set the value toofor the context key in the execution context..get()method: return the current EC value for the context key. Context keys returnNonewhen the key is missing, so the method never fails.
The below is an example of how context keys can be used:
my_context = sys.new_context_key('my_context')
my_context.set('spam')
# Later, to access the value of my_context:
print(my_context.get())
Execution Context is implemented on top of Thread-local Storage.
For every thread there is a separate stack of Logical Contexts --
mappings of ContextKey objects to their values in the LC.
New threads always start with an empty EC.
For CPython:
PyThreadState:
execution_context: ExecutionContext([
LogicalContext({ci1: val1, ci2: val2, ...}),
...
])
The ContextKey.get() and .set() methods are defined as
follows (in pseudo-code):
class ContextKey:
def get(self):
tstate = PyThreadState_Get()
for logical_context in reversed(tstate.execution_context):
if self in logical_context:
return logical_context[self]
return None
def set(self, value):
tstate = PyThreadState_Get()
if not tstate.execution_context:
tstate.execution_context = [LogicalContext()]
tstate.execution_context[-1][self] = value
With the semantics defined so far, the Execution Context can already
be used as an alternative to threading.local():
def print_foo():
print(ci.get() or 'nothing')
ci = sys.new_context_key('ci')
ci.set('foo')
# Will print "foo":
print_foo()
# Will print "nothing":
threading.Thread(target=print_foo).start()
Execution Context is generally managed by the Python interpreter, but sometimes it is desirable for the user to take the control over it. A few examples when this is needed:
- running a computation in
concurrent.futures.ThreadPoolExecutorwith the current EC; - reimplementing generators with iterators (more on that later);
- managing contexts in asynchronous frameworks (implement proper
EC support in
asyncio.Taskandasyncio.loop.call_soon.)
For these purposes we add a set of new APIs (they will be used in later sections of this specification):
sys.new_logical_context(): create an emptyLogicalContextobject.sys.new_execution_context(): create an emptyExecutionContextobject.Both
LogicalContextandExecutionContextobjects are opaque to Python code, and there are no APIs to modify them.sys.get_execution_context()function. The function returns a copy of the current EC: anExecutionContextinstance.The runtime complexity of the actual implementation of this function can be O(1), but for the purposes of this section it is equivalent to:
def get_execution_context(): tstate = PyThreadState_Get() return copy(tstate.execution_context)sys.run_with_execution_context(ec: ExecutionContext, func, *args, **kwargs)runsfunc(*args, **kwargs)in the provided execution context:def run_with_execution_context(ec, func, *args, **kwargs): tstate = PyThreadState_Get() old_ec = tstate.execution_context tstate.execution_context = ExecutionContext( ec.logical_contexts + [LogicalContext()] ) try: return func(*args, **kwargs) finally: tstate.execution_context = old_ecAny changes to Logical Context by
funcwill be ignored. This allows to reuse oneExecutionContextobject for multiple invocations of different functions, without them being able to affect each other's environment:ci = sys.new_context_key('ci') ci.set('spam') def func(): print(ci.get()) ci.set('ham') ec = sys.get_execution_context() sys.run_with_execution_context(ec, func) sys.run_with_execution_context(ec, func) # Will print: # spam # spamsys.run_with_logical_context(lc: LogicalContext, func, *args, **kwargs)runsfunc(*args, **kwargs)in the current execution context using the specified logical context.Any changes that
funcdoes to the logical context will be persisted inlc. This behaviour is different from therun_with_execution_context()function, which always creates a new throw-away logical context.In pseudo-code:
def run_with_logical_context(lc, func, *args, **kwargs): tstate = PyThreadState_Get() old_ec = tstate.execution_context tstate.execution_context = ExecutionContext( old_ec.logical_contexts + [lc] ) try: return func(*args, **kwargs) finally: tstate.execution_context = old_ecUsing the previous example:
ci = sys.new_context_key('ci') ci.set('spam') def func(): print(ci.get()) ci.set('ham') ec = sys.get_execution_context() lc = sys.new_logical_context() sys.run_with_logical_context(lc, func) sys.run_with_logical_context(lc, func) # Will print: # spam # ham
As an example, let's make a subclass of
concurrent.futures.ThreadPoolExecutor that preserves the execution
context for scheduled functions:
class Executor(concurrent.futures.ThreadPoolExecutor):
def submit(self, fn, *args, **kwargs):
context = sys.get_execution_context()
fn = functools.partial(
sys.run_with_execution_context, context,
fn, *args, **kwargs)
return super().submit(fn)
Generators in Python are producers of data, and yield expressions
are used to suspend/resume their execution. When generators suspend
execution, their local state will "leak" to the outside code if they
store it in a TLS or in a global variable:
local = threading.local()
def gen():
old_x = local.x
local.x = 'spam'
try:
yield
...
yield
finally:
local.x = old_x
The above code will not work as many Python users expect it to work.
A simple next(gen()) will set local.x to "spam" and it will
never be reset back to its original value.
One of the goals of this proposal is to provide a mechanism to isolate local state in generators.
To achieve this, we make a small set of modifications to the generator object:
New
__logical_context__attribute. This attribute is readable and writable for Python code.When a generator object is instantiated its
__logical_context__is initialized with an emptyLogicalContext.Generator's
.send()and.throw()methods are modified as follows (in pseudo-C):if gen.__logical_context__ is not NULL: tstate = PyThreadState_Get() tstate.execution_context.push(gen.__logical_context__) try: # Perform the actual `Generator.send()` or # `Generator.throw()` call. return gen.send(...) finally: gen.__logical_context__ = tstate.execution_context.pop() else: # Perform the actual `Generator.send()` or # `Generator.throw()` call. return gen.send(...)If a generator has a non-NULL
__logical_context__, it will be pushed to the EC and, therefore, generators will use it to accumulate their local state.If a generator has no
__logical_context__, generators will will use whatever LC they are being run in.
Every generator object has its own Logical Context that stores only its own local modifications of the context. When a generator is being iterated, its logical context will be put in the EC stack of the current thread. This means that the generator will be able to access keys from the surrounding context:
local = sys.new_context_key("local")
global = sys.new_context_key("global")
def generator():
local.set('inside gen:')
while True:
print(local.get(), global.get())
yield
g = gen()
local.set('hello')
global.set('spam')
next(g)
local.set('world')
global.set('ham')
next(g)
# Will print:
# inside gen: spam
# inside gen: ham
Any changes to the EC in nested generators are invisible to the outer generator:
local = sys.new_context_key("local")
def inner_gen():
local.set('spam')
yield
def outer_gen():
local.set('ham')
yield from gen()
print(local.get())
list(outer_gen())
# Will print:
# ham
If __logical_context__ is set to None for a generator,
it will simply use the outer Logical Context.
The @contextlib.contextmanager decorator uses this mechanism to
allow its generator to affect the EC:
item = sys.new_context_key('item')
@contextmanager
def context(x):
old = item.get()
item.set('x')
try:
yield
finally:
item.set(old)
with context('spam'):
with context('ham'):
print(1, item.get())
print(2, item.get())
# Will print:
# 1 ham
# 2 spam
The Execution Context API allows to fully replicate EC behaviour imposed on generators with a regular Python iterator class:
class Gen:
def __init__(self):
self.logical_context = sys.new_logical_context()
def __iter__(self):
return self
def __next__(self):
return sys.run_with_logical_context(
self.logical_context, self._next_impl)
def _next_impl(self):
# Actual __next__ implementation.
...
Prior to PEP 492, yield from was used as one of the mechanisms
to implement coroutines in Python. PEP 492 is built on top
of yield from machinery, and it is even possible to make a
generator compatible with async/await code by decorating it with
@types.coroutine (or @asyncio.coroutine).
Generators decorated with these decorators follow the Execution Context semantics described below in the EC Semantics for Coroutines section below.
Another yield from use is to compose generators. Essentially,
yield from gen() is a better version of
for v in gen(): yield v (read more about many subtle details
in PEP 380.)
A crucial difference between await coro and yield value is
that the former expression guarantees that the coro will be
executed fully, while the latter is producing value and
suspending the generator until it gets iterated again.
Therefore, this proposal does not special case yield from
expression for regular generators:
item = sys.new_context_key('item')
def nested():
assert item.get() == 'outer'
item.set('inner')
yield
def outer():
item.set('outer')
yield from nested()
assert item.get() == 'outer'
Python PEP 492 coroutines are used to implement cooperative multitasking. For a Python end-user they are similar to threads, especially when it comes to sharing resources or modifying the global state.
An event loop is needed to schedule coroutines. Coroutines that
are explicitly scheduled by the user are usually called Tasks.
When a coroutine is scheduled, it can schedule other coroutines using
an await expression. In async/await world, awaiting a coroutine
is equivalent to a regular function call in synchronous code. Thus,
Tasks are similar to threads.
By drawing a parallel between regular multithreaded code and async/await, it becomes apparent that any modification of the execution context within one Task should be visible to all coroutines scheduled within it. Any execution context modifications, however, must not be visible to other Tasks executing within the same OS thread.
Similar to generators, coroutines have the new __logical_context__
attribute and same implementations of .send() and .throw()
methods. The key difference is that coroutines start with
__logical_context__ set to NULL (generators start with
an empty LogicalContext.)
This means that it is expected that the asynchronous library and its Task abstraction will control how exactly coroutines interact with Execution Context.
In asynchronous frameworks like asyncio, coroutines are run by
an event loop, and need to be explicitly scheduled (in asyncio
coroutines are run by asyncio.Task.)
To enable correct Execution Context propagation into Tasks, the asynchronous framework needs to assist the interpreter:
- When
create_taskis called, it should capture the current execution context withsys.get_execution_context()and save it on the Task object. - The
__logical_context__of the wrapped coroutine should be initialized to a new empty logical context. - When the Task object runs its coroutine object, it should execute
.send()and.throw()methods within the captured execution context, using thesys.run_with_execution_context()function.
For asyncio.Task:
class Task:
def __init__(self, coro):
...
self.exec_context = sys.get_execution_context()
coro.__logical_context__ = sys.new_logical_context()
def _step(self, val):
...
sys.run_with_execution_context(
self.exec_context,
self.coro.send, val)
...
This makes any changes to execution context made by nested coroutine calls within a Task to be visible throughout the Task:
ci = sys.new_context_key('ci')
async def nested():
ci.set('nested')
async def main():
ci.set('main')
print('before:', ci.get())
await nested()
print('after:', ci.get())
asyncio.get_event_loop().run_until_complete(main())
# Will print:
# before: main
# after: nested
New Tasks, started within another Task, will run in the correct execution context too:
current_request = sys.new_context_key('current_request')
async def child():
print('current request:', repr(current_request.get()))
async def handle_request(request):
current_request.set(request)
event_loop.create_task(child)
run(top_coro())
# Will print:
# current_request: None
The above snippet will run correctly, and the child()
coroutine will be able to access the current request object
through the current_request Context Key.
Any of the above examples would work if one the coroutines
was a generator decorated with @asyncio.coroutine.
Similarly to Tasks, functions like asyncio's loop.call_soon()
should capture the current execution context with
sys.get_execution_context() and execute callbacks
within it with sys.run_with_execution_context().
This way the following code will work:
current_request = sys.new_context_key('current_request')
def log():
request = current_request.get()
print(request)
async def request_handler(request):
current_request.set(request)
get_event_loop.call_soon(log)
Asynchronous Generators (AG) interact with the Execution Context similarly to regular generators.
They have an __logical_context__ attribute, which, similarly to
regular generators, can be set to None to make them use the outer
Logical Context. This is used by the new
contextlib.asynccontextmanager decorator.
Greenlet is an alternative implementation of cooperative scheduling for Python. Although greenlet package is not part of CPython, popular frameworks like gevent rely on it, and it is important that greenlet can be modified to support execution contexts.
In a nutshell, greenlet design is very similar to design of
generators. The main difference is that for generators, the stack
is managed by the Python interpreter. Greenlet works outside of the
Python interpreter, and manually saves some PyThreadState
fields and pushes/pops the C-stack. Thus the greenlet package
can be easily updated to use the new low-level C API to enable
full support of EC.
Python APIs were designed to completely hide the internal implementation details, but at the same time provide enough control over EC and LC to re-implement all of Python built-in objects in pure Python.
sys.new_context_key(name: str='...'): create aContextKeyobject used to access/set values in EC.ContextKey:.name: read-only attribute..get(): return the current value for the key..set(o): set the current value in the EC for the key.
sys.get_execution_context(): return the currentExecutionContext.sys.new_execution_context(): create a new emptyExecutionContext.sys.new_logical_context(): create a new emptyLogicalContext.sys.run_with_execution_context(ec: ExecutionContext, func, *args, **kwargs).sys.run_with_logical_context(lc:LogicalContext, func, *args, **kwargs).
PyContextKey * PyContext_NewKey(char *desc): create aPyContextKeyobject.PyObject * PyContext_GetKey(PyContextKey *): get the current value for the context key.int PyContext_SetKey(PyContextKey *, PyObject *): set the current value for the context key.PyLogicalContext * PyLogicalContext_New(): create a new emptyPyLogicalContext.PyLogicalContext * PyExecutionContext_New(): create a new emptyPyExecutionContext.PyExecutionContext * PyExecutionContext_Get(): get the EC for the active thread state.int PyExecutionContext_Set(PyExecutionContext *): set the passed EC object as the current for the active thread state.int PyExecutionContext_SetWithLogicalContext(PyExecutionContext *, PyLogicalContext *): allows to implementsys.run_with_logical_contextPython API.
Using a weak key mapping for LogicalContext implementation
enables the following properties with regards to garbage
collection:
ContextKeyobjects are strongly-referenced only from the application code, not from any of the Execution Context machinery or values they point to. This means that there are no reference cycles that could extend their lifespan longer than necessary, or prevent their garbage collection.- Values put in the Execution Context are guaranteed to be kept
alive while there is a
ContextKeykey referencing them in the thread. - If a
ContextKeyis garbage collected, all of its values will be removed from all contexts, allowing them to be GCed if needed. - If a thread has ended its execution, its thread state will be
cleaned up along with its
ExecutionContext, cleaning up all values bound to all Context Keys in the thread.
We can add three new fields to PyThreadState and
PyInterpreterState structs:
uint64_t PyThreadState->unique_id: a globally unique thread state identifier (we can add a counter toPyInterpreterStateand increment it when a new thread state is created.)uint64_t ContextKey->version: every time the key is updated in any logical context or thread, this key will be incremented.
The above two fields allow implementing a fast cache path in
ContextKey.get(), in pseudo-code:
class ContextKey:
def set(self, value):
... # implementation
self.version += 1
def get(self):
tstate = PyThreadState_Get()
if (self.last_tstate_id == tstate.unique_id and
self.last_version == self.version):
return self.last_value
value = None
for mapping in reversed(tstate.execution_context):
if self in mapping:
value = mapping[self]
break
self.last_value = value # borrowed ref
self.last_tstate_id = tstate.unique_id
self.last_version = self.version
return value
Note that last_value is a borrowed reference. The assumption
is that if current thread and key version tests are OK, the object
will be alive. This allows the CK values to be properly GCed.
This is similar to the trick that decimal C implementation uses for caching the current decimal context, and will have the same performance characteristics, but available to all Execution Context users.
The straightforward way of implementing the proposed EC
mechanisms is to create a WeakKeyDict on top of Python
dict type.
To implement the ExecutionContext type we can use Python
list (or a custom stack implementation with some
pre-allocation optimizations).
This approach will have the following runtime complexity:
O(M) for
ContextKey.get(), whereMis the number of Logical Contexts in the stack.It is important to note that
ContextKey.get()will implement a cache making the operation O(1) for packages likedecimalandnumpy.O(1) for
ContextKey.set().O(N) for
sys.get_execution_context(), whereNis the total number of keys/values in the current execution context.
Languages like Clojure and Scala use Hash Array Mapped Tries (HAMT) to implement high performance immutable collections [5], [6].
Immutable mappings implemented with HAMT have O(log32N)
performance for both set(), get(), and merge() operations,
which is essentially O(1) for relatively small mappings
(read about HAMT performance in CPython in the
Appendix: HAMT Performance section.)
In this approach we use the same design of the ExecutionContext
as in Approach #1, but we will use HAMT backed weak key Logical Context
implementation. With that we will have the following runtime
complexity:
O(M * log32N) for
ContextKey.get(), whereMis the number of Logical Contexts in the stack, andNis the number of keys/values in the EC. The operation will essentially be O(M), because execution contexts are normally not expected to have more than a few dozen of keys/values.(
ContextKey.get()will have the same caching mechanism as in Approach #1.)O(log32N) for
ContextKey.set()whereNis the number of keys/values in the current logical context. This will essentially be an O(1) operation most of the time.O(log32N) for
sys.get_execution_context(), whereNis the total number of keys/values in the current execution context.
Essentially, using HAMT for Logical Contexts instead of Python dicts,
allows to bring down the complexity of sys.get_execution_context()
from O(N) to O(log32N) because of the more efficient
merge algorithm.
We can make an alternative ExecutionContext design by using
a linked list. Each LogicalContext in the ExecutionContext
object will be wrapped in a linked-list node.
LogicalContext objects will use an HAMT backed weak key
implementation described in the Approach #2.
Every modification to the current LogicalContext will produce a
new version of it, which will be wrapped in a new linked list
node. Essentially this means, that ExecutionContext is an
immutable forest of LogicalContext objects, and can be safely
copied by reference in sys.get_execution_context() (eliminating
the expensive "merge" operation.)
With this approach, sys.get_execution_context() will be a
constant time O(1) operation.
In case we decide to apply additional optimizations such as flattening ECs with too many Logical Contexts, HAMT-backed immutable mapping will have a O(log32N) merge complexity.
We believe that approach #3 enables an efficient and complete Execution Context implementation, with excellent runtime performance.
ContextKey.get() Cache enables fast retrieval of context keys for performance critical libraries like decimal and numpy.
Fast sys.get_execution_context() enables efficient management
of execution contexts in asynchronous libraries like asyncio.
PyThreadState_GetDict is a TLS, and some of its existing users
might depend on it being just a TLS. Changing its behaviour to follow
the Execution Context semantics would break backwards compatibility.
PEP 521 proposes an alternative solution to the problem:
enhance Context Manager Protocol with two new methods: __suspend__
and __resume__. To make it compatible with async/await,
the Asynchronous Context Manager Protocol will also need to be
extended with __asuspend__ and __aresume__.
This allows to implement context managers like decimal context and
numpy.errstate for generators and coroutines.
The following code:
class Context:
def __init__(self):
self.key = new_context_key('key')
def __enter__(self):
self.old_x = self.key.get()
self.key.set('something')
def __exit__(self, *err):
self.key.set(self.old_x)
would become this:
local = threading.local()
class Context:
def __enter__(self):
self.old_x = getattr(local, 'x', None)
local.x = 'something'
def __suspend__(self):
local.x = self.old_x
def __resume__(self):
local.x = 'something'
def __exit__(self, *err):
local.x = self.old_x
Besides complicating the protocol, the implementation will likely negatively impact performance of coroutines, generators, and any code that uses context managers, and will notably complicate the interpreter implementation.
PEP 521 also does not provide any mechanism to propagate state in a logical context, like storing a request object in an HTTP request handler to have better logging. Nor does it solve the leaking state problem for greenlet/gevent.
Because async/await code needs an event loop to run it, an EC-like solution can be implemented in a limited way for coroutines.
Generators, on the other hand, do not have an event loop or
trampoline, making it impossible to intercept their yield points
outside of the Python interpreter.
APIs like redirecting stdout by overwriting sys.stdout, or
specifying new exception display hooks by overwriting the
sys.displayhook function are affecting the whole Python process
by design. Their users assume that the effect of changing
them will be visible across OS threads. Therefore we cannot
just make these APIs to use the new Execution Context.
That said we think it is possible to design new APIs that will be context aware, but that is outside of the scope of this PEP.
This proposal preserves 100% backwards compatibility.
While investigating possibilities of how to implement an immutable
mapping in CPython, we were able to improve the efficiency
of dict.copy() up to 5 times: [4]. One caveat is that the
improved dict.copy() does not resize the dict, which is a
necessary thing to do when items get deleted from the dict.
Which means that we can make dict.copy() faster for only dicts
that don't need to be resized, and the ones that do, will use
a slower version.
To assess if HAMT can be used for Execution Context, we implemented it in CPython [7].
Figure 1. Benchmark code can be found here: [9].
The chart illustrates the following:
- HAMT displays near O(1) performance for all benchmarked dictionary sizes.
- If we can use the optimized
dict.copy()implementation ([4]), the performance of immutable mapping implemented with Pythondictis good up until 100 items. - A dict with an unoptimized
dict.copy()becomes very slow around 100 items.
Figure 2. Benchmark code can be found here: [10].
Figure 2 shows comparison of lookup costs between Python dict and an HAMT immutable mapping. HAMT lookup time is 30-40% worse than Python dict lookups on average, which is a very good result, considering how well Python dicts are optimized.
Note, that according to [8], HAMT design can be further improved.
The bottom line is that it is possible to imagine a scenario when an application has more than 100 items in the Execution Context, in which case the dict-backed implementation of an immutable mapping becomes a subpar choice.
HAMT on the other hand guarantees that its set(), get(),
and merge() operations will execute in O(log32) time,
which means it is a more future proof solution.
I thank Elvis Pranskevichus and Victor Petrovykh for countless discussions around the topic and PEP proof reading and edits.
Thanks to Nathaniel Smith for proposing the ContextKey design
[17] [18], for pushing the PEP towards a more complete design, and
coming up with the idea of having a stack of contexts in the thread
state.
Thanks to Nick Coghlan for numerous suggestions and ideas on the mailing list, and for coming up with a case that cause the complete rewrite of the initial PEP version [19].
Posted on 11-Aug-2017, view it here: [20].
Posted on 15-Aug-2017, view it here: [21].
The fundamental limitation that caused a complete redesign of the first version was that it was not possible to implement an iterator that would interact with the EC in the same way as generators (see [19].)
Version 2 was a complete rewrite, introducing new terminology (Local Context, Execution Context, Context Item) and new APIs.
Posted on 18-Aug-2017: the current version.
Updates:
- Local Context was renamed to Logical Context. The term "local" was ambiguous and conflicted with local name scopes.
- Context Item was renamed to Context Key, see the thread with Nick Coghlan, Stefan Krah, and Yury Selivanov [22] for details.
- Context Item get cache design was adjusted, per Nathaniel Smith's idea in [24].
- Coroutines are created without a Logical Context; ceval loop
no longer needs to special case the
awaitexpression (proposed by Nick Coghlan in [23].) - Appendix: HAMT Performance section was updated with more
details about the proposed
dict.copy()optimization and its limitations.
| [1] | https://blog.golang.org/context |
| [2] | https://msdn.microsoft.com/en-us/library/system.threading.executioncontext.aspx |
| [3] | numpy/numpy#9444 |
| [4] | (1, 2) http://bugs.python.org/issue31179 |
| [5] | https://en.wikipedia.org/wiki/Hash_array_mapped_trie |
| [6] | http://blog.higher-order.net/2010/08/16/assoc-and-clojures-persistenthashmap-part-ii.html |
| [7] | https://github.com/1st1/cpython/tree/hamt |
| [8] | https://michael.steindorfer.name/publications/oopsla15.pdf |
| [9] | https://gist.github.com/1st1/9004813d5576c96529527d44c5457dcd |
| [10] | https://gist.github.com/1st1/dbe27f2e14c30cce6f0b5fddfc8c437e |
| [11] | https://github.com/1st1/cpython/tree/pep550 |
| [12] | https://www.python.org/dev/peps/pep-0492/#async-await |
| [13] | https://github.com/MagicStack/uvloop/blob/master/examples/bench/echoserver.py |
| [14] | https://github.com/MagicStack/pgbench |
| [15] | https://github.com/python/performance |
| [16] | https://gist.github.com/1st1/6b7a614643f91ead3edf37c4451a6b4c |
| [17] | https://mail.python.org/pipermail/python-ideas/2017-August/046752.html |
| [18] | https://mail.python.org/pipermail/python-ideas/2017-August/046772.html |
| [19] | (1, 2) https://mail.python.org/pipermail/python-ideas/2017-August/046775.html |
| [20] | https://github.com/python/peps/blob/e8a06c9a790f39451d9e99e203b13b3ad73a1d01/pep-0550.rst |
| [21] | https://github.com/python/peps/blob/e3aa3b2b4e4e9967d28a10827eed1e9e5960c175/pep-0550.rst |
| [22] | https://mail.python.org/pipermail/python-ideas/2017-August/046801.html |
| [23] | https://mail.python.org/pipermail/python-ideas/2017-August/046790.html |
| [24] | https://mail.python.org/pipermail/python-ideas/2017-August/046786.html |
This document has been placed in the public domain.