You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Running these (and Nexus workers when they arrive) in different threads would prevent an async activity that incorrectly does blocking I/O from blocking workflow progress. In addition to limiting blast radius it should make it easier to understand the failure modes.
The text was updated successfully, but these errors were encountered:
We have to be able to use the primary/user-controlled event loop for running (some) activities/operations anyways, so maybe this is a workflow worker only thing?
Workflow workers will have to remove async and go threaded which will cost threads. There is the blocking call for polling (currently async, so not using Python thread). Then the code backgrounds the work when received which uses async today. And then that work uses a thread on a pool so that it can impose a deadlock timeout on it. This means every workflow activation uses 1 thread from the thread pool today. We either have to double the in-use thread count or run a single thread for the life of the workflow worker that runs its own asyncio event loop so it can background activations as received (not to be confused with the next point).
Workflow workers will always need to use primary/user-controlled event loop for user codecs anyways (e.g. via asyncio.run_coroutine_threadsafe) which means it is subject to the same blocking concerns. So any benefits are only for those not using codecs.
We will have to decide whether the benefit is worth the costs here.
Currently, when workflow and activity workers are run in the same process, they share the the same thread (same event loop):
sdk-python/temporalio/worker/_worker.py
Lines 489 to 492 in b0dfaef
Running these (and Nexus workers when they arrive) in different threads would prevent an async activity that incorrectly does blocking I/O from blocking workflow progress. In addition to limiting blast radius it should make it easier to understand the failure modes.
The text was updated successfully, but these errors were encountered: