Description
While writing documentation for asyncio.TaskGroup
I discovered something fishy.
There is a behavior where if a task fails with an exception (except asyncio.CancelledError
), the remaining tasks are cancelled. There is also a rule that this is only done once, implemented using self._aborted
, which is set by self._abort()
.
But what should happen if new tasks are created after self._abort()
is called, and one of those tasks fails? Then the remaining new tasks are not cancelled. To repro, we need something that creates two tasks, where the first one fails, and the second catches asyncio.CancelledError
and when caught creates two more tasks. The third task would then fail, and the fourth task might wait for ever, never getting cancelled.
Is this a bug (or design flaw)? I think we decided we would support task creation during the wait (even though EdgeDb's TaskGroup disallowed it) so that it's possible to dynamically create new tasks forever -- this seems useful. But I'm not sure we thought deep about whether to allow creating new tasks once we're waiting for all cancelled tasks to finish.
How would we fix it? Disallowing new task creation once self._aborted
is set seems excessive, since it would disallow legitimate creation of new tasks during cleanup. We could keep a weak set of tasks that we haven't cancelled yet, and if one of those fails we could cancel all others in that set (and remove them from the set) -- this would essentially create successive "generations" of tasks that live or die together (starting a new generation once any member of the current generation dies). Is this worth it?
CC: @njs @Tinche @agronholm -- I assume this problem doesn't exist in Trio because of its level-triggered cancellation, but maybe one of you still has a useful insight.