From ad484fc0939d301c48575ca438e886f69df04baf Mon Sep 17 00:00:00 2001 From: Kumar Aditya Date: Fri, 13 Jun 2025 19:58:35 +0530 Subject: [PATCH 01/14] add asyncio implementation docs --- InternalDocs/asyncio.md | 123 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 123 insertions(+) create mode 100644 InternalDocs/asyncio.md diff --git a/InternalDocs/asyncio.md b/InternalDocs/asyncio.md new file mode 100644 index 00000000000000..93187379e4082b --- /dev/null +++ b/InternalDocs/asyncio.md @@ -0,0 +1,123 @@ +asyncio +======= + +Author: Kumar Aditya + + +This document describes the working and implementation details of C implementation of the +[`asyncio`](https://docs.python.org/3/library/asyncio.html) module. + + +## Pre-Python 3.14 implementation + +Until Python 3.13, the C implementation of `asyncio` used a [`WeakSet`](https://docs.python.org/3/library/weakref.html#weakref.WeakSet) to store all the tasks created by the event loop [^1]. `WeakSet` was used so that the event loop +doesn't hold strong references to the tasks, allowing them to be garbage collected when they are no longer needed. +The current task of the event loop was stored in dict mapping the event loop to the current task [^2]. + +```c + /* Dictionary containing tasks that are currently active in + all running event loops. {EventLoop: Task} */ + PyObject *current_tasks; + + /* WeakSet containing all tasks scheduled to run on event loops. */ + PyObject *scheduled_tasks; +``` + +This implementation had a few drawbacks: +1. **Performance**: Using a `WeakSet` for storing tasks is inefficient as it requires maintaining a full set of weak references to tasks along with corresponding weakref callback to cleanup the tasks when they are garbage collected. +This increases the work done by the garbage collector and in applications with a large number of tasks, this becomes a bottle neck, with increased memory usage and lower performance. Looking up the current task was slow as it required a dictionary lookup on the `current_tasks` dict. + +2. **Thread safety**: Until Python 3.14, concurrent iterations over `WeakSet` was not thread safe[^3]. This meant calling APIs like `asyncio.all_tasks()` could lead to inconsistent results or even `RuntimeError` if used in multiple threads[^4]. + +3. **Poor scaling in free-threading**: Using global `WeakSet` for storing all tasks across all threads lead to contention when adding and removing tasks from the set which is a frequent operation. As such it performed poorly in free-threading and did not scale well with the number of threads. Similarly accessing the current task in multiple threads did not scale due to contention on the global `current_tasks` dictionary. + +## Python 3.14 implementation + +To address these issues, Python 3.14 implements several changes to improve the performance and thread safety of tasks management. + +- **Per-thread double linked list for tasks**: Python 3.14 introduces a per-thread circular double linked list implementation for storing tasks. This allows each thread to maintain its own list of tasks and allows for lock free addition and removal of tasks. This is designed to be efficient, and thread-safe and scales well with the number of threads in free-threading. This was implemented as part of [Audit asyncio thread safety](https://github.com/python/cpython/issues/128002). + +- **Per-thread current task**: Python 3.14 stores the current task on the current thread state instead of a global dictionary. This allows for faster access to the current task without the need for a dictionary lookup. Each thread maintains its own current task, which is stored in the `PyThreadState` structure. This was implemented in https://github.com/python/cpython/issues/129898. + + +## Per-thread double linked list for tasks + +This implementation uses a circular doubly linked list to store tasks on the thread states. This is used for all tasks which are instances of `asyncio.Task` or subclasses of it, for third-party tasks a fallback `WeakSet` implementation is used. The linked list is implemented using an embedded `llist_node` structure within each `TaskObj`. By embedding the list node directly into the task object, the implementation avoids additional memory allocations for linked list nodes. + +The `PyThreadState` structure gained a new field `asyncio_tasks_head`, which serves as the head of the circular linked list of tasks. This allows for lock free addition and removal of tasks from the list. + + It is possible that when a thread state is deallocated, there are lingering tasks in it's list, this can happen if another thread has references to the tasks of this thread as such the `PyInterpreterState` structure also gains a new `asyncio_tasks_head` field to store any lingering tasks. When a thread state is deallocated, any remaining lingering tasks are moved to the interpreter state tasks list, and the thread state tasks list is cleared. +The `asyncio_tasks_lock` is used protect the interpreter's tasks list from concurrent modifications. + + +```c +typedef struct TaskObj { + ... + struct llist_node asyncio_node; +} TaskObj; + +typedef struct PyThreadState { + ... + struct llist_node asyncio_tasks_head; +} PyThreadState; +typedef struct PyInterpreterState { + ... + struct llist_node asyncio_tasks_head; + PyMutex asyncio_tasks_lock; +} PyInterpreterState; + +``` + +When a task is created, it is added to the current thread's list of tasks by the `register_task` function. When the task is done, it is removed from the list by the `unregister_task` function. In free-threading, the thread id of thread which +created the task is stored in `task_tid` field of the `TaskObj`. This is used to check if the task is being removed from the correct thread's task list. If the current thread is same as the thread which created it then no locking is required, otherwise in free-threading, the `stop-the-world` pause is used to pause all other threads and then safely remove the task from the tasks list. + +```mermaid +flowchart TD + subgraph one["Executing Thread"] + A["task = asyncio.create_task(coro())"] -->B("register_task(task)") + B --> C{"task->task_state"} + C -->|pending| D["task_step(task)"] + C -->|done| F["unregister_task(task)"] + C -->|cancelled| F["unregister_task(task)"] + D --> C + F --> G{"free-threading"} + G --> |false| H["unregister_task_safe(task)"] + G --> |true| J{"check correct thread
task->task_tid == _Py_ThreadId()"} + J --> |true| H + J --> |false| I["stop the world
pause all threads"] + I --> H["unregister_task_safe(task)"] + end + subgraph two["Thread deallocating"] + A1{"check thread's task list is empty
llist_empty(tstate->asyncio_tasks_head)"} + A1 --> |true| B1["deallocate thread
free_theadstate(tstate)"] + A1 --> |false| C1["add tasks to interpreter's task list
llist_concat(&tstate->interp->asyncio_tasks_head,tstate->asyncio_tasks_head)"] + C1 --> B1 + end + + one --> two + +``` + +`asyncio.all_tasks` now iterates over the per-thread task lists of all threads and the interpreter's task list to get all the tasks. In free-threading this is done by pausing all the threads using the `stop-the-world` pause to ensure that no tasks are being added or removed while iterating over the lists. This allows for a consistent view of all task lists across all threads and is thread safe. + +This design allows for lock free execution and scales well in free-threading with multiple event loops running in different threads. + +## Per-thread current task +This implementation stores the current task in the `PyThreadState` structure, which allows for faster access to the current task without the need for a dictionary lookup. + +```c +typedef struct PyThreadState { + ... + PyObject *asyncio_current_loop; + PyObject *asyncio_current_task; +} PyThreadState; +``` + +When a task is entered or left, the current task is updated in the thread state using `enter_task` and `leave_task` functions. When `current_task(loop)` is called where `loop` is the current running event loop of the current thread, no locking is required as the current task is stored in the thread state and is returned directly. Otherwise, if the `loop` is not current running event loop, the `stop-the-world` pause is used to pause all threads in free-threading and then by iterating over all the thread states and checking if the `loop` matches with `tstate->asyncio_current_loop`, the current task is found and returned. If no matching thread state is found, `None` is returned. + + + +[^1]: https://github.com/python/cpython/blob/9a10b734f164ca5a253ae3a05f4960e3fcbeef2b/Modules/_asynciomodule.c#L42 +[^2]: https://github.com/python/cpython/blob/9a10b734f164ca5a253ae3a05f4960e3fcbeef2b/Modules/_asynciomodule.c#L39 +[^3]: https://github.com/python/cpython/issues/123089 +[^4]: https://github.com/python/cpython/issues/80788 \ No newline at end of file From 5c0dc5af6435ffe4aa1245b5ff2ed6f5b252da6b Mon Sep 17 00:00:00 2001 From: Kumar Aditya Date: Fri, 13 Jun 2025 20:07:00 +0530 Subject: [PATCH 02/14] fix footnotes --- InternalDocs/asyncio.md | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/InternalDocs/asyncio.md b/InternalDocs/asyncio.md index 93187379e4082b..ed379de9066e6e 100644 --- a/InternalDocs/asyncio.md +++ b/InternalDocs/asyncio.md @@ -10,9 +10,9 @@ This document describes the working and implementation details of C implementati ## Pre-Python 3.14 implementation -Until Python 3.13, the C implementation of `asyncio` used a [`WeakSet`](https://docs.python.org/3/library/weakref.html#weakref.WeakSet) to store all the tasks created by the event loop [^1]. `WeakSet` was used so that the event loop +Until Python 3.13, the C implementation of `asyncio` used a [`WeakSet`](https://docs.python.org/3/library/weakref.html#weakref.WeakSet) to store all the tasks created by the event loop. `WeakSet` was used so that the event loop doesn't hold strong references to the tasks, allowing them to be garbage collected when they are no longer needed. -The current task of the event loop was stored in dict mapping the event loop to the current task [^2]. +The current task of the event loop was stored in dict mapping the event loop to the current task. ```c /* Dictionary containing tasks that are currently active in @@ -27,7 +27,7 @@ This implementation had a few drawbacks: 1. **Performance**: Using a `WeakSet` for storing tasks is inefficient as it requires maintaining a full set of weak references to tasks along with corresponding weakref callback to cleanup the tasks when they are garbage collected. This increases the work done by the garbage collector and in applications with a large number of tasks, this becomes a bottle neck, with increased memory usage and lower performance. Looking up the current task was slow as it required a dictionary lookup on the `current_tasks` dict. -2. **Thread safety**: Until Python 3.14, concurrent iterations over `WeakSet` was not thread safe[^3]. This meant calling APIs like `asyncio.all_tasks()` could lead to inconsistent results or even `RuntimeError` if used in multiple threads[^4]. +2. **Thread safety**: Until Python 3.14, concurrent iterations over `WeakSet` was not thread safe[^1]. This meant calling APIs like `asyncio.all_tasks()` could lead to inconsistent results or even `RuntimeError` if used in multiple threads[^2]. 3. **Poor scaling in free-threading**: Using global `WeakSet` for storing all tasks across all threads lead to contention when adding and removing tasks from the set which is a frequent operation. As such it performed poorly in free-threading and did not scale well with the number of threads. Similarly accessing the current task in multiple threads did not scale due to contention on the global `current_tasks` dictionary. @@ -117,7 +117,5 @@ When a task is entered or left, the current task is updated in the thread state -[^1]: https://github.com/python/cpython/blob/9a10b734f164ca5a253ae3a05f4960e3fcbeef2b/Modules/_asynciomodule.c#L42 -[^2]: https://github.com/python/cpython/blob/9a10b734f164ca5a253ae3a05f4960e3fcbeef2b/Modules/_asynciomodule.c#L39 -[^3]: https://github.com/python/cpython/issues/123089 -[^4]: https://github.com/python/cpython/issues/80788 \ No newline at end of file +[^1]: https://github.com/python/cpython/issues/123089 +[^2]: https://github.com/python/cpython/issues/80788 \ No newline at end of file From d2ac060e8d08ae947d5afd0f5d057b11f571cf7e Mon Sep 17 00:00:00 2001 From: Kumar Aditya Date: Fri, 13 Jun 2025 20:09:20 +0530 Subject: [PATCH 03/14] add details about external introspection --- InternalDocs/asyncio.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/InternalDocs/asyncio.md b/InternalDocs/asyncio.md index ed379de9066e6e..c4878f79dab213 100644 --- a/InternalDocs/asyncio.md +++ b/InternalDocs/asyncio.md @@ -35,7 +35,7 @@ This increases the work done by the garbage collector and in applications with a To address these issues, Python 3.14 implements several changes to improve the performance and thread safety of tasks management. -- **Per-thread double linked list for tasks**: Python 3.14 introduces a per-thread circular double linked list implementation for storing tasks. This allows each thread to maintain its own list of tasks and allows for lock free addition and removal of tasks. This is designed to be efficient, and thread-safe and scales well with the number of threads in free-threading. This was implemented as part of [Audit asyncio thread safety](https://github.com/python/cpython/issues/128002). +- **Per-thread double linked list for tasks**: Python 3.14 introduces a per-thread circular double linked list implementation for storing tasks. This allows each thread to maintain its own list of tasks and allows for lock free addition and removal of tasks. This is designed to be efficient, and thread-safe and scales well with the number of threads in free-threading. This also allows external introspection tools such as `python -m asyncio pstree` to inspect tasks running in all threads and was implemented as part of [Audit asyncio thread safety](https://github.com/python/cpython/issues/128002). - **Per-thread current task**: Python 3.14 stores the current task on the current thread state instead of a global dictionary. This allows for faster access to the current task without the need for a dictionary lookup. Each thread maintains its own current task, which is stored in the `PyThreadState` structure. This was implemented in https://github.com/python/cpython/issues/129898. From e4c5ca9b6cf42158a5d4f49e988b75bb09039ff7 Mon Sep 17 00:00:00 2001 From: Kumar Aditya Date: Fri, 13 Jun 2025 20:11:04 +0530 Subject: [PATCH 04/14] typo --- InternalDocs/asyncio.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/InternalDocs/asyncio.md b/InternalDocs/asyncio.md index c4878f79dab213..7c607e829d48d5 100644 --- a/InternalDocs/asyncio.md +++ b/InternalDocs/asyncio.md @@ -89,7 +89,7 @@ flowchart TD end subgraph two["Thread deallocating"] A1{"check thread's task list is empty
llist_empty(tstate->asyncio_tasks_head)"} - A1 --> |true| B1["deallocate thread
free_theadstate(tstate)"] + A1 --> |true| B1["deallocate thread
free_threadstate(tstate)"] A1 --> |false| C1["add tasks to interpreter's task list
llist_concat(&tstate->interp->asyncio_tasks_head,tstate->asyncio_tasks_head)"] C1 --> B1 end From 2a5a2b177dba60d5577864bd5391bd801805795b Mon Sep 17 00:00:00 2001 From: Kumar Aditya Date: Fri, 13 Jun 2025 20:15:02 +0530 Subject: [PATCH 05/14] add to readme --- InternalDocs/README.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/InternalDocs/README.md b/InternalDocs/README.md index 4502902307cd5c..c20aa015c5bb74 100644 --- a/InternalDocs/README.md +++ b/InternalDocs/README.md @@ -41,3 +41,9 @@ Program Execution - [Garbage Collector Design](garbage_collector.md) - [Exception Handling](exception_handling.md) + + +Modules +--- + +- [asyncio](asyncio.md) \ No newline at end of file From 88fca2023d120bb74c66d55b35fd389f1fe020c4 Mon Sep 17 00:00:00 2001 From: Kumar Aditya Date: Fri, 13 Jun 2025 20:20:57 +0530 Subject: [PATCH 06/14] add free-threading note --- InternalDocs/asyncio.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/InternalDocs/asyncio.md b/InternalDocs/asyncio.md index 7c607e829d48d5..829b7566cdfd75 100644 --- a/InternalDocs/asyncio.md +++ b/InternalDocs/asyncio.md @@ -95,7 +95,6 @@ flowchart TD end one --> two - ``` `asyncio.all_tasks` now iterates over the per-thread task lists of all threads and the interpreter's task list to get all the tasks. In free-threading this is done by pausing all the threads using the `stop-the-world` pause to ensure that no tasks are being added or removed while iterating over the lists. This allows for a consistent view of all task lists across all threads and is thread safe. @@ -113,8 +112,9 @@ typedef struct PyThreadState { } PyThreadState; ``` -When a task is entered or left, the current task is updated in the thread state using `enter_task` and `leave_task` functions. When `current_task(loop)` is called where `loop` is the current running event loop of the current thread, no locking is required as the current task is stored in the thread state and is returned directly. Otherwise, if the `loop` is not current running event loop, the `stop-the-world` pause is used to pause all threads in free-threading and then by iterating over all the thread states and checking if the `loop` matches with `tstate->asyncio_current_loop`, the current task is found and returned. If no matching thread state is found, `None` is returned. +When a task is entered or left, the current task is updated in the thread state using `enter_task` and `leave_task` functions. When `current_task(loop)` is called where `loop` is the current running event loop of the current thread, no locking is required as the current task is stored in the thread state and is returned directly (general case). Otherwise, if the `loop` is not current running event loop, the `stop-the-world` pause is used to pause all threads in free-threading and then by iterating over all the thread states and checking if the `loop` matches with `tstate->asyncio_current_loop`, the current task is found and returned. If no matching thread state is found, `None` is returned. +In free-threading, it avoids contention on a global dictionary as threads can access the current task of thier running loop without any locking. [^1]: https://github.com/python/cpython/issues/123089 From b3b5a95e10427e4152f3d9c2da2703fdc990ee69 Mon Sep 17 00:00:00 2001 From: Kumar Aditya Date: Fri, 13 Jun 2025 23:26:17 +0530 Subject: [PATCH 07/14] get rid of author --- InternalDocs/asyncio.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/InternalDocs/asyncio.md b/InternalDocs/asyncio.md index 829b7566cdfd75..9e69180147d4ae 100644 --- a/InternalDocs/asyncio.md +++ b/InternalDocs/asyncio.md @@ -1,8 +1,6 @@ asyncio ======= -Author: Kumar Aditya - This document describes the working and implementation details of C implementation of the [`asyncio`](https://docs.python.org/3/library/asyncio.html) module. From 32f36e2d0eddb75dd833b3f59f220ce9137c7380 Mon Sep 17 00:00:00 2001 From: Kumar Aditya Date: Fri, 13 Jun 2025 23:28:21 +0530 Subject: [PATCH 08/14] Apply suggestions from code review Co-authored-by: Guido van Rossum --- InternalDocs/asyncio.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/InternalDocs/asyncio.md b/InternalDocs/asyncio.md index 9e69180147d4ae..4cd568ed181358 100644 --- a/InternalDocs/asyncio.md +++ b/InternalDocs/asyncio.md @@ -10,7 +10,7 @@ This document describes the working and implementation details of C implementati Until Python 3.13, the C implementation of `asyncio` used a [`WeakSet`](https://docs.python.org/3/library/weakref.html#weakref.WeakSet) to store all the tasks created by the event loop. `WeakSet` was used so that the event loop doesn't hold strong references to the tasks, allowing them to be garbage collected when they are no longer needed. -The current task of the event loop was stored in dict mapping the event loop to the current task. +The current task of the event loop was stored in a dict mapping the event loop to the current task. ```c /* Dictionary containing tasks that are currently active in @@ -44,7 +44,7 @@ This implementation uses a circular doubly linked list to store tasks on the thr The `PyThreadState` structure gained a new field `asyncio_tasks_head`, which serves as the head of the circular linked list of tasks. This allows for lock free addition and removal of tasks from the list. - It is possible that when a thread state is deallocated, there are lingering tasks in it's list, this can happen if another thread has references to the tasks of this thread as such the `PyInterpreterState` structure also gains a new `asyncio_tasks_head` field to store any lingering tasks. When a thread state is deallocated, any remaining lingering tasks are moved to the interpreter state tasks list, and the thread state tasks list is cleared. + It is possible that when a thread state is deallocated, there are lingering tasks in its list; this can happen if another thread has references to the tasks of this thread. Therefore, the `PyInterpreterState` structure also gains a new `asyncio_tasks_head` field to store any lingering tasks. When a thread state is deallocated, any remaining lingering tasks are moved to the interpreter state tasks list, and the thread state tasks list is cleared. The `asyncio_tasks_lock` is used protect the interpreter's tasks list from concurrent modifications. @@ -66,27 +66,27 @@ typedef struct PyInterpreterState { ``` -When a task is created, it is added to the current thread's list of tasks by the `register_task` function. When the task is done, it is removed from the list by the `unregister_task` function. In free-threading, the thread id of thread which +When a task is created, it is added to the current thread's list of tasks by the `register_task` function. When the task is done, it is removed from the list by the `unregister_task` function. In free-threading, the thread id of the thread which created the task is stored in `task_tid` field of the `TaskObj`. This is used to check if the task is being removed from the correct thread's task list. If the current thread is same as the thread which created it then no locking is required, otherwise in free-threading, the `stop-the-world` pause is used to pause all other threads and then safely remove the task from the tasks list. ```mermaid flowchart TD subgraph one["Executing Thread"] A["task = asyncio.create_task(coro())"] -->B("register_task(task)") - B --> C{"task->task_state"} + B --> C{"task->task_state?"} C -->|pending| D["task_step(task)"] C -->|done| F["unregister_task(task)"] C -->|cancelled| F["unregister_task(task)"] D --> C - F --> G{"free-threading"} + F --> G{"free-threading?"} G --> |false| H["unregister_task_safe(task)"] - G --> |true| J{"check correct thread
task->task_tid == _Py_ThreadId()"} + G --> |true| J{"correct thread?
task->task_tid == _Py_ThreadId()"} J --> |true| H J --> |false| I["stop the world
pause all threads"] I --> H["unregister_task_safe(task)"] end subgraph two["Thread deallocating"] - A1{"check thread's task list is empty
llist_empty(tstate->asyncio_tasks_head)"} + A1{"thread's task list empty?
llist_empty(tstate->asyncio_tasks_head)"}``` A1 --> |true| B1["deallocate thread
free_threadstate(tstate)"] A1 --> |false| C1["add tasks to interpreter's task list
llist_concat(&tstate->interp->asyncio_tasks_head,tstate->asyncio_tasks_head)"] C1 --> B1 From 5c25673cb96d3991a1e2f16f89f1a101320fff21 Mon Sep 17 00:00:00 2001 From: Kumar Aditya Date: Fri, 13 Jun 2025 23:30:17 +0530 Subject: [PATCH 09/14] line fixes --- InternalDocs/asyncio.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/InternalDocs/asyncio.md b/InternalDocs/asyncio.md index 9e69180147d4ae..a820c6d24975a1 100644 --- a/InternalDocs/asyncio.md +++ b/InternalDocs/asyncio.md @@ -58,12 +58,12 @@ typedef struct PyThreadState { ... struct llist_node asyncio_tasks_head; } PyThreadState; + typedef struct PyInterpreterState { ... struct llist_node asyncio_tasks_head; PyMutex asyncio_tasks_lock; } PyInterpreterState; - ``` When a task is created, it is added to the current thread's list of tasks by the `register_task` function. When the task is done, it is removed from the list by the `unregister_task` function. In free-threading, the thread id of thread which From b78c68b25824ff676d0b3413786d32313eca4ea7 Mon Sep 17 00:00:00 2001 From: Kumar Aditya Date: Fri, 13 Jun 2025 23:56:36 +0530 Subject: [PATCH 10/14] explain why storing data per thread was chosen over per loop --- InternalDocs/asyncio.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/InternalDocs/asyncio.md b/InternalDocs/asyncio.md index 095207a6f439d5..232edd6305c2ed 100644 --- a/InternalDocs/asyncio.md +++ b/InternalDocs/asyncio.md @@ -37,6 +37,9 @@ To address these issues, Python 3.14 implements several changes to improve the p - **Per-thread current task**: Python 3.14 stores the current task on the current thread state instead of a global dictionary. This allows for faster access to the current task without the need for a dictionary lookup. Each thread maintains its own current task, which is stored in the `PyThreadState` structure. This was implemented in https://github.com/python/cpython/issues/129898. +Storing the current task and list of all tasks per-thread instead of storing it per-loop was chosen primarily to support external introspection tools such as `python -m asyncio pstree` as looking up arbitrary attributes on the loop object +is not possible externally. Storing data per-thread also makes it easy to support third party event loop implementations such as `uvloop` and is more efficient for single threaded asyncio use-case as it avoids the overhead of attribute lookups on the loop object and several other calls on the performance critical path of adding and removing tasks from the per-loop task list. + ## Per-thread double linked list for tasks From 1c475c94a145a4afc9a577f9faf221830500b933 Mon Sep 17 00:00:00 2001 From: Kumar Aditya Date: Sat, 14 Jun 2025 00:09:39 +0530 Subject: [PATCH 11/14] fix mermaid flowchart --- InternalDocs/asyncio.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/InternalDocs/asyncio.md b/InternalDocs/asyncio.md index 232edd6305c2ed..010e53dc0cd635 100644 --- a/InternalDocs/asyncio.md +++ b/InternalDocs/asyncio.md @@ -73,6 +73,7 @@ When a task is created, it is added to the current thread's list of tasks by the created the task is stored in `task_tid` field of the `TaskObj`. This is used to check if the task is being removed from the correct thread's task list. If the current thread is same as the thread which created it then no locking is required, otherwise in free-threading, the `stop-the-world` pause is used to pause all other threads and then safely remove the task from the tasks list. ```mermaid + flowchart TD subgraph one["Executing Thread"] A["task = asyncio.create_task(coro())"] -->B("register_task(task)") @@ -89,7 +90,7 @@ flowchart TD I --> H["unregister_task_safe(task)"] end subgraph two["Thread deallocating"] - A1{"thread's task list empty?
llist_empty(tstate->asyncio_tasks_head)"}``` + A1{"thread's task list empty?
llist_empty(tstate->asyncio_tasks_head)"} A1 --> |true| B1["deallocate thread
free_threadstate(tstate)"] A1 --> |false| C1["add tasks to interpreter's task list
llist_concat(&tstate->interp->asyncio_tasks_head,tstate->asyncio_tasks_head)"] C1 --> B1 From bf02a8b340bbf0b2e55eb221fb3238edc828881f Mon Sep 17 00:00:00 2001 From: Kumar Aditya Date: Sun, 15 Jun 2025 12:18:52 +0530 Subject: [PATCH 12/14] minor edits --- InternalDocs/asyncio.md | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/InternalDocs/asyncio.md b/InternalDocs/asyncio.md index 010e53dc0cd635..62990ad28eb8be 100644 --- a/InternalDocs/asyncio.md +++ b/InternalDocs/asyncio.md @@ -8,7 +8,7 @@ This document describes the working and implementation details of C implementati ## Pre-Python 3.14 implementation -Until Python 3.13, the C implementation of `asyncio` used a [`WeakSet`](https://docs.python.org/3/library/weakref.html#weakref.WeakSet) to store all the tasks created by the event loop. `WeakSet` was used so that the event loop +Before Python 3.14, the C implementation of `asyncio` used a [`WeakSet`](https://docs.python.org/3/library/weakref.html#weakref.WeakSet) to store all the tasks created by the event loop. `WeakSet` was used so that the event loop doesn't hold strong references to the tasks, allowing them to be garbage collected when they are no longer needed. The current task of the event loop was stored in a dict mapping the event loop to the current task. @@ -22,10 +22,10 @@ The current task of the event loop was stored in a dict mapping the event loop t ``` This implementation had a few drawbacks: -1. **Performance**: Using a `WeakSet` for storing tasks is inefficient as it requires maintaining a full set of weak references to tasks along with corresponding weakref callback to cleanup the tasks when they are garbage collected. -This increases the work done by the garbage collector and in applications with a large number of tasks, this becomes a bottle neck, with increased memory usage and lower performance. Looking up the current task was slow as it required a dictionary lookup on the `current_tasks` dict. +1. **Performance**: Using a `WeakSet` for storing tasks is inefficient, as it requires maintaining a full set of weak references to tasks along with corresponding weakref callback to cleanup the tasks when they are garbage collected. +This increases the work done by the garbage collector, and in applications with a large number of tasks, this becomes a bottle neck, with increased memory usage and lower performance. Looking up the current task was slow as it required a dictionary lookup on the `current_tasks` dict. -2. **Thread safety**: Until Python 3.14, concurrent iterations over `WeakSet` was not thread safe[^1]. This meant calling APIs like `asyncio.all_tasks()` could lead to inconsistent results or even `RuntimeError` if used in multiple threads[^2]. +2. **Thread safety**: Before Python 3.14, concurrent iterations over `WeakSet` was not thread safe[^1]. This meant calling APIs like `asyncio.all_tasks()` could lead to inconsistent results or even `RuntimeError` if used in multiple threads[^2]. 3. **Poor scaling in free-threading**: Using global `WeakSet` for storing all tasks across all threads lead to contention when adding and removing tasks from the set which is a frequent operation. As such it performed poorly in free-threading and did not scale well with the number of threads. Similarly accessing the current task in multiple threads did not scale due to contention on the global `current_tasks` dictionary. @@ -38,7 +38,7 @@ To address these issues, Python 3.14 implements several changes to improve the p - **Per-thread current task**: Python 3.14 stores the current task on the current thread state instead of a global dictionary. This allows for faster access to the current task without the need for a dictionary lookup. Each thread maintains its own current task, which is stored in the `PyThreadState` structure. This was implemented in https://github.com/python/cpython/issues/129898. Storing the current task and list of all tasks per-thread instead of storing it per-loop was chosen primarily to support external introspection tools such as `python -m asyncio pstree` as looking up arbitrary attributes on the loop object -is not possible externally. Storing data per-thread also makes it easy to support third party event loop implementations such as `uvloop` and is more efficient for single threaded asyncio use-case as it avoids the overhead of attribute lookups on the loop object and several other calls on the performance critical path of adding and removing tasks from the per-loop task list. +is not possible externally. Storing data per-thread also makes it easy to support third party event loop implementations such as `uvloop`, and is more efficient for single threaded asyncio use-case as it avoids the overhead of attribute lookups on the loop object and several other calls on the performance critical path of adding and removing tasks from the per-loop task list. ## Per-thread double linked list for tasks @@ -47,7 +47,7 @@ This implementation uses a circular doubly linked list to store tasks on the thr The `PyThreadState` structure gained a new field `asyncio_tasks_head`, which serves as the head of the circular linked list of tasks. This allows for lock free addition and removal of tasks from the list. - It is possible that when a thread state is deallocated, there are lingering tasks in its list; this can happen if another thread has references to the tasks of this thread. Therefore, the `PyInterpreterState` structure also gains a new `asyncio_tasks_head` field to store any lingering tasks. When a thread state is deallocated, any remaining lingering tasks are moved to the interpreter state tasks list, and the thread state tasks list is cleared. +It is possible that when a thread state is deallocated, there are lingering tasks in its list; this can happen if another thread has references to the tasks of this thread. Therefore, the `PyInterpreterState` structure also gains a new `asyncio_tasks_head` field to store any lingering tasks. When a thread state is deallocated, any remaining lingering tasks are moved to the interpreter state tasks list, and the thread state tasks list is cleared. The `asyncio_tasks_lock` is used protect the interpreter's tasks list from concurrent modifications. @@ -69,8 +69,7 @@ typedef struct PyInterpreterState { } PyInterpreterState; ``` -When a task is created, it is added to the current thread's list of tasks by the `register_task` function. When the task is done, it is removed from the list by the `unregister_task` function. In free-threading, the thread id of the thread which -created the task is stored in `task_tid` field of the `TaskObj`. This is used to check if the task is being removed from the correct thread's task list. If the current thread is same as the thread which created it then no locking is required, otherwise in free-threading, the `stop-the-world` pause is used to pause all other threads and then safely remove the task from the tasks list. +When a task is created, it is added to the current thread's list of tasks by the `register_task` function. When the task is done, it is removed from the list by the `unregister_task` function. In free-threading, the thread id of the thread which created the task is stored in `task_tid` field of the `TaskObj`. This is used to check if the task is being removed from the correct thread's task list. If the current thread is same as the thread which created it then no locking is required, otherwise in free-threading, the `stop-the-world` pause is used to pause all other threads and then safely remove the task from the tasks list. ```mermaid @@ -99,7 +98,7 @@ flowchart TD one --> two ``` -`asyncio.all_tasks` now iterates over the per-thread task lists of all threads and the interpreter's task list to get all the tasks. In free-threading this is done by pausing all the threads using the `stop-the-world` pause to ensure that no tasks are being added or removed while iterating over the lists. This allows for a consistent view of all task lists across all threads and is thread safe. +`asyncio.all_tasks` now iterates over the per-thread task lists of all threads and the interpreter's task list to get all the tasks. In free-threading, this is done by pausing all the threads using the `stop-the-world` pause to ensure that no tasks are being added or removed while iterating over the lists. This allows for a consistent view of all task lists across all threads and is thread safe. This design allows for lock free execution and scales well in free-threading with multiple event loops running in different threads. From ce48a131bd32270b3a007ed67e5f850b61d83476 Mon Sep 17 00:00:00 2001 From: Guido van Rossum Date: Tue, 17 Jun 2025 08:28:14 -0700 Subject: [PATCH 13/14] Apply Carol's edits Co-authored-by: Carol Willing --- InternalDocs/asyncio.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/InternalDocs/asyncio.md b/InternalDocs/asyncio.md index 62990ad28eb8be..fff57555d4bddf 100644 --- a/InternalDocs/asyncio.md +++ b/InternalDocs/asyncio.md @@ -23,11 +23,11 @@ The current task of the event loop was stored in a dict mapping the event loop t This implementation had a few drawbacks: 1. **Performance**: Using a `WeakSet` for storing tasks is inefficient, as it requires maintaining a full set of weak references to tasks along with corresponding weakref callback to cleanup the tasks when they are garbage collected. -This increases the work done by the garbage collector, and in applications with a large number of tasks, this becomes a bottle neck, with increased memory usage and lower performance. Looking up the current task was slow as it required a dictionary lookup on the `current_tasks` dict. +This increases the work done by the garbage collector, and in applications with a large number of tasks, this becomes a bottleneck, with increased memory usage and lower performance. Looking up the current task was slow as it required a dictionary lookup on the `current_tasks` dict. 2. **Thread safety**: Before Python 3.14, concurrent iterations over `WeakSet` was not thread safe[^1]. This meant calling APIs like `asyncio.all_tasks()` could lead to inconsistent results or even `RuntimeError` if used in multiple threads[^2]. -3. **Poor scaling in free-threading**: Using global `WeakSet` for storing all tasks across all threads lead to contention when adding and removing tasks from the set which is a frequent operation. As such it performed poorly in free-threading and did not scale well with the number of threads. Similarly accessing the current task in multiple threads did not scale due to contention on the global `current_tasks` dictionary. +3. **Poor scaling in free-threading**: Using global `WeakSet` for storing all tasks across all threads lead to contention when adding and removing tasks from the set which is a frequent operation. As such it performed poorly in free-threading and did not scale well with the number of threads. Similarly, accessing the current task in multiple threads did not scale due to contention on the global `current_tasks` dictionary. ## Python 3.14 implementation @@ -38,7 +38,7 @@ To address these issues, Python 3.14 implements several changes to improve the p - **Per-thread current task**: Python 3.14 stores the current task on the current thread state instead of a global dictionary. This allows for faster access to the current task without the need for a dictionary lookup. Each thread maintains its own current task, which is stored in the `PyThreadState` structure. This was implemented in https://github.com/python/cpython/issues/129898. Storing the current task and list of all tasks per-thread instead of storing it per-loop was chosen primarily to support external introspection tools such as `python -m asyncio pstree` as looking up arbitrary attributes on the loop object -is not possible externally. Storing data per-thread also makes it easy to support third party event loop implementations such as `uvloop`, and is more efficient for single threaded asyncio use-case as it avoids the overhead of attribute lookups on the loop object and several other calls on the performance critical path of adding and removing tasks from the per-loop task list. +is not possible externally. Storing data per-thread also makes it easy to support third party event loop implementations such as `uvloop`, and is more efficient for the single threaded asyncio use-case as it avoids the overhead of attribute lookups on the loop object and several other calls on the performance critical path of adding and removing tasks from the per-loop task list. ## Per-thread double linked list for tasks From 95d2954cff9d68480fed17071ff24c3a58f0dd55 Mon Sep 17 00:00:00 2001 From: Guido van Rossum Date: Tue, 17 Jun 2025 08:54:22 -0700 Subject: [PATCH 14/14] Reflow lines to 80 chars (except URLs and Mermaid) --- InternalDocs/asyncio.md | 152 +++++++++++++++++++++++++++++++--------- 1 file changed, 120 insertions(+), 32 deletions(-) diff --git a/InternalDocs/asyncio.md b/InternalDocs/asyncio.md index fff57555d4bddf..b60fe70478a6bc 100644 --- a/InternalDocs/asyncio.md +++ b/InternalDocs/asyncio.md @@ -2,15 +2,20 @@ asyncio ======= -This document describes the working and implementation details of C implementation of the +This document describes the working and implementation details of C +implementation of the [`asyncio`](https://docs.python.org/3/library/asyncio.html) module. ## Pre-Python 3.14 implementation -Before Python 3.14, the C implementation of `asyncio` used a [`WeakSet`](https://docs.python.org/3/library/weakref.html#weakref.WeakSet) to store all the tasks created by the event loop. `WeakSet` was used so that the event loop -doesn't hold strong references to the tasks, allowing them to be garbage collected when they are no longer needed. -The current task of the event loop was stored in a dict mapping the event loop to the current task. +Before Python 3.14, the C implementation of `asyncio` used a +[`WeakSet`](https://docs.python.org/3/library/weakref.html#weakref.WeakSet) +to store all the tasks created by the event loop. `WeakSet` was used +so that the event loop doesn't hold strong references to the tasks, +allowing them to be garbage collected when they are no longer needed. +The current task of the event loop was stored in a dict mapping the +event loop to the current task. ```c /* Dictionary containing tasks that are currently active in @@ -22,34 +27,87 @@ The current task of the event loop was stored in a dict mapping the event loop t ``` This implementation had a few drawbacks: -1. **Performance**: Using a `WeakSet` for storing tasks is inefficient, as it requires maintaining a full set of weak references to tasks along with corresponding weakref callback to cleanup the tasks when they are garbage collected. -This increases the work done by the garbage collector, and in applications with a large number of tasks, this becomes a bottleneck, with increased memory usage and lower performance. Looking up the current task was slow as it required a dictionary lookup on the `current_tasks` dict. -2. **Thread safety**: Before Python 3.14, concurrent iterations over `WeakSet` was not thread safe[^1]. This meant calling APIs like `asyncio.all_tasks()` could lead to inconsistent results or even `RuntimeError` if used in multiple threads[^2]. - -3. **Poor scaling in free-threading**: Using global `WeakSet` for storing all tasks across all threads lead to contention when adding and removing tasks from the set which is a frequent operation. As such it performed poorly in free-threading and did not scale well with the number of threads. Similarly, accessing the current task in multiple threads did not scale due to contention on the global `current_tasks` dictionary. +1. **Performance**: Using a `WeakSet` for storing tasks is +inefficient, as it requires maintaining a full set of weak references +to tasks along with corresponding weakref callback to cleanup the +tasks when they are garbage collected. This increases the work done +by the garbage collector, and in applications with a large number of +tasks, this becomes a bottleneck, with increased memory usage and +lower performance. Looking up the current task was slow as it required +a dictionary lookup on the `current_tasks` dict. + +2. **Thread safety**: Before Python 3.14, concurrent iterations over +`WeakSet` was not thread safe[^1]. This meant calling APIs like +`asyncio.all_tasks()` could lead to inconsistent results or even +`RuntimeError` if used in multiple threads[^2]. + +3. **Poor scaling in free-threading**: Using global `WeakSet` for +storing all tasks across all threads lead to contention when adding +and removing tasks from the set which is a frequent operation. As such +it performed poorly in free-threading and did not scale well with the +number of threads. Similarly, accessing the current task in multiple +threads did not scale due to contention on the global `current_tasks` +dictionary. ## Python 3.14 implementation -To address these issues, Python 3.14 implements several changes to improve the performance and thread safety of tasks management. - -- **Per-thread double linked list for tasks**: Python 3.14 introduces a per-thread circular double linked list implementation for storing tasks. This allows each thread to maintain its own list of tasks and allows for lock free addition and removal of tasks. This is designed to be efficient, and thread-safe and scales well with the number of threads in free-threading. This also allows external introspection tools such as `python -m asyncio pstree` to inspect tasks running in all threads and was implemented as part of [Audit asyncio thread safety](https://github.com/python/cpython/issues/128002). - -- **Per-thread current task**: Python 3.14 stores the current task on the current thread state instead of a global dictionary. This allows for faster access to the current task without the need for a dictionary lookup. Each thread maintains its own current task, which is stored in the `PyThreadState` structure. This was implemented in https://github.com/python/cpython/issues/129898. - -Storing the current task and list of all tasks per-thread instead of storing it per-loop was chosen primarily to support external introspection tools such as `python -m asyncio pstree` as looking up arbitrary attributes on the loop object -is not possible externally. Storing data per-thread also makes it easy to support third party event loop implementations such as `uvloop`, and is more efficient for the single threaded asyncio use-case as it avoids the overhead of attribute lookups on the loop object and several other calls on the performance critical path of adding and removing tasks from the per-loop task list. - +To address these issues, Python 3.14 implements several changes to +improve the performance and thread safety of tasks management. + +- **Per-thread double linked list for tasks**: Python 3.14 introduces + a per-thread circular double linked list implementation for + storing tasks. This allows each thread to maintain its own list of + tasks and allows for lock free addition and removal of tasks. This + is designed to be efficient, and thread-safe and scales well with + the number of threads in free-threading. This also allows external + introspection tools such as `python -m asyncio pstree` to inspect + tasks running in all threads and was implemented as part of [Audit + asyncio thread + safety](https://github.com/python/cpython/issues/128002). + +- **Per-thread current task**: Python 3.14 stores the current task on + the current thread state instead of a global dictionary. This + allows for faster access to the current task without the need for + a dictionary lookup. Each thread maintains its own current task, + which is stored in the `PyThreadState` structure. This was + implemented in https://github.com/python/cpython/issues/129898. + +Storing the current task and list of all tasks per-thread instead of +storing it per-loop was chosen primarily to support external +introspection tools such as `python -m asyncio pstree` as looking up +arbitrary attributes on the loop object is not possible +externally. Storing data per-thread also makes it easy to support +third party event loop implementations such as `uvloop`, and is more +efficient for the single threaded asyncio use-case as it avoids the +overhead of attribute lookups on the loop object and several other +calls on the performance critical path of adding and removing tasks +from the per-loop task list. ## Per-thread double linked list for tasks -This implementation uses a circular doubly linked list to store tasks on the thread states. This is used for all tasks which are instances of `asyncio.Task` or subclasses of it, for third-party tasks a fallback `WeakSet` implementation is used. The linked list is implemented using an embedded `llist_node` structure within each `TaskObj`. By embedding the list node directly into the task object, the implementation avoids additional memory allocations for linked list nodes. - -The `PyThreadState` structure gained a new field `asyncio_tasks_head`, which serves as the head of the circular linked list of tasks. This allows for lock free addition and removal of tasks from the list. - -It is possible that when a thread state is deallocated, there are lingering tasks in its list; this can happen if another thread has references to the tasks of this thread. Therefore, the `PyInterpreterState` structure also gains a new `asyncio_tasks_head` field to store any lingering tasks. When a thread state is deallocated, any remaining lingering tasks are moved to the interpreter state tasks list, and the thread state tasks list is cleared. -The `asyncio_tasks_lock` is used protect the interpreter's tasks list from concurrent modifications. - +This implementation uses a circular doubly linked list to store tasks +on the thread states. This is used for all tasks which are instances +of `asyncio.Task` or subclasses of it, for third-party tasks a +fallback `WeakSet` implementation is used. The linked list is +implemented using an embedded `llist_node` structure within each +`TaskObj`. By embedding the list node directly into the task object, +the implementation avoids additional memory allocations for linked +list nodes. + +The `PyThreadState` structure gained a new field `asyncio_tasks_head`, +which serves as the head of the circular linked list of tasks. This +allows for lock free addition and removal of tasks from the list. + +It is possible that when a thread state is deallocated, there are +lingering tasks in its list; this can happen if another thread has +references to the tasks of this thread. Therefore, the +`PyInterpreterState` structure also gains a new `asyncio_tasks_head` +field to store any lingering tasks. When a thread state is +deallocated, any remaining lingering tasks are moved to the +interpreter state tasks list, and the thread state tasks list is +cleared. The `asyncio_tasks_lock` is used protect the interpreter's +tasks list from concurrent modifications. ```c typedef struct TaskObj { @@ -69,7 +127,16 @@ typedef struct PyInterpreterState { } PyInterpreterState; ``` -When a task is created, it is added to the current thread's list of tasks by the `register_task` function. When the task is done, it is removed from the list by the `unregister_task` function. In free-threading, the thread id of the thread which created the task is stored in `task_tid` field of the `TaskObj`. This is used to check if the task is being removed from the correct thread's task list. If the current thread is same as the thread which created it then no locking is required, otherwise in free-threading, the `stop-the-world` pause is used to pause all other threads and then safely remove the task from the tasks list. +When a task is created, it is added to the current thread's list of +tasks by the `register_task` function. When the task is done, it is +removed from the list by the `unregister_task` function. In +free-threading, the thread id of the thread which created the task is +stored in `task_tid` field of the `TaskObj`. This is used to check if +the task is being removed from the correct thread's task list. If the +current thread is same as the thread which created it then no locking +is required, otherwise in free-threading, the `stop-the-world` pause +is used to pause all other threads and then safely remove the task +from the tasks list. ```mermaid @@ -98,12 +165,21 @@ flowchart TD one --> two ``` -`asyncio.all_tasks` now iterates over the per-thread task lists of all threads and the interpreter's task list to get all the tasks. In free-threading, this is done by pausing all the threads using the `stop-the-world` pause to ensure that no tasks are being added or removed while iterating over the lists. This allows for a consistent view of all task lists across all threads and is thread safe. +`asyncio.all_tasks` now iterates over the per-thread task lists of all +threads and the interpreter's task list to get all the tasks. In +free-threading, this is done by pausing all the threads using the +`stop-the-world` pause to ensure that no tasks are being added or +removed while iterating over the lists. This allows for a consistent +view of all task lists across all threads and is thread safe. -This design allows for lock free execution and scales well in free-threading with multiple event loops running in different threads. +This design allows for lock free execution and scales well in +free-threading with multiple event loops running in different threads. ## Per-thread current task -This implementation stores the current task in the `PyThreadState` structure, which allows for faster access to the current task without the need for a dictionary lookup. + +This implementation stores the current task in the `PyThreadState` +structure, which allows for faster access to the current task without +the need for a dictionary lookup. ```c typedef struct PyThreadState { @@ -113,9 +189,21 @@ typedef struct PyThreadState { } PyThreadState; ``` -When a task is entered or left, the current task is updated in the thread state using `enter_task` and `leave_task` functions. When `current_task(loop)` is called where `loop` is the current running event loop of the current thread, no locking is required as the current task is stored in the thread state and is returned directly (general case). Otherwise, if the `loop` is not current running event loop, the `stop-the-world` pause is used to pause all threads in free-threading and then by iterating over all the thread states and checking if the `loop` matches with `tstate->asyncio_current_loop`, the current task is found and returned. If no matching thread state is found, `None` is returned. - -In free-threading, it avoids contention on a global dictionary as threads can access the current task of thier running loop without any locking. +When a task is entered or left, the current task is updated in the +thread state using `enter_task` and `leave_task` functions. When +`current_task(loop)` is called where `loop` is the current running +event loop of the current thread, no locking is required as the +current task is stored in the thread state and is returned directly +(general case). Otherwise, if the `loop` is not current running event +loop, the `stop-the-world` pause is used to pause all threads in +free-threading and then by iterating over all the thread states and +checking if the `loop` matches with `tstate->asyncio_current_loop`, +the current task is found and returned. If no matching thread state is +found, `None` is returned. + +In free-threading, it avoids contention on a global dictionary as +threads can access the current task of thier running loop without any +locking. [^1]: https://github.com/python/cpython/issues/123089