v2.0: Process-per-context architecture with performance improvements#10
Open
v2.0: Process-per-context architecture with performance improvements#10
Conversation
Implement a general-purpose worker thread pool that eliminates per-request GIL acquisition overhead. Each worker holds the GIL (or has its own subinterpreter with OWN_GIL on Python 3.12+) and processes requests from a shared MPSC queue. Key features: - Sync API: call, apply, eval, exec, asgi_run, wsgi_run - Async API: all *_async variants returning request_id for non-blocking calls - await/1,2 for waiting on async results - Per-worker module caching to avoid reimport overhead - Support for FREE_THREADED (3.13+), SUBINTERP (3.12+), and FALLBACK modes
- Fix potential crash when locals_term is uninitialized (check for 0) - Add benchmark results directory with baseline comparisons Known issue: ~0.5-1% of concurrent sync calls may timeout under high load (100+ concurrent callers). Async API unaffected.
1. Use-after-free on request_id: Save request_id BEFORE enqueueing the request to the worker pool. Once enqueued, a worker can process and free the request at any time. Accessing req->request_id after py_pool_enqueue() is undefined behavior. 2. Double-free of msg_env: After a successful enif_send(), the message environment is consumed/invalidated by the Erlang runtime. We must set req->msg_env = NULL to prevent py_pool_request_free() from calling enif_free_env() on an already-freed environment. These bugs caused ~0.5-1% of concurrent calls to timeout under high load because request IDs could be corrupted, leading to message/response mismatch. Also adds debug counters (responses_sent, responses_failed) to pool stats for monitoring send success rate.
Changed py_pool_process_asgi to call run_asgi(module_name, callable_name, scope, body) instead of run(app, scope, body), matching hornbeam's hornbeam_asgi_runner interface. Also updated extract_asgi_response to handle both dict and tuple return formats, supporting hornbeam's dict-based response.
- Add compile-time detection of PyInterpreterConfig_OWN_GIL (Python 3.12+) - Add mutex to py_subinterp_worker_t for thread-safe parallel access - Add nif_subinterp_asgi_run for ASGI on subinterpreters - Add py_resource_pool module with lock-free round-robin scheduling - Benchmark shows 8-10x improvement with subinterpreters enabled
Replace worker pool with process-per-context model where each Python context is owned by a dedicated Erlang process. Enables reentrant callbacks via suspension-based mechanism without deadlock. - Add py_context.erl with recursive receive pattern for inline callback handling - Add py_context_router.erl for scheduler-affinity based routing - Add nif_context_resume for Python replay with cached callback results - Support sequential callbacks via callback_results array accumulation - Remove old pool modules (py_pool, py_worker, py_worker_pool, etc.)
- Pass timeout parameter through py:eval/3 and do_call/5 - Add py:contexts_started/0 and py_context_router:is_started/0 - Fix test_timeout to use time.sleep for reliable delay - Fix thread callback suite to check existing contexts
When the application restarts, py_thread_handler registers as the new coordinator, but existing thread workers in the NIF-level pool still had has_handler=true from the previous run. This caused them to skip spawning new handler processes and write to dead pipes. Reset has_handler=false on all existing workers when a new coordinator is registered.
Two fixes: 1. suspended_context_state_destructor: For subinterpreters with OWN_GIL, use PyThreadState_Swap to switch to the correct interpreter before releasing Python objects. PyGILState_Ensure only works for the main interpreter and causes memory corruption with subinterpreter objects. 2. thread_worker_set_coordinator: Reset has_handler=false on all existing workers when a new coordinator registers (e.g., after app restart). Old workers kept has_handler=true but their handler processes were dead.
- Rename priv/erlang/ to priv/_erlang_impl/ to avoid C module shadowing - Add _extend_erlang_module() helper in py_callback.c to re-export Python package functions (run, new_event_loop, EventLoopPolicy, etc.) - Update py_event_loop.erl to call extension during initialization - Delete buggy erlang_asyncio.py (blocking sleep replaced by proper asyncio.sleep backed by Erlang timers via call_later) - Add test infrastructure in priv/tests/ for event loop integration The unified erlang module now provides uvloop-compatible API: - erlang.run(coro) - run async code with Erlang event loop - erlang.new_event_loop() - create ErlangEventLoop instance - erlang.install() - install ErlangEventLoopPolicy (deprecated 3.12+) - erlang.call() / erlang.async_call() - call Erlang functions - asyncio.sleep() works via Erlang timers
- Update py_erlang_sleep_SUITE to use erlang.run() with standard asyncio instead of the removed erlang_asyncio module - Skip py_asyncio_compat_SUITE: tests create standalone ErlangEventLoop instances via erlang.new_event_loop() and call loop.run_forever(). Timer scheduling for standalone loops needs work - timers fire immediately instead of after the scheduled delay.
- Add isolated parameter to ErlangEventLoop.__init__() that creates a per-loop capsule via _loop_new() for proper event routing - Update all loop methods (call_at, _run_once, stop, close, add_reader, remove_reader, add_writer, remove_writer) to use per-loop capsule APIs when running as isolated instance - new_event_loop() now passes isolated=True by default - Fix run_forever() to honor stop() called before run_forever() by not resetting _stopping flag at start - Simplify async_test_runner to run tests synchronously without erlang.run() wrapper, avoiding nested event loop issues - Add timeout fallback to test_add_remove_writer to prevent hanging - Remove skip from py_asyncio_compat_SUITE to enable tests Test results: 46 tests run, 42 passed, 4 failures (edge cases)
The pthread+usleep polling async workers have been replaced with an event-driven model using py_event_loop and enif_select: - Add _run_and_send wrapper in Python for result delivery via erlang.send() - Add nif_event_loop_run_async NIF for direct coroutine submission - Add py_event_loop:run_async/2 Erlang API - Add py_event_loop_pool.erl for managing event loop-based async execution - Rewrite py_async_pool.erl to delegate to event_loop_pool - Update supervisor tree to include py_event_loop_pool - Remove py_async_worker.erl and py_async_worker_sup.erl - Stub deprecated async_worker NIFs to return errors - Remove async_event_loop_thread and async_future_callback C code Performance improvements: - Latency: ~10-20ms polling -> <1ms (enif_select) - CPU idle: 100 wakeups/sec -> Zero - Threads: N pthreads -> 0 extra threads API unchanged: py:async_call/3,4 and py:await/1,2 work the same.
Replace global variables with module state structure stored in the Python module, enabling proper per-interpreter/per-context event loop isolation. Changes: - Add py_event_loop_module_state_t struct containing event_loop, shared_router, shared_router_valid, and isolation_mode - Update PyModuleDef to allocate module state (m_size) - Update get_interpreter_event_loop() to read from module state - Update set_interpreter_event_loop() to write to module state - Update nif_set_python_event_loop() to use module state - Update nif_set_isolation_mode() to use module state - Update nif_set_shared_router() to use module state - Update py_get_isolation_mode() to read from module state - Update py_loop_new() to read shared_router from module state - Update event_loop_destructor() to clear module state - Update create_default_event_loop() to use module state - Remove g_python_event_loop, g_shared_router, g_shared_router_valid, and g_isolation_mode global variables
- Remove erlang_loop.py, use _erlang_impl as the single implementation - Add get_event_loop_policy() export to _erlang_impl and erlang module - Fix signal tests: ErlangEventLoop has limited signal support (SIGINT, SIGTERM, SIGHUP only), other signals raise ValueError - Skip subprocess tests for Erlang (not yet implemented) - Update all imports to use erlang module (public API) with _erlang_impl as internal fallback - Update docs and examples to use erlang module imports
- test_run_until_complete_nested_raises: Use asyncio.sleep(0.1) to ensure timer path (not fast path), properly close coroutine in finally block - test_run_until_complete_on_closed_raises: Store coroutine in variable and close it in finally block - tearDown: Cancel pending tasks and shutdown async generators before closing loop to prevent resource leaks - Add test_asyncio_sleep_zero_fast_path: Verify sleep(0) uses fast path - test_add_remove_writer: Use socketpair for reliable write readiness
- Share fd_resource per fd to prevent enif_select stealing errors - Add NIF functions for fd resource management - Use send() instead of sendto() for connected UDP sockets - Fix TCP EOF handling to call connection_lost properly
await coro() runs in shared context (changes visible to caller), while create_task(coro()) runs in copied context (changes isolated). Updated test_context_in_task and test_multiple_context_vars to reflect correct Python behavior.
Subprocess is not supported because Python's subprocess module uses fork() which corrupts the Erlang VM when called from within the NIF. Users should use Erlang ports directly via erlang.call() instead, which provides superior subprocess management with built-in supervision, monitoring, and fault tolerance. Changes: - Replace _subprocess.py with NotImplementedError stub and docs - Remove subprocess event handling from _loop.py - Remove subprocess functions from py_event_loop.c - Update tests to verify NotImplementedError is raised - Set HAS_SUBPROCESS_SUPPORT = False in test base
ETF encoding for pids and references: - Add decode_etf_string() helper in py_callback.c to convert __etf__:base64 encoded strings back to Erlang terms - Add ETF encoding in term_to_python_repr for pids and refs in py_context.erl and py_thread_handler.erl Test fixes: - Skip ProcessPoolExecutor test inside Erlang NIF (fork issues) - Use 'spawn' multiprocessing context instead of 'fork' - Accept OSError in addition to TimeoutError for connect timeout test Cleanup: - Remove obsolete multi_loop test files
Implement low-level fd-based API where Erlang handles I/O scheduling via enif_select and Python handles protocol logic. - Add priv/_erlang_impl/_reactor.py with Protocol base class and registry - Add src/py_reactor_context.erl for Erlang reactor context process - Expose erlang.reactor via sys.modules for 'import erlang.reactor' syntax - Add test suite (py_reactor_SUITE.erl) with 6 tests - Add Python tests (py_test_reactor.py) with 3 tests - Add examples/reactor_echo.erl as usage example Works with any fd - TCP, UDP, Unix sockets, pipes, etc.
- Add _sandbox.py with Python audit hooks (PEP 578) to block dangerous operations: fork, exec, spawn, subprocess, os.system, os.popen - Install sandbox automatically when running inside Erlang VM - Remove signal handling support (not applicable in Erlang context) - Update policy to always return ErlangEventLoop - Fix ExecutionMode test to check correct enum values - Remove signal tests and AIO subprocess tests from test suite
New documentation: - docs/security.md: Document audit hook sandbox, blocked operations (fork, exec, subprocess), and Erlang port alternatives - docs/reactor.md: Document erlang.reactor module for FD-based protocol handling with Protocol base class and examples Updated documentation: - docs/asyncio.md: Update for unified erlang module, mark erlang.install() as deprecated in 3.12+, add Limitations section for subprocess/signal handling, add ExecutionMode documentation - docs/getting-started.md: Add Security Considerations section, update asyncio section to use erlang.run() - README.md: Add security sandbox to features, add doc links Also fixed edoc errors in source files: - src/py_nif.erl: Fix angle bracket syntax in reactor function docs - src/py_context_router.erl: Replace markdown code blocks with <pre>
API change: py:call_async/3,4 renamed to py:cast/3,4 following gen_server convention (call=sync, cast=async). Add benchmark_compare.erl for comparing performance between versions. Current version shows ~2-3x improvement over v1.8.1: - Sync calls: 0.011ms -> 0.004ms (2.9x faster) - Cast single: 0.011ms -> 0.004ms (2.8x faster) - Throughput: ~90K -> ~250K calls/sec
Covers: - py:call_async -> py:cast rename - py:bind/unbind removal (use py_context_router) - py:ctx_* removal (use py_context directly) - erlang_asyncio -> erlang module consolidation - Subprocess removal (use Erlang ports) - Signal handler removal (use Erlang level) - New features: context router, reactor, erlang.send() - Performance comparison table
- Add suspended return type to context_call/5 and context_eval/3 specs - Fix context_create pattern match to include InterpId
Bind context in init_per_testcase to ensure py:exec and py:eval use the same Python namespace. Scheduler migration could cause different contexts to be selected between calls.
Change exception type from atom to string in error tuples.
Since atoms are never garbage collected and Python exception
names are unbounded (custom exceptions, third-party libraries),
using atoms can exhaust the atom table.
Breaking change: error tuples change from {error, {TypeError, "msg"}}
to {error, {"TypeError", "msg"}}
Change trace status from atom to string to prevent atom table exhaustion from arbitrary Python status strings.
Add NULL checks after enif_make_new_binary calls in: - build_suspended_result - build_suspended_context_result - blocking callback path Without these checks, a memcpy to NULL would cause a crash if allocation fails.
Add NULL checks for binary allocation in extract_asgi_response(): - Header name allocation - Header value allocation - Body allocation Falls back to generic py_to_term() conversion on allocation failure rather than crashing on memcpy to NULL.
Add NULL checks for binary allocation in: - recvfrom_test_udp host conversion - reactor_on_read_ready action string - reactor_on_write_ready action string - nif_version Python version string Returns appropriate errors on allocation failure rather than crashing on memcpy to NULL.
Two fixes: 1. Set shutdown=true in cleanup path so py_pool_init loop sees the failure and doesn't spin forever on !running && !shutdown 2. Add 30 second timeout to init wait loop as safety net 3. Detect worker init failure (shutdown=true, running=false) and clean up properly Previously, if worker init failed, the init wait loop would spin forever waiting for running to become true.
Replace pthread_cond_wait with pthread_cond_timedwait (30 second timeout) in sync_sleep to prevent indefinite hangs if the Erlang process dies or never signals completion. Previously, if the Erlang side never signaled sleep completion, the Python thread would hang forever. Now it will raise a TimeoutError after 30 seconds.
PyTuple_Pack does NOT steal references - it increments refcounts of its arguments. Passing PyUnicode_FromString or PyLong_FromLong directly leaks those temporary objects. Fixed in: - py_wsgi.c: wsgi_version_tuple (1, 0) creation - py_event_loop.c: async_result tuple creation Now properly creates temporaries, passes to PyTuple_Pack, then decrefs the temporaries.
Update tests to expect strings instead of atoms for: - Python exception type names (e.g., "NameError" not 'NameError') - Trace span status (e.g., "ok" not ok atom) These changes match the security fixes that prevent atom table exhaustion from unbounded Python exception/status values.
- README.md: Update error handling examples to use strings - docs/migration.md: Add security-related breaking changes section - c_src/py_nif.h: Update make_py_error documentation - c_src/py_logging.c: Use cached ATOM_OK/ATOM_ERROR for trace status - test/py_logging_SUITE.erl: Revert trace status to atoms Trace span status remains as atoms (ok/error) since these are known, bounded values that use cached atoms for efficiency.
- Add per-interpreter exception lookup for ProcessError and SuspensionRequired to fix exception catching in subinterpreters - Extend erlang module with event loop functions (run, new_event_loop) when creating py_context to ensure availability in all interpreters Fixes test failures in py_erlang_sleep_SUITE, py_pid_send_SUITE, and py_reentrant_SUITE.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Process-per-context architecture with 2-3x performance improvement over v1.8.1.
Added
py_context_routerwith scheduler-affinity routingerlang.reactor- FD-based protocol handling for custom serverserlang.send(pid, term)- Fire-and-forget message passingChanged
py:call_async→py:casterlang_asyncioconsolidated intoerlangmoduleRemoved
py:bind/unbindandpy:ctx_*(usepy_context_router)Performance
Docs
docs/migration.md- v1.8.x to v2.0 guidedocs/security.md- Sandbox documentationdocs/reactor.md- Protocol I/O handling