Let's consider a scenario when an application crash at a specific moment can break data consistency. For example, a developer forgot to wrap their code in a transaction:
def transfer(from_user: UserID, to_user: UserID, amount: int) -> None:
conn = db.connect() # Autocommit mode
# The developer forgot to wrap the requests in `with conn.transaction()`
conn.execute(
"UPDATE users SET balance = balance - ? WHERE id = ?",
(amount, from_user),
)
# If execution is interrupted here, we lose the money
conn.execute(
"UPDATE users SET balance = balance + ? WHERE id = ?",
(amount, to_user),
)(To keep things simple, we'll assume transfer is guaranteed not to run concurrently, so we're not interested in race conditions here. We'll also leave data validation out of the scope for now)
This is a rather sneaky type of bug because they:
- aren't easily caught by linters
- aren't easily caught by tests
- might only show up a year later under very specific circumstances
But what if we really want to catch such problems with tests? Let's try to write a test that will emulate code interruption at different points in time.
Python 3.12 introduced an API that allows you to do just that: sys.monitoring. This is a project from the Faster CPython team, aimed at providing cheap (low-impact) instrumentation – primarily for profilers and debuggers. The details are described in PEP 669.
Among other things, sys.monitoring allows you to attach a callback to the execution of bytecode instructions of an arbitrary code object. Yes, technically you could do this before with sys.settrace, but without granularity — the trace was always global, which made things noticeably heavy for the interpreter.
Here’s a "hello-world" example:
# should run as is on python >= 3.12
import dis
from sys import monitoring
from types import CodeType
def sum_numbers(lst: list[int]) -> int:
match lst:
case [x, *xs]:
return x + sum_numbers(xs)
case _:
return 0
def on_opcode(code: CodeType, offset: int) -> None:
opcode = code.co_code[offset]
opname = dis.opname[opcode]
print(offset, f"Opcode: {opname} at {code}")
monitoring.use_tool_id(0, "opcode_tracer")
monitoring.register_callback(0, monitoring.events.INSTRUCTION, on_opcode)
monitoring.set_local_events(0, sum_numbers.__code__, monitoring.events.INSTRUCTION)
print(sum_numbers([1, 2, 3]))Looks quite convenient. But instead of building useful tooling as the PEP authors intended, we're going to abuse it. If you throw an exception in the callback, the execution will be interrupted at the current opcode. This way, we can emulate a stop at any arbitrary instruction, like so:
class Stop(Exception): ...
def on_opcode(code: CodeType, offset: int) -> None:
global counter
counter += 1
if counter == stop_at:
raise StopNow we can write our test:
- Credit user 1 with $100
- Start a transfer to user 2
- Interrupt the transfer execution at the Nth instruction
- Check that the total amount of money in the DB is still $100
- Repeat for every N from 1 to the total number of instructions in the code
In pseudocode:
n_instructions = 0
while True:
n_instructions += 1
set_user_balances({1: 100, 2: 0})
was_terminated = terminate_after(n_instructions, lambda: transfer(1, 2, 100))
if not was_terminated:
break
if total_balance() != 100:
print(f"Bug detected when terminated after {n_instructions} instructions")The full version with a proper API is in simulator.py. Features:
- To enable tracing only for specific code, you can pass a list of modules, functions, or classes
- Besides opcodes, you can simulate interruptions at code lines (
mode="line") – it's less granular, but cheaper
Missing features:
- We could analyze the bytecode and skip instructions that are known to be safe
- Async support
- Parallelization