Share one poll loop across sibling event triggers#66584
Conversation
cb5ecf0 to
0406c1f
Compare
dfdf876 to
ac871a6
Compare
51a57b1 to
0163983
Compare
0163983 to
752132e
Compare
4ba609f to
0a5af27
Compare
59c79e5 to
c2f2da8
Compare
c2f2da8 to
6a377f9
Compare
uranusjr
left a comment
There was a problem hiding this comment.
Not sure with the parameter name, but I don’t have a concrete opinion.
236ee58 to
d7e9549
Compare
| return None | ||
|
|
||
| @classmethod | ||
| async def open_shared_stream(cls, kwargs: dict[str, Any]) -> AsyncIterator[Any]: |
There was a problem hiding this comment.
Not sure if it's a good idea to introduce method-level generic so that user could define TypedDict or internal data model themself that shared for open_shared_stream and filter_shared_stream.
e.g. async def open_shared_stream(cls, kwargs: dict[str, Any]) -> AsyncIterator[MyType]:
There was a problem hiding this comment.
The raw stream is a private contract between open_shared_stream and filter_shared_stream — T never crosses an API boundary, so the TypeVar would serve only as documentation for the subclass author.
I'd rather skip it unless users ask for it, which I doubt anyone would ever do.
| """ | ||
|
|
||
|
|
||
| class _PollTerminated(Exception): |
There was a problem hiding this comment.
Do these exception need to inherent from AirflowException?
There was a problem hiding this comment.
_PollTerminated and _SubscriberOverflow are internal sentinels — they never escape SharedStreamManager and the trigger-failure path catches Exception regardless of base.
I'd rather not add more AirflowException subclasses for purely internal types.
When several AssetWatcher instances back triggers that read from the same upstream resource (one SQS queue, one Kafka topic, etc.), the triggerer spins up N independent poll loops today — one per trigger. Issue apache#66476 asks for one shared poller serving all of them. Add an opt-in path on BaseEventTrigger via three new hooks (`shared_stream_key`, `open_shared_stream`, `filter_shared_stream`) and a new SharedStreamManager that runs one poll task per distinct key and broadcasts events to per-subscriber queues. The key is read once when run_trigger starts and identifies the group for the trigger's lifetime. Per-trigger cleanup runs in run_trigger's finally; SharedStreamManager.stop_all() runs in the triggerer's shutdown path as a safety net. Triggers whose shared_stream_key() returns None (the default) keep their existing run() loop unchanged. The per-subscriber buffer size is exposed as [triggerer] shared_stream_subscriber_queue_size (default 1024) so deployments with a fast upstream can raise it without code changes.
- triggerer: compute shared_stream_key after render_template_fields so templated attributes resolve before keying - shared_stream: extract _drain_and_offer_failure helper; reuse from terminal broadcast and overflow paths - DirectoryFileDeleteTrigger: normalise directory via realpath so relative/absolute/symlink/trailing-slash variants share one scan
- DirectoryFileDeleteTrigger.open_shared_stream: raise on PermissionError / NotADirectoryError / IsADirectoryError so config bugs surface in the UI instead of silently spinning a warning loop; keep swallow + retry for the rest of OSError (transient I/O)
Round-3 doc cleanup for jason's review (C1/C2/C6): - Drop Kafka/SQS recommendations from event-scheduling.rst and BaseEventTrigger class docstring; the producer-side ack channel is out of scope this iteration. - Document the deterministic-key requirement on shared_stream_key and add a Slow-subscriber overflow mitigations section (raise subscriber queue size, redesign the key to narrow groups).
Address jason's C4 suggestion: collapse the get-and-None-check on SharedStreamManager.subscribe into a single walrus expression.
d7e9549 to
3ddd18a
Compare
Why
When multiple
AssetWatcherinstances back triggers that read from the same upstream resource — one SQS queue, one Kafka topic, one directory — today's triggerer spins up N independent poll loops, one per trigger. For directory scans and similar idempotent / read-only sources this is wasted I/O and load on the upstream.What
BaseEventTrigger:shared_stream_key()— return a hashable; triggers with equal keys share one poll loop. ReturningNone(default) keeps the existingrun()loop unchanged.@classmethod open_shared_stream(kwargs)— runs once per group; yields raw events for the lifetime of the group, or raises. Declared as a classmethod (not staticmethod) so subclasses can chain viasuper().open_shared_stream(kwargs).filter_shared_stream(shared_stream)— per-subscriber; converts raw events intoTriggerEvents.airflow.triggers.shared_streamwithSharedStreamManager: one poll task per distinct key, broadcasts to per-subscriber bounded queues, evicts groups synchronously before anyawaitto close lifecycle race windows. The manager is single-event-loop and not thread-safe;TriggerRunneris its sole owner.TriggerRunner.run_trigger: triggers with a non-Nonekey route through the manager;stop_all()runs in shutdown as a safety net.DirectoryFileDeleteTriggerin the standard provider — sibling triggers watching the same directory share a single scan.[triggerer] shared_stream_subscriber_queue_size(default 1024) — per-subscriber buffer size; a slow subscriber that overruns its buffer fails loudly with_SubscriberOverflowrather than dropping events silently. Sibling subscribers are unaffected.Scope
This PR targets idempotent / read-only / subscriber-side-effect upstreams only — directory scans, polling REST APIs, Kafka with
enable.auto.commit=true, sources where per-event cleanup lives on the subscriber.Manual-commit Kafka, SQS delete-on-process, Pub/Sub
ack_id, and Service Bus peek-lock are out of scope — they need a producer-side ack channel from each subscriber's accept/reject decision back to the producer's handle, whichfilter_shared_streamdoes not provide today. A follow-up issue will track that work.closes: #66476
Was generative AI tooling used to co-author this PR?
Generated-by: [Claude] following the guidelines
{pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.