feat: add zombie DAG run detection to scheduler#1163
Conversation
|
Jesus, I have not seen an open source project move so fast. I am truly blown away. ❤️ |
|
One question: how expensive is this |
|
@jonasban It probably won’t be too expensive if there are fewer DAG-runs like 10 for file-based storage, but that can vary depending on the hardware. The heartbeat timeout is currently 45s, so that’s how long it would take to catch a zombie run anyway. I think under normal circumstances, zombies are relatively rare, so I’m not sure it’s worth make it something like 2–5s. What do you think? Still, it would be nice to make them configurable if user needs more control. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #1163 +/- ##
==========================================
+ Coverage 65.37% 65.42% +0.05%
==========================================
Files 126 127 +1
Lines 18769 18893 +124
==========================================
+ Hits 12270 12361 +91
- Misses 5516 5540 +24
- Partials 983 992 +9
... and 3 files with indirect coverage changes Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
Overview
Implements automatic detection and cleanup of zombie DAG runs - processes marked as running but whose underlying process is no longer alive.
Feedback-by: @jonasban
Issue: #1130
Changes
ZombieDetectorthat periodically checks running DAG runszombieDetectionInterval(default 45s, 0 to disable)DAGU_SCHEDULER_ZOMBIE_DETECTION_INTERVAL