-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Description
Update:
This issue has been resolved by implementing a new pipeline option to Java/Python/Go SDK. When this pipeline option is specified, and the runner has restart bundle logic, the SDK harness automatically restarts if an element takes too long to process and the runner will retry processing of the same work item.
The pipeline option per SDK:
- Java: 'elementProcessingTimeoutMinutes', takes in integer input, e.g., '--elementProcessingTimeoutMinutes=5'
- Python: 'element_processing_timeout_minutes', takes in integer input, e.g., '--element_processing_timeout_minutes=5'
- Go: 'element_processing_timeout', takes in duration string input, e.g., '--element_processing_timeout=5m'
What would you like to happen?
The document "Beam Configurable SDK Restart Proposal" proposes adding a pipeline option called ptransform_timeout_duration
to Apache Beam's Dataflow. This option allows users to set a timeout for PTransforms, so if processing a single element exceeds the timeout, the SDK will restart. The goal is to detect and address "hung" SDK workers faster than the current 100+ minute inactivity timeout.
Issue Priority
Priority: 2 (default / most feature requests should be filed as P2)
Issue Components
- Component: Python SDK
- Component: Java SDK
- Component: Go SDK
- Component: Typescript SDK
- Component: IO connector
- Component: Beam YAML
- Component: Beam examples
- Component: Beam playground
- Component: Beam katas
- Component: Website
- Component: Infrastructure
- Component: Spark Runner
- Component: Flink Runner
- Component: Samza Runner
- Component: Twister2 Runner
- Component: Hazelcast Jet Runner
- Component: Google Cloud Dataflow Runner