Codestin Search App

SameerMesiah97 · 2026-01-24T17:26:03Z

Description

Added best-effort cleanup to EmrCreateJobFlowOperator to terminate EMR clusters when failures occur after successful cluster creation. Cleanup behavior is guarded by a flag and is opted in by default.

In certain failure modes, the operator could previously create a cluster via create_job_flow and then fail during later execution steps (for example, while waiting for completion when DescribeCluster permissions are missing). In these cases, the task failed while leaving the cluster running. The operator now attempts to terminate the created job flow if an exception is raised after creation. Cleanup is best-effort and does not override or mask the original exception.

This change applies a similar failure-handling approach recently introduced for EC2CreateInstanceOperator in PR #60904. But cleanup is only triggered for post-start EMR job flow failures (including waiter-related errors), ensuring termination is attempted only when a job flow was successfully created and avoiding interception of non-AWS exceptions.

Rationale

EmrCreateJobFlowOperator is responsible for provisioning and coordinating an external, stateful service whose lifecycle extends beyond task execution. If the task fails after cluster creation, Airflow can no longer reliably manage or observe the cluster’s state. Adding opportunistic cleanup in these scenarios reduces the risk of orphaned EMR clusters and unexpected infrastructure costs, while preserving existing failure semantics. Cleanup errors are logged and do not affect the task’s final failure state.

Restricting cleanup to post-creation EMR job flow failures prevents unintended termination in unrelated failure paths while still addressing orphaned job flows created during execution.

Tests

Added a unit test covering failure after cluster creation and verifying that termination is attempted.
Added a unit test ensuring cleanup failures do not mask the original exception.

Documentation

The docstring for EmrCreateJobFlowOperator has been updated with a brief description of the new flag terminate_job_flow_on_failure.

Backwards Compatibility

A new flag called terminate_job_flow_on_failure has been added to EmrCreateJobFlowOperator with a default setting of True. Cleanup will now be attempted on a best-effort basis if WaiterError is encountered.

Reproduciblity

The failure scenario could not be reproduced directly due to personal AWS account permissions. However, based on the current control flow of EmrCreateJobFlowOperator, it is possible for cluster creation to succeed while a later step fails, leaving the EMR cluster running without cleanup. This change defensively addresses that case. Contributors reading this PR are free to provide a reproduction for the aforementioned failure mode if they can.

SameerMesiah97 · 2026-01-25T15:06:55Z

@vincbeck

No pressure to review. But tagging you here as it follows the same theme as the PR for EC2 (#60904), which you merged.

SameerMesiah97 · 2026-01-26T13:10:21Z

@eladkal

This follows the same theme as #61051

uranusjr

One nit

Attempt best-effort termination of EMR clusters when failures occur after successful job flow creation. Cleanup does not mask the original exception.

shahar1 · 2026-02-10T13:34:08Z

@SameerMesiah97 Could you please resolve conflicts?

Edit: I took care of it so I could include it in the upcoming release (used Gen. AI, GitHub Copilot + Claude Sonnet 4.6)

# Conflicts: # providers/amazon/src/airflow/providers/amazon/aws/operators/emr.py # providers/amazon/tests/unit/amazon/aws/operators/test_emr_create_job_flow.py

SameerMesiah97 · 2026-02-10T15:11:10Z

@SameerMesiah97 Could you please resolve conflicts?

Edit: I took care of it so I could include it in the upcoming release (used Gen. AI, GitHub Copilot + Claude Sonnet 4.6)

So you no longer want me to fix it?

shahar1 · 2026-02-10T15:20:31Z

@SameerMesiah97 Could you please resolve conflicts?
Edit: I took care of it so I could include it in the upcoming release (used Gen. AI, GitHub Copilot + Claude Sonnet 4.6)

So you no longer want me to fix it?

No need, I took care of it as I'm starting the release process very soon :)
The failed test seemed unrelated, so I just ran it again.
Great job anyway!

…failure (apache#61010)

SameerMesiah97 requested a review from o-nikolas as a code owner January 24, 2026 17:26

boring-cyborg Bot added area:providers provider:amazon AWS/Amazon - related issues labels Jan 24, 2026

SameerMesiah97 mentioned this pull request Jan 25, 2026

EcsRunTaskOperator leaks ECS task on failure with partial IAM permissions #61050

Closed

2 tasks

vincbeck approved these changes Jan 26, 2026

View reviewed changes

uranusjr approved these changes Jan 27, 2026

View reviewed changes

Comment thread providers/amazon/src/airflow/providers/amazon/aws/operators/emr.py Outdated

SameerMesiah97 force-pushed the EmrCreateJobFlowOperator-Cleanup branch from 9968272 to e494b8b Compare January 27, 2026 14:51

SameerMesiah97 mentioned this pull request Jan 27, 2026

EksCreateNodegroupOperator leaks EKS nodegroup on failure with partial IAM permissions #61142

Closed

2 tasks

Add post-create cleanup to EmrCreateJobFlowOperator

3cab55f

Attempt best-effort termination of EMR clusters when failures occur after successful job flow creation. Cleanup does not mask the original exception.

SameerMesiah97 force-pushed the EmrCreateJobFlowOperator-Cleanup branch from e494b8b to 3cab55f Compare January 29, 2026 20:05

SameerMesiah97 mentioned this pull request Jan 29, 2026

Add best-effort cleanup to EksCreateNodegroupOperator on post-create failure #61145

Merged

SameerMesiah97 requested a review from vincbeck January 29, 2026 20:48

vincbeck approved these changes Jan 29, 2026

View reviewed changes

SameerMesiah97 mentioned this pull request Jan 30, 2026

Restrict EC2CreateInstanceOperator cleanup to waiter failures and add guard flag #61272

Merged

Merge branch 'main' into EmrCreateJobFlowOperator-Cleanup

79f86cc

# Conflicts: # providers/amazon/src/airflow/providers/amazon/aws/operators/emr.py # providers/amazon/tests/unit/amazon/aws/operators/test_emr_create_job_flow.py

shahar1 merged commit 0df2f59 into apache:main Feb 10, 2026
169 of 170 checks passed

shahar1 mentioned this pull request Feb 11, 2026

Status of testing Providers that were prepared on February 10, 2026 #61766

Closed

81 tasks

Alok-kumar-priyadarshi pushed a commit to Alok-kumar-priyadarshi/airflow that referenced this pull request Feb 11, 2026

Add best-effort cleanup to EmrCreateJobFlowOperator on post-creation …

aae34b6

…failure (apache#61010)

Ratasa143 pushed a commit to Ratasa143/airflow that referenced this pull request Feb 15, 2026

Add best-effort cleanup to EmrCreateJobFlowOperator on post-creation …

36a1b43

…failure (apache#61010)

choo121600 pushed a commit to choo121600/airflow that referenced this pull request Feb 22, 2026

Add best-effort cleanup to EmrCreateJobFlowOperator on post-creation …

f0e2d1f

…failure (apache#61010)

potiuk mentioned this pull request Feb 26, 2026

Status of testing Providers that were prepared on February 26, 2026 #62537

Closed

AkshayArali pushed a commit to AkshayArali/airflow_630 that referenced this pull request Feb 27, 2026

Add best-effort cleanup to EmrCreateJobFlowOperator on post-creation …

74f81c0

…failure (apache#61010)

AkshayArali pushed a commit to AkshayArali/airflow_630 that referenced this pull request Feb 27, 2026

Add best-effort cleanup to EmrCreateJobFlowOperator on post-creation …

db0e97f

…failure (apache#61010)

potiuk mentioned this pull request Mar 3, 2026

Status of testing Providers that were prepared on March 03, 2026 #62794

Closed

55 tasks

Subham-KRLX pushed a commit to Subham-KRLX/airflow that referenced this pull request Mar 4, 2026

Add best-effort cleanup to EmrCreateJobFlowOperator on post-creation …

0f00fcc

…failure (apache#61010)

dominikhei pushed a commit to dominikhei/airflow that referenced this pull request Mar 11, 2026

Add best-effort cleanup to EmrCreateJobFlowOperator on post-creation …

0fe83c9

…failure (apache#61010)

Ankurdeewan pushed a commit to Ankurdeewan/airflow that referenced this pull request Mar 15, 2026

Add best-effort cleanup to EmrCreateJobFlowOperator on post-creation …

c6eb7f7

…failure (apache#61010)

radhwene pushed a commit to radhwene/airflow that referenced this pull request Mar 21, 2026

Add best-effort cleanup to EmrCreateJobFlowOperator on post-creation …

a249fbb

…failure (apache#61010)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add best-effort cleanup to EmrCreateJobFlowOperator on post-creation failure#61010

Add best-effort cleanup to EmrCreateJobFlowOperator on post-creation failure#61010
shahar1 merged 2 commits into
apache:mainfrom
SameerMesiah97:EmrCreateJobFlowOperator-Cleanup

SameerMesiah97 commented Jan 24, 2026 •

edited

Loading

Uh oh!

SameerMesiah97 commented Jan 25, 2026

Uh oh!

SameerMesiah97 commented Jan 26, 2026

Uh oh!

uranusjr left a comment

Uh oh!

Uh oh!

shahar1 commented Feb 10, 2026 •

edited

Loading

Uh oh!

SameerMesiah97 commented Feb 10, 2026

Uh oh!

shahar1 commented Feb 10, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

SameerMesiah97 commented Jan 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SameerMesiah97 commented Jan 25, 2026

Uh oh!

SameerMesiah97 commented Jan 26, 2026

Uh oh!

uranusjr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

shahar1 commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SameerMesiah97 commented Feb 10, 2026

Uh oh!

shahar1 commented Feb 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

SameerMesiah97 commented Jan 24, 2026 •

edited

Loading

shahar1 commented Feb 10, 2026 •

edited

Loading

shahar1 commented Feb 10, 2026 •

edited

Loading