Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@braczka
Copy link
Contributor

@braczka braczka commented Aug 1, 2025

Description:

Applies 2 resubmit cap for WRF ensemble members submissions during assim_advance step within driver.csh.
This prevents rare cases of 'zombie' jobs that need to be killed based on PID.

Fixes issue

Fixes #923

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update

Documentation changes needed?

The change outputs a clear error message and provides instructions on how to diagnose the problem.

Tests

Ran WRF-DART tutorial and forced WRF failure to test fix.
See /glade/derecho/scratch/bmraczka/WRFv4.5_maxretry

Checklist for release

  • Merge into main
  • Create release from the main branch with appropriate tag
  • Delete feature-branch

Testing Datasets

  • Dataset needed for testing available upon request
  • Dataset download instructions included
  • No dataset needed

@braczka braczka requested a review from hkershaw-brown August 1, 2025 23:31
@braczka braczka added Emailed Issue Was originally emailed to DAReS team Derecho issues related to running on NCAR's new supercomputer wrf Weather Research & Forecasting Model labels Aug 1, 2025
Copy link
Member

@hkershaw-brown hkershaw-brown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Brett,

This caps the number of resubmissions to fix #923
I've left one suggestion on changing a confusing comment, and one definite change.
The definite change is to have driver.csh exit with an error code if a wrf member fails.

As Moha brought up at standup there are several ways to modernize/improve the workflow, but these are out of scope for this fix.

Cheers,
Helen

braczka and others added 2 commits August 5, 2025 11:51
Co-authored-by: Helen Kershaw <[email protected]>
Co-authored-by: Helen Kershaw <[email protected]>
Copy link
Member

@hkershaw-brown hkershaw-brown left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved!

@hkershaw-brown hkershaw-brown added the release! bundle with next release label Aug 5, 2025
@hkershaw-brown hkershaw-brown added release+1 bundle with release after next release! bundle with next release and removed release! bundle with next release release+1 bundle with release after next labels Aug 13, 2025
@hkershaw-brown hkershaw-brown merged commit b5f24ee into main Aug 19, 2025
4 checks passed
@hkershaw-brown hkershaw-brown deleted the WRF_retries branch August 19, 2025 14:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Derecho issues related to running on NCAR's new supercomputer Emailed Issue Was originally emailed to DAReS team release! bundle with next release wrf Weather Research & Forecasting Model

Projects

None yet

Development

Successfully merging this pull request may close these issues.

No cap on WRF-DART resbumit tries

3 participants