-
Couldn't load subscription status.
- Fork 164
Apply cap on WRF-DART resubmits #941
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Brett,
This caps the number of resubmissions to fix #923
I've left one suggestion on changing a confusing comment, and one definite change.
The definite change is to have driver.csh exit with an error code if a wrf member fails.
As Moha brought up at standup there are several ways to modernize/improve the workflow, but these are out of scope for this fix.
Cheers,
Helen
Co-authored-by: Helen Kershaw <[email protected]>
Co-authored-by: Helen Kershaw <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approved!
Description:
Applies 2 resubmit cap for WRF ensemble members submissions during assim_advance step within driver.csh.
This prevents rare cases of 'zombie' jobs that need to be killed based on PID.
Fixes issue
Fixes #923
Types of changes
Documentation changes needed?
The change outputs a clear error message and provides instructions on how to diagnose the problem.
Tests
Ran WRF-DART tutorial and forced WRF failure to test fix.
See /glade/derecho/scratch/bmraczka/WRFv4.5_maxretry
Checklist for release
Testing Datasets