-
Notifications
You must be signed in to change notification settings - Fork 121
fixes an issue with macro directives for !$acc kernels
#926
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR replaces the custom GPU_PARALLEL() macro with raw OpenACC kernels directives as a stopgap for NVHPC compatibility.
- Substituted
#:call GPU_PARALLEL()/#:endcall GPU_PARALLELwith!$acc kernels/!$acc end kernels - Applied the change in both the time-stepping and data-output modules
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| src/simulation/m_time_steppers.fpp | Replaced GPU_PARALLEL macro around minval(max_dt) with kernels |
| src/simulation/m_data_output.fpp | Replaced GPU_PARALLEL macro around maxval/minval calls |
Comments suppressed due to low confidence (3)
src/simulation/m_time_steppers.fpp:996
- Wrapping a scalar
minvalcall in akernelsregion may incur unnecessary kernel launch overhead and may not generate a reduction on the device. Consider using!$acc parallel loop reduction(min:dt_local)around the explicit loop overmax_dtwith a collapse if multiple dimensions are involved.
!$acc kernels
src/simulation/m_data_output.fpp:319
- Enclosing
maxval(icfl_sf)in akernelsregion may not produce an efficient reduction; consider converting this to a!$acc parallel loop reduction(max:icfl_max_loc)over the underlying array to leverage device-side reductions.
!$acc kernels
src/simulation/m_data_output.fpp:323
- This
kernelsregion wraps two scalar reductions (vcfl_max_loc,Rc_min_loc); you may get better performance by using a combined!$acc parallel loop reduction(max:vcfl_max_loc) reduction(min:Rc_min_loc)over the loop indices instead ofkernels.
!$acc kernels
PR Code Suggestions ✨Explore these optional code suggestions:
|
|||||||||
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #926 +/- ##
==========================================
+ Coverage 43.68% 43.71% +0.02%
==========================================
Files 68 68
Lines 18363 18360 -3
Branches 2295 2292 -3
==========================================
+ Hits 8022 8026 +4
+ Misses 8949 8945 -4
+ Partials 1392 1389 -3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
User description
The #883 created an issue that substituted
!$acc parallelfor!$acc kernelsin all of 3 whole places (!!). It turns out this does not work on NVHPC. This is a stopgap fix. @prathi-wind will fix it up more properly.PR Type
Bug fix
Description
Replace GPU_PARALLEL macro with
!$acc kernelsdirectivesFix NVHPC compiler compatibility issues
Update OpenACC directives in data output and time stepping modules
Changes diagram
Changes walkthrough 📝
m_data_output.fpp
Update OpenACC directives in data output modulesrc/simulation/m_data_output.fpp
#:call GPU_PARALLEL()with!$acc kernelsforicfl_max_loccalculation
!$acc kernelsdirectives forvcfl_max_locandRc_min_locm_time_steppers.fpp
Fix OpenACC directives in time stepping modulesrc/simulation/m_time_steppers.fpp
#:call GPU_PARALLEL()with!$acc kernelsfordt_localcalculation