-
Notifications
You must be signed in to change notification settings - Fork 7
a2b_ord4 with less stencils but still validating and not slower #493
base: master
Are you sure you want to change the base?
Conversation
This reverts commit 17f45f9.
|
launch jenkins |
|
launch jenkins |
| qin[1, 1, 0], | ||
| ) | ||
| qout = (ec1 + ec2 + ec3) * (1.0 / 3.0) | ||
| tmp = 0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it still an issue to have regions first?
If we need this, should we put a TODO comment here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is still an issue -- a horizontalIf error raises . I added a TODO
fv3core/stencils/a2b_ord4.py
Outdated
| from __externals__ import i_end, i_start | ||
|
|
||
| with computation(PARALLEL), interval(...): | ||
| # ppm_volume_mean_x |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we make this a docstring?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I now use a gtscript function instead
… of memory, but is slower, 5 seconds for some reason
|
blarg is 10x slower?! |
Purpose
a2b_ord4 was made into a class, but the stencils were only merged a bit towards the single stencil pr version. This takes another swing at it to try to improve speed and gpu utilization. It still seems to need to be split in multiple stencils to validate on the gt backends, but removed the blocking computation dependence, getting the different regions to not rely on each other by recomputing components rather than using a previously computed offset value. Removed qxx and qyy temporaries. corner computations are done in a larger stencil rather than one at a time. Putting 2 together worked and no longer ran into the nan issue that had been happening before using gtscript functions in regions. Putting all 4 together results in an error with the gtcuda backend:
excessive recursion at instantiation of class "gridtools::meta::lazy::lfold<gri dtools::meta::dedup_step_implCode changes:
Checklist
Before submitting this PR, please make sure: