-
Notifications
You must be signed in to change notification settings - Fork 12
Add required backend arg to make_storage methods #40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
launch jenkins |
|
Tests passed apart from 2 failures on 54 ranks that appear to be due to distributed compilation (the "{name} has no attribute {name}" error where {name} is an auto-generated stencil module). |
| # Only used if reading the grid from savepoints on disk | ||
| # Default is to generate a baroclinic initialization | ||
| def read_grid(serializer: serialbox.Serializer, rank: int = 0) -> Grid: | ||
| """Uses the serializer to generate a Grid object from serialized data""" | ||
| grid_savepoint = serializer.get_savepoint("Grid-Info")[0] | ||
| grid_data = {} | ||
| grid_fields = serializer.fields_at_savepoint(grid_savepoint) | ||
| for field in grid_fields: | ||
| grid_data[field] = serializer.read(field, grid_savepoint) | ||
| if len(grid_data[field].flatten()) == 1: | ||
| grid_data[field] = grid_data[field][0] | ||
| return fv3core.testing.TranslateGrid(grid_data, rank).python_grid() | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oliver and I discussed this in the pr having fv3core using metric terms and opted to leave it for now until we are sure the new grid methodology is stable for use here. But it is indeed currently unused
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you say after these tests running a few days that it's stable enough to remove this code from the file, and use the git history to recover it if we need it? If not I can put it back, in which case when should we re-assess this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's remove it yeah.
| # set backend | ||
| fv3core.set_backend("numpy") | ||
| backend = "numpy" | ||
| fv3core.set_backend(backend) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so we are leaving the 'global_config' global state? does this still need to be set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's going to be a bigger task to fully remove the global config. After this PR I'll go back to my PR running a timestep of the dynamics with no global state access, and once that test is done I can copy the pattern used to these runfiles. After that, we'll still need to refactor the unit tests to pass Grid explicitly to the tests instead of moving through global state. After all that, we should be able to remove this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this specific case, we use spec.grid below, and Grid still makes use of global_config to get the backend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
| def make_storage(): | ||
| return utils.make_storage_from_shape( | ||
| self._idx.max_shape, backend=self._stencil_config.backend | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yay!
elynnwu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow! That's a lot of make_storage. Thanks for going through this!
| @@ -1,3 +1,5 @@ | |||
| .dace.conf | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I've been meaning to add this...
|
|
||
| @property | ||
| def lat(self): | ||
| def lat(self) -> fv3util.Quantity: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
| def make_storage(): | ||
| return utils.make_storage_from_shape( | ||
| self._idx.max_shape, backend=self._stencil_config.backend | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that it looks much cleaner having make_stroage(), but it just seems awkward that we need to duplicate this function in every module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative design on this would be to have a make_storage method on the StencilConfig and StencilFactory classes, which makes use of its backend. @rheacangeo and I settled on this design so that you can still make storages without the larger configuration needs of StencilConfig, but it's possible we could enable both. Then here we could use stencil_factory.make_storage(self._idx.max_shape).
Perhaps something to discuss in a team meeting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. I'll put this on the team meeting next week.
| self._tmp_ke = make_storage() | ||
| self._tmp_vort = make_storage() | ||
| self._tmp_fx = make_storage() | ||
| self._tmp_fx1 = make_storage() | ||
| self._tmp_fx2 = make_storage() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer the long line since it also calls make_storage_from_shape above. It will be clear to see what's different between the storages. It's probably not too bad if you make the name just max_shape and backend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious to see what others think...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if there's one standard that's best for every module - it depends a lot on how many allocations there are, how many of them use exactly the same arguments (as these do), how many use different args and which ones are different, whether origin is being set, whether backend already exists as its own variable in the init, how long the variable names being set are, etc. etc. . In some modules, I did go the explicit argument route, but when it would multiply a large number of allocation lines by 3, I opted for this pattern.
The balance would also be different if e.g. we renamed make_storage_from_shape to make_storage.
|
launch jenkins |
rheacangeo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, but also Elynn brings up good questions!
* Initialize GeosDycoreWrapper with bdt (timestep) * Use GEOS version of constants * 1. Add qcld to the list of tracers beings advected 2. Made GEOS specific changes to thresholds in saturation adjustment * Accumulate diss_est * Allow GEOS_WRAPPER to process device data * Add clear to collector for 3rd party use. GEOS pass down timings to caller * Make kernel analysis run a copy stencil to compute local bandwith Parametrize tool with backend, output format * Move constant on a env var Add saturation adjustement threshold to const * lint * More linting * Remove unused if leading to empty code block * Restrict dace to 0.14.1 due to a parsing bug * Add guard for bdt==0 Fix bad merge for bdt with GEOS_Wrapper * Remove unused code * Fix theroritical timings Lint * Fixed a bug where pkz was being calculated twice, and the second calc was wrong * Downgrade DaCe to 0.14.0 pending array aliasing fix * Set default cache path for orchestrated DaCe to respect GT_CACHE_* env * Remove previous per stencil override of default_build_folder * Revert "Set default cache path for orchestrated DaCe to respect GT_CACHE_* env" This reverts commit 4fc5b4d. * Revert "Remove previous per stencil override of default_build_folder" This reverts commit 2245027. * Read cache_root in default dace backend * Document faulty behavior with GT_CACHE_DIR_NAME * Fix bad requirements syntax * Check for the string value of CONST_VERSION directly instead of enum * Protect constant selection more rigorusly. Clean abort on unknown constant given * Log constants selection * Refactor NQ to constants.py * Replace all logger with pace_log Introduce PACE_LOGLEVEL to control log level from outside * Code guidelines clean up * Devops/GitHub actions on (ai2cm#15) * Linting on PR * Run main unit test * Update python to available 3.8.12 * Remove cd to pace * Lint: git submodule recursive * Typo * Add openmpi to the image * Linting * Fix unit tests (remove dxa, dya rely on halo ex) * typo * Change name of jobs * Distributed compilation on orchestrated backend for NxN layouts (ai2cm#14) * Adapt orchestration distribute compile for NxN layout * Remove debug code * Add a more descriptive string base postfix for cache naming Identify the code path for all cases Consistent reload post-compile Create a central space for all caches generation logic No more original layout check required * Add a test on caches relocatability * Verbose todo * Linting on PR * Run main unit test * Update python to available 3.8.12 * Remove cd to pace * Lint: git submodule recursive * Typo * Add openmpi to the image * Linting * Fix unit tests (remove dxa, dya rely on halo ex) * typo * Change name of jobs * Missing enum * Lint imports * Fix unit tests * Deactivate relocability test due to Python crash Logged as issyue 16 * Typo * Raise for 1,X and X,1 layouts which requires a new descriptor * Added ak, bk for 137 levels in eta.py * Add floating point precision to GEOS bridge init * lint * Add device PCI bus id (for MPS debug) * Typo + lint * Try to detect MPS reading the "log" pipe * Lint * Clean up * Log info GEOS bridge (ai2cm#18) * Add floating point precision to GEOS bridge init * lint * Add device PCI bus id (for MPS debug) * Typo + lint * Try to detect MPS reading the "log" pipe * Lint * Clean up * Update geos/develop to grab NOAA PR9 results (ai2cm#21) * Verbose choice of block/grid size * added build script for c5 * updated repo to NOAA * GEOS integration (ai2cm#9) * Initialize GeosDycoreWrapper with bdt (timestep) * Use GEOS version of constants * 1. Add qcld to the list of tracers beings advected 2. Made GEOS specific changes to thresholds in saturation adjustment * Accumulate diss_est * Allow GEOS_WRAPPER to process device data * Add clear to collector for 3rd party use. GEOS pass down timings to caller * Make kernel analysis run a copy stencil to compute local bandwith Parametrize tool with backend, output format * Move constant on a env var Add saturation adjustement threshold to const * Remove unused if leading to empty code block * Restrict dace to 0.14.1 due to a parsing bug * Add guard for bdt==0 Fix bad merge for bdt with GEOS_Wrapper * Remove unused code * Fix theroritical timings * Fixed a bug where pkz was being calculated twice, and the second calc was wrong * Downgrade DaCe to 0.14.0 pending array aliasing fix * Set default cache path for orchestrated DaCe to respect GT_CACHE_* env * Remove previous per stencil override of default_build_folder * Revert "Set default cache path for orchestrated DaCe to respect GT_CACHE_* env" * Revert "Remove previous per stencil override of default_build_folder" * Read cache_root in default dace backend * Document faulty behavior with GT_CACHE_DIR_NAME * Fix bad requirements syntax * Check for the string value of CONST_VERSION directly instead of enum * Protect constant selection more rigorusly. Clean abort on unknown constant given * Log constants selection * Refactor NQ to constants.py * Fix or explain inlined import * Verbose runtime error when bad dt_atmos * Verbose warm up * re-initialize heat_source and diss_est each call, add do_skeb check to accumulation --------- --------- * [NOAA:Update] Bring back ai2cm#15 & doubly periodic domain (ai2cm#25) * Feature/dp driver (ai2cm#13) * initial commit * adding test config * adding the rest of driver and util code * updating history.md * move u_max to dycore config * uncomment assert * added comment explaining the copy of grid type to dycore config * Turn main unit test & lint on PR, logger clean up [NASA:Update] (ai2cm#15) * Initialize GeosDycoreWrapper with bdt (timestep) * Use GEOS version of constants * 1. Add qcld to the list of tracers beings advected 2. Made GEOS specific changes to thresholds in saturation adjustment * Accumulate diss_est * Allow GEOS_WRAPPER to process device data * Add clear to collector for 3rd party use. GEOS pass down timings to caller * Make kernel analysis run a copy stencil to compute local bandwith Parametrize tool with backend, output format * Move constant on a env var Add saturation adjustement threshold to const * Restrict dace to 0.14.1 due to a parsing bug * Add guard for bdt==0 * Fix theroritical timings * Fixed a bug where pkz was being calculated twice, and the second calc was wrong * Downgrade DaCe to 0.14.0 pending array aliasing fix * Set default cache path for orchestrated DaCe to respect GT_CACHE_* env * Remove previous per stencil override of default_build_folder * Revert "Set default cache path for orchestrated DaCe to respect GT_CACHE_* env" * Read cache_root in default dace backend * Document faulty behavior with GT_CACHE_DIR_NAME * Check for the string value of CONST_VERSION directly instead of enum * Protect constant selection more rigorusly. Clean abort on unknown constant given * Log constants selection * Refactor NQ to constants.py * Introduce PACE_LOGLEVEL to control log level from outside * Code guidelines clean up * Devops/GitHub actions on (ai2cm#15) * Linting on PR * Run main unit test * Update python to available 3.8.12 * Fix unit tests (remove dxa, dya rely on halo ex) * Update HISTORY.md * Adapt log_level in driver.run * Verbose the PACE_CONSTANTS * Doc log level hierarchical nature --------- * Lint --------- * Update gt4py, dace, cleanup (ai2cm#19) * Update gt4py to top of master on June 21 * Update DaCe to 0.14.2 Workaround aliasing issue in FiniteVolumeTransport * Fix to gt4py storage * Downgrade to dace 0.14.1 * DaCe to 0.14.4 Orchestrating NonHydrostaticPressureGradient Adptating code to newer gt4py * Regenerate constraints.txt * Default constants to GFS Fix snapshot for GPU runs Lint on ETA Fix log level * Remove `daint_venv` submodule * Adding dace as a submodule Removing buildenv as a submodule * Update gt4py to latest master * Skip ConstantPropagation during `Simplify` * Remove buidlenv * Update requirements_dev.txt * Add editable util to requirements_dev.txt * lint * scipy for tests is now needed * Pin `DaCe` to pace-fixes-0 merge * Remove logging setup in test_translate * Make cupy import robust to device not being available * Fix to GEOS bridge MPS detection * Up gt4py to August 14th EOD: - Hip/ROCm - New allocators * DaCE module: swap SSH for HTTPS (ai2cm#26) * GEOS GridTools stencils build override (ai2cm#27) * Stencil build override for GEOS * Deactivate warnings if PACE_LOGLEVEL is > WARNING * Better log level * Bad merge (again) * NASA fork sync. (ai2cm#37) (ai2cm#30) * Initialize GeosDycoreWrapper with bdt (timestep) * Use GEOS version of constants * 1. Add qcld to the list of tracers beings advected 2. Made GEOS specific changes to thresholds in saturation adjustment * Accumulate diss_est * Allow GEOS_WRAPPER to process device data * Add clear to collector for 3rd party use. GEOS pass down timings to caller * Make kernel analysis run a copy stencil to compute local bandwith Parametrize tool with backend, output format * Move constant on a env var Add saturation adjustement threshold to const * lint * More linting * Remove unused if leading to empty code block * Restrict dace to 0.14.1 due to a parsing bug * Add guard for bdt==0 Fix bad merge for bdt with GEOS_Wrapper * Remove unused code * Fix theroritical timings Lint * Fixed a bug where pkz was being calculated twice, and the second calc was wrong * Downgrade DaCe to 0.14.0 pending array aliasing fix * Set default cache path for orchestrated DaCe to respect GT_CACHE_* env * Remove previous per stencil override of default_build_folder * Revert "Set default cache path for orchestrated DaCe to respect GT_CACHE_* env" This reverts commit 4fc5b4d. * Revert "Remove previous per stencil override of default_build_folder" This reverts commit 2245027. * Read cache_root in default dace backend * Document faulty behavior with GT_CACHE_DIR_NAME * Fix bad requirements syntax * Check for the string value of CONST_VERSION directly instead of enum * Protect constant selection more rigorusly. Clean abort on unknown constant given * Log constants selection * Refactor NQ to constants.py * Replace all logger with pace_log Introduce PACE_LOGLEVEL to control log level from outside * Code guidelines clean up * Devops/GitHub actions on (ai2cm#15) * Linting on PR * Run main unit test * Update python to available 3.8.12 * Remove cd to pace * Lint: git submodule recursive * Typo * Add openmpi to the image * Linting * Fix unit tests (remove dxa, dya rely on halo ex) * typo * Change name of jobs * Distributed compilation on orchestrated backend for NxN layouts (ai2cm#14) * Adapt orchestration distribute compile for NxN layout * Remove debug code * Add a more descriptive string base postfix for cache naming Identify the code path for all cases Consistent reload post-compile Create a central space for all caches generation logic No more original layout check required * Add a test on caches relocatability * Verbose todo * Linting on PR * Run main unit test * Update python to available 3.8.12 * Remove cd to pace * Lint: git submodule recursive * Typo * Add openmpi to the image * Linting * Fix unit tests (remove dxa, dya rely on halo ex) * typo * Change name of jobs * Missing enum * Lint imports * Fix unit tests * Deactivate relocability test due to Python crash Logged as issyue 16 * Typo * Raise for 1,X and X,1 layouts which requires a new descriptor * Added ak, bk for 137 levels in eta.py * Add floating point precision to GEOS bridge init * lint * Add device PCI bus id (for MPS debug) * Typo + lint * Try to detect MPS reading the "log" pipe * Lint * Clean up * Log info GEOS bridge (ai2cm#18) * Add floating point precision to GEOS bridge init * lint * Add device PCI bus id (for MPS debug) * Typo + lint * Try to detect MPS reading the "log" pipe * Lint * Clean up * Update geos/develop to grab NOAA PR9 results (ai2cm#21) * Verbose choice of block/grid size * added build script for c5 * updated repo to NOAA * GEOS integration (ai2cm#9) * Initialize GeosDycoreWrapper with bdt (timestep) * Use GEOS version of constants * 1. Add qcld to the list of tracers beings advected 2. Made GEOS specific changes to thresholds in saturation adjustment * Accumulate diss_est * Allow GEOS_WRAPPER to process device data * Add clear to collector for 3rd party use. GEOS pass down timings to caller * Make kernel analysis run a copy stencil to compute local bandwith Parametrize tool with backend, output format * Move constant on a env var Add saturation adjustement threshold to const * Remove unused if leading to empty code block * Restrict dace to 0.14.1 due to a parsing bug * Add guard for bdt==0 Fix bad merge for bdt with GEOS_Wrapper * Remove unused code * Fix theroritical timings * Fixed a bug where pkz was being calculated twice, and the second calc was wrong * Downgrade DaCe to 0.14.0 pending array aliasing fix * Set default cache path for orchestrated DaCe to respect GT_CACHE_* env * Remove previous per stencil override of default_build_folder * Revert "Set default cache path for orchestrated DaCe to respect GT_CACHE_* env" * Revert "Remove previous per stencil override of default_build_folder" * Read cache_root in default dace backend * Document faulty behavior with GT_CACHE_DIR_NAME * Fix bad requirements syntax * Check for the string value of CONST_VERSION directly instead of enum * Protect constant selection more rigorusly. Clean abort on unknown constant given * Log constants selection * Refactor NQ to constants.py * Fix or explain inlined import * Verbose runtime error when bad dt_atmos * Verbose warm up * re-initialize heat_source and diss_est each call, add do_skeb check to accumulation --------- --------- * [NOAA:Update] Bring back ai2cm#15 & doubly periodic domain (ai2cm#25) * Feature/dp driver (ai2cm#13) * initial commit * adding test config * adding the rest of driver and util code * updating history.md * move u_max to dycore config * uncomment assert * added comment explaining the copy of grid type to dycore config * Turn main unit test & lint on PR, logger clean up [NASA:Update] (ai2cm#15) * Initialize GeosDycoreWrapper with bdt (timestep) * Use GEOS version of constants * 1. Add qcld to the list of tracers beings advected 2. Made GEOS specific changes to thresholds in saturation adjustment * Accumulate diss_est * Allow GEOS_WRAPPER to process device data * Add clear to collector for 3rd party use. GEOS pass down timings to caller * Make kernel analysis run a copy stencil to compute local bandwith Parametrize tool with backend, output format * Move constant on a env var Add saturation adjustement threshold to const * Restrict dace to 0.14.1 due to a parsing bug * Add guard for bdt==0 * Fix theroritical timings * Fixed a bug where pkz was being calculated twice, and the second calc was wrong * Downgrade DaCe to 0.14.0 pending array aliasing fix * Set default cache path for orchestrated DaCe to respect GT_CACHE_* env * Remove previous per stencil override of default_build_folder * Revert "Set default cache path for orchestrated DaCe to respect GT_CACHE_* env" * Read cache_root in default dace backend * Document faulty behavior with GT_CACHE_DIR_NAME * Check for the string value of CONST_VERSION directly instead of enum * Protect constant selection more rigorusly. Clean abort on unknown constant given * Log constants selection * Refactor NQ to constants.py * Introduce PACE_LOGLEVEL to control log level from outside * Code guidelines clean up * Devops/GitHub actions on (ai2cm#15) * Linting on PR * Run main unit test * Update python to available 3.8.12 * Fix unit tests (remove dxa, dya rely on halo ex) * Update HISTORY.md * Adapt log_level in driver.run * Verbose the PACE_CONSTANTS * Doc log level hierarchical nature --------- * Lint --------- * Update gt4py, dace, cleanup (ai2cm#19) * Update gt4py to top of master on June 21 * Update DaCe to 0.14.2 Workaround aliasing issue in FiniteVolumeTransport * Fix to gt4py storage * Downgrade to dace 0.14.1 * DaCe to 0.14.4 Orchestrating NonHydrostaticPressureGradient Adptating code to newer gt4py * Regenerate constraints.txt * Default constants to GFS Fix snapshot for GPU runs Lint on ETA Fix log level * Remove `daint_venv` submodule * Adding dace as a submodule Removing buildenv as a submodule * Update gt4py to latest master * Skip ConstantPropagation during `Simplify` * Remove buidlenv * Update requirements_dev.txt * Add editable util to requirements_dev.txt * lint * scipy for tests is now needed * Pin `DaCe` to pace-fixes-0 merge * Remove logging setup in test_translate * Make cupy import robust to device not being available * Fix to GEOS bridge MPS detection * Up gt4py to August 14th EOD: - Hip/ROCm - New allocators * DaCE module: swap SSH for HTTPS (ai2cm#26) * GEOS GridTools stencils build override (ai2cm#27) * Stencil build override for GEOS * Deactivate warnings if PACE_LOGLEVEL is > WARNING * Better log level * Bad merge (again) --------- --------- Co-authored-by: Purnendu Chakraborty <[email protected]> Co-authored-by: Purnendu Chakraborty <[email protected]> Co-authored-by: Rusty Benson <[email protected]> Co-authored-by: Oliver Elbert <[email protected]> Co-authored-by: Oliver Elbert <[email protected]>
…) (ai2cm#48) * [Feature] Guarding against unimplemented configuration (ai2cm#40) Guarding against unimplemented namelists options: - a2b_ord4 - d_sw - fv_dynamics - fv_subgridz - neg_adj3 - divergence damping - xppm - yppm Misc: - Fix `netcdf_monitor` not mkdir the directory - Add `as_dict` to the dycore state to dump the dycore more easily * Unused assert * Update fv3core/pace/fv3core/stencils/yppm.py Co-authored-by: Oliver Elbert <[email protected]> * Update fv3core/pace/fv3core/stencils/xppm.py Co-authored-by: Oliver Elbert <[email protected]> * Change NotImplemented to ValueError for n_sponge<3 * lint --------- Co-authored-by: Oliver Elbert <[email protected]>
Currently, make_storage helper methods use the globally-set backend. This PR updates these methods to explicitly take in the backend used for the storage.
Changes:
dyncore_temporaries,TranslateGrid,storage_dict,make_grid,process_grid_savepointand functions that call it in conftest.py,make_storage_from_shape,make_storage_from_shape_uncached,make_storage_data,get_quantity_halo_spec,get_halo_update_spec, andmake_storage_dict