Add Additional Metadata and Data Model Conversions #201

TylerWixtrom-NOAA · 2025-08-20T16:47:53Z

Adds and option in xarray engine to parse metadata following CF conventions. This is still a work in progress and needs unit tests and testing against multiple datasets.

TylerWixtrom-NOAA · 2025-09-02T14:48:10Z

@eengl @AdamSchnapp This is now ready for review.

in xarray dataset. Add optional data_model argument to translate coordinates and attributes to defined data model specification.

src/grib2io/cf_standard_names.py

… metadata values

EricEngle-NOAA · 2025-09-08T18:21:27Z

src/grib2io/cf_standard_names.py

I will likely move this file into tables/

EricEngle-NOAA · 2025-09-08T18:24:47Z

src/grib2io/cf_standard_names.py

+# Data extracted from the markdown tables
+# taken from CF conventions standard names table
+# https://cfconventions.org/Data/cf-standard-names/current/build/cf-standard-name-table.html
+data = [


I will likely reformat this into a dictionary where the key is the shortName and value is a dict with keys:

cf_standard_name

cf_cell_method

So for example,

data = { 'ABSV': {'cf_standard_name': 'atmosphere_absolute_vorticity', 'cf_cell_method': None}, }

Hmmm...wait...I see why you formatted as such...to get into a DataFrame.

That is correct. I don't think we need to run this through pandas. I think a dict lookup would suffice if that is preferred.

If anything, it would be consistent with the rest of the tables. Do you foresee the value expanding beyond just cf_standard_name and cf_cell_method?

No, I do not see that expanding. cf_standard_name is the only required value, while cf_cell_method becomes relevant for certain elements (e.g. max temperature). I do not believe any additional fields would be relevant.

[email protected]

* Fix for duplicate variables in xarray DataTree Added logic to make sure that GRIB2 metadata be of type int, not Grib2Message when adding these as separate columns to pandas df for datatree processing Added entries for 195, 196, 197 in level table for datatree. These are custom levels used by AWC (and thus NBM). [skip ci] * Significant Update for xarray_backend.py This commit allows for the backend to provide a common dimension name for GRIB2 messages where more than 1 attribute would provide the dimensionality. For example, probability messages (pdtn 5 or 9), these messages have the following attrs used as dimensions: thresholdLowerLimit, thresholdUpperLimit, but these are valid for 1 message, basically we are adding an extra dimension. So here we introduce a "threshold" dimension (and coordinate), and thresholdLowerLimit and thresholdUpperLimit serve as coordinates, dimensioned by "threshold". Also, this commit adds the ability to add another node for Xarray DataTrees by the typeOfProbability attribute, providing node name "prob_<typeOfProbability>". * Update xarray_backend.py Fixed bug from previous commit where for mfdatasets the dimension ordering was wrong when creating the DataArray. The fix is reversing ordered_meta in make_variables(). * Update xarray_backend.py Cleaned up *Dim class naming for DimCube. * Update xarray_backend.py For *Dim classes, using typing.Tuple[] instead of tuple[]. This maintains compatibility with Python 3.8. * Update for xarray_backend.py This commit comments out the call to create_dataset_from_df() and instead calls try_process_by_variables(). This leads to variables for a level/pdtn tree gettings their own var_<shortName> DataTree group. The next step here is to collect DataArrays where they have the same level value. For example, collect temps where 2m height above ground and winds where 10m height above ground. With these changes, all GRIB2 messages are resolved when reading a NBM Core F001 CONUS GRIB2 file. The multitude of diagnostic print statements are still present. * additions to pr:200 duplicate elements with open datatree (#203) * this appears to be a working checkpoint * revisions and cleanup for adding extra coordinates for dimensions that are not indexes; level, threshold --------- Co-authored-by: Adam.Schnapp <[email protected]> * Update templates.py Added check in __get__ for valueOfFirstFixedSurface and valueOfSecondFixedSurface to check for scale_factor < 0, to return 0.0 [skip ci] * 200 duplicate elements with open datatree (#204) * this appears to be a working checkpoint * revisions and cleanup for adding extra coordinates for dimensions that are not indexes; level, threshold * Update templates.py Added check in __get__ for valueOfFirstFixedSurface and valueOfSecondFixedSurface to check for scale_factor < 0, to return 0.0 [skip ci] * adjust a test for adjusted dimension/coordinate behavior and fix bug parsing grib index --------- Co-authored-by: Adam.Schnapp <[email protected]> Co-authored-by: Eric Engle <[email protected]> * Add Additional Metadata and Data Model Conversions (#201) * Add support for additional coordinate and metadata information in xarray dataset. Add optional data_model argument to translate coordinates and attributes to defined data model specification. * integrate aspects of the cf_metadata into the default data model * Move ptype threshold decoding to data model function * Fix string type conversion in metadata parsing * Add comment with reference to CF conventions standard names table * Fix error in ptype parsing and remove forced conversion to string for metadata values * Add else to rename all coordinate names * Change leadtime to lead_time per [email protected] --------- Co-authored-by: Adam.Schnapp <[email protected]> * Updating grib2io tables Added reworked CF tables NCEP GRIB2 tables updated to v35 [skip ci] * Updates to xarray backend. Moved table in vertical_coordinate_surfaces.py to inside xarray_backend.py. Code cleanup. [skip ci] * Update for tables and table generation scripts. [skip ci] * Update GRIB2 tables Updated make_grib2_tables/get-ncep-grib2-originating-centers.py to perform more explicit table scraping and encoding due to pandas read_html not properly encoding characters with accent marks. [skip ci] * Update for xarray_backend.py Changes here for the Xarray DataTree. As of this commit, all GRIB2 messages for a Core NBM GRIB2 files are accounted for. [skip ci] * Update xarray_backend.py Clean up of DataTree names. [skip ci] * Update tests due to NCEP GRIB2 table updates. [skip ci] * Update tests/test_xarray_datatree_backend.py Xarray DataTree .ds is not None ever, but will return a DatasetView of an empty Dataset when there is none. * Clean up xarray_backend.py Remove test prints. [skip ci] * More datatree updates [skip ci] * Update xarray_backend.py [skip ci] * Update tests for xarray DataTree backend Added file blend.t00z.core.f001.co_4x_reduce.grib2 which is a NBM Core forecast GRIB2 file for CONUS where the 2.5km grid has been rediced 4x to roughly 10km to make the filesize smaller, but all GRIB2 metadata are preserved. Added test to resolve all messages in the file. * Update DataTree tests Commented out some check due to changes to the DataTree structure. [skip ci] * Update tests. --------- Co-authored-by: Eric Engle <[email protected]> Co-authored-by: Adam Schnapp <[email protected]> Co-authored-by: Adam.Schnapp <[email protected]> Co-authored-by: TylerWixtrom-NOAA <[email protected]>

TylerWixtrom-NOAA force-pushed the cf_metadata branch from ea094be to 6fc88b6 Compare September 2, 2025 14:21

TylerWixtrom-NOAA changed the base branch from master to 200-duplicate-elements-with-open_datatree September 2, 2025 14:22

TylerWixtrom-NOAA force-pushed the cf_metadata branch from 6fc88b6 to 5623ebc Compare September 2, 2025 14:26

TylerWixtrom-NOAA marked this pull request as ready for review September 2, 2025 14:47

TylerWixtrom-NOAA changed the title ~~Add CF Metadata~~ Add Additional Metadata and Data Model Conversions Sep 2, 2025

Add support for additional coordinate and metadata information

2d0ad6f

in xarray dataset. Add optional data_model argument to translate coordinates and attributes to defined data model specification.

TylerWixtrom-NOAA force-pushed the cf_metadata branch from 5623ebc to 2d0ad6f Compare September 2, 2025 16:30

Adam.Schnapp and others added 2 commits September 3, 2025 14:13

integrate aspects of the cf_metadata into the default data model

54a5e47

Move ptype threshold decoding to data model function

79e25d4

EricEngle-NOAA reviewed Sep 4, 2025

View reviewed changes

src/grib2io/cf_standard_names.py Show resolved Hide resolved

Fix string type conversion in metadata parsing

283cd5c

TylerWixtrom-NOAA force-pushed the cf_metadata branch from fd94f59 to 283cd5c Compare September 4, 2025 12:02

TylerWixtrom-NOAA added 2 commits September 4, 2025 12:03

Add comment with reference to CF conventions standard names table

85ad0cf

Fix error in ptype parsing and remove forced conversion to string for…

f92df4d

… metadata values

EricEngle-NOAA reviewed Sep 8, 2025

View reviewed changes

src/grib2io/cf_standard_names.py

Copy link

Collaborator

EricEngle-NOAA Sep 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will likely move this file into tables/

EricEngle-NOAA reviewed Sep 8, 2025

View reviewed changes

TylerWixtrom-NOAA added 2 commits September 17, 2025 14:51

Add else to rename all coordinate names

cd651ce

Change leadtime to lead_time per

458a548

[email protected]

TylerWixtrom-NOAA force-pushed the cf_metadata branch from 3de2508 to 458a548 Compare September 18, 2025 19:24

EricEngle-NOAA merged commit f9af967 into NOAA-MDL:200-duplicate-elements-with-open_datatree Sep 22, 2025
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Additional Metadata and Data Model Conversions #201

Add Additional Metadata and Data Model Conversions #201

Uh oh!

TylerWixtrom-NOAA commented Aug 20, 2025

Uh oh!

TylerWixtrom-NOAA commented Sep 2, 2025

Uh oh!

Uh oh!

EricEngle-NOAA Sep 8, 2025

Uh oh!

EricEngle-NOAA Sep 8, 2025

Uh oh!

EricEngle-NOAA Sep 8, 2025

Uh oh!

TylerWixtrom-NOAA Sep 15, 2025

Uh oh!

EricEngle-NOAA Sep 15, 2025

Uh oh!

TylerWixtrom-NOAA Sep 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Add Additional Metadata and Data Model Conversions #201

Add Additional Metadata and Data Model Conversions #201

Uh oh!

Conversation

TylerWixtrom-NOAA commented Aug 20, 2025

Uh oh!

TylerWixtrom-NOAA commented Sep 2, 2025

Uh oh!

Uh oh!

EricEngle-NOAA Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

EricEngle-NOAA Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

EricEngle-NOAA Sep 8, 2025

Choose a reason for hiding this comment

Uh oh!

TylerWixtrom-NOAA Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

EricEngle-NOAA Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

TylerWixtrom-NOAA Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants