Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@TylerWixtrom-NOAA
Copy link
Contributor

Adds and option in xarray engine to parse metadata following CF conventions. This is still a work in progress and needs unit tests and testing against multiple datasets.

@TylerWixtrom-NOAA TylerWixtrom-NOAA changed the base branch from master to 200-duplicate-elements-with-open_datatree September 2, 2025 14:22
@TylerWixtrom-NOAA TylerWixtrom-NOAA marked this pull request as ready for review September 2, 2025 14:47
@TylerWixtrom-NOAA
Copy link
Contributor Author

@eengl @AdamSchnapp This is now ready for review.

@TylerWixtrom-NOAA TylerWixtrom-NOAA changed the title Add CF Metadata Add Additional Metadata and Data Model Conversions Sep 2, 2025
in xarray dataset. Add optional data_model argument
to translate coordinates and attributes to
defined data model specification.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will likely move this file into tables/

# Data extracted from the markdown tables
# taken from CF conventions standard names table
# https://cfconventions.org/Data/cf-standard-names/current/build/cf-standard-name-table.html
data = [
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will likely reformat this into a dictionary where the key is the shortName and value is a dict with keys:

  • cf_standard_name
  • cf_cell_method

So for example,

data = {
    'ABSV': {'cf_standard_name': 'atmosphere_absolute_vorticity', 'cf_cell_method': None},
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm...wait...I see why you formatted as such...to get into a DataFrame.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is correct. I don't think we need to run this through pandas. I think a dict lookup would suffice if that is preferred.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If anything, it would be consistent with the rest of the tables. Do you foresee the value expanding beyond just cf_standard_name and cf_cell_method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I do not see that expanding. cf_standard_name is the only required value, while cf_cell_method becomes relevant for certain elements (e.g. max temperature). I do not believe any additional fields would be relevant.

@EricEngle-NOAA EricEngle-NOAA merged commit f9af967 into NOAA-MDL:200-duplicate-elements-with-open_datatree Sep 22, 2025
12 checks passed
EricEngle-NOAA added a commit that referenced this pull request Oct 8, 2025
* Fix for duplicate variables in xarray DataTree

Added logic to make sure that GRIB2 metadata be of type int, not
Grib2Message when adding these as separate columns to pandas df
for datatree processing

Added entries for 195, 196, 197 in level table for datatree. These
are custom levels used by AWC (and thus NBM).

[skip ci]

* Significant Update for xarray_backend.py

This commit allows for the backend to provide a common dimension name
for GRIB2 messages where more than 1 attribute would provide the dimensionality.

For example, probability messages (pdtn 5 or 9), these messages have the following
attrs used as dimensions: thresholdLowerLimit, thresholdUpperLimit, but these
are valid for 1 message, basically we are adding an extra dimension.

So here we introduce a "threshold" dimension (and coordinate), and thresholdLowerLimit
and thresholdUpperLimit serve as coordinates, dimensioned by "threshold".

Also, this commit adds the ability to add another node for Xarray DataTrees by the
typeOfProbability attribute, providing node name "prob_<typeOfProbability>".

* Update xarray_backend.py

Fixed bug from previous commit where for mfdatasets the dimension ordering
was wrong when creating the DataArray.

The fix is reversing ordered_meta in make_variables().

* Update xarray_backend.py

Cleaned up *Dim class naming for DimCube.

* Update xarray_backend.py

For *Dim classes, using typing.Tuple[] instead of tuple[].  This maintains
compatibility with Python 3.8.

* Update for xarray_backend.py

This commit comments out the call to create_dataset_from_df() and instead
calls try_process_by_variables().

This leads to variables for a level/pdtn tree gettings their own var_<shortName>
DataTree group.

The next step here is to collect DataArrays where they have the same level value.
For example, collect temps where 2m height above ground and winds where 10m
height above ground.

With these changes, all GRIB2 messages are resolved when reading a NBM Core F001
CONUS GRIB2 file.

The multitude of diagnostic print statements are still present.

* additions to pr:200 duplicate elements with open datatree (#203)

* this appears to be a working checkpoint

* revisions and cleanup for adding extra coordinates for dimensions that are not indexes; level, threshold

---------

Co-authored-by: Adam.Schnapp <[email protected]>

* Update templates.py

Added check in __get__ for valueOfFirstFixedSurface and valueOfSecondFixedSurface
to check for scale_factor < 0, to return 0.0

[skip ci]

* 200 duplicate elements with open datatree (#204)

* this appears to be a working checkpoint

* revisions and cleanup for adding extra coordinates for dimensions that are not indexes; level, threshold

* Update templates.py

Added check in __get__ for valueOfFirstFixedSurface and valueOfSecondFixedSurface
to check for scale_factor < 0, to return 0.0

[skip ci]

* adjust a test for adjusted dimension/coordinate behavior and fix bug parsing grib index

---------

Co-authored-by: Adam.Schnapp <[email protected]>
Co-authored-by: Eric Engle <[email protected]>

* Add Additional Metadata and Data Model Conversions (#201)

* Add support for additional coordinate and metadata information
in xarray dataset. Add optional data_model argument
to translate coordinates and attributes to
defined data model specification.

* integrate aspects of the cf_metadata into the default data model

* Move ptype threshold decoding to data model function

* Fix string type conversion in metadata parsing

* Add comment with reference to CF conventions standard names table

* Fix error in ptype parsing and remove forced conversion to string for metadata values

* Add else to rename all coordinate names

* Change leadtime to lead_time per
[email protected]

---------

Co-authored-by: Adam.Schnapp <[email protected]>

* Updating grib2io tables

Added reworked CF tables

NCEP GRIB2 tables updated to v35

[skip ci]

* Updates to xarray backend.

Moved table in vertical_coordinate_surfaces.py to inside
xarray_backend.py.

Code cleanup.

[skip ci]

* Update for tables and table generation scripts.

[skip ci]

* Update GRIB2 tables

Updated make_grib2_tables/get-ncep-grib2-originating-centers.py to
perform more explicit table scraping and encoding due to pandas
read_html not properly encoding characters with accent marks.

[skip ci]

* Update for xarray_backend.py

Changes here for the Xarray DataTree.  As of this commit, all GRIB2
messages for a Core NBM GRIB2 files are accounted for.

[skip ci]

* Update xarray_backend.py

Clean up of DataTree names.

[skip ci]

* Update tests due to NCEP GRIB2 table updates.

[skip ci]

* Update tests/test_xarray_datatree_backend.py

Xarray DataTree .ds is not None ever, but will return a DatasetView
of an empty Dataset when there is none.

* Clean up xarray_backend.py

Remove test prints.

[skip ci]

* More datatree updates

[skip ci]

* Update xarray_backend.py

[skip ci]

* Update tests for xarray DataTree backend

Added file blend.t00z.core.f001.co_4x_reduce.grib2 which is a NBM
Core forecast GRIB2 file for CONUS where the 2.5km grid has been
rediced 4x to roughly 10km to make the filesize smaller, but all GRIB2
metadata are preserved.

Added test to resolve all messages in the file.

* Update DataTree tests

Commented out some check due to changes to the DataTree structure.

[skip ci]

* Update tests.

---------

Co-authored-by: Eric Engle <[email protected]>
Co-authored-by: Adam Schnapp <[email protected]>
Co-authored-by: Adam.Schnapp <[email protected]>
Co-authored-by: TylerWixtrom-NOAA <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants