Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Issue #1536 reduce values calls splitting #1537

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
May 27, 2025

Conversation

JoerivanEngelen
Copy link
Contributor

@JoerivanEngelen JoerivanEngelen commented May 23, 2025

Fixes #1536

Description

  • Remove unnecessary calls to .values
  • Fix bug in _skip_if_datarray where the spatial dimension of unstructured grids was hardcoded
  • Refactor exchange creation logic to use xr.Dataset().to_dataframe(), this to carefully merge variables into a dataset with matching dimensions, then convert to pandas dataframe.

I can't verify yet if this improves things with Teun's example as I do not have his scripts, but this fixes a bug and will improve performance somewhat when using dask (as it reduces unnecessary loads into memory)

Checklist

  • Links to correct issue
  • Update changelog, if changes affect users
  • PR title starts with Issue #nr, e.g. Issue #737
  • Unit tests were added
  • If feature added: Added/extended example

@JoerivanEngelen JoerivanEngelen requested a review from Manangka May 23, 2025 14:55
with raise_if_dask_computes():
assert _skip_dataarray(grid) is False
assert _skip_dataarray(xr.DataArray(True)) is True
assert _skip_dataarray(layer_da) is True
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 22 and 23 is independent of the cases provided to this test. You could put them in a separate test.
Maybe name the tests something along:
test_skip_dataarray_grids_types
test_skip_dataarray_non_grid_types

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I decided to add separate cases for the layered constants and constants to GridCases, which I renamed to DataArrayCases.

)

all_geometric_vars = ["ihc", "cl1", "cl2", "hwva", "angldegx", "cdist"]
for var in all_geometric_vars:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can combine these line
for var in all_geometric_vars if var in in self.dataset.data_vars:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't work without a list/dict comprehension. So it would end up in this:

geometric_vars = ["ihc", "cl1", "cl2", "hwva", "angldegx", "cdist"]
vars_to_render.update({
    var: (index_dim, self.dataset[var].data)
    for var in geometric_vars if var in self.dataset.data_vars
})

Which looks more complicated than currently.

for var in all_geometric_vars:
if var in self.dataset.data_vars:
vars_to_render[var] = (index_dim, self.dataset[var].data)
datablock = xr.merge([vars_to_render], join="exact").to_dataframe()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does merge on a single object do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xr.merge merges all dictionaries which map variable names to DataArrays into a single xr.Dataset. So with one element in the list, the dictionary is converted to a xr.Dataset. xr.merge doesn't support providing a single dictionary directly.

@JoerivanEngelen JoerivanEngelen enabled auto-merge May 27, 2025 11:37
Copy link

@JoerivanEngelen JoerivanEngelen added this pull request to the merge queue May 27, 2025
Merged via the queue into master with commit 4ff9db9 May 27, 2025
7 checks passed
@JoerivanEngelen JoerivanEngelen deleted the issue_#1536_reduce_values_calls_splitting branch May 27, 2025 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reduce amount of calls to .values in model splitting
2 participants