-
Notifications
You must be signed in to change notification settings - Fork 3
Issue #1436 work around dask issue #1450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…nt amongst other tests.
|
@@ -91,11 +91,14 @@ def generate_index_array(self): | |||
active = self.dataset["active"] | |||
|
|||
isactive = area.where(active).notnull() | |||
svat = xr.full_like(area, fill_value=0, dtype=np.int64).rename("svat") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand how this fixes the issue.
I looked at the github issue you linked to.
There the problematic statement seems to be a[mask] = a[mask]
.
In this case is that svat.data[isactive.data] = np.arange(1, index.sum() + 1)
?
It seems that the bug is being recognized.
Wouldn't it be better to pin dask to an older version until this issue is resolved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dask has slightly different behavior here than numpy. I think we are threading into more obscure features of numpy, of which I'm not certain they are in the scope of Dask or not.
From what I read, the dask developer was surprised a[mask] = b
worked in the first place. Which is what we do here. Furthermore, it seems 2025.1.0 also shows some weird behavior with masks. I therefore think it is safer to load these arrays into memory (turning them into regular numpy arrays) for now. That is less hassle then pinning to older versions, as that entails patching the conda-forge repopatches feedstock etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just looked further into this, I think our issue is unrelated to the Dask issue. So what I did never worked with dask in the first place.
import numpy as np
a = np.zeros((3,3))
mask = np.array(
[
[True, True, False],
[False, False, True],
[False, True, False]
]
)
n_count = np.sum(mask)
a[mask] = np.arange(1, n_count+1)
print(a) # With numpy it works!
# %%
import dask.array
da = dask.array.from_array(a, chunks=(3,3))
dmask = dask.array.from_array(mask, chunks=(3,3))
da[dmask] = np.arange(1, n_count+1) # Error
Throws error:
ValueError: Boolean index assignment in Dask expects equally shaped arrays.
Example: da1[da2] = da3 where da1.shape == (4,), da2.shape == (4,) and da3.shape == (4,).
Alternatively, you can use the extended API that supportsindexing with tuples.
Example: da1[(da2,)] = da3.
Therefore loading these specific grids into memory is an acceptable, and quickest solution.
Fixes #1436
Description
Work around Dask issue by forcing a load of idomain and svat into memory. These grids will never be as large as the boundary condition grids with a time axis, so I think it is acceptable. At least for the LHM it is. I couldn't get it to work with a
where
: numpy only accepts 1D arrays on a boolean indexed 3D as long as the 1D array has the length of one of the dimensions, it then does what we want to do here. However when this is not the case, which is 95% of the time, numpy throws an error. I therefore had to resort to loading arrays into memory.Furthermore I refactored
test_coupler_mapping.py
to be able to reduce some code duplication and add a test case with a dask array. For this I had to make the well package used consistent (From 2 wells to 3 wells, where the second well now lays on y=2.0 instead of y=1.0), so I updated some of the assertions.Checklist
Issue #nr
, e.g.Issue #737