Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[Bug] - Derivation MetaSWAP mappings fails with Dask >= 2025.2.0 #1436

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 tasks
JoerivanEngelen opened this issue Feb 19, 2025 · 1 comment · Fixed by #1450
Closed
2 tasks

[Bug] - Derivation MetaSWAP mappings fails with Dask >= 2025.2.0 #1436

JoerivanEngelen opened this issue Feb 19, 2025 · 1 comment · Fixed by #1450
Assignees
Labels
bug Something isn't working

Comments

@JoerivanEngelen
Copy link
Contributor

JoerivanEngelen commented Feb 19, 2025

Bug description

By sheer coincidence, @WouterSwierstra and I ran into a bug that arose with the latest Dask release. Problematic are lines like this:

mod_id.data[idomain_active.data] = np.arange(1, n_mod + 1)

If we load this into memory as numpy arrays, everything is fine, however keeping them as lazy objects causes values not to be set or causes an error like: ValueError: cannot broadcast shape (10,) to shape (nan,). I think I'm doing something here that was not intended to work with Dask and somehow magically worked.

And even worse:

svat.values[isactive.values] = np.arange(1, index.sum() + 1)

Here it tries to set a read-only array in case Dask is used: .values then returns a read-only array.

Related issue: dask/dask#11753, this provides a potential solution:

In the meantime, dask-ml should use da.where instead (which is what da.Array.setitem calls internally) unless there is a use case about applying the same functions to numpy arrays, which would take a performance hit?

So I therefore think using a where is a lot safer here. Operations are on 2D grids, without time dimension, so I don't foresee huge performance issues by doing this.

Furthermore: The fact that we missed this is probably caused by our test bench loading everything into memory, otherwise this bug would have surfaced earlier. Dask 2025.2.0 is in the present dev environment.

Refinement

  • In the MetaSWAP mapping derivations that use fancy indexing, use where instead.
  • Add tests where data is not loaded into memory and run with dask.
@JoerivanEngelen JoerivanEngelen added the bug Something isn't working label Feb 19, 2025
@github-project-automation github-project-automation bot moved this to 📯 New in iMOD Suite Feb 19, 2025
@JoerivanEngelen JoerivanEngelen moved this from 📯 New to 📝Refined in iMOD Suite Feb 19, 2025
@JoerivanEngelen JoerivanEngelen moved this from 📝Refined to 🏗 In Progress in iMOD Suite Feb 21, 2025
@JoerivanEngelen JoerivanEngelen self-assigned this Feb 21, 2025
@JoerivanEngelen JoerivanEngelen moved this from 🏗 In Progress to 🧐 In Review in iMOD Suite Feb 24, 2025
@JoerivanEngelen
Copy link
Contributor Author

Just looked further into this, I think our issue is unrelated to the Dask issue. So what I did never worked with dask in the first place.

import numpy as np

a = np.zeros((3,3))
mask = np.array(
    [
        [True, True, False], 
        [False, False, True], 
        [False, True, False]
    ]
)
n_count = np.sum(mask)
a[mask] = np.arange(1, n_count+1)
print(a) # With numpy it works!

# %%
import dask.array

da = dask.array.from_array(a, chunks=(3,3))
dmask = dask.array.from_array(mask, chunks=(3,3))
da[dmask] = np.arange(1, n_count+1) # Error

Throws error:

ValueError: Boolean index assignment in Dask expects equally shaped arrays.
Example: da1[da2] = da3 where da1.shape == (4,), da2.shape == (4,) and da3.shape == (4,).
Alternatively, you can use the extended API that supportsindexing with tuples.
Example: da1[(da2,)] = da3.

github-merge-queue bot pushed a commit that referenced this issue Feb 25, 2025
Fixes #1436 

# Description
Work around Dask issue by forcing a load of idomain and svat into
memory. These grids will never be as large as the boundary condition
grids with a time axis, so I think it is acceptable. At least for the
LHM it is. I couldn't get it to work with a ``where``: numpy only
accepts 1D arrays on a boolean indexed 3D as long as the 1D array has
the length of one of the dimensions, it then does what we want to do
here. However when this is not the case, which is 95% of the time, numpy
throws an error. I therefore had to resort to loading arrays into
memory.

Furthermore I refactored ``test_coupler_mapping.py`` to be able to
reduce some code duplication and add a test case with a dask array. For
this I had to make the well package used consistent (From 2 wells to 3
wells, where the second well now lays on y=2.0 instead of y=1.0), so I
updated some of the assertions.

# Checklist
- [x] Links to correct issue
- [x] Update changelog, if changes affect users
- [x] PR title starts with ``Issue #nr``, e.g. ``Issue #737``
- [x] Unit tests were added
- [ ] **If feature added**: Added/extended example
@github-project-automation github-project-automation bot moved this from 🧐 In Review to ✅ Done in iMOD Suite Feb 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: ✅ Done
Development

Successfully merging a pull request may close this issue.

1 participant