[Bug] - Derivation MetaSWAP mappings fails with Dask >= 2025.2.0 #1436

JoerivanEngelen · 2025-02-19T12:28:35Z

Bug description

By sheer coincidence, @WouterSwierstra and I ran into a bug that arose with the latest Dask release. Problematic are lines like this:

imod-python/imod/msw/coupler_mapping.py

Line 120 in 4c4e6e1

mod_id.data[idomain_active.data] = np.arange(1, n_mod + 1)

If we load this into memory as numpy arrays, everything is fine, however keeping them as lazy objects causes values not to be set or causes an error like: ValueError: cannot broadcast shape (10,) to shape (nan,). I think I'm doing something here that was not intended to work with Dask and somehow magically worked.

And even worse:

imod-python/imod/msw/grid_data.py

Line 98 in 4c4e6e1

svat.values[isactive.values] = np.arange(1, index.sum() + 1)

Here it tries to set a read-only array in case Dask is used: .values then returns a read-only array.

Related issue: dask/dask#11753, this provides a potential solution:

In the meantime, dask-ml should use da.where instead (which is what da.Array.setitem calls internally) unless there is a use case about applying the same functions to numpy arrays, which would take a performance hit?

So I therefore think using a where is a lot safer here. Operations are on 2D grids, without time dimension, so I don't foresee huge performance issues by doing this.

Furthermore: The fact that we missed this is probably caused by our test bench loading everything into memory, otherwise this bug would have surfaced earlier. Dask 2025.2.0 is in the present dev environment.

Refinement

In the MetaSWAP mapping derivations that use fancy indexing, use where instead.
Add tests where data is not loaded into memory and run with dask.

The text was updated successfully, but these errors were encountered:

JoerivanEngelen · 2025-02-25T09:10:52Z

Just looked further into this, I think our issue is unrelated to the Dask issue. So what I did never worked with dask in the first place.

import numpy as np

a = np.zeros((3,3))
mask = np.array(
    [
        [True, True, False], 
        [False, False, True], 
        [False, True, False]
    ]
)
n_count = np.sum(mask)
a[mask] = np.arange(1, n_count+1)
print(a) # With numpy it works!

# %%
import dask.array

da = dask.array.from_array(a, chunks=(3,3))
dmask = dask.array.from_array(mask, chunks=(3,3))
da[dmask] = np.arange(1, n_count+1) # Error

Throws error:

ValueError: Boolean index assignment in Dask expects equally shaped arrays.
Example: da1[da2] = da3 where da1.shape == (4,), da2.shape == (4,) and da3.shape == (4,).
Alternatively, you can use the extended API that supportsindexing with tuples.
Example: da1[(da2,)] = da3.

Fixes #1436 # Description Work around Dask issue by forcing a load of idomain and svat into memory. These grids will never be as large as the boundary condition grids with a time axis, so I think it is acceptable. At least for the LHM it is. I couldn't get it to work with a ``where``: numpy only accepts 1D arrays on a boolean indexed 3D as long as the 1D array has the length of one of the dimensions, it then does what we want to do here. However when this is not the case, which is 95% of the time, numpy throws an error. I therefore had to resort to loading arrays into memory. Furthermore I refactored ``test_coupler_mapping.py`` to be able to reduce some code duplication and add a test case with a dask array. For this I had to make the well package used consistent (From 2 wells to 3 wells, where the second well now lays on y=2.0 instead of y=1.0), so I updated some of the assertions. # Checklist - [x] Links to correct issue - [x] Update changelog, if changes affect users - [x] PR title starts with ``Issue #nr``, e.g. ``Issue #737`` - [x] Unit tests were added - [ ] **If feature added**: Added/extended example

JoerivanEngelen added the bug Something isn't working label Feb 19, 2025

github-project-automation bot added this to iMOD Suite Feb 19, 2025

github-project-automation bot moved this to 📯 New in iMOD Suite Feb 19, 2025

JoerivanEngelen moved this from 📯 New to 📝Refined in iMOD Suite Feb 19, 2025

JoerivanEngelen moved this from 📝Refined to 🏗 In Progress in iMOD Suite Feb 21, 2025

JoerivanEngelen self-assigned this Feb 21, 2025

JoerivanEngelen mentioned this issue Feb 24, 2025

Issue #1436 work around dask issue #1450

Merged

5 tasks

JoerivanEngelen moved this from 🏗 In Progress to 🧐 In Review in iMOD Suite Feb 24, 2025

JoerivanEngelen closed this as completed in #1450 Feb 25, 2025

github-project-automation bot moved this from 🧐 In Review to ✅ Done in iMOD Suite Feb 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug] - Derivation MetaSWAP mappings fails with Dask >= 2025.2.0 #1436

[Bug] - Derivation MetaSWAP mappings fails with Dask >= 2025.2.0 #1436

JoerivanEngelen commented Feb 19, 2025 •

edited

Loading

JoerivanEngelen commented Feb 25, 2025

Uh oh!

[Bug] - Derivation MetaSWAP mappings fails with Dask >= 2025.2.0 #1436

[Bug] - Derivation MetaSWAP mappings fails with Dask >= 2025.2.0 #1436

Comments

JoerivanEngelen commented Feb 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Refinement

JoerivanEngelen commented Feb 25, 2025

Uh oh!

JoerivanEngelen commented Feb 19, 2025 •

edited

Loading