-
Notifications
You must be signed in to change notification settings - Fork 3
Description
At the outset, I'm not sure if this is a user error or a bug in ncdata or a bug upstream in xarray/iris or a bug downstream in netCDF4. I'm asking because it occurs in ncdata, but feel free to send me looking elsewhere if that makes more sense.
I was playing around with writing strings into a netCDF file. There seems to be multiple ways to do this, some of which seem to work fine, others of which raise errors.
For running all these demos, I used a Python 3.11 virtual environment with the following requirements.txt file. I'm working on a mac.
Requirements
ncdata==0.1.1
netCDF4==1.7.2
scitools-iris==3.11.1
xarray==2025.1.2
Passing example
If you create the array using a character array, this seems to all be happy
import iris
import netCDF4
import numpy as np
from ncdata.iris import from_iris
from ncdata.iris_xarray import cubes_to_xarray
from ncdata.netcdf4 import from_nc4
iris.FUTURE.save_split_attrs = True
with netCDF4.Dataset("demo.nc", "w") as ds:
regions_l = ["Australia", "New Zealand", "England"]
regions_max_length = max(len(v) for v in regions_l)
regions = np.array(regions_l, dtype=f"S{regions_max_length}")
ds.createDimension("lbl", len(regions))
ds.createDimension("strlen", regions_max_length)
ds.createVariable("region", "S1", ("lbl", "strlen"))
ds["region"][:] = netCDF4.stringtochar(regions)
# None of these raise any errors
from_nc4("demo.nc")
cube = iris.load("demo.nc")
from_iris(cube)
cubes_to_xarray(cube)The output netCDF file also looks sensible
ncdump demo.nc
netcdf demo {
dimensions:
lbl = 3 ;
strlen = 11 ;
variables:
char region(lbl, strlen) ;
data:
region =
"Australia",
"New Zealand",
"England" ;
}
Failing example 1 - something to do with encoding
If you create the array using a character array but let netCDF4 do the encoding, the string encoding seems to not work if you load from iris then try and convert with ncdata (suggests the bug is in iris?).
import iris
import netCDF4
import numpy as np
from ncdata.iris import from_iris
from ncdata.iris_xarray import cubes_to_xarray
from ncdata.netcdf4 import from_nc4
iris.FUTURE.save_split_attrs = True
with netCDF4.Dataset("demo.nc", "w") as ds:
regions_l = ["Australia", "New Zealand", "England"]
regions_max_length = max(len(v) for v in regions_l)
regions = np.array(regions_l, dtype=f"S{regions_max_length}")
ds.createDimension("lbl", len(regions))
ds.createDimension("strlen", regions_max_length)
ds.createVariable("region", "S1", ("lbl", "strlen"))
ds["region"]._Encoding = "ascii"
ds["region"][:] = regions
from_nc4("demo.nc")
cube = iris.load("demo.nc")
from_iris(cube)
"""
The line above gives the following error
...
File ".../venv/lib/python3.11/site-packages/ncdata/dataset_like.py", line 284, in _get_fillvalue
fv = netCDF4.default_fillvals[dtype_code]
~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^
KeyError: 'U11'
"""
cubes_to_xarray(cube)The underlying netCDF file looks sensible though.
netcdf demo {
dimensions:
lbl = 3 ;
strlen = 11 ;
variables:
char region(lbl, strlen) ;
region:_Encoding = "ascii" ;
data:
region =
"Australia",
"New Zealand",
"England" ;
}
Failing example 2 - variable length strings
If you write using a variable length string, then the error appears to come from ncdata. However, iris also can't load the file, so maybe this just isn't a supported use case.
import iris
import netCDF4
import numpy as np
from ncdata.iris import from_iris
from ncdata.iris_xarray import cubes_to_xarray
from ncdata.netcdf4 import from_nc4
iris.FUTURE.save_split_attrs = True
with netCDF4.Dataset("demo.nc", "w") as ds:
regions_l = ["Australia", "New Zealand", "England"]
regions_max_length = max(len(v) for v in regions_l)
regions = np.array(regions_l, dtype="O")
ds.createDimension("lbl", len(regions))
ds.createVariable("region", str, ("lbl",))
ds["region"][:] = regions
from_nc4("demo.nc")
"""
The line above gives the following error
Traceback (most recent call last):
File ".../demo-variable-str-failing.py", line 20, in <module>
from_nc4("demo.nc")
File ".../venv/lib/python3.11/site-packages/ncdata/netcdf4.py", line 308, in from_nc4
ncdata = _from_nc4_group(nc4ds)
^^^^^^^^^^^^^^^^^^^^^^
File ".../venv/lib/python3.11/site-packages/ncdata/netcdf4.py", line 264, in _from_nc4_group
var.data = da.from_array(
^^^^^^^^^^^^^^
File ".../venv/lib/python3.11/site-packages/dask/array/core.py", line 3523, in from_array
chunks = normalize_chunks(
^^^^^^^^^^^^^^^^^
File ".../venv/lib/python3.11/site-packages/dask/array/core.py", line 3130, in normalize_chunks
chunks = auto_chunks(chunks, shape, limit, dtype, previous_chunks)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../venv/lib/python3.11/site-packages/dask/array/core.py", line 3304, in auto_chunks
raise ValueError(
ValueError: auto-chunking with dtype.itemsize == 0 is not supported, please pass in `chunks` explicitly
"""
cube = iris.load("demo.nc")
from_iris(cube)
cubes_to_xarray(cube)The underlying netCDF seems to be valid, but maybe I'm missing something.
ncdump demo.nc
netcdf demo {
dimensions:
lbl = 3 ;
variables:
string region(lbl) ;
data:
region = "Australia", "New Zealand", "England" ;
}
@pp-mo not sure if you have any thoughts?