Support extension array indexes #9671

ilan-gold · 2024-10-24T15:37:01Z

Identical to kmuehlbauer#1 - probably not very helpful in terms of changes since https://github.com/kmuehlbauer/xarray/tree/any-time-resolution-2 contains most of it....

Closes #Failure in pandas TestDataFrameToXArray.test_to_xarray_index_types #9661
Tests added
User visible changes (including notable bug fixes) are documented in whats-new.rst
New functions/methods are listed in api.rst

…ore/variable.py to use any-precision datetime/timedelta with autmatic inferring of resolution

…ocessing, raise now early

…_ref_date

…o fix mypy

…t resolution, fix code and tests to allow this

for more information, see https://pre-commit.ci

…-resolution

… more carefully, for now using pd.Series to covert `OMm` type datetimes/timedeltas (will result in ns precision)

…rray` series creating an extension array when `.array` is accessed

xarray/core/formatting.py

dcherian · 2025-04-22T14:37:59Z

xarray/core/indexing.py

@@ -1902,6 +1926,10 @@ def copy(self, deep: bool = True) -> Self:
        array = self.array.copy(deep=True) if deep else self.array
        return type(self)(array, self._dtype)

+    @property


This change breaks a lot of tests. I would do it in a separate PR.

I have something for this :)

dcherian · 2025-04-22T14:41:10Z

xarray/core/variable.py

@@ -192,8 +192,6 @@ def _maybe_wrap_data(data):
    if isinstance(data, pd.Index):
        return PandasIndexingAdapter(data)
    if isinstance(data, pd.api.extensions.ExtensionArray):
-        if isinstance(data.dtype, pd.Int64Dtype | pd.Float64Dtype | pd.StringDtype):


This casting was intentional. We'd like these dtypes to be converted to numpy arrays.

Ok I had lost track of who did this or not, I thought I checked the blame but must not have. Could you explain a bit why so I can add a comment?

I think its mostly backwards compatibility, and we aren't sure how to handle the possibility of both numpy and pandas handling very similar dtypes. So we are opting to not change behaviour

ilan-gold · 2025-04-22T16:51:23Z

I think the failing test is coming from zarr==3.0.7 but otherwise everything seems to pass

keewis

unless we want to special-case it in .data or anywhere else that gives access to the underlying array I don't think there's a way around exposing the extension array wrapper (I think the special-casing would make the code a bit more complicated, though). I believe that's fine, though, since that would just be a special kind of array that is compatible with the array API.

xarray/core/extension_array.py

xarray/core/utils.py

…y into ig/fix_extension_indexer

ilan-gold · 2025-04-23T09:52:48Z

that is compatible with the array API.

It duck types but is not array-api compatible. That being said I would not be opposed to removing the visible wrapper from .data. I can at least try locally and see what comes of it

ilan-gold · 2025-04-23T14:12:19Z

@keewis Excellent suggestion: ilan-gold#1. It seems to be crazy straightforward

dcherian

Thanks for sticking with this! and apologies for the slow progress. This kind of fundamental change is hard to review.

* main: (76 commits) Update how-to-add-new-backend.rst (#10240) Support extension array indexes (#9671) Switch documentation to pydata-sphinx-theme (#8708) Bump codecov/codecov-action from 5.4.0 to 5.4.2 in the actions group (#10239) Fix mypy, min-versions CI, xfail Zarr tests (#10255) Remove `test_dask_layers_and_dependencies` (#10242) Fix: Docs generation create temporary files that are not cleaned up. (#10238) opendap / dap4 support for pydap backend (#10182) Add RangeIndex (#10076) Fix mypy (#10232) Fix doctests (#10230) Fix broken Sphinx Roles (#10225) `DatasetView.map` fix `keep_attrs` (#10219) Add datatree repr asv (#10214) CI: Automatic PR labelling is back (#10201) Fixes dimension order in `xarray.Dataset.to_stacked_array` (#10205) Fix references to core classes in docs (#10207) Update pre-commit hooks (#10208) add `scipy-stubs` as extra `[types]` dependency (#10202) Fix sparse dask repr test (#10200) ...

* main: Fix convert calendar on non-temporal data in datasets (pydata#10268) BinGrouper: reduce indirection (pydata#10270) Fix reduction by subset of grouper dimensions (pydata#10258) Shorten text repr for ``DataTree`` (pydata#10139) Fix benchmarks runners (pydata#10265) Fix infinite recursion when calling `np.fix` (pydata#10248) BinGrouper: Support setting labels when provided with IntervalIndex (pydata#10259) Avoid stacking when grouping by chunked array (pydata#10254) Improve alignment checks (pydata#10251) Update how-to-add-new-backend.rst (pydata#10240) Support extension array indexes (pydata#9671) Switch documentation to pydata-sphinx-theme (pydata#8708) Bump codecov/codecov-action from 5.4.0 to 5.4.2 in the actions group (pydata#10239)

benbovy · 2025-05-09T12:28:13Z

xarray/core/indexing.py

+        if dtype is None and is_valid_numpy_dtype(self.dtype):
+            dtype = cast(np.dtype, self.dtype)
+        else:
+            dtype = get_valid_numpy_dtype(self.array)


@ilan-gold this ignores any given dtype that is not None, is it intended?

(For more context, I'm cleaning-up PandasIndexingAdapter classes in #10296)

I refactored this in 80f496f.

That is likely not intended!

My intention was definitely not to ignore that. There should be another check on the dtype itself for its validity as a numpy dtype

kmuehlbauer and others added 30 commits October 18, 2024 07:31

implement default_precision_timestamp, refactor coding/times.py and c…

7b5f323

…ore/variable.py to use any-precision datetime/timedelta with autmatic inferring of resolution

align tests with new time resolution behaviour

8784f33

timedelta decoding, fsspec handling

b45ab23

fixes in coding/times.py

39086ef

add docs on time coding

df49a40

attempt fixing doc tests

adb8ca3

fix issue where out-of-bounds floating point values slipped in the pr…

266b1ed

…ocessing, raise now early

convert to UTC first before stripping of tz in _unpack_time_units_and…

6d5f13b

…_ref_date

reorganize pandas compatibility code, remove unneeded code, attempt t…

5d68bfe

…o fix mypy

another attempt to finally fix mypy

07bba69

refactor out _check_date_is_after_shift

6e7f0bb

refactor out _maybe_strip_tz_from_timestamp

b4a49bb

more refactoring in coding.times.py

2e1ff4f

more refactoring in coding.times.py

d5a7da0

minor fix in time-coding.rst

821b68d

set default resolution to "s", which actually means, use pandas lowes…

d066edf

…t resolution, fix code and tests to allow this

Add section for default units, fix options

ed22da1

attempt to fix typing

8bf23f4

attempt to fix typing

c3a2b39

fix scalar datetime/timedelta

3c44aed

fix user docs

48be73a

[pre-commit.ci] auto fixes from pre-commit.com hooks

7ac9983

for more information, see https://pre-commit.ci

Fix variable tests, mostly datetime/timedelta is inittialized with us…

d86ad04

…-resolution

revert changes in _possible_convert_objects, this needs to be checked…

b5d0795

… more carefully, for now using pd.Series to covert `OMm` type datetimes/timedeltas (will result in ns precision)

fix doc link

60324f0

(fix): allow all extension array data types in pandas adapters

c2bc4df

(fix): dataframes have no array attr

84569bc

(fix): allow chunked numpy extension arrays because of `test_pandas_a…

90e390d

…rray` series creating an extension array when `.array` is accessed

(fix): dtypes for PandasIndex

7c32bd0

(chore): remove test for unnecessary conversion

795ecf6

github-actions bot added the topic-NamedArray Lightweight version of Variable label Apr 9, 2025

dcherian reviewed Apr 9, 2025

View reviewed changes

xarray/core/formatting.py Outdated Show resolved Hide resolved

ilan-gold added 3 commits April 22, 2025 13:14

Merge branch 'main' into ig/fix_extension_indexer

02c887c

(fix): typing issues

bddccce

(fix): remove unnecessary casting + nbytes

6134d10

dcherian reviewed Apr 22, 2025

View reviewed changes

(fix): nbytes value

cfca8f5

dcherian reviewed Apr 22, 2025

View reviewed changes

(fix): handling of non-unique dtype with extension arrays

18b78d1

ilan-gold added 2 commits April 22, 2025 18:52

(fix): mypy?

cfd099f

Merge branch 'main' into ig/fix_extension_indexer

b147efb

keewis reviewed Apr 22, 2025

View reviewed changes

xarray/core/extension_array.py Show resolved Hide resolved

xarray/core/utils.py Outdated Show resolved Hide resolved

ilan-gold added 2 commits April 23, 2025 11:25

(fix): return get_valid_numpy_dtype

78cc243

Merge branch 'ig/fix_extension_indexer' of github.com:ilan-gold/xarra…

96a1ddd

…y into ig/fix_extension_indexer

dcherian and others added 3 commits April 26, 2025 10:53

Merge branch 'main' into ig/fix_extension_indexer

4d2ecc6

fix doctest

9883ab0

Update whats-new

e7f681c

dcherian approved these changes Apr 26, 2025

View reviewed changes

Merge branch 'main' into ig/fix_extension_indexer

d8a6b5d

dcherian enabled auto-merge (squash) April 26, 2025 17:32

dcherian merged commit 2f1751d into pydata:main Apr 26, 2025
31 checks passed

ilan-gold mentioned this pull request May 1, 2025

Add back getattr for ExtensionArrays #10278

Merged

benbovy reviewed May 9, 2025

View reviewed changes

This was referenced May 14, 2025

Failure in pandas TestDataFrameToXArray.test_to_xarray_index_types #9661

Closed

Cannot export dataset with categorical index in 2025.4.0 #10312

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support extension array indexes #9671

Support extension array indexes #9671

ilan-gold commented Oct 24, 2024 •

edited by dcherian

Loading

dcherian Apr 22, 2025

ilan-gold Apr 22, 2025

ilan-gold Apr 22, 2025

dcherian Apr 22, 2025

ilan-gold Apr 22, 2025 •

edited

Loading

dcherian Apr 22, 2025 •

edited

Loading

ilan-gold commented Apr 22, 2025 •

edited

Loading

keewis left a comment

ilan-gold commented Apr 23, 2025

ilan-gold commented Apr 23, 2025

dcherian left a comment

benbovy May 9, 2025

benbovy May 9, 2025

benbovy May 9, 2025

ilan-gold May 9, 2025

ilan-gold May 9, 2025

Support extension array indexes #9671

Support extension array indexes #9671

Conversation

ilan-gold commented Oct 24, 2024 • edited by dcherian Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ilan-gold Apr 22, 2025 • edited Loading

Choose a reason for hiding this comment

dcherian Apr 22, 2025 • edited Loading

Choose a reason for hiding this comment

ilan-gold commented Apr 22, 2025 • edited Loading

keewis left a comment

Choose a reason for hiding this comment

ilan-gold commented Apr 23, 2025

ilan-gold commented Apr 23, 2025

dcherian left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ilan-gold commented Oct 24, 2024 •

edited by dcherian

Loading

ilan-gold Apr 22, 2025 •

edited

Loading

dcherian Apr 22, 2025 •

edited

Loading

ilan-gold commented Apr 22, 2025 •

edited

Loading