Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Support extension array indexes #9671

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 246 commits into from
Apr 26, 2025
Merged

Conversation

ilan-gold
Copy link
Contributor

@ilan-gold ilan-gold commented Oct 24, 2024

Identical to kmuehlbauer#1 - probably not very helpful in terms of changes since https://github.com/kmuehlbauer/xarray/tree/any-time-resolution-2 contains most of it....

kmuehlbauer and others added 30 commits October 18, 2024 07:31
…ore/variable.py to use any-precision datetime/timedelta with autmatic inferring of resolution
…t resolution, fix code and tests to allow this
… more carefully, for now using pd.Series to covert `OMm` type datetimes/timedeltas (will result in ns precision)
…rray` series creating an extension array when `.array` is accessed
@github-actions github-actions bot added the topic-NamedArray Lightweight version of Variable label Apr 9, 2025
@@ -1902,6 +1926,10 @@ def copy(self, deep: bool = True) -> Self:
array = self.array.copy(deep=True) if deep else self.array
return type(self)(array, self._dtype)

@property
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change breaks a lot of tests. I would do it in a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have something for this :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -192,8 +192,6 @@ def _maybe_wrap_data(data):
if isinstance(data, pd.Index):
return PandasIndexingAdapter(data)
if isinstance(data, pd.api.extensions.ExtensionArray):
if isinstance(data.dtype, pd.Int64Dtype | pd.Float64Dtype | pd.StringDtype):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This casting was intentional. We'd like these dtypes to be converted to numpy arrays.

Copy link
Contributor Author

@ilan-gold ilan-gold Apr 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I had lost track of who did this or not, I thought I checked the blame but must not have. Could you explain a bit why so I can add a comment?

Copy link
Contributor

@dcherian dcherian Apr 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think its mostly backwards compatibility, and we aren't sure how to handle the possibility of both numpy and pandas handling very similar dtypes. So we are opting to not change behaviour

@ilan-gold
Copy link
Contributor Author

ilan-gold commented Apr 22, 2025

I think the failing test is coming from zarr==3.0.7 but otherwise everything seems to pass

Copy link
Collaborator

@keewis keewis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unless we want to special-case it in .data or anywhere else that gives access to the underlying array I don't think there's a way around exposing the extension array wrapper (I think the special-casing would make the code a bit more complicated, though). I believe that's fine, though, since that would just be a special kind of array that is compatible with the array API.

@ilan-gold
Copy link
Contributor Author

that is compatible with the array API.

It duck types but is not array-api compatible. That being said I would not be opposed to removing the visible wrapper from .data. I can at least try locally and see what comes of it

@ilan-gold
Copy link
Contributor Author

@keewis Excellent suggestion: ilan-gold#1. It seems to be crazy straightforward

Copy link
Contributor

@dcherian dcherian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for sticking with this! and apologies for the slow progress. This kind of fundamental change is hard to review.

@dcherian dcherian enabled auto-merge (squash) April 26, 2025 17:32
@dcherian dcherian merged commit 2f1751d into pydata:main Apr 26, 2025
31 checks passed
dcherian added a commit that referenced this pull request Apr 27, 2025
* main: (76 commits)
  Update how-to-add-new-backend.rst (#10240)
  Support extension array indexes (#9671)
  Switch documentation to pydata-sphinx-theme (#8708)
  Bump codecov/codecov-action from 5.4.0 to 5.4.2 in the actions group (#10239)
  Fix mypy, min-versions CI, xfail Zarr tests (#10255)
  Remove `test_dask_layers_and_dependencies` (#10242)
  Fix: Docs generation create temporary files that are not cleaned up. (#10238)
  opendap / dap4 support for pydap backend (#10182)
  Add RangeIndex (#10076)
  Fix mypy (#10232)
  Fix doctests (#10230)
  Fix broken Sphinx Roles (#10225)
  `DatasetView.map` fix `keep_attrs` (#10219)
  Add datatree repr asv (#10214)
  CI: Automatic PR labelling is back (#10201)
  Fixes dimension order in `xarray.Dataset.to_stacked_array` (#10205)
  Fix references to core classes in docs (#10207)
  Update pre-commit hooks (#10208)
  add `scipy-stubs` as extra `[types]` dependency (#10202)
  Fix sparse dask repr test (#10200)
  ...
dcherian added a commit to dcherian/xarray that referenced this pull request Apr 29, 2025
* main:
  Fix convert calendar on non-temporal data in datasets (pydata#10268)
  BinGrouper: reduce indirection (pydata#10270)
  Fix reduction by subset of grouper dimensions (pydata#10258)
  Shorten text repr for ``DataTree`` (pydata#10139)
  Fix benchmarks runners (pydata#10265)
  Fix infinite recursion when calling `np.fix` (pydata#10248)
  BinGrouper: Support setting labels when provided with IntervalIndex (pydata#10259)
  Avoid stacking when grouping by chunked array (pydata#10254)
  Improve alignment checks (pydata#10251)
  Update how-to-add-new-backend.rst (pydata#10240)
  Support extension array indexes (pydata#9671)
  Switch documentation to pydata-sphinx-theme (pydata#8708)
  Bump codecov/codecov-action from 5.4.0 to 5.4.2 in the actions group (pydata#10239)
Comment on lines +1781 to +1784
if dtype is None and is_valid_numpy_dtype(self.dtype):
dtype = cast(np.dtype, self.dtype)
else:
dtype = get_valid_numpy_dtype(self.array)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ilan-gold this ignores any given dtype that is not None, is it intended?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(For more context, I'm cleaning-up PandasIndexingAdapter classes in #10296)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored this in 80f496f.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is likely not intended!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intention was definitely not to ignore that. There should be another check on the dtype itself for its validity as a numpy dtype

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants