-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Support extension array indexes #9671
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ore/variable.py to use any-precision datetime/timedelta with autmatic inferring of resolution
…ocessing, raise now early
…t resolution, fix code and tests to allow this
for more information, see https://pre-commit.ci
… more carefully, for now using pd.Series to covert `OMm` type datetimes/timedeltas (will result in ns precision)
…rray` series creating an extension array when `.array` is accessed
@@ -1902,6 +1926,10 @@ def copy(self, deep: bool = True) -> Self: | |||
array = self.array.copy(deep=True) if deep else self.array | |||
return type(self)(array, self._dtype) | |||
|
|||
@property |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change breaks a lot of tests. I would do it in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have something for this :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
xarray/core/variable.py
Outdated
@@ -192,8 +192,6 @@ def _maybe_wrap_data(data): | |||
if isinstance(data, pd.Index): | |||
return PandasIndexingAdapter(data) | |||
if isinstance(data, pd.api.extensions.ExtensionArray): | |||
if isinstance(data.dtype, pd.Int64Dtype | pd.Float64Dtype | pd.StringDtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This casting was intentional. We'd like these dtypes to be converted to numpy arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok I had lost track of who did this or not, I thought I checked the blame but must not have. Could you explain a bit why so I can add a comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think its mostly backwards compatibility, and we aren't sure how to handle the possibility of both numpy and pandas handling very similar dtypes. So we are opting to not change behaviour
I think the failing test is coming from |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unless we want to special-case it in .data
or anywhere else that gives access to the underlying array I don't think there's a way around exposing the extension array wrapper (I think the special-casing would make the code a bit more complicated, though). I believe that's fine, though, since that would just be a special kind of array that is compatible with the array API.
…y into ig/fix_extension_indexer
It duck types but is not array-api compatible. That being said I would not be opposed to removing the visible wrapper from |
@keewis Excellent suggestion: ilan-gold#1. It seems to be crazy straightforward |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for sticking with this! and apologies for the slow progress. This kind of fundamental change is hard to review.
* main: (76 commits) Update how-to-add-new-backend.rst (#10240) Support extension array indexes (#9671) Switch documentation to pydata-sphinx-theme (#8708) Bump codecov/codecov-action from 5.4.0 to 5.4.2 in the actions group (#10239) Fix mypy, min-versions CI, xfail Zarr tests (#10255) Remove `test_dask_layers_and_dependencies` (#10242) Fix: Docs generation create temporary files that are not cleaned up. (#10238) opendap / dap4 support for pydap backend (#10182) Add RangeIndex (#10076) Fix mypy (#10232) Fix doctests (#10230) Fix broken Sphinx Roles (#10225) `DatasetView.map` fix `keep_attrs` (#10219) Add datatree repr asv (#10214) CI: Automatic PR labelling is back (#10201) Fixes dimension order in `xarray.Dataset.to_stacked_array` (#10205) Fix references to core classes in docs (#10207) Update pre-commit hooks (#10208) add `scipy-stubs` as extra `[types]` dependency (#10202) Fix sparse dask repr test (#10200) ...
* main: Fix convert calendar on non-temporal data in datasets (pydata#10268) BinGrouper: reduce indirection (pydata#10270) Fix reduction by subset of grouper dimensions (pydata#10258) Shorten text repr for ``DataTree`` (pydata#10139) Fix benchmarks runners (pydata#10265) Fix infinite recursion when calling `np.fix` (pydata#10248) BinGrouper: Support setting labels when provided with IntervalIndex (pydata#10259) Avoid stacking when grouping by chunked array (pydata#10254) Improve alignment checks (pydata#10251) Update how-to-add-new-backend.rst (pydata#10240) Support extension array indexes (pydata#9671) Switch documentation to pydata-sphinx-theme (pydata#8708) Bump codecov/codecov-action from 5.4.0 to 5.4.2 in the actions group (pydata#10239)
if dtype is None and is_valid_numpy_dtype(self.dtype): | ||
dtype = cast(np.dtype, self.dtype) | ||
else: | ||
dtype = get_valid_numpy_dtype(self.array) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ilan-gold this ignores any given dtype
that is not None, is it intended?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(For more context, I'm cleaning-up PandasIndexingAdapter classes in #10296)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I refactored this in 80f496f.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is likely not intended!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My intention was definitely not to ignore that. There should be another check on the dtype
itself for its validity as a numpy dtype
Identical to kmuehlbauer#1 - probably not very helpful in terms of changes since https://github.com/kmuehlbauer/xarray/tree/any-time-resolution-2 contains most of it....
whats-new.rst
api.rst