-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
WIP: Implement oindex #6075
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Implement oindex #6075
Conversation
0562b2f
to
7145b07
Compare
See also a start for an NEP at https://gist.github.com/seberg/976373b6a2b7c4188591 Of course some of the things are a bit from my perspective, I did not actually run the examples, so won't guarantee they are all correct ;) (I know that @njsmith was pondering some more restrictions in some places with boolean indices, but I do not see the reason for that right now). Same again, don't expect instant followup, it was more a way to keep me awake.... |
@seberg - the fancy indexing has me flummoxed often enough that I'd welcome any simplification! Two broad comments (not sure if you prefer them here or on your NEP; happy to repost):
|
Well, take is a single inter array index, it does not have all of these problems at all, so I don't really think that applies. About your point 1. A function which does this would be nice. One could of course think about doing something like indexing (to also allow slices, etc.), but I am not clear on how you would do it. But something like |
Perhaps a |
@jaimefrio - yes, probably much more sensible -- see #6078. @seberg - the name is of course an implementation detail, though it is helpful to have an obvious one. For The logic in suggesting
|
Just to note, I added a small paragraph about this problem to the NEP, I think I will mail it to the list (unless someone feels I should add something) shortly after the 1.10 release. |
Updated. It now implements everything in the NEP, oindex, vindex and lindex. However, does not throw a fit when plain indexing is potentially not clear (but that should be a trivial addition). It currently works by broadcasting the arrays, in principle could be slightly faster by instead using axes reordering in nditer, but bleh ;). |
Boooo! Should now also (hopefully) give the "unclear" warnings -- not sure if with the exact rules as in the NEP -- if you enable deprecation warnings always/error. |
☔ The latest upstream changes (presumably #7027) made this pull request unmergeable. Please resolve the merge conflicts. |
☔ The latest upstream changes (presumably #7667) made this pull request unmergeable. Please resolve the merge conflicts. |
☔ The latest upstream changes (presumably #8043) made this pull request unmergeable. Please resolve the merge conflicts. |
does this need a rebase/conflict resolution for the NEP discussion to progress? |
Implement ``` multiindex.prepared(dtype=None, shape=None,convert_booleans={"not_single", "always"}) ``` which gives some information. On field access (which is not possible to reach) would return `{"type": "field-access", "orig_index": orig_index}`, on non field index returns a much larger dict with most information.
Tests should be passing now. I added a In [1]: class subclass(np.ndarray):
...: def __getitem__(self, obj):
...: if isinstance(obj, np.core.multiarray._multiindex):
...: print(obj.prepared(convert_booleans="not_single"))
...: return super().__getitem__(obj)
...:
In [2]: arr = np.zeros((2, 3)).view(subclass)
In [3]: arr.oindex[[0, 1], [0, 1]]
{'type': 'index', 'method': 'oindex', 'orig_index': ([0, 1], [0, 1]), 'view': False, 'simplified_index': (array([0, 1]), array([0, 1])), 'scalar': False, 'ellipsis_dims': None, 'result_ndim': 2}
Out[3]:
subclass([[0., 0.],
[0., 0.]])
In [4]: arr.vindex[[0, 1], [0, 1]]
{'type': 'index', 'method': 'vindex', 'orig_index': ([0, 1], [0, 1]), 'view': False, 'simplified_index': (array([0, 1]), array([0, 1])), 'scalar': False, 'ellipsis_dims': None, 'result_ndim': 1}
Out[4]: subclass([0., 0.])
In [5]: arr.oindex[arr > 0]
{'type': 'index', 'method': 'oindex', 'orig_index': subclass([[False, False, False],
[False, False, False]]), 'view': False, 'simplified_index': subclass([[False, False, False],
[False, False, False]]), 'scalar': False, 'ellipsis_dims': None, 'result_ndim': 1}
Out[5]: subclass([], dtype=float64)
In [6]: arr.oindex[True, 0, ...]
{'type': 'index', 'method': 'oindex', 'orig_index': (True, 0, Ellipsis), 'view': False, 'simplified_index': (True, 0, Ellipsis), 'scalar': False, 'ellipsis_dims': 1, 'result_ndim': 2}
Out[6]: subclass([[0., 0., 0.]])
In [7]: arr.oindex[np.array(True), 0, ...]
{'type': 'index', 'method': 'oindex', 'orig_index': (array(True), 0, Ellipsis), 'view': False, 'simplified_index': (True, 0, Ellipsis), 'scalar': False, 'ellipsis_dims': 1, 'result_ndim': 2}
Out[7]: subclass([[0., 0., 0.]])
In [8]: arr.vindex[0, 1]
{'type': 'index', 'method': 'vindex', 'orig_index': (0, 1), 'view': False, 'simplified_index': (0, 1), 'scalar': True, 'ellipsis_dims': None, 'result_ndim': 0}
Out[8]: 0.0
In [9]: arr.vindex[0, 1, ...]
{'type': 'index', 'method': 'vindex', 'orig_index': (0, 1, Ellipsis), 'view': True, 'simplified_index': (0, 1, Ellipsis), 'scalar': False, 'ellipsis_dims': 0, 'result_ndim': 0}
Out[9]: subclass(0.)
In [10]: arr.vindex[0, np.array(1), ...]
{'type': 'index', 'method': 'vindex', 'orig_index': (0, array(1), Ellipsis), 'view': False, 'simplified_index': (0, 1, Ellipsis), 'scalar': False, 'ellipsis_dims': 0, 'result_ndim': 0}
Out[10]: subclass(0.) @hameerabbasi would this type of thing help you. Note a few oddities you cannot see... 0-D booleans are converted to False/True (for plain indexing, which is not available). Also the EDIT: To note, valid arguments for EDIT: OOps, about 0-D booleans being True/False. This can be reached of course, I just had a bug. |
Another note. This does not expose how |
A screw it, the EDIT: OK, one way around it would be to actually do the dimension expansion necessary to broadcast everything to a fancy index. The actual transpose would be a different issue though. But I think we could repor tthat as well. EDIT: Or something like a tuple-of arrays (note that a 1-D boolean index is ok), but without more hacks, that fully means that numpy does not understand that simplified index anymore.... |
#define PLAIN_INDEXING 1 | ||
#define OUTER_INDEXING 2 | ||
#define VECTOR_INDEXING 4 | ||
#define FANCY_INDEXING 8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to imply that these can be or'd together - does that make any sense at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you are right, but before going too much in depth, might want to mostly discuss the gist. There are also larger code blocks that need to be deleted due to not support bools in vindex. Its pretty alpha, though with code removal, and disabling that prepared
thingy, it should probably work fine. (plus making the warnings more conservative).
PyObject *index; /* The indexing object */ | ||
int indexing_method; /* See mapping.h */ | ||
/* If bound is 1, the following are information about the array */ | ||
int bound; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What purpose does binding serve? If it's useful, I'd be inclined to have separate Index
and BoundIndex
classes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You do not have to pass in the shape and dtype again if you are a subclass, but I guess forcing the subclass to write prepared(self.shape, self.dtype)
or .prepared(self)
is OK too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whose job it is to do the binding seems kinda orthogonal - what I'm saying is that this would be better represented with multiple classes, rather than a single class with a bunch of flags - especially given the precedent for bound objects in python. It sounds like the only value in binding is to make the prepared()
function work anyway? Perhaps that could be exposed as index.bind(shape, dtype).data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point, its much like set and frozenset. Probably should only have the bound version anyway, and expose all of these as attributes and just allow to create a new one from an old one with a new shape (and dtype), could be a method or not.
Got a bit lost in the fastest way to return something instead of thinking about how is nicest. My only real interest is currently if such info can help subclasses or not.
@seberg Thanks for the ping! The problem here is I doubt anyone will need to know how @mattip It seems NEP-21 (which this is an implementation of) is still in the draft stage. We should ping the mailing list for acceptance before we actually merge this. This is an implementation detail for sure, but I believe that since |
@hameerabbasi, this is far from ready, since it is lacking a lot in terms of tests, etc. The API would definitely not look as it is now. What I could imagine right now is probably this, make this rather be exposed as:
I somewhat think that simply not allowing field access might be an option as well (would remove the need for that dtype). If this is supposed to really give information, then I think keeping the original index around is probably unnecessary. Also just to note: numpy will cast all vector inputs (also the booleans) to intp arrays during the preprocessing (which this exposes), is that even desired for projects like dask/xarray? |
The The The
The
AFAICT, I don't believe they'll care so long as it isn't worse than before. |
Xarray already implements all forms of outer and vectorized style indexing.
But they work differently because we do broadcasting based on dimension
names, so unlike the case in NumPy they don't need separate interfaces.
…On Mon, Jul 30, 2018 at 6:10 AM Hameer Abbasi ***@***.***> wrote:
The view thing might be useful for Dask, in particular, they might use it
to determine if they should create a view, and if this created view will
use any extra memory. I don't believe they do this currently.
The scalar thing is useful to know for pydata/sparse, currently we do
something hacky depending on if the indices are incomplete and the last
index is an ellipsis.
The vector_index thing should probably not be broadcasted to the output
dims. I can see myself using .flatten and since this would *not* produce
a view for broadcasted dimensions... Which can make memory usage blow up
for something like outer. It's probably best if it was an input to .vindex
.
Getting the transpose rule might be a bit annoying, but probably not be so
bad, have to read the code a bit though to be sure, so if it helps
subclasses quite a bit, sounds fine. The problem is some lock-in to
supporting this output.
The vector_transpose rule is useful if someone actually wants to
implement legacy indexing. I can speak for myself: I don't plan to. I'm not
sure of XArray or Dask will be interested in this, but given that no one
has put in the effort to support it so far, and nor have there been
significant requests for it: Probably not. I can probably explain the rule
to you now (from the docs): If not all scalars and arrays (so-called
advanced indices) are next to each other, then the "advanced index
dimension" goes at the start. Otherwise, it goes where these advanced
indices were in the original array.
numpy will cast all vector inputs (also the booleans) to intp arrays
during the preprocessing (which this exposes), is that even desired for
projects like dask/xarray
AFAICT, I don't believe they'll care so long as it isn't worse than before.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#6075 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ABKS1r86sA4qrn0iqWR4JyhGugfEH4__ks5uLwW6gaJpZM4FYEYr>
.
|
Well, thinking about it a little, my current take would be:
|
As long as there will be an externally exposed method that does the same thing. Remember that pydata/sparse, XArray and Dask aren't subclasses. Subclasses are things like |
@hameerabbasi yes, of course, which is why above I always had But, to get to the transpose part, should probably clean up some technical dept in the MapIter code :(, nothing huge but also not very quick maybe. |
So, actually, to really move on here, probably the best thing is to leave it at it, the Index object is probably a good idea, and for now could really just store the original index (or the prepared one, hardly matters -- except if it is the prepared one, will need to enforce the same shape and fields in the dtype, so thats a bit annoying maybe). And then continue trying to make the life of things such as pysparse easier afterwards. I have pointed out what we can do reasonably or with a bit more work. But, for the next 2-4 months I really shouldn't do much here :(. |
@hameerabbasi This is the old PR, which is probably largely finished. I expect the main open issues are some boolean indexing decisions (which I honestly do not think matter much either way) and how to handle subclasses (i.e. I think we may need a helper object to indicate which type of indexing is going on, so that a subclass implementing only Since there are deprecation tests up there, I guess that I already did some deprecations. I would suggest if we pick this up, that we do not include any deprecations in the first iteration, but only add new features. That way existing tests can definitely stay untouched which would create a lot of churn (and merge conflicts). This is high impact, but it stalled, probably largely since it was never reviewed. I do not want to pick it up myself, but I may be able to review it. This was written by myself in 2015 or so, so effectively even without much code changes that is a different person reviewing ;). EDIT: Anyone picking this up, should probably also push officially accepting the corresponding NEP. |
@seberg Needs rebae. |
This would still nice to revive! It may not be an insane amount of work. One thing that still needs to be settled is how to exactly work with subclasses. It may be OK to break them, but it would be nice to make sure that a subclass like masked arrays can remain working reasonably well. There should be a few ideas for that discussed here (or in old mailing list posts). In general, this was pretty far along with that exception and probably code cleanup. |
Seems I was productive during trvale.
Should be good enough to try around, if someone can contribute, please do, it will probably be a while before I look at it again seriously.