-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
ENH: add cartesian() #5874
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: add cartesian() #5874
Conversation
Generate the cartesian product of input arrays.
|
out = out.T | ||
|
||
for j, arr in enumerate(arrays): | ||
n /= arr.size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to do n //= arr.size
to silence the deprecation warnings that are failing the tests.
Wherever it ends up, it will need to be mentioned in the release notes, and the docstring needs one of those |
I feel pretty strongly that automatically flattening input to 1D is a bad idea. Better to just raise an error (though I do like the idea of the broadcasting across other dimensions). |
- add versionadded - improve doc - refuse 2D input - fix unittests - more unittests - add "unique" parameter
The cartesian product is defined to works on sets. Following this definition we >>> cartesian((np.array([1, 1]), np.array([1, 1])))
array([[1, 1]]) Would this confuse people? I don't know. Alternatively one could argue that each row in a 2-D array is an element of the >>> cartesian((np.array([[1, 1]]), np.array([[1, 1]])))
array([[1, 1, 1, 1]]) Would we still use the unique rows then? >>> cartesian((np.array([[1, 1], [1, 1]]), np.array([[1, 1], [2, 2]])))
array([[1, 1, 1, 1], [1, 1, 2, 2]]) Just as reference Python's >>> list(itertools.product(range(2), range(2)))
[(0, 0), (0, 1), (1, 0), (1, 1)]
>>> list(itertools.product([1, 1], range(2)))
[(1, 0), (1, 1), (1, 0), (1, 1)] If we refuse 2D input we still have to decide if we use Here is my suggestion which represents the current pull request:
|
I think a For the name of this function, how about using the most explicit |
I agree with @shoyer. I would expect it to work like |
OK. Sounds good to me. But we'll refuse 2D input, right?! It's easy to flatten the input. |
Here's a tentative version of what a "broadcast on the rest" implementation could look like. I have removed the
You could then do sick things like, computing the cartesian products over the last dimension, for the cartesian product of all subarrays, or something like that...
The logic to broadcast arrays excluding an axis probably deserves being turned into its own function, especially if we want the error message to have some relevance to the original shape of the arrays, to the remapped one passed into |
@jaimefrio I like the idea of copying generalized ufuncs, but this seems way too complicated. In particular, there are way too many ways to interpret the
The implemented solution here is, I think, (2)a, but all these seem like reasonable guesses. You also have I don't think there's a way to make the Unfortunately, even in that case the function signature (allowing for computed dimension sizes) is not entirely obvious. Jaime seems to be suggesting something like Another way to vectorize across multiple dimensions is to do the cartesian product along all axes. That would imply: I think that treating the signature of the function for 1D input (as already documented) as the core signature for a gufunc is the most consistent approach to generalizing to multiple dimensions. |
One case in which an |
I wonder if we should take these problems with computing output axis sizes
|
@shoyer In my example, I am adding a size-1 dimension to the first array, so the dimensions of the inputs are |
Regarding your suggestion of doing cartesian products over all axes, I think the example I chose shows that it can very easily be achieved with the proposed functionality by adding a few size-1 axes and a call to |
And yes, adding an Until we sort that out, if we don't want to stall this, I think we have two options:
|
I think it would be nice to allow subclasses, at least in principle (via a
|
My main point was, why are we flattening the output? E.g. even in the 1d case we could return |
@mhvk Shouldn't @njsmith If the inputs are all 1-D, then yes, reshaping to |
Let's not start inventing new array_priority special cases, there are I have no idea what to say about the full generality version with @mhvk https://github.com/mhvk Shouldn't array_priority play some role @njsmith https://github.com/njsmith If the inputs are all 1-D, then yes, — |
@jaimefrio - In principle, one might want to have a But your comment does make me realise that this probably needs more thought about how to generalize this. Since this involves adding a |
My supposedly simple pull requests have the tendency to turn into bigger discussions :) I agree with @jaimefrio that solving how to handle the axis argument to a multi-operand functions probably should be discussed on the mailinglist. To proceed with the pull request: |
☔ The latest upstream changes (presumably #7742) made this pull request unmergeable. Please resolve the merge conflicts. |
There does not seem to be any demand for this feature. I guess we can close this. |
Generate the cartesian product of input arrays.
This is the pull request that resulted from the discussion. First, thanks to everybody who participated in the discussion!
Here is a performance comparison of different implementations.
The PR contains the implementation, docs and unittests, but...
I'm not quite sure if
arraysetopt.py
it the right place for this function."Set operations for 1D numeric arrays based on sorting." does not sound like
cartesian
.cartesian
is similar toprod
, butfromnumeric.py
is also not the right place.Maybe
core/function_base.py
next tolinspace
? Orlinalg.py
?Open questions
or should we give it an axis argument, and then "broadcast on the rest", a la
generalized ufunc?" :