Thanks to visit codestin.com
Credit goes to github.com

Skip to content

ENH: Getting NEP 50 behavior in the array API compat library #22341

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
asmeurer opened this issue Sep 27, 2022 · 8 comments
Closed

ENH: Getting NEP 50 behavior in the array API compat library #22341

asmeurer opened this issue Sep 27, 2022 · 8 comments

Comments

@asmeurer
Copy link
Member

Proposed new feature or change:

In #21626 NEP 50 is implemented as opt-in with either a global setting or a context manager. I am working on creating an array API compatibility library (see WIP at https://github.com/data-apis/numpy-array-api-compat). Unlike numpy.array_api, this compatibility library is separate from numpy, so that it can be updated independently. It also aims to be usable for downstream libraries implementing against the array API. So in particular, it is not a strict implementation like numpy.array_api. For instance, it will not raise errors for dtype combinations that are not required by the spec but are allowed by NumPy (see also https://numpy.org/doc/stable/reference/array_api.html).

For this library, I'd like to keep the wrapping to a minimum, and in particular, I'd like to avoid creating a wrapper class for arrays like was done in numpy.array_api, and instead just use np.ndarray directly. Since we aren't going for strictness, this isn't a problem, but one issue is type promotion. The spec requires no value-based type promotion, along the lines of NEP 50.

The question then is how can we achieve this, so that users of the compatibility library will get something that looks like NumPy but follows the array API specification. Neither of the implemented solutions looks very good for this. We could set the global flag, but this could break other libraries in the same process that use NumPy outside of the array API. In general, it seems like a bad practice for a library to set a global flag. The context manager doesn't seem to be helpful, since most instances of type promotion issues come from operators. If a library (like scikit-learn) has some code like

import numpy_array_api_compat as np
a = np.asarray(...)
b = np.asarray(...)
# suppose a is an array and b is a scalar, so that NEP 50 is relevant
c = a + b

Then the only way it can make a + b do the right type promotion is for scikit-learn itself to add the context manager. But it's annoying for every implementing code to do this.

Ideally, we'd be able to set a promotion_state = 'weak' flag on arrays and scalars which get carried around by subsequent operations. I can then wrap asarray and every other array API function that creates arrays in the compat library to set this flag, so that any (array API compatible) usage of those arrays automatically gets the correct type promotion behavior.

I'm not sure if this is feasible. If anyone has any other suggestions how we could achieve this, let me know. The options I can think of are

  • Add a promotion_state flag to ndarray/scalars.
  • Make a wrapper ndarray class in the compat library, similar to numpy.array_api. I'd prefer to avoid this but we can consider it if it's unavoidable.
  • Require implementing libraries to use the context manager.
  • Ignore type promotion differences between NumPy and the array API (this option is more or less the same as the previous one).
@seberg
Copy link
Member

seberg commented Sep 28, 2022

A context manager, and more importantly the corresponding contextvar, would be the only way that I can see right now.
You will have to add a few additional hacks (i.e. keep the global flag, but only use it while we are in C and assume/enforce that the context cannot change).
The reason is that checking a contextvar often is likely fine, but in legacy ufunc promotion we may need to check it a dozen times for a single function call.

I always assumed we go with point 4. though! This is because it actually seems the easier/more useful thing for most likely users?

Currently, I assume any adopting library will already be used a lot with NumPy input. Enabling the flag would thus modify behavior of the library. It seems easier to me from a library perspective if the behavior change happens when updating to NumPy 2.0, rather than earlier.
So for existing libraries, not getting new behavior seems fine to me and maybe even easier.

Newly written libraries might be annoyed by this, I admit. Although, then it might be best to require the library to use the context-manager (or some option) to actually get the new behavior.
If that is desired, that seems fine to me, but much lower priority?

@asmeurer
Copy link
Member Author

The point is that libraries that write against the array API will effectively get the new behavior whenever they run against an array library that doesn't have the NumPy legacy behavior. So they will have to deal with those promotion rules anyway if they want things to be consistent across array libraries.

@seberg
Copy link
Member

seberg commented Sep 29, 2022

if they want things to be consistent across array libraries.

Right, but is that actually something they want or need? If you really wanted that fine grained consistency, you cannot use plain NumPy arrays, but would have to always wrap them consistently (or flag them).
Importantly, you would also have to unwrap/unflag them consistently, since you don't want to affect the user code?

This is of course a pattern that can be done and was mentioned many times: wrap, use a (less minimal) np.array_api, and unwrap again before returning to the user.

That pattern still seems impractical to me, but I still don't want to discourage you from exploring it if that is what you consider best.
Importantly, the sklearn PR obviously did not bother about any of these considerations at all (although it probably doesn't matter for sklearn). Also, if a library really wants fine grained control, it may actually make sense to manually figure out the right promotions early on.

In this case I still think that trying to get promotion right here is making the perfect the enemy of the good: To me it seems like a huge amount of hassle for something that practically nobody will even notice.

@seberg
Copy link
Member

seberg commented Sep 29, 2022

The only true solution would be wrapping (for a library, end-users you might just tell to upgrade NumPy and opt-in to NEP 50).

Wrapping would require a clear pattern that we do not have for 1.5+ years. And that may even require formalizing array_api.return_array(...) or helper decorators.

Call these deficiencies NumPy bugs if you like and hope that downstream pushes for fixing them and adopting NEP 50 (e.g. this question). I would much prefer to spend our limited brain cycles on pushing for NEP 50 rather than inventing a compatibility layer that I don't think anyone wants to keep in the mid-term?!

If we cannot do without that compatibility layer in the long-term (say because NumPy returns scalar and we don't know when we might change that), then that is a different matter. In that case, I could try to look into "marking" arrays and you would need to look into how that compatibility layer should look like (i.e. remove the need for you to fully wrap the NumPy array).

@asmeurer
Copy link
Member Author

@thomasjpfan were type promotion differences a concern with your scikit-learn array API work?

I'm happy to leave this as is for now, until it comes up for a real world use case.

@thomasjpfan
Copy link
Contributor

For using Array API, I did not hit any issues with value based casting. Most of my typing related changes had to do with the strictness of casting. For example, one can not divide an integer array by another integer with the Array API:

import numpy.array_api as xp

X = xp.asarray([1, 2, 3])
Y = X / 3

@Micky774 has a PR that experimented with "turning on NEP 50" and the changes to scikit-learn were small: scikit-learn/scikit-learn#23644

@asmeurer
Copy link
Member Author

So it sounds like many libraries will already be implementing NEP 50 support concurrently with array API support, so it may not be necessary to worry about it so much in the compat library.

For example, one can not divide an integer array by another integer with the Array API:

The compat library won't be strict about this, but that's why it's still useful to test against the minimal numpy.array_api implementation, because there's no guarantee that other libraries will allow integer division (for example) when it isn't required by the spec.

@seberg
Copy link
Member

seberg commented Oct 19, 2022

Going to close this for now, I will note that I hope to have np.ufunc.resolve_dtypes() available soon, which might be interesting in similar situations (always uses NEP 50). Even if it is not relevant for the exact question here.

We can reopen if the decision to ignore difference needs to be reconsidered.

@seberg seberg closed this as completed Oct 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants