-
-
Notifications
You must be signed in to change notification settings - Fork 11k
DOC: Improve the description of the dtype
parameter in numpy.array
docstring
#10614
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Yes, because we probably need to provide some way to obtain the C long type in numpy - we provide I think the question you mean to ask is "Why is the default integer type |
@eric-wieser as I wrote, I've asked this question on the ML, because I think it is more suitable there than here, but there was no feedback. In fact, there are certainly a few related questions, I'm sorry for that (but I did not want to separate questions so as not to litter).
Partly yes :-) Nevertheless, I find this phrase - "If not given, then the type will be determined as the minimum type required to hold the objects in the sequence" to be incorrect, Also I can't understand why this casts to (np.array([0x8fffffffffffffff]) + np.array([0x8fffffffffffffff])).dtype
dtype('uint64') But this does not cast to (np.array([0x7fffffff]) + np.array([0x7fffffff])).dtype
dtype('int32') Also I don't understand why this casts to >>> (np.array(1, np.int32) + np.array(1, np.uint64)).dtype
dtype('float64') #instead of dtype('O') And if C's rules are intended why it casts to |
The fact that Also, I'm not sure what you mean by "type chain", but if you mean the C type-conversion rules, they are more complicated than what you said. But numpy does not use C-conversion rules anyway. As to the main issue here of whether the docstring could be improved, I agree it could. I think a more precise (but less descriptive) description might be something like:
|
This is plain wrong: >>> a = np.array(256, np.uint16)
>>> np.array(a, dtype=np.uint8) # definitely not a downcast... /s
array(0, dtype=uint8) |
@ahaldane from yours link I found another one #5745. And it looks that the question of this type with the corresponding surprise occurs at least every two months.
I meant that Also Back to the topic: To be honest, I found the current implicit treatment of |
I think there's a major disconnect between the documentation and what we actually do. What you describe above sounds like what we do already (except with int_ instead of int32). Can you give an example where that's not the case? |
@eric-wieser Yes, in the following moments:
p.s.
Why |
I think most of us devs feel we should try to limit unintentional creation of object array rather than do it more often. See #5353, #7547, #368, for example. Object arrays can't perform quite a few operations that normal arrays can, and often lead to confusing bugs if the user wasn't expecting an object array. I agree the creation of the unsigned integers seems undocumented and might be surprising, I hadn't understood that part of your comment before. The code that actually does that is here (link). We should at least document it somewhere. |
@ahaldane I respect the choice of developers and of course I do not have so much experience with The #7547 is not really related, in any case it is hard to make a reasonable assumption what the result for In[2]: np.asarray([1, 2, 3]) * 'a'
TypeError: ufunc 'multiply' did not contain a loop with signature matching types dtype('<U11') dtype('<U11') dtype('<U11')
In[3]: np.asarray([1, 2, 3], dtype=np.object) * 'a'
Out[3]: array(['a', 'aa', 'aaa'], dtype=object) Reading through another ones I have a feeling that all problems are related with the fact that |
@godaygo There are actually a number similar related problems involving dtypes which many developers (including those more ancient and revered than me:) have been planning on trying to fix for a long time. See the "epic dtype cleanup plan" #2899. One goal is to make it easy to add something like a "multiprecision integer" type. In fact, BIDS obtained a grant to fund two numpy developers, one of the goals being to finally implement this plan. Last I heard the position is still open and you can apply if it interests you: http://numpy-discussion.10968.n7.nabble.com/Position-at-BIDS-UC-Berkeley-to-work-on-NumPy-td45085.html |
Also, you comment about a dedicated multiprecision int reminds me of this issue and comment: #3804 (comment) |
I have a similar question: a = array((1, 2, 3, 4))
a, a.shape, a.dtype
# (array([1, 2, 3, 4]), (4,), dtype('int64')) It doesn't seem like |
The documentation of the dtype kwarg still states
I wish we could write "If not given, then the type will be determined according to internal arbitrary rules that may change between versions." In practice it would be good to link to a part of a NEP or other documentation that describes the algorithm used. |
I think "minimum" is a bit incorrect though. We basically do There are a couple of subtleties to this, and I don't want to list them all, especially since I am trying to just remove some of them and hope nobody notices... (especially around strings). Plus what |
I agree especially with the following point of @godaygo :
I am a numpy user for over 15 years now, yet I have not been aware of this difference. This has bitten us in production, where a lazy piece of code actively converted a |
dtype
parameter in numpy.array
docstringdtype
parameter in numpy.array
docstring
I am wondering if there's anything we can do here to make this issue more actionable. It looks like the situation is as follows:
Does this make sense? If not, any suggestions for some definition of "Done" for this issue? |
NEP 50 doesn't actually propose to change anything here yet :), although it does explain the problem a bit. The weird subtleties I hinted at long ago are gone now. So the rules now are the same as calling Can we just say something like NumPy will try to use a default dtype that can represent the values (by applying promotion rules when necessary)? That glances over the integer nonsense, but at least it doesn't say "minimal"... I don't have a great idea, but maybe it helps. |
Uh oh!
There was an error while loading. Please reload this page.
The description of the
dtype
parameter innumpy.array
docstring looks as follows:which is rather misleading, because in reality the behavior is different. For example for integers the type chain on Windows is (
int32
->int64
->uint64
->object
), generally it starts withnp.int_
. Also, for example, there is nonp.uint32
in this chain, and also there is nonp.int8
and etc.One more thought, this behavior is also inconsistent with the following snippet:
Thus in
np.array
numpy
follows Python rules and in expression it follows C's rules. But it also follows deliberately:Another moment that by default the
dtype
will benumpy.float64
instead ofnp.int_
which contradicts with the description. Related issue is #10405.I don't know what is the right way to resolve this tangle. I opened this issue here because it seems that there is no interest in discussing this on numpy's ML.
p.s.: one more point, is there any real benefit to define
np.int_
as C's long (instead of 8 byte on 64-bit and 4 byte on 32-bit), with the following differences on Windows 64-bit and other OSs. As for me, I've never did meet the advantages of this choice, only a few inconveniences, and the need to providedtype=
everywhere.The text was updated successfully, but these errors were encountered: