-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
DOC: add docstrings for numeric types #11858
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: add docstrings for numeric types #11858
Conversation
Test failure is bogus. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks fine to me.
97ac6b4
to
5c6c027
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just comments, did not read all, but I think it should all be good even without going into those, just bous questions.
numpy/core/_add_newdocs.py
Outdated
|
||
add_newdoc('numpy.core.numerictypes', 'int64', | ||
"""64-bit integer. Character code 'l'. Python int compatible.""") | ||
""" | ||
64-bit integer. Character code ``'l'``. Python int compatible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering a bit about the Python int compatible part here. It is mostly valid for python2, and even then wrong on windows (and 32bit linux).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is a) not true on 32-bit windows, and b) doesn't set __doc__
correctly anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np.dtype('int64').__doc__
is empty, but np.int64.__doc__
shows this string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My mistake, I was misinterpreting the results of #10106
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mainly imitated the existing documentation for all different int
sizes to uint
sizes. But yes, there's this aliasing going on between e.g., np.int32 is np.intc
is true on my machine. So it's actually not straightforward to document this, since documenting np.intc
will then also document np.int32
.
numpy/core/_add_newdocs.py
Outdated
@@ -8009,22 +8022,65 @@ def luf(lamdaexpr, *args, **kwargs): | |||
|
|||
add_newdoc('numpy.core.numerictypes', 'float128', | |||
""" | |||
128-bit floating-point number. Character code: 'g'. C long float | |||
128-bit floating-point number. Character code: ``'g'``. C long float | |||
compatible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
c long float (or long double?). Wonder if it should warn here that it is not an IEEE quad float type. And the same text is true for float96, also... But maybe this is not the place to get into those details. Not sure how prominent this shows up also on the online docs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Euhm, yes, that's a mistake, sorry. I'll fix this and add some extra notes on the 96-bits version, as well.
I think we need to tread carefully here - we might end up wanting different online docs to the local docs - local docs know the exact mapping of sized aliases to C types, whereas the sphinx docs shouldn't make assumptions about the user's machine. |
numpy/core/_add_newdocs.py
Outdated
|
||
add_newdoc('numpy.core.numerictypes', 'int32', | ||
"""32-bit integer. Character code 'i'. C int compatible.""") | ||
""" | ||
32-bit integer. Character code ``'i'``. C int compatible. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not true - on my machine, the character code is l
.
If we want to add these docs, I think we should do it to byte
/short
/intc
/int_
/longlong
, not the sized aliases
OK, this is an important point for good documentation. I will take a second stab, keeping this in mind. To get a better picture: are these docstrings meant to also replace the list of scalar dtypes in the docs on the long run (i.e., with |
Perhaps the correct path for now is:
|
9b9b2f2
to
05c601c
Compare
I'm not completely happy with how this sketch of a solution looks, but it should be a step closer to the actual correct, yet platform-flexible documentation. Is this where this PR wants to go? |
numpy/core/_add_newdocs.py
Outdated
|
||
integer_aliases = type_aliases('numpy.core.numerictypes', [ | ||
('int8', '8-bit singed integer (-128 to 127)'), | ||
('int16', '16-bit singed integer (-32768 to 32767)'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: signed
numpy/core/_add_newdocs.py
Outdated
('uint16', '16-bit unsinged integer (0 to 65535)'), | ||
('uint32', '32-bit unsinged integer (0 to 4294967295)'), | ||
('uint64', '64-bit unsinged integer (0 to 18446744073709551615)'), | ||
('uintp', 'Unsigned integer large enough to fit pointer, compatible with C ``size_t``'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strictly it's uintptr_t
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed size_t
and ssize_t
to intptr_t
and uintptr_t
, but the reference docs are again slightly wrong, here, then: https://docs.scipy.org/doc/numpy/user/basics.types.html
Should these be fixed as well?
I like the direction this is going in - I think it will definitely be an improvement over what already exists. I think we'll need to think a little more about how to use this with server-built sphinx builds - but for local users using Right now none of these docstrings are in sphinx anyway, so we can punt that to a later PR. |
numpy/core/_add_newdocs.py
Outdated
|
||
add_newdoc('numpy.core.numerictypes', 'float_', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strictly this should be double
- float_
is an alias defined to be compatible with python float
, although admittedly the two are always the same.
numpy/core/_add_newdocs.py
Outdated
{} | ||
""".format(extra_alias_doc('numpy.core.numerictypes', 'float_', float_aliases))) | ||
|
||
add_newdoc('numpy.core.numerictypes', 'longfloat', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think longdouble
is the canonical name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, just followed the list here: https://docs.scipy.org/doc/numpy/reference/arrays.scalars.html. But indeed, longfloat
seems to be an alias for longdouble
.
numpy/core/_add_newdocs.py
Outdated
{} | ||
""".format(extra_alias_doc('numpy.core.numerictypes', 'clongfloat', complex_aliases))) | ||
|
||
add_newdoc('numpy.core.numerictypes', 'complex192', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one is an alias, not a real type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that mean that float96
is also an alias, and either float96
or float128
is matched against longdouble
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bingo
numpy/core/_add_newdocs.py
Outdated
def type_aliases(place, aliases): | ||
return [(alias_type, doc) for (alias, doc) in aliases | ||
for alias_type in (get_type(place, alias),) | ||
if alias_type is not None] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than using a hack to make a local variable in a comprehension, I think this would be clearer as a generator using yield
Update: fixed minor things, but:
But these can probably be fixed once these docstrings get included in the reference documentation? Next, stackoverflow shows a few hackish way of detecting sphinx (https://stackoverflow.com/questions/20843737/check-if-sphinx-doc-called-the-script), but something feels wrong about actually accessing and using this when generating the docs? |
Lets leave worrying about sphinx to a later PR. Both your points are correct - but I think the other docs are just subtly wrong, and I would prefer to aim for correctness not consistency - your most recent update looks good |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you include both the name and doc in the list of aliases? Right now I think you only add the doc.
Also, adding float_
and complex_
to the list of aliases would be handy.
Can you show the output of help(np.intc)
, help(np.long)
, and help(np.longlong)
after this change?
I suspect there's some overlap here with #10151
Yes, forgot that, thanks!
Since these seem to be 'hardcoded' aliases, I'm adding these as part of the docstring itself, and not with this alias list construct. |
Fine by me. |
>>> print(np.intc.__doc__)
Signed integer type, compatible with C ``int``.
Character code: ``'i'``.
>>> print(np.long.__doc__)
int(x=0) -> integer
int(x, base=10) -> integer
Convert a number or string to an integer, or return 0 if no arguments
are given. If x is a number, return x.__int__(). For floating point
numbers, this truncates towards zero.
If x is not a number or if base is given, then x must be a string,
bytes, or bytearray instance representing an integer literal in the
given base. The literal can be preceded by '+' or '-' and be surrounded
by whitespace. The base defaults to 10. Valid bases are 0 and 2-36.
Base 0 means to interpret the base from the string as an integer literal.
>>> int('0b100', base=0)
4
>>> print(np.longlong.__doc__)
Signed integer type, compatible with C ``long long``.
Character code: ``'q'``. I'll have a closer look at that other PR later. |
I'd specifically like to see the output of |
numpy/core/_add_newdocs.py
Outdated
try: | ||
return getattr(__import__(place, globals(), {}, [obj]), obj) | ||
except Exception: | ||
return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be AttributeError
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stolen from add_newdoc
in core/function_base.py
, where it also says Exception
, but I've adapted it.
Sorry, here you go:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't comment inline on mobile: Line 8017 should not be passing a default to getattr - the attribute will always exist for canonical names
numpy/core/_add_newdocs.py
Outdated
def add_newdoc_for_numeric_type(obj, fixed_aliases, possible_aliases, doc): | ||
o = getattr(_numerictypes, obj, None) | ||
if o is None: | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my earlier comment - change this to o = getattr(_numerictypes, obj)
, since the object is guaranteed to exist.
numpy/core/_add_newdocs.py
Outdated
for (alias, doc) in aliases: | ||
alias_type = getattr(_numerictypes, alias, None) | ||
if alias_type is not None: | ||
yield (alias_type, alias, doc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Strictly speaking this would be better as:
try:
alias_type = getattr(_numerictypes, alias)
except AttributeError:
pass
else:
yield (alias_type, alias, doc)
As that doesn't silence bugs if an alias ends up somehow being set to None
numpy/core/_add_newdocs.py
Outdated
|
||
add_newdoc('numpy.core.numerictypes', 'object_', | ||
"""Any Python object. Character code: 'O'.""") | ||
add_newdoc_for_numeric_type('object_', [], [], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I'd consider this numeric. Probably better either to rename the function to *_scalar_type
, or leave the non-numeric types as they were. I realize the module is called numerictypes
, but we're stuck with that for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made the changes I suggested below myself. Feel free to tweak if you want, but I think this is ready to go in
I've still implemented the one small change in |
Oh, and almost forgot, but I yesterday noticed that this actually doesn't work for the complex types. These types already have a docstring set in the C code, and >>> help(np.cdouble)
Help on class complex128 in module numpy:
class complex128(complexfloating, builtins.complex)
| Composed of two 64 bit floats I guess I've missed this since been focusing on the signed and unsigned integers, since they had these C type equivalence issues. The line with this docstring is here: https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/scalartypes.c.src#L3773. My instinctive reaction would be to make this consistent and remove the docstring from the C code, but then I don't know the reason why these docstrings are there? |
Seems sensible to me. Even if there is a reason for them to be there, it would apply to all of the docstrings anyway, not just that one, |
('complex128', 'Complex number type composed of 2 64-bit-precision floating-point numbers'), | ||
('complex192', 'Complex number type composed of 2 96-bit extended-precision floating-point numbers'), | ||
('complex256', 'Complex number type composed of 2 128-bit extended-precision floating-point numbers'), | ||
]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any real value to having these four separate lists of aliases, rather than building one big list to perform the lookup in?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Euhm, yeah, the idea was not not loop over irrelevant aliases (while. But that does seem to come at the cost of being slightly more error-prone. I'm guessing you prefer the side of the trade-off that's less error-prone?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, especially since collectivey we've proven that that type of error is easy to make and hard to spot. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One less argument to add_newdoc_for_scalar_type
as well, as bonus.
…ultiarray/scalartypes.c.src, as they are now set in numpy/core/_add_newdocs.py
…ing 'np.' to fixed aliases
numpy/core/_add_newdocs.py
Outdated
try: | ||
alias_type = getattr(_numerictypes, alias) | ||
except AttributeError: | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could do with a comment here like "the set of aliases that actually exist varies between platforms" or "this alias is not present on this platform" or something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want more visibility for this, a release note under "improvements" mentioning that help(np.intp)
or similar now shows a list of common type aliases would seem pretty sensible
c47a0a7
to
dff5de6
Compare
Almost done! :-) Running the code from #10106 again, there are a few more undocumented numeric types that seem to be related, part of the type hierarchy of scalar types:
Futhermore, |
Perhaps let this go in as-is, and add the rest in another PR? |
I'm with @mattip on this one - documenting the abstract types seems like a separate task to documenting the concrete ones. |
Just tweaked the release notes - I plan to squash and merge in a day or two, in case anyone else decides to weigh in. |
Thanks @YannickJadoul. And thanks to Eric for helping get this knocked into shape. |
Thanks indeed, @eric-wieser, for the guidance and remarks! |
This PR adds docstrings for
np.float16
,np.uint8
,np.uint16
,np.uint32
,np.uint64
. Furthermore, the formatting of the char codes in the existing docstrings is unified to the same format.Where to generate the autosummary in sphinx, adding it to the reference documentation is not completely clear, though: the current documentation on dtypes is spread out over 2 or 3 different places:
See #10106.
(Part of EuroSciPy 2018 sprints)