Thanks to visit codestin.com
Credit goes to github.com

Skip to content

DOC: add docstrings for numeric types #11858

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 22 commits into from
Sep 18, 2018

Conversation

YannickJadoul
Copy link
Contributor

This PR adds docstrings for np.float16, np.uint8, np.uint16, np.uint32, np.uint64. Furthermore, the formatting of the char codes in the existing docstrings is unified to the same format.

Where to generate the autosummary in sphinx, adding it to the reference documentation is not completely clear, though: the current documentation on dtypes is spread out over 2 or 3 different places:

See #10106.

(Part of EuroSciPy 2018 sprints)

@charris
Copy link
Member

charris commented Sep 1, 2018

Test failure is bogus.

Copy link
Contributor

@jeffyancey jeffyancey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks fine to me.

@YannickJadoul YannickJadoul force-pushed the np.core.numerictypes-doc branch from 97ac6b4 to 5c6c027 Compare September 2, 2018 20:20
Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just comments, did not read all, but I think it should all be good even without going into those, just bous questions.


add_newdoc('numpy.core.numerictypes', 'int64',
"""64-bit integer. Character code 'l'. Python int compatible.""")
"""
64-bit integer. Character code ``'l'``. Python int compatible.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering a bit about the Python int compatible part here. It is mostly valid for python2, and even then wrong on windows (and 32bit linux).

Copy link
Member

@eric-wieser eric-wieser Sep 3, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is a) not true on 32-bit windows, and b) doesn't set __doc__ correctly anyway

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.dtype('int64').__doc__ is empty, but np.int64.__doc__ shows this string.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My mistake, I was misinterpreting the results of #10106

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mainly imitated the existing documentation for all different int sizes to uint sizes. But yes, there's this aliasing going on between e.g., np.int32 is np.intc is true on my machine. So it's actually not straightforward to document this, since documenting np.intc will then also document np.int32.

@@ -8009,22 +8022,65 @@ def luf(lamdaexpr, *args, **kwargs):

add_newdoc('numpy.core.numerictypes', 'float128',
"""
128-bit floating-point number. Character code: 'g'. C long float
128-bit floating-point number. Character code: ``'g'``. C long float
compatible.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c long float (or long double?). Wonder if it should warn here that it is not an IEEE quad float type. And the same text is true for float96, also... But maybe this is not the place to get into those details. Not sure how prominent this shows up also on the online docs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Euhm, yes, that's a mistake, sorry. I'll fix this and add some extra notes on the 96-bits version, as well.

@eric-wieser
Copy link
Member

eric-wieser commented Sep 3, 2018

AFAIR, setting __doc__ on the types through add_newdocs doesn't actually work - but perhaps I'm mistaken.

I think we need to tread carefully here - we might end up wanting different online docs to the local docs - local docs know the exact mapping of sized aliases to C types, whereas the sphinx docs shouldn't make assumptions about the user's machine.


add_newdoc('numpy.core.numerictypes', 'int32',
"""32-bit integer. Character code 'i'. C int compatible.""")
"""
32-bit integer. Character code ``'i'``. C int compatible.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not true - on my machine, the character code is l.

If we want to add these docs, I think we should do it to byte/short/intc/int_/longlong, not the sized aliases

@YannickJadoul
Copy link
Contributor Author

we might end up wanting different online docs to the local docs - local docs know the exact mapping of sized aliases to C types, whereas the sphinx docs shouldn't make assumptions about the user's machine.

OK, this is an important point for good documentation. I will take a second stab, keeping this in mind.

To get a better picture: are these docstrings meant to also replace the list of scalar dtypes in the docs on the long run (i.e., with autosummary generated documentation) ? What level of detail is needed, and in what context should these docstrings present in the online documentation?

@eric-wieser
Copy link
Member

eric-wieser commented Sep 6, 2018

Perhaps the correct path for now is:

  • Document the native unsized np.int_ types using add_new_docs, with something like

    Corresponds to the C long type

  • Add a generated sentence like

    On this platform, this type is aliased to np.int64 <docs for int64>

@YannickJadoul YannickJadoul force-pushed the np.core.numerictypes-doc branch from 9b9b2f2 to 05c601c Compare September 6, 2018 21:45
@YannickJadoul
Copy link
Contributor Author

I'm not completely happy with how this sketch of a solution looks, but it should be a step closer to the actual correct, yet platform-flexible documentation. Is this where this PR wants to go?


integer_aliases = type_aliases('numpy.core.numerictypes', [
('int8', '8-bit singed integer (-128 to 127)'),
('int16', '16-bit singed integer (-32768 to 32767)'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: signed

('uint16', '16-bit unsinged integer (0 to 65535)'),
('uint32', '32-bit unsinged integer (0 to 4294967295)'),
('uint64', '64-bit unsinged integer (0 to 18446744073709551615)'),
('uintp', 'Unsigned integer large enough to fit pointer, compatible with C ``size_t``'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly it's uintptr_t

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed size_t and ssize_t to intptr_t and uintptr_t, but the reference docs are again slightly wrong, here, then: https://docs.scipy.org/doc/numpy/user/basics.types.html

Should these be fixed as well?

@eric-wieser
Copy link
Member

eric-wieser commented Sep 7, 2018

I like the direction this is going in - I think it will definitely be an improvement over what already exists.

I think we'll need to think a little more about how to use this with server-built sphinx builds - but for local users using help(type), I think this does exactly the right thing.

Right now none of these docstrings are in sphinx anyway, so we can punt that to a later PR.


add_newdoc('numpy.core.numerictypes', 'float_',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly this should be double - float_ is an alias defined to be compatible with python float, although admittedly the two are always the same.

{}
""".format(extra_alias_doc('numpy.core.numerictypes', 'float_', float_aliases)))

add_newdoc('numpy.core.numerictypes', 'longfloat',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think longdouble is the canonical name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, just followed the list here: https://docs.scipy.org/doc/numpy/reference/arrays.scalars.html. But indeed, longfloat seems to be an alias for longdouble.

{}
""".format(extra_alias_doc('numpy.core.numerictypes', 'clongfloat', complex_aliases)))

add_newdoc('numpy.core.numerictypes', 'complex192',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is an alias, not a real type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean that float96is also an alias, and either float96 or float128 is matched against longdouble?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bingo

def type_aliases(place, aliases):
return [(alias_type, doc) for (alias, doc) in aliases
for alias_type in (get_type(place, alias),)
if alias_type is not None]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than using a hack to make a local variable in a comprehension, I think this would be clearer as a generator using yield

@YannickJadoul
Copy link
Contributor Author

Update: fixed minor things, but:

  • clongfloat is mentioned in the existing documentation, instead of clongdouble.
  • The reference documentation on intp mentions ssize_t instead of intptr_t.

But these can probably be fixed once these docstrings get included in the reference documentation?

Next, stackoverflow shows a few hackish way of detecting sphinx (https://stackoverflow.com/questions/20843737/check-if-sphinx-doc-called-the-script), but something feels wrong about actually accessing and using this when generating the docs?

@eric-wieser
Copy link
Member

eric-wieser commented Sep 7, 2018

Lets leave worrying about sphinx to a later PR.

Both your points are correct - but I think the other docs are just subtly wrong, and I would prefer to aim for correctness not consistency - your most recent update looks good

Copy link
Member

@eric-wieser eric-wieser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you include both the name and doc in the list of aliases? Right now I think you only add the doc.

Also, adding float_ and complex_ to the list of aliases would be handy.

Can you show the output of help(np.intc), help(np.long), and help(np.longlong) after this change?

I suspect there's some overlap here with #10151

@YannickJadoul
Copy link
Contributor Author

Can you include both the name and doc in the list of aliases? Right now I think you only add the doc.

Yes, forgot that, thanks!

Also, adding float_ and complex_ to the list of aliases would be handy

Since these seem to be 'hardcoded' aliases, I'm adding these as part of the docstring itself, and not with this alias list construct.

@eric-wieser
Copy link
Member

I'm adding these as part of the docstring itself

Fine by me.

@YannickJadoul
Copy link
Contributor Author

Can you show the output of help(np.intc), help(np.long), and help(np.longlong) after this change?

>>> print(np.intc.__doc__)
Signed integer type, compatible with C ``int``.
    Character code: ``'i'``.
>>> print(np.long.__doc__)
int(x=0) -> integer
int(x, base=10) -> integer

Convert a number or string to an integer, or return 0 if no arguments
are given.  If x is a number, return x.__int__().  For floating point
numbers, this truncates towards zero.

If x is not a number or if base is given, then x must be a string,
bytes, or bytearray instance representing an integer literal in the
given base.  The literal can be preceded by '+' or '-' and be surrounded
by whitespace.  The base defaults to 10.  Valid bases are 0 and 2-36.
Base 0 means to interpret the base from the string as an integer literal.
>>> int('0b100', base=0)
4
>>> print(np.longlong.__doc__)
Signed integer type, compatible with C ``long long``. 
    Character code: ``'q'``.

I'll have a closer look at that other PR later.

@eric-wieser
Copy link
Member

I'd specifically like to see the output of help. Also I made a typo, and said np.long not np.int_.

try:
return getattr(__import__(place, globals(), {}, [obj]), obj)
except Exception:
return None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be AttributeError.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stolen from add_newdoc in core/function_base.py, where it also says Exception, but I've adapted it.

@YannickJadoul
Copy link
Contributor Author

I'd specifically like to see the output of help.

Sorry, here you go:

help(np.intc)

Help on class int32 in module numpy:

class int32(signedinteger)
 |  Signed integer type, compatible with C ``int``.
 |  Character code: ``'i'``.
 |  
 |  Method resolution order:
 |      int32
 |      signedinteger
 |      integer
 |      number
 |      generic
 |      builtins.object
 |  
 |  Methods defined here:
 |  
 |  __abs__(self, /)
 |      abs(self)
 |  
 |  __add__(self, value, /)
 |      Return self+value.
 |  
 |  __and__(self, value, /)
 |      Return self&value.
[...]
 |  astype(...)
 |      Not implemented (virtual attribute)
 |      
 |      Class generic exists solely to derive numpy scalars from, and possesses,
 |      albeit unimplemented, all the attributes of the ndarray class
 |      so as to provide a uniform API.
 |      
 |      See Also
 |      --------
 |      The corresponding attribute of the derived class of interest.
 |  
[...]

`help(np.int_)

Help on class int64 in module numpy:

class int64(signedinteger)
 |  Signed integer type, compatible with Python `int` anc C ``long``.
 |  Character code: ``'l'``.
 |  
 |  Method resolution order:
 |      int64
 |      signedinteger
 |      integer
 |      number
 |      generic
 |      builtins.object
[...]

help(np.longlong)

Help on class int64 in module numpy:

class int64(signedinteger)
 |  Signed integer type, compatible with C ``long long``. 
 |  Character code: ``'q'``.
 |  
 |  Method resolution order:
 |      int64
 |      signedinteger
 |      integer
 |      number
 |      generic
 |      builtins.object
[...]

Copy link
Member

@eric-wieser eric-wieser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't comment inline on mobile: Line 8017 should not be passing a default to getattr - the attribute will always exist for canonical names

def add_newdoc_for_numeric_type(obj, fixed_aliases, possible_aliases, doc):
o = getattr(_numerictypes, obj, None)
if o is None:
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my earlier comment - change this to o = getattr(_numerictypes, obj), since the object is guaranteed to exist.

for (alias, doc) in aliases:
alias_type = getattr(_numerictypes, alias, None)
if alias_type is not None:
yield (alias_type, alias, doc)
Copy link
Member

@eric-wieser eric-wieser Sep 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking this would be better as:

try:
    alias_type = getattr(_numerictypes, alias)
except AttributeError:
    pass
else:
    yield (alias_type, alias, doc)

As that doesn't silence bugs if an alias ends up somehow being set to None


add_newdoc('numpy.core.numerictypes', 'object_',
"""Any Python object. Character code: 'O'.""")
add_newdoc_for_numeric_type('object_', [], [],
Copy link
Member

@eric-wieser eric-wieser Sep 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I'd consider this numeric. Probably better either to rename the function to *_scalar_type, or leave the non-numeric types as they were. I realize the module is called numerictypes, but we're stuck with that for now.

Copy link
Member

@eric-wieser eric-wieser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the changes I suggested below myself. Feel free to tweak if you want, but I think this is ready to go in

@YannickJadoul
Copy link
Contributor Author

Made the changes I suggested below myself. Feel free to tweak if you want, but I think this is ready to go in

I've still implemented the one small change in numeric_type_aliases you suggested.

@YannickJadoul
Copy link
Contributor Author

Oh, and almost forgot, but I yesterday noticed that this actually doesn't work for the complex types. These types already have a docstring set in the C code, and add_newdocs will not overwrite the existing one (but does so silently):

>>> help(np.cdouble)
Help on class complex128 in module numpy:

class complex128(complexfloating, builtins.complex)
 |  Composed of two 64 bit floats

I guess I've missed this since been focusing on the signed and unsigned integers, since they had these C type equivalence issues.

The line with this docstring is here: https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/scalartypes.c.src#L3773. My instinctive reaction would be to make this consistent and remove the docstring from the C code, but then I don't know the reason why these docstrings are there?

@eric-wieser
Copy link
Member

My instinctive reaction would be to make this consistent and remove the docstring from the C code

Seems sensible to me. Even if there is a reason for them to be there, it would apply to all of the docstrings anyway, not just that one,

('complex128', 'Complex number type composed of 2 64-bit-precision floating-point numbers'),
('complex192', 'Complex number type composed of 2 96-bit extended-precision floating-point numbers'),
('complex256', 'Complex number type composed of 2 128-bit extended-precision floating-point numbers'),
])
Copy link
Member

@eric-wieser eric-wieser Sep 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any real value to having these four separate lists of aliases, rather than building one big list to perform the lookup in?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Euhm, yeah, the idea was not not loop over irrelevant aliases (while. But that does seem to come at the cost of being slightly more error-prone. I'm guessing you prefer the side of the trade-off that's less error-prone?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, especially since collectivey we've proven that that type of error is easy to make and hard to spot. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One less argument to add_newdoc_for_scalar_type as well, as bonus.

…ultiarray/scalartypes.c.src, as they are now set in numpy/core/_add_newdocs.py
try:
alias_type = getattr(_numerictypes, alias)
except AttributeError:
pass
Copy link
Member

@eric-wieser eric-wieser Sep 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could do with a comment here like "the set of aliases that actually exist varies between platforms" or "this alias is not present on this platform" or something

Copy link
Member

@eric-wieser eric-wieser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want more visibility for this, a release note under "improvements" mentioning that help(np.intp) or similar now shows a list of common type aliases would seem pretty sensible

@eric-wieser eric-wieser added this to the 1.16.0 release milestone Sep 17, 2018
@YannickJadoul YannickJadoul force-pushed the np.core.numerictypes-doc branch from c47a0a7 to dff5de6 Compare September 17, 2018 10:28
@YannickJadoul
Copy link
Contributor Author

Almost done! :-)

Running the code from #10106 again, there are a few more undocumented numeric types that seem to be related, part of the type hierarchy of scalar types:

  • <class 'numpy.complexfloating'> (np.complexfloating)
  • <class 'numpy.flexible'> (np.flexible)
  • <class 'numpy.floating'> (np.floating)
  • <class 'numpy.inexact'> (np.inexact)
  • <class 'numpy.integer'> (np.integer)
  • <class 'numpy.number'> (np.number)
  • <class 'numpy.signedinteger'> (np.signedinteger)
  • <class 'numpy.unsignedinteger'> (np.unsignedinteger)

Futhermore, numpy.void seems like it could still be a couple of lines included in this PR as well, just after object ?

@mattip
Copy link
Member

mattip commented Sep 17, 2018

Perhaps let this go in as-is, and add the rest in another PR?

@eric-wieser
Copy link
Member

I'm with @mattip on this one - documenting the abstract types seems like a separate task to documenting the concrete ones.

@eric-wieser
Copy link
Member

Just tweaked the release notes - I plan to squash and merge in a day or two, in case anyone else decides to weigh in.

@charris charris merged commit 9741ce2 into numpy:master Sep 18, 2018
@charris
Copy link
Member

charris commented Sep 18, 2018

Thanks @YannickJadoul. And thanks to Eric for helping get this knocked into shape.

@YannickJadoul
Copy link
Contributor Author

Thanks indeed, @eric-wieser, for the guidance and remarks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants