-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[WIP] : Added assert_consistent_docs() and related tests #10323
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Do we expect this to be reasonable for everything, or only for parts of the library? something like |
Or is the idea to call this with very small families, like "only linear models" or something like that? |
numpydoc is not found? (are we not using numpydoc on master... I lost track of that) |
can you choose a few related objects from one module and add tests for
their parameters as an example?
|
In |
Yes, Andy, the intention is to use this for families of objects, e.g. |
The doctest for the example is failing as I have used |
do the import locally in the function is one easy solution
|
The doctest is still failing with |
I think just skip, or remove, the doctest.
…On 17 December 2017 at 16:11, Aman Pratik ***@***.***> wrote:
The doctest is still failing with
UNEXPECTED EXCEPTION: SkipTest('numpydoc is required to test the
docstrings, as well as python version >= 3.5',)
Should we skip the doctest?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#10323 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz61TOYUDWAaKm6cvUFC7AjYZ3ggi2ks5tBKJngaJpZM4RCetf>
.
|
I will add more tests to improve the coverage shortly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work so far, thanks!
sklearn/utils/testing.py
Outdated
include_returns : list, '*' or None (default) | ||
List of Returns to be included. '*' for including all returns. | ||
|
||
exclude_params : list, '*' or None (default) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use the same order as in the function signature
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'*' is meaningless for exclusion, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if we have to ignore all the, lets say, attributes. exclude_attribs='*'
would be a nice way since we have to set either include or exclude.
sklearn/utils/testing.py
Outdated
objects (classes, functions, descriptors) with docstrings that can be | ||
parsed as numpydoc. | ||
|
||
include_params : list, '*' or None (default) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's tempting to make include_params='*'
the default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought so too.
""" | ||
Checks consistency between the docstring of ``objects``. | ||
|
||
Checks if types and descriptions of Parameters/Attributes/Returns are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to clarify behaviour when one of the params/attribs/returns is present in one and not another. Do we just ignore it and only compare for all pairs where they are common? I think so, but this should be documented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. We compare only those with the same name, else do nothing. I will document it.
sklearn/utils/testing.py
Outdated
@@ -882,3 +882,128 @@ def check_docstring_parameters(func, doc=None, ignore=None, class_name=None): | |||
if n1 != n2: | |||
incorrect += [func_name + ' ' + n1 + ' != ' + n2] | |||
return incorrect | |||
|
|||
|
|||
def check_data(doc_list, type_dict, type_name, object_name, include, exclude): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be _check_matching_docstrings or something. Definitely start with a _
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think this deserves a succinct docstring
sklearn/utils/testing.py
Outdated
def check_data(doc_list, type_dict, type_name, object_name, include, exclude): | ||
for name, type_definition, description in doc_list: | ||
# remove all whitespaces | ||
type_definition = type_definition.replace(' ', '') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
White space is significant. How about using ' '.join(s.split())
to normalise whitespace?
sklearn/utils/testing.py
Outdated
|
||
def check_data(doc_list, type_dict, type_name, object_name, include, exclude): | ||
for name, type_definition, description in doc_list: | ||
# remove all whitespaces |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do the include/exclude logic before this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In excluded cases just continue. Otherwise branch depending on whether it's been seen in a previous object.
sklearn/utils/testing.py
Outdated
|
||
if name in type_dict: | ||
u_dict = type_dict[name] | ||
if (u_dict['type_definition'] != type_definition or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use plain old
assert actual == expected, msg
This way pytest can help with more or less verbosity. Msg might be "type for parameter random_state in SVC differs from in SVR".
sklearn/utils/testing.py
Outdated
object_name+" has inconsistency.") | ||
else: | ||
if include is None: | ||
if name not in exclude: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if exclude is None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we are assuming that either include or exclude is set(not None) so if include is None then exclude is not. We might change this if we change the default for include.
sklearn/utils/testing.py
Outdated
add_dict = {} | ||
add_dict['type_definition'] = type_definition | ||
add_dict['description'] = description | ||
type_dict[name] = add_dict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be more readable if you just defined add_dict here rather than a series of insertions
sklearn/utils/tests/test_testing.py
Outdated
@@ -491,3 +496,49 @@ def test_check_docstring_parameters(): | |||
'type definition for param: "c " (type definition was "")', | |||
'sklearn.utils.tests.test_testing.f_check_param_definition There was ' | |||
'no space between the param name and colon ("d:int")']) | |||
|
|||
|
|||
def test_assert_consistent_docs(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we should be testing that the assert function works properly, i.e. by inventing or copying docstrings to test ordinary and tricky cases.
Tests for metric docstrings belong with metric tests, I think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might take some time. I will start working on it with all the other changes.
@jnothman What are we supposed to do finally about the include and exclude concept?
|
Let's get rid of '*' and replace it with True:
Yes, make a helper to only include a test when docstring testing is enabled |
@jnothman I have made the changes and added tests. Need your opinion on the tests. |
|
||
""" | ||
from numpydoc import docscrape | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should validate include and exclude make sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be helpful and do no harm, I think we should add it.
for name, type_definition, description in doc_list: | ||
if exclude is not None and name in exclude: | ||
pass | ||
elif include is not True and name not in include: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will raise TypeError if include=False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we are going with include and exclude validation in the very beginning, so it wont be necessary here.
sklearn/utils/testing.py
Outdated
type_definition = " ".join(type_definition.split()) | ||
description = [" ".join(s.split()) for s in description] | ||
try: | ||
description.remove('') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will only remove the first. You could use list(filter(None, description))
sklearn/utils/testing.py
Outdated
if name in type_dict: | ||
u_dict = type_dict[name] | ||
msg1 = (type_name + " " + name + " of " + object_name + | ||
" has inconsistent type definition.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent with what?
sklearn/utils/tests/test_testing.py
Outdated
@if_numpydoc | ||
def test_assert_consistent_docs(): | ||
# Test for consistent parameters | ||
assert_consistent_docs([func_doc1, func_doc2, func_doc3], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can just test once or twice with actual dummy functions, and then just hack the data in NumpyDocString instances to test intricacies of the implementation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will work on it.
I have added validation test for include_ and exclude_ , but I need your opinion on it. Also, I am not sure on how to test the corner cases. Few examples might help. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. Your tests aren't being run in CI currently. Hopefully #10473 will change that.
It's currently a bit hard to follow what your tests are trying to check. Testing the error message will improve this. But make sure they are systematic and commented such that each change of parameter value (e.g. include_returns=True) is clear to the reader.
How you can structure the tests: test all meaningful valid and invalid settings of {include_params, exclude_params}, then using a NumpyDocString object, set doc['Returns'] = doc['Parameters']
and doc['Parameters'] = []
, and run the same tests with {include_returns, exclude_returns}, to make sure that behaviour there is identical. Same with {include_attribs, exclude_attribs}. Do so with loops or pytest.mark.parametrize
to avoid repeating yourself. Then in a separate test function, assert things about precedence: make sure that assertions about parameters happen first, then those about attribs, then those about returns.
sklearn/utils/testing.py
Outdated
objects (classes, functions, descriptors) with docstrings that can be | ||
parsed as numpydoc. | ||
|
||
include_params : list, False or True (default) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could change list -> collection. All we care is that in
works (or in some other implementation, iteration may be used; but still, a collection is sufficient).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I should just allow collections to be passed as arguments. Wont it break somewhere? It might need some testing as well. What do you say?
sklearn/utils/testing.py
Outdated
AssertionError: Parameter y_true of mean_squared_error has inconsistency. | ||
|
||
""" | ||
if ((isinstance(exclude_params, list) and include_params is not True) or |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should allow exclude_* to be a set too. And generally, isinstance should be avoided in preference for duck typing. I think in this case (exclude_params and include_params is not True)
is sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought this was little messy.
About exclude_, I would prefer the current scheme since it is simpler and fulfills our purpose, I will document it better if needed. Why do we need exclude_ ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what you mean. Actually I think you've misunderstood that by exclude_*
I just mean exclude_params etc. All I mean is that we should not strictly be checking for a list; a set would also be an appropriate collection
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I get it now. As per your previous comment we are not just allowing lists but collections and sets as well. I will make the changes.
sklearn/utils/tests/test_testing.py
Outdated
doc1 = docscrape.NumpyDocString(inspect.getdoc(func_doc1)) | ||
doc2 = docscrape.NumpyDocString(inspect.getdoc(func_doc2)) | ||
|
||
assert_raises(AssertionError, assert_consistent_docs, [doc1, doc2], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should test the message too. Please use assert_raises_regex
(or with pytest.raises(AssertionError, match=regex)
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will do it.
sklearn/utils/tests/test_testing.py
Outdated
include_attribs=True) | ||
|
||
# Test with actual classification metrics | ||
assert_consistent_docs([precision_recall_fscore_support, precision_score, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please put this in a function test_docstrings()
in sklearn/metrics/tests/test_classification.py? I think that's where we want it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I add one for regression metrics as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No hurry to test all metrics, it's just an example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. Your tests aren't being run in CI currently. Hopefully #10473 will change that.
It's currently a bit hard to follow what your tests are trying to check. Testing the error message will improve this. But make sure they are systematic and commented such that each change of parameter value (e.g. include_returns=True) is clear to the reader.
How you can structure the tests: test all meaningful valid and invalid settings of {include_params, exclude_params}, then using a NumpyDocString object, set doc['Returns'] = doc['Parameters']
and doc['Parameters'] = []
, and run the same tests with {include_returns, exclude_returns}, to make sure that behaviour there is identical. Same with {include_attribs, exclude_attribs}. Do so with loops or pytest.mark.parametrize
to avoid repeating yourself. Then in a separate test function, assert things about precedence: make sure that assertions about parameters happen first, then those about attribs, then those about returns.
I have tried to change the tests accordingly. I have used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking pretty good. Just have a think about edge cases that are not currently tested, and which might fail if the code we're written differently. One case I can imagine is having three docstrings, where some param is shared by two but not three of them
include_returns=False, | ||
exclude_params=['labels', 'average', 'beta']) | ||
|
||
error_str = ("Parameter 'labels' of 'precision_score' has inconsistent " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume we don't want this inconsistency to exist? The docs should be fixed then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In precision_score
description there seems to be an addition,
.. versionchanged:: 0.17 parameter *labels* improved for multiclass problem.
Should this be added in precision_recall_fscore_support
? If yes then would this PR be appropriate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose so.
I will have to work on the edge cases. I get confused sometimes on which tests would be reasonable and which wont. |
it's a hard balance to get right, and you get better with practice. writing
tests before implementation, and extending them during implementation helps
think about how to assert desired functionality.
Good tests should IMO look a bit like a proof by induction. You test the
base case, and then assume everything like what's been tested so far works,
upon which you extend with variants (different parameters etc)
|
include_returns=False, | ||
exclude_params=['labels', 'average', 'beta']) | ||
|
||
error_str = ("Parameter 'labels' of 'precision_score' has inconsistent " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose so.
assert_consistent_docs([precision_recall_fscore_support, precision_score, | ||
recall_score, f1_score, fbeta_score], | ||
include_returns=False, | ||
exclude_params=['labels', 'average', 'beta']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not beta?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we want to test average for precision_score, recall_score, f1_score, fbeta_score. Can use a separate assertion, I suppose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have inconsistency in parameter beta
.
For precision_recall_fscore_support
,
beta : float, 1.0 by default
The strength of recall versus precision in the F-score.
For fbeta_score
,
beta : float
Weight of precision in harmonic mean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you checked that if there are multiple differences in a section, the one reported is deterministic?
Otherwise this is looking good
assert_consistent_docs([precision_recall_fscore_support, precision_score, | ||
recall_score, f1_score, fbeta_score], | ||
include_returns=False, | ||
exclude_params=['labels', 'average', 'beta']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we want to test average for precision_score, recall_score, f1_score, fbeta_score. Can use a separate assertion, I suppose.
def if_numpydoc(func): | ||
""" | ||
Decorator to check if numpydoc is available and python version is | ||
atleast 3.5. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- at least
|
||
if name in type_dict: | ||
u_dict = type_dict[name] | ||
msg1 = (type_name + " '" + name + "' of '" + object_name + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using one (.format) or another (%) kind of formatting string would be clearer
for u in objects: | ||
if isinstance(u, docscrape.NumpyDocString): | ||
doc = u | ||
name = 'Object '+str(i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space around +. I think we should allow the user to pass in names somehow...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps objects can be (name, numpydocstring) pairs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be appropriate for numpydocstring objects.
So, now objects
can be a callable (function,class etc.) or tuple of type (string, NumpyDocString)
. Am I right?
attrib_dict = {} | ||
return_dict = {} | ||
|
||
i = 1 # sequence of object in the collection |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use enumerate(objects, 1) instead
[doc1, doc2], include_returns=['precision'], | ||
include_params=False) # type definition mismatch | ||
|
||
doc3 = doc1 # both doc1 and doc3 return 'recall' whereas doc2 does not |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a thought inspired by this: I wonder if we should raise an error/warning if an explicitly included name is only in one of the input docstrings...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure it would be very necessary. Also, if a name is present only in a few of the objects, maybe there should be a warning for that as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you test that the error is deterministic if there are multiple inconsistencies in one section?
Apart from these, this is looking good
Well, the code would show an error on the very first inconsistency it finds. The error message would be enough to locate the exact place of inconsistency i.e. the error message shows the Parameter name, name of the concerned objects and error type(type definition or description). |
@glemaitre @scikit-learn/documentation-team what do we think of this now? There are a few usecases, worth continuing this work? |
I guess this PR was already looking in good shape, so the extra step may be worth it. |
+1, I think this is useful |
@lucyleeow @ArturoAmorQ would you have bandwidth to push this forward? pretty please? 😁 |
I'm happy to work on this 😄 |
Please feel free to ping me as well if you need reviews :) |
Yep this looks like a step ahead for consistency and having the right docstring. |
Fixes #9388
Added a function to check for consistency between docstring of objects.
In this approach there is a python dictionary for each of Parameters, Attributes and Returns. Indexed by the name of the parameter/attribute/return with the value being a Python dictionary containing its type definition and description. The function checks for each object if the docstring is identical (excluding whitespaces) for a given parameter/attribute/return in the main dictionary. If a new parameter/attribute/return are found they are added in the main dictionary.