-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
TST check if docstring items are equal between objects (functions, classes, etc.) #28678
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The incl/excl params are a bit confusing so I made this table, which hopefully helps.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @lucyleeow! I do not have much time at this moment so I only did some brief testing, and overall this looks nice. Here are some very general suggestions/concerns at first glance:
-
It might be nice to have a diff output, especially for long docstrings. Maybe using
difflib.Differ
and itscompare
method? -
I'm worried that the current solution is not flexible enough, but I don't have a good solution. For instance, the
average
parameter of the metrics are almost the same but differs only a bit, and we would have to exclude it from the comparison. Not sure what you and other maintainers think.
Good idea, I didn't want to implement something complex myself. I will have a play with that. I think I would want to print only the line that is different (it would be a very long output for e.g., the I also need to deal with the situation when there are >2 different strings. I could just use the first one as reference and compare each of the others to the first. As long as I am able to only print the line/section that is different, this should be okay.
I had the same thought. There are other scenarios that would not be so easy to deal with, e.g., using a different word in the description ('metric' instead of 'score'). |
I've had a go at print diffs. I've used
|
This is getting a bit old but @adrinjalali do you think we're still interested in having this test? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit hard to follow the code, but I like the result.
sklearn/utils/_testing.py
Outdated
ref_str = "" | ||
ref_group = [] | ||
for docstring, group in gd.items(): | ||
if not ref_str and not ref_group: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it make sense at this indentation to move things to another function? kinda hard for me to follow this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think the later additions were sort of proof of concept (to address #28678 (review)), to see if it was worth the complexity and if the output was okay.
So printing a diff using context_diff
works well for:
- single word changes,
- addition or deletion of a sentence/word
but not so good for
- whole sentence moved to somewhere else in a paragraph,
- lots of changes in a sentence
(I'll work on producing examples for the above to show what it would look like).
Having a look at difflib
package, context_diff
seemed to be the best solution (but it's been a while). I think this is probably acceptable, but any comments so far?
I'm worried that the current solution is not flexible enough, but I don't have a good solution. For instance, the average parameter of the metrics are almost the same but differs only a bit, and we would have to exclude it from the comparison. Not sure what you and other maintainers think.
I think this would be difficult, see: #28678 (comment). Do you think this is worth pursuing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a first round of comment on the code. I'll look at the tests more in details now.
sklearn/utils/_testing.py
Outdated
Args = namedtuple("args", ["include", "exclude", "arg_name"]) | ||
section_dict = { | ||
"Parameters": Args(include_params, exclude_params, "params"), | ||
"Attributes": Args(include_attribs, exclude_attribs, "attribs"), | ||
"Returns": Args(include_returns, exclude_returns, "returns"), | ||
} | ||
for section in list(section_dict): | ||
args = section_dict[section] | ||
if args.exclude and args.include is not True: | ||
raise TypeError( | ||
f"The 'exclude_{args.arg_name}' argument can be set only when the " | ||
f"'include_{args.arg_name}' argument is True." | ||
) | ||
if args.include is False: | ||
del section_dict[section] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a big fan of deleting a key here. I think that we could instead create the dictionary dynamically just by creating a small function:
Args = namedtuple("args", ["include", "exclude", "arg_name"]) | |
section_dict = { | |
"Parameters": Args(include_params, exclude_params, "params"), | |
"Attributes": Args(include_attribs, exclude_attribs, "attribs"), | |
"Returns": Args(include_returns, exclude_returns, "returns"), | |
} | |
for section in list(section_dict): | |
args = section_dict[section] | |
if args.exclude and args.include is not True: | |
raise TypeError( | |
f"The 'exclude_{args.arg_name}' argument can be set only when the " | |
f"'include_{args.arg_name}' argument is True." | |
) | |
if args.include is False: | |
del section_dict[section] | |
Args = namedtuple("args", ["include", "exclude", "arg_name"]) | |
def create_args(include, exclude, arg_name, section_name): | |
if exclude and include is not True: | |
raise TypeError( | |
f"The 'exclude_{arg_name}' argument can be set only when the " | |
f"'include_{arg_name}' argument is True." | |
) | |
if include is False: | |
return {} | |
return {section_name: Args(include, exclude, arg_name)} | |
section_dict = { | |
**create_args(include_params, exclude_params, "params", "Parameters"), | |
**create_args(include_attribs, exclude_attribs, "attribs", "Attributes"), | |
**create_args(include_returns, exclude_returns, "returns", "Returns"), | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good start. We should have subsequent PR to introduce the assertion in more places.
from sklearn.metrics import ( | ||
f1_score, | ||
fbeta_score, | ||
precision_recall_fscore_support, | ||
precision_score, | ||
recall_score, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be better to change the import here and have
from sklearn import metrics
and then call metrics.f1_score
.
We will end-up importing all functions of scikit-learn maybe :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we have to see if we want to isolate all those consistency checks in the future if there are too many.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I originally put these with the metrics tests (see: #28678 (comment)), as I had the same thought - potentially we will check many docstrings, and it may make more sense to put them with their own tests (or somewhere else?). The other tests in test_docstring_parameters.py
are more general, and covers most of the public classes/functions.
But I am not familiar with a lot of the codebase, so I do not know how many more places we would want to use this test. We can always move later!
From the tool that has been written here, I'm thinking if we could reuse some part of the machinery to check that the types of a parameter is consistent with the Apart of making sure that we have consistent information, we could then safely have something like https://github.com/scientific-python/docstub to use the type from the documentation to generate automatically the stubs (that up to now, we don't maintain) useful for IDEs. |
Good idea, I think I could definitely make that work. We may need to get consensus on terms used (e.g., 'estimator object' vs 'estimator instance' or 'array-like' vs 'ndarray', whether to include 'or None'), but this is probably a good thing. |
Looks good. Thanks @lucyleeow |
Starting to use this function in #29831, I came across two parameters, where it would be nice to compare only part of the description, e.g.,: scikit-learn/sklearn/ensemble/_stacking.py Lines 457 to 464 in 153205a
and scikit-learn/sklearn/ensemble/_stacking.py Lines 875 to 878 in 153205a
Maybe a nice addition is to be able to specify which part of the description to compare? This may be a better than setting number of words that can be different, as you won't be able to know which words end up being different. Note when we use it we'd probably run this function on one specific parameter, and set the 'description_subset' for it specifically. WDYT @glemaitre @adrinjalali |
It seems like a nice variation would be to check for a specific text be present in the docstring. It would cover this case as well, WDYT? |
Interesting, hadn't thought of that! Let me summarise potential solutions:
Another idea: we could automatically ignore a difference that is a switch between any of the words "estimator", "regressor" and "classifier". |
We could check against a regex, that would support a subset, and variations in a word. Do you think that'd work? |
Love it, I think it would cover all use cases. I'll open a PR and see what people think? |
Following on from #30854, where we discovered we need to re-work the assert function before we can use it more widely. With regards to matching descriptions where the whole description should not match, we can instead have a regex to capture group (e.g., first 2 sentences, excluding the 4th word etc), that will be matched between params.
This parameter could also take a dict, allowing us to also specify which param/attr/return this regex matches, e.g.,:
(not sure if we have to specify whether the item is a param/attr/return, as it is possible that e.g., a parameter and a return both have the same name) This means that we don't need to use a 2nd assert, when one of the items don't need to be matched completely. With regards to matching types that differ slightly, I see 2 cases:
Possible solutions:
ping @glemaitre @StefanieSenger (and maybe @adrinjalali ) WDYT? |
I like the sound of these options, but I'd need to see a PR to have a better idea of what the implications are. And a side note, as for different types, we might sometimes actually make them more consistent if we see those differences. |
I have given this some thought and here are my 2 cents. Unfortunately I don't have a solution (except to not use regex at all), but I have a very big concern about maintainability: There will be many good reasons for differing param and attribute descriptions and thus after we have added all these new tests, they will have a lot of regex expressions, including very condensed/flexible ones. And having a lot of regex patterns means that adding documentation comes with needing to adapt the regex expressions of this test (not only if the PR author forgot to add the change to related classes, but possibly also if the addition is totally valid, because we defined the regex too narrowly). This puts a very high barrier to any change in the docstrings and I am not sure if we can afford that. Also, doc PRs are often done by new contributors and we make it very hard for new people to add something to the docs, even possibly a typo correction. In general, I have the impression that in scikit-learn we are building a castle, which is safer but comes with the cost of inflexibility and - at some point - stagnation. From the more technical standpoint (and totally disregarding what I wrote before):
About making this new contributor issue: I am not sure if with the regex it should be a good first issue. I feel with the need to make a judgement on which params/attributes/returns to include and the decision on how flexible the regex should be it is not an issue for people who don't know the project well and it might require us to define what exactly people should include and what we can add afterwards. What about putting the lables "moderate" and "meta-issue" instead of "good first issue"? Unprecedented idea: Maybe people can pair with maintainers who then also push to their branches to finish the regex. We could offer this to people who have proven they know how to handle the git workflow and other skills in other good first issues first, so to say as a follow up. I know that a lot of thought and effort went into this a long time before I even became aware of it; so I feel quite guilty for coming in and voicing concerns so late in the process. Please use my comments the way that suits you best, @lucyleeow, I don't want to constrain you in any way. |
I may have been looking at different objects than you, but from my experience, for objects we want to test, parameters are mostly the same? But I agree that I can see this getting out of hand. @glemaitre did mention making a "wanted list" of objects for which we want to add this test for, so maybe we could only work on those to start with? We don't want/need to add tests for everything, as I said in the issue, the list is just a starting point and some (many?) items do not warrant a test to be added. I think:
I didn't envisage a lot of regex use, but we can see after working on the "wanted list".
Sorry I didn't make this clear (it's only obvious if you look at the code) but the description text and the type are dealt with separately in the code, because numpydoc splits these. The regex (
It's just want you described in item 1, if we allow >1 regex per 'test' we can have less 'tests'. I just used the term
Yes it probably is a bit more complicated than our usual good first issues, even without the regex part, they have to know how to use pytest etc. Not sure if we should have this has a general open issue or use for sprint etc. Let me make some draft PRs, and we can re-assess from there? |
Thanks for the explanations, @lucyleeow. I had imagined a very broad use of these tests and in Bagging* I have encountered three params (only checked params, not attributes and returns) that needed a regex. I see you are thinking of a more selected use of the test, and I agree that (given a beginner friendly error message) the test can be enriching. In any case, no need to directly adjust your approach to my opinions/concerns. I am not a maintainer and I just wanted to give some feedback from when I had tested this issue. |
Your insights are just as useful and valid as a maintainers, and I'm happy to take them on board. I had a look at the Bagging estimators and note:
|
As for |
I would tend to agree. I conservatively kept it because once it helped me pick up that we missed a You can take a look at draft PR #30926, the regex option there would allow us to ignore a 'version...' if desired, but it is extra regex work so it may be worth it to ignore by default there too. |
Reference Issues/PRs
closes #9388
closes #10323 (supercedes)
What does this implement/fix? Explain your changes.
Adds a test that checks that items in parameters/attributes/returns sections of objects are the same. Builds on #10323
False
by default (I thought this was better, so user has to explicitly turn on, also less typing as I think people will usually only not check all three sections, but if you want to exclude, you need to turn incl toTrue
)True
and exclNone
means check all itemslabels
param ofprecision_recall_fscore_support
as I can see they were all update dated together in this commitI've also added a skip fixture to skip tests if numpydoc not installed. Happy to change.
One problem still to solve: we accept
NumpyDocString
but AFAICT there is not way to get the name of the original object. We are just using naming these 'Object 1' here which is not ideal. Joel suggested that we could accept (name, numpydocstring) tuples inobjects
. This would work but is not elegant.Another solution is to use the numpdoc subclasses
ClassDoc
,FunctionDoc
andObjDoc
. These store the original object in a private attrib (e.g.,ClassDoc._cls
,FunctionDoc._f
). We could instead only accept these subclasses and we'd be able to get the object name from the private attrib? BUT there is no specific data descriptor subclass. I don't see anywhere in scikit-learn where we have a data description with param sections, so I wonder how useful the data descriptor case is?Any other comments?
Still need to add a test for
NumpyDocString
obj type.cc @adrinjalali @Charlie-XIAO (and @jnothman just in case, as you reviewed the original)