-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Add option of matching regex to assert_docstring_consistency
#29867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -1254,7 +1254,7 @@ def f1_score( | |||
average : {'micro', 'macro', 'samples', 'weighted', 'binary'} or None, \ | |||
default='binary' | |||
This parameter is required for multiclass/multilabel targets. | |||
If ``None``, the scores for each class are returned. Otherwise, this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we use 'score' here but revert to metrics (see below, in the same function). I am actually unsure about whether to change all 'metrics' to 'scores' for f1_score
, precision_score
and recall_score
, or leave as 'metrics'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good question
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My own definition would be the following:
- "score": scalar where higher value is better
- "error": scalar where lower value is better
- "metric": term accounting for "score" and "error" to measure the statistical performance.
Therefore, I would be fine to generalize the term "score" when it comes these functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, do you mean 'generalize "metric"' ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or more specifically, do you mean use 'metric' everywhere as it accounts for both types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this.
@@ -1254,7 +1254,7 @@ def f1_score( | |||
average : {'micro', 'macro', 'samples', 'weighted', 'binary'} or None, \ | |||
default='binary' | |||
This parameter is required for multiclass/multilabel targets. | |||
If ``None``, the scores for each class are returned. Otherwise, this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good question
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pretty much how it looks like.
@@ -1254,7 +1254,7 @@ def f1_score( | |||
average : {'micro', 'macro', 'samples', 'weighted', 'binary'} or None, \ | |||
default='binary' | |||
This parameter is required for multiclass/multilabel targets. | |||
If ``None``, the scores for each class are returned. Otherwise, this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My own definition would be the following:
- "score": scalar where higher value is better
- "error": scalar where lower value is better
- "metric": term accounting for "score" and "error" to measure the statistical performance.
Therefore, I would be fine to generalize the term "score" when it comes these functions.
sklearn/utils/_testing.py
Outdated
@@ -760,6 +776,10 @@ def assert_docstring_consistency( | |||
List of returns to be excluded. If None, no returns are excluded. | |||
Can only be set if `include_returns` is True. | |||
|
|||
description_regex : str, default="" | |||
Regular expression to match to all descriptions. If empty string, will |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this regard, I would rephrase it slightly:
Regular expression pattern to match all descriptions.
Maybe we should be more explicit what we mean by "descriptions".
description_regex = r"""This parameter is required for multiclass/multilabel targets\. | ||
If ``None``, the metrics for each class are returned\. Otherwise, this | ||
determines the type of averaging performed on the data: | ||
``'binary'``: | ||
Only report results for the class specified by ``pos_label``\. | ||
This is applicable only if targets \(``y_\{true,pred\}``\) are binary\. | ||
``'micro'``: | ||
Calculate metrics globally by counting the total true positives, | ||
false negatives and false positives\. | ||
``'macro'``: | ||
Calculate metrics for each label, and find their unweighted | ||
mean\. This does not take label imbalance into account\. | ||
``'weighted'``: | ||
Calculate metrics for each label, and find their average weighted | ||
by support \(the number of true instances for each label\)\. This | ||
alters 'macro' to account for label imbalance; it can result in an | ||
F-score that is not between precision and recall\.[\s\w]*\.* | ||
``'samples'``: | ||
Calculate metrics for each instance, and find their average \(only | ||
meaningful for multilabel classification where this differs from | ||
:func:`accuracy_score`\)\.""" # noqa E501 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the purpose of the test, we could maybe pass a real regex pattern instead. For instance, we could require to only match the type of average, e.g., 'binary', 'micro', etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, we could also parametrize the test by passing or not the description_regex
parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could maybe pass a real regex pattern instead. For instance, we could require to only match the type of average, e.g., 'binary', 'micro', etc.
Sorry I don't follow. Do you mean take only specific average types, e.g., 'binary', 'micro', and match the extracted description between objects?
That would be another way to do it, but the current regex pattern just matches descriptions from all objects against the given regex...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant was to use this for instance:
description_regex = "\b(binary|micro|macro|weighted)\b"
instead of the entire docstring.
sklearn/utils/tests/test_testing.py
Outdated
descr_regex_pattern=" ".join(regex_full.split()), | ||
) | ||
# Check we can just match a few alternate words | ||
regex_words = r"(labels|average|binary)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@glemaitre I've added a test to check the new parameter, this one is in the same vein as what you suggested above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. @adrinjalali do you want to have another look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM.
Calculate metrics for each label, and find their average weighted | ||
by support \(the number of true instances for each label\)\. This | ||
alters 'macro' to account for label imbalance; it can result in an | ||
F-score that is not between precision and recall\.[\s\w]*\.* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we have a comment here (and wherever we have a regex) on what it does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @adrinjalali ! Done so in 2f981d4, hopefully looks okay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @lucyleeow
Reference Issues/PRs
Follows from #28678, in particular: #28678 (comment)
What does this implement/fix? Explain your changes.
Adds
description_regex
parameter to allow matching all descriptions to a regex.Adds a sample test - this will probably be one of the more difficult/awkward parameter docstrings to match properly.
Any other comments?
cc @adrinjalali @glemaitre
(To add tests once we're happy with the change)