Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Add option of matching regex to assert_docstring_consistency #29867

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Nov 21, 2024

Conversation

lucyleeow
Copy link
Member

@lucyleeow lucyleeow commented Sep 17, 2024

Reference Issues/PRs

Follows from #28678, in particular: #28678 (comment)

What does this implement/fix? Explain your changes.

Adds description_regex parameter to allow matching all descriptions to a regex.
Adds a sample test - this will probably be one of the more difficult/awkward parameter docstrings to match properly.

Any other comments?

cc @adrinjalali @glemaitre

(To add tests once we're happy with the change)

Copy link

github-actions bot commented Sep 17, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 6278c15. Link to the linter CI: here

@@ -1254,7 +1254,7 @@ def f1_score(
average : {'micro', 'macro', 'samples', 'weighted', 'binary'} or None, \
default='binary'
This parameter is required for multiclass/multilabel targets.
If ``None``, the scores for each class are returned. Otherwise, this
Copy link
Member Author

@lucyleeow lucyleeow Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we use 'score' here but revert to metrics (see below, in the same function). I am actually unsure about whether to change all 'metrics' to 'scores' for f1_score, precision_score and recall_score, or leave as 'metrics'?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good question

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My own definition would be the following:

  • "score": scalar where higher value is better
  • "error": scalar where lower value is better
  • "metric": term accounting for "score" and "error" to measure the statistical performance.

Therefore, I would be fine to generalize the term "score" when it comes these functions.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, do you mean 'generalize "metric"' ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or more specifically, do you mean use 'metric' everywhere as it accounts for both types?

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this.

@@ -1254,7 +1254,7 @@ def f1_score(
average : {'micro', 'macro', 'samples', 'weighted', 'binary'} or None, \
default='binary'
This parameter is required for multiclass/multilabel targets.
If ``None``, the scores for each class are returned. Otherwise, this
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good question

@glemaitre glemaitre requested review from adrinjalali and glemaitre and removed request for adrinjalali September 17, 2024 16:06
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pretty much how it looks like.

@@ -1254,7 +1254,7 @@ def f1_score(
average : {'micro', 'macro', 'samples', 'weighted', 'binary'} or None, \
default='binary'
This parameter is required for multiclass/multilabel targets.
If ``None``, the scores for each class are returned. Otherwise, this
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My own definition would be the following:

  • "score": scalar where higher value is better
  • "error": scalar where lower value is better
  • "metric": term accounting for "score" and "error" to measure the statistical performance.

Therefore, I would be fine to generalize the term "score" when it comes these functions.

@@ -760,6 +776,10 @@ def assert_docstring_consistency(
List of returns to be excluded. If None, no returns are excluded.
Can only be set if `include_returns` is True.

description_regex : str, default=""
Regular expression to match to all descriptions. If empty string, will
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this regard, I would rephrase it slightly:

Regular expression pattern to match all descriptions.

Maybe we should be more explicit what we mean by "descriptions".

Comment on lines 348 to 368
description_regex = r"""This parameter is required for multiclass/multilabel targets\.
If ``None``, the metrics for each class are returned\. Otherwise, this
determines the type of averaging performed on the data:
``'binary'``:
Only report results for the class specified by ``pos_label``\.
This is applicable only if targets \(``y_\{true,pred\}``\) are binary\.
``'micro'``:
Calculate metrics globally by counting the total true positives,
false negatives and false positives\.
``'macro'``:
Calculate metrics for each label, and find their unweighted
mean\. This does not take label imbalance into account\.
``'weighted'``:
Calculate metrics for each label, and find their average weighted
by support \(the number of true instances for each label\)\. This
alters 'macro' to account for label imbalance; it can result in an
F-score that is not between precision and recall\.[\s\w]*\.*
``'samples'``:
Calculate metrics for each instance, and find their average \(only
meaningful for multilabel classification where this differs from
:func:`accuracy_score`\)\.""" # noqa E501
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the purpose of the test, we could maybe pass a real regex pattern instead. For instance, we could require to only match the type of average, e.g., 'binary', 'micro', etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, we could also parametrize the test by passing or not the description_regex parameter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could maybe pass a real regex pattern instead. For instance, we could require to only match the type of average, e.g., 'binary', 'micro', etc.

Sorry I don't follow. Do you mean take only specific average types, e.g., 'binary', 'micro', and match the extracted description between objects?
That would be another way to do it, but the current regex pattern just matches descriptions from all objects against the given regex...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant was to use this for instance:

description_regex = "\b(binary|micro|macro|weighted)\b"

instead of the entire docstring.

descr_regex_pattern=" ".join(regex_full.split()),
)
# Check we can just match a few alternate words
regex_words = r"(labels|average|binary)"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@glemaitre I've added a test to check the new parameter, this one is in the same vein as what you suggested above.

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @adrinjalali do you want to have another look.

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM.

Calculate metrics for each label, and find their average weighted
by support \(the number of true instances for each label\)\. This
alters 'macro' to account for label imbalance; it can result in an
F-score that is not between precision and recall\.[\s\w]*\.*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have a comment here (and wherever we have a regex) on what it does?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @adrinjalali ! Done so in 2f981d4, hopefully looks okay?

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @lucyleeow

@adrinjalali adrinjalali merged commit 03db24f into scikit-learn:main Nov 21, 2024
30 checks passed
@lucyleeow lucyleeow deleted the doc_consis_regex branch November 21, 2024 10:03
jeremiedbb pushed a commit to jeremiedbb/scikit-learn that referenced this pull request Dec 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants