DOC See Also descriptions do not match for multiple functions/classes #24464

vitaliset · 2022-09-18T05:24:03Z

Describe the issue linked to the documentation

While working on a docstring-related pull request (#24259) I noticed that, sometimes, the See Also description for the same function/class does not match. For instance, the accuracy_score description was different depending on the class I looked at:

scikit-learn/sklearn/metrics/_classification.py

Lines 745 to 747 in feaf382

    
               See Also 
        
               -------- 
        
               accuracy_score : Function for calculating the accuracy score.

scikit-learn/sklearn/metrics/_classification.py

Lines 956 to 960 in feaf382

    
               See Also 
        
               -------- 
        
               accuracy_score : Compute the accuracy score. By default, the function will 
        
                   return the fraction of correct predictions divided by the total number 
        
                   of predictions.

I decided to investigate it further and see if it was occuring elsewhere (click here to check the gist where I regexed the raw files). Today, at commit 2f8b8e7f1, we have 56 functions/classes that have some sort of different descriptions at some See Also. From looking at them at first glance I have sorted the differences I get into 4 categorys:

Those where the description is related to the current class/function;

For instance, inside the RegressorChain class, ClassifierChain is described as the "equivalent" version. That is a different description from the one we see inside MultiOutputClassifier, but inside the context of the RegressorChain class, it makes sense.

scikit-learn/sklearn/multioutput.py

Lines 965 to 967 in 2f8b8e7

    
               See Also 
        
               -------- 
        
               ClassifierChain : Equivalent for classification.

Those related to extra "\n" on the text;

scikit-learn/sklearn/kernel_approximation.py

Line 106 in 2f8b8e7

SkewedChi2Sampler : Approximate feature map for "skewed chi-squared" kernel.

scikit-learn/sklearn/kernel_approximation.py

Lines 292 to 293 in 2f8b8e7

SkewedChi2Sampler : Approximate feature map for

"skewed chi-squared" kernel.
Those that are different per se;
- For instance, the accuracy_score descriptions we mentioned earlier have no obvious reason to be different.
Those that are related to the docstring pattern.
- For instance, we are missing a space after the name of the class/function being referenced. So my script puts it inside the previous description (extra question: shouldn't it be failing on the numpy doc validation? I was excepting it to be an open docstring from Ensure that functions's docstrings pass numpydoc validation #21350, but this function was already done by DOC Ensures that dcg_score & roc_curve passes numpydoc validation #24351).
  
  scikit-learn/sklearn/metrics/_ranking.py
  
  Line 961 in 2f8b8e7
  
  det_curve: Compute error rates for different probability thresholds.

Suggest a potential alternative/fix

I was planning on taking a closer look at the output table I get with the differences in the descriptions and coming up with the list of functions/classes that we should change to make it uniform across different occurrences of See Also.
Then, as this is a fairly beginner friendly issue, I would edit the issue description and give it a step-by-step (copying most part of what @thomasjpfan did in #21350) and let folks looking for a "good first issue" help. I would be happy to make sure the task bullets are up to date.

What do the maintainers think? Does it make sense? :)

PS: Of course, we will be forever vulnerable to this kind of issue if we don't create a test (maybe adapted from my gist). The problem here is that sometimes it just makes sense to have different descriptions... We can create an "ignore list" maybe, but we can discuss this later.

The text was updated successfully, but these errors were encountered:

glemaitre · 2022-10-17T17:15:08Z

I don't know if we should define this issue as a priority. The difference with other documentation issues is that, in most cases, the description is correct.

vitaliset · 2022-12-07T01:27:34Z

Thanks for the comment @glemaitre! I made a small pull request related to some doc typos I found here and will close this issue.

If someone does decide to solve this minor issue in the future, one can use the gist to find the mismatches and fix them quickly.

vitaliset added Documentation Needs Triage Issue requires triage labels Sep 18, 2022

glemaitre removed the Needs Triage Issue requires triage label Oct 17, 2022

vitaliset mentioned this issue Dec 7, 2022

DOC Correcting some small documentation typos #25125

Merged

vitaliset closed this as completed Dec 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

DOC See Also descriptions do not match for multiple functions/classes #24464

DOC See Also descriptions do not match for multiple functions/classes #24464

vitaliset commented Sep 18, 2022

glemaitre commented Oct 17, 2022

Uh oh!

vitaliset commented Dec 7, 2022

Uh oh!

Uh oh!

DOC See Also descriptions do not match for multiple functions/classes #24464

DOC See Also descriptions do not match for multiple functions/classes #24464

Comments

vitaliset commented Sep 18, 2022

Describe the issue linked to the documentation

Suggest a potential alternative/fix

glemaitre commented Oct 17, 2022

Uh oh!

vitaliset commented Dec 7, 2022

Uh oh!