Thanks to visit codestin.com
Credit goes to github.com

Skip to content

RFECV cross-validation generator (cv) parameter #29554

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bioruffo opened this issue Jul 24, 2024 · 6 comments
Closed

RFECV cross-validation generator (cv) parameter #29554

bioruffo opened this issue Jul 24, 2024 · 6 comments
Assignees

Comments

@bioruffo
Copy link
Contributor

bioruffo commented Jul 24, 2024

Describe the issue linked to the documentation

Hello,
if I'm not mistaken, I think that the documentation of RFECV about the cv parameter might be incorrect regarding the choice of StratifiedKFold or KFold when the estimator is a classifier. The docs read:

For integer/None inputs, if y is binary or multiclass, StratifiedKFold is used. If the estimator is a classifier or if y is neither binary nor multiclass, KFold is used.

Which matches the _rfe.py file.

I believe that the correct phrasing of the second sentence is that it's when the estimator is not a classifier, then KFold will be used. When a classifier is used (and y is binary or multiclass), it's possible to perform StratifiedKFold.

In fact, code-wise, the cv parameter is processed by check_cv, which states,

        For integer/None inputs, if classifier is True and ``y`` is either
        binary or multiclass, :class:`StratifiedKFold` is used. In all other
        cases, :class:`KFold` is used.

(here classifier is a boolean that is True when the estimator is a classifier).

According to the code of check_cv(), if cv is supplied as an integer (and so a cv method must be chosen), and the estimator is a classifier, and y is binary or multiclass, then the cv generator will indeed be a StratifiedKFold.

    cv = 5 if cv is None else cv
    if isinstance(cv, numbers.Integral):
        if (
            classifier
            and (y is not None)
            and (type_of_target(y, input_name="y") in ("binary", "multiclass"))
        ):
            return StratifiedKFold(cv)
        else:
            return KFold(cv)

Thank you!

Suggest a potential alternative/fix

I suggest changing the phrasing "If the estimator is a classifier" to "If the estimator is not a classifier".

@bioruffo bioruffo added Documentation Needs Triage Issue requires triage labels Jul 24, 2024
@glemaitre
Copy link
Member

Yes indeed. This is a typo. @bioruffo Do you wish to do a pull request?

@bioruffo
Copy link
Contributor Author

Yes I'll be happy to!

@glemaitre
Copy link
Member

Cheers.

@Higgs32584
Copy link
Contributor

Is this closeable?

@bioruffo
Copy link
Contributor Author

Yes the typo was corrected in the above-mentioned merge, thank you!

@Higgs32584
Copy link
Contributor

Alright good. You can close yourself you don't have to wait for a mod.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants