Thanks to visit codestin.com
Credit goes to github.com

Skip to content

FEA Add RepeatedStratifiedGroupKFold as a new splitter #24227

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

arvkevi
Copy link

@arvkevi arvkevi commented Aug 23, 2022

closes #24247

Reference Issues/PRs

This functionality was discussed in #13621.

What does this implement/fix? Explain your changes.

This adds a splitter class to model_selection that repeats StratifiedGroupKFold n times.

Any other comments?

@arvkevi arvkevi changed the title Add RepeatedStratifiedGroupKFold [MRG] Add RepeatedStratifiedGroupKFold Oct 2, 2023
@glemaitre glemaitre changed the title [MRG] Add RepeatedStratifiedGroupKFold FEA Add RepeatedStratifiedGroupKFold as a new splitter Mar 13, 2024
@glemaitre glemaitre self-requested a review March 13, 2024 10:33
Copy link

github-actions bot commented Mar 13, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 5fcdb1c. Link to the linter CI: here

@glemaitre
Copy link
Member

I'll provide a review to this feature. Sorry @arvkevi that this PR did not get any attention.

@arvkevi are you still willing to address the review that I'll be doing or you prefer to give the torch to someone else?

@arvkevi
Copy link
Author

arvkevi commented Mar 13, 2024

No worries, I can address the review @glemaitre

@adrinjalali
Copy link
Member

I think for this it makes sense to have a repeat kwarg to the constructor and have a variant instead of a new class. Note that with the existing classes we might also merge them together and deprecated some.

@glemaitre glemaitre added this to the 1.6 milestone May 20, 2024
@glemaitre
Copy link
Member

@adrinjalali I think that we can add the class and the deprecation can come as a whole where we move into parameters instead of separate class.

@glemaitre
Copy link
Member

In general, it looks good. Just a couple of nitpicks.

@glemaitre glemaitre removed their request for review May 20, 2024 11:55
@adrinjalali
Copy link
Member

It seems silly to me to introduce a class which we already know we'd like to deprecate, while we can simply add an arg to StratifiedGroupKFold

@@ -1242,6 +1253,39 @@ def test_repeated_stratified_kfold_determinstic_split():
next(splits)


def test_repeated_stratified_group_kfold_determinstic_split():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we should have an additional test to be sure that we don't get the same split if we have different random state.

There is a typo for deterministic.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this what you had in mind @glemaitre?

Fixed the typo (it was also misspelled in test_repeated_stratified_kfold_deterministic_split, so I fixed it there, too.

X = [[1, 2], [3, 4], [5, 6], [7, 8], [9, 10]]
y = [1, 1, 1, 0, 0]
groups = [0, 0, 1, 1, 1]
random_state = 1944695409
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could use 0 and not a fancy number.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we use 0 and 1 now.

* Bump versionadded
* Fixed two typos of deterministic in tests
* Added multiple random_state parameters to tests
Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the functionality, but as I said before, I rather not introduce it via new class.

@arvkevi
Copy link
Author

arvkevi commented Nov 7, 2024

Thanks for the feedback. Is there a PR or issue discussing deprecating the RepeatedKFold and the RepeatedStratifiedKFold classes? I could implement something like: StratifiedGroupKFold(n_repeats=None) as the default argument in this PR. But if more substantial changes are planned for the splitter classes, would you want them all done in one separate PR dedicated to refactoring rather than breaking the pattern here?

@adrinjalali
Copy link
Member

As for repeating, I agree that n_repeats=None on StratifiedGroupKFold makes a lot of sense.

But generally, we also want to have a strata data passed to these object as discussed here: #26821

And generally, we can have a sinlge KFold class, which accepts groups, and strata and also accepts n_repeats argument and the user can set the right metadata requests and it simply works.

@glemaitre glemaitre removed this from the 1.6 milestone Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

Add RepeatedStratifiedGroupKFold
3 participants