-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
GroupKFold fails in nested cross-validation (similar to #2879) #7646
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
In particular, the estimator.fit(X_train, y_train, **fit_params) call in model_selection._validation.py does not include "groups" in fit_params, so it defaults to None. |
Thanks for the report @davidslater. I'm not sure if this is to be fixed for 0.18.1 (it only applies to 0.18, but AFAIK this kind of nested CV wasn't possible before, so it's not exactly a regression), but I've labelled as such. |
I agree we need #4497 to fix this, and I don't see how we could do it for 0.18.1 - except special-casing the |
"need contributor"? I don't think we agree on a fix, do we? |
Or just some kind of routing parameter specific to CV, seeing as we already On 15 October 2016 at 03:41, Andreas Mueller [email protected]
|
I really do think that we need #4497 to address this. It's an issue that |
Yes, I don't think this should be tagged 0.18.1... |
Bump. Is there an agreement on how to approach this? |
Is there an easy fix (special casing 'groups') for this? I feel like a
proper one is a while off yet.
…On 26 Mar 2017 12:27 pm, "Arya McCarthy" ***@***.***> wrote:
Bump. Is there an agreement on how to approach this?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#7646 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz68AU5I1TSRAIjQ3YvawaG8w6heyZks5rpb7_gaJpZM4KUHR0>
.
|
I encountered what seems to be another form of the same error running the example code for nested CV with GroupKFold as cv technique: This is no problem, as there are other ways to implement nested CV without using Sorry if this is the wrong place, but I'm pretty new to this and didn't want to open a new Issue. |
Any updated on this? It has a WIP PR since 2017 =/ |
Yes, and the general problem is making some good progress, but it's a hard
one.
|
Would it be possible to raise a |
So unless I'm mistaken, it looks like passing |
This is a serious issue for running this kind of splitting. I don't know how it took so many years and still is not fixed. |
Hi All, is there any fix or work around on this one? |
Curiously, this does seem to work. I am saying curiously because normally, I'd expect an error due to shape mismatch - we'd be passing a subset of the data to the nested CV, but with the full group and not its corresponding subset. |
Description
groups parameter in model_selection.cross_val_score() is not propagated in to RandomSearchCV.fit() call. This is similar to #2879 and probably best addressed in #4497.
Steps/Code to Reproduce
Expected Results
When StratifiedKFold is used, the output is [ 0.8 0.7]. In general, it should be an array of 2 floats.
Actual Results
Versions
Darwin-15.6.0-x86_64-i386-64bit
('Python', '2.7.11 (default, Jan 22 2016, 08:29:18) \n[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)]')
('NumPy', '1.11.2')
('SciPy', '0.18.1')
('Scikit-Learn', '0.18')
The text was updated successfully, but these errors were encountered: