Thanks to visit codestin.com
Credit goes to github.com

Skip to content

ClassifierChain does not support GroupKFold #11429

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
uatach opened this issue Jul 4, 2018 · 3 comments
Closed

ClassifierChain does not support GroupKFold #11429

uatach opened this issue Jul 4, 2018 · 3 comments

Comments

@uatach
Copy link
Contributor

uatach commented Jul 4, 2018

from sklearn.datasets import make_multilabel_classification
from sklearn.multioutput import ClassifierChain
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GroupKFold

X, Y = make_multilabel_classification()
ClassifierChain(LogisticRegression(), 'random', cv=GroupKFold(3)).fit(X, Y)
# ValueError: The 'groups' parameter should not be None.
@adrinjalali
Copy link
Member

So here's the code with stack trace:

>>> from sklearn.datasets import make_multilabel_classification
>>> from sklearn.multioutput import ClassifierChain
>>> from sklearn.linear_model import LogisticRegression
>>> from sklearn.model_selection import GroupKFold, KFold
>>> 
>>> X, Y = make_multilabel_classification()
>>> ClassifierChain(LogisticRegression(), 'random',
...         cv=GroupKFold(3)).fit(X, Y)
Traceback (most recent call last):
  File "<console>", line 2, in <module>
  File ".venv/lib/python3.6/site-packages/sklearn/multioutput.py", line 563, in fit
    super(ClassifierChain, self).fit(X, Y)
  File ".venv/lib/python3.6/site-packages/sklearn/multioutput.py", line 432, in fit
    y=y, cv=self.cv)
  File ".venv/lib/python3.6/site-packages/sklearn/model_selection/_validation.py", line 771, in cross_val_predict
    for train, test in cv.split(X, y, groups))
  File ".venv/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 779, in __call__
    while self.dispatch_one_batch(iterator):
  File ".venv/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 620, in dispatch_one_batch
    tasks = BatchedCalls(itertools.islice(iterator, batch_size))
  File ".venv/lib/python3.6/site-packages/sklearn/externals/joblib/parallel.py", line 127, in __init__
    self.items = list(iterator_slice)
  File ".venv/lib/python3.6/site-packages/sklearn/model_selection/_validation.py", line 769, in <genexpr>
    prediction_blocks = parallel(delayed(_fit_and_predict)(
  File ".venv/lib/python3.6/site-packages/sklearn/model_selection/_split.py", line 331, in split
    for train, test in super(_BaseKFold, self).split(X, y, groups):
  File ".venv/lib/python3.6/site-packages/sklearn/model_selection/_split.py", line 100, in split
    for test_index in self._iter_test_masks(X, y, groups):
  File ".venv/lib/python3.6/site-packages/sklearn/model_selection/_split.py", line 112, in _iter_test_masks
    for test_index in self._iter_test_indices(X, y, groups):
  File ".venv/lib/python3.6/site-packages/sklearn/model_selection/_split.py", line 504, in _iter_test_indices
    raise ValueError("The 'groups' parameter should not be None.")
ValueError: The 'groups' parameter should not be None.

The issue is that the GroupKFold's get_n_splits expects the groups parameter (the data should be split into groups already, provided with the groups input).

>>> group_kfold.get_n_splits(X, y, groups)

ClassifierChain, or rather its parent _BaseChain calls cross_val_predict which accepts the groups variable as an input and uses it if provided. The issue is that ClassifierChain and _BaseChain are not aware of groups and therefore don't pass it to cross_val_predict.

I'm not sure how this is supposed to be solved (I'm just investigating and clarifying the issue and then can implement the proposed solution). groups can be passed to the fit function, which seems to be the case in two cases:

scikit-learn/sklearn/model_selection/_search.py:582:    def fit(self, X, y=None, groups=None, **fit_params):
scikit-learn/sklearn/feature_selection/rfe.py:443:    def fit(self, X, y, groups=None):

Any alternatives?

@jnothman
Copy link
Member

jnothman commented Jul 29, 2018 via email

@adrinjalali
Copy link
Member

Now supported with metadata routing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants