-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Exception in LogisticRegressionCV #28178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Found the same issue some years ago: #15389 |
@glemaitre mentioned that this can be fixed with:
Does labels order matter? For example, if y = [2, 2, 1, 1, 0, 0], then will passing labels=[0, 1, 2] and labels=[2, 1, 0] work correctly? |
log_loss documentation says that "The labels in y_pred are assumed to be ordered alphabetically, as done by LabelBinarizer". Does this mean lexicographical sorting, even for integer labels? For example, is [0, 10, 9] a correct sorting? |
The issue is not the y = np.array(y)
from sklearn.model_selection import StratifiedKFold
cv = StratifiedKFold(n_splits=5)
for train_idx, test_idx in cv.split(X, y):
print(y[train_idx])
print(y[test_idx])
All 4 first splits get all the classes at fit so the different So I am not sure that we can do much here because this is a really ill-posed problem. |
@glemaitre is it possible to append zero output probabilities for classes not present in the train? |
I don't see an easy way to do so without being a hack. |
Uh oh!
There was an error while loading. Please reload this page.
Describe the bug
The code provided below raises ValueError. I guess that the problem is that minor classes may not be included in train or val sets for some folds during internal cross-validation, even with stratified split. This produces errors with some metrics other than default (accuracy).
One solution may be setting log-proba to -inf for classes not present in the train set, as well as providing label argument. How can I fix this in the most simple way?
Steps/Code to Reproduce
Expected Results
No exception thrown
Actual Results
ValueError: y_true and y_pred contain different number of classes 2, 3. Please provide the true labels explicitly through the labels argument. Classes found in y_true: [0 1]
Versions
The text was updated successfully, but these errors were encountered: