-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
LogisticRegressionCV does not handle sample weights as expected when using liblinear solver #29416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for the report. I agree that the reproducer makes sense: the repeated samples in the within the first CV group should be equivalent to weighting the samples within this same first CV group. |
@snath-xoc do you understand why the reproducer fails with |
Maybe you could make the test harder with a larger number of values for the grid of Cs, e.g. Then before checking the values of the np.testing.assert_allclose(est_weighted.C_, est_repeated.C_) |
Also to make the test faster to execute and harder to pass at the same time, it would make sense to generate a dataset with a lower number of data points, e.g. EDIT: I tried to edit the reproducer to only use |
Actually we can detect the problem for "lbfgs" as well by comparing the values for the np.testing.assert_allclose(est_weighted.scores_[1], est_repeated.scores_[1]) and this happens also for small values of |
And I confirm that the changes in #29419 does fix the assertion failure on the |
Note: this is a special case of a the wider problem described in:
Describe the bug
_log_reg_scoring_path
used withinLogisticRegressionCV
withliblinear
solver not returning the same coefficients when weighting samples usingsample_weight
versus when repeating samples based on weights.NOTE: L801 in
_log_reg_scoring_path
does not passsample_weight
into scorer when scorer is not specified, needs fixing.Steps/Code to Reproduce
Expected Results
No error is thrown
Actual Results
Versions
The text was updated successfully, but these errors were encountered: