Thanks to visit codestin.com
Credit goes to github.com

Skip to content

[MRG+1] ENH add multi_class='auto' for LogisticRegression, default from 0.22; default solver will be 'lbfgs' #11905

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 25 commits into from
Aug 26, 2018

Conversation

jnothman
Copy link
Member

@jnothman jnothman commented Aug 24, 2018

Adds multi_class='auto'

        If the option chosen is 'ovr', then a binary problem is fit for each
        label. For 'multinomial' the loss minimised is the multinomial loss fit
        across the entire probability distribution, *even when the data is
        binary*. 'multinomial' is unavailable when solver='liblinear'.
        'auto' selects 'ovr' if the data is binary, or if solver='liblinear',
        and otherwise selects 'multinomial'.

'auto' will be default from 0.22. Also sets solver='lbfgs' to default from 0.22.

This is okay to merge for RC, but I've not yet done a couple of things we should do for the final version:

  • don't warn when multi_class='warn' but y is binary
  • don't explicitly specify multi_class in docs/examples/tests when warning will therefore be silenced.

I may not have time to finish this for a couple of days.

Supersedes and closes #11476
Supersedes and closes #10001
Closes #9997

@jnothman jnothman changed the title ENH add multi_class='auto' for LogisticRegression, default from 0.22 ENH add multi_class='auto' for LogisticRegression, default from 0.22; default solver will be 'lbfgs' Aug 24, 2018
Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I think the multi_class= 'auto' is indeed better than the earlier alternatives.

+1 for merging and fixing the remaining points mentioned in the description after the RC.

@amueller or @agramfort would you be able to have a look?

:class:`linear_model.LogisticRegression` will change respectively from
``'liblinear'`` and ``'ovr'`` in version 0.20 to ``'lbfgs'`` and
``'auto'`` in version 0.22. A FutureWarning is raised when the default
values are used. :issue:`11476` by `Tom Dupre la Tour`_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add yourself

@@ -654,7 +688,7 @@ def logistic_regression_path(X, y, pos_class=None, Cs=10, fit_intercept=True,
le = LabelEncoder()
Y_multi = le.fit_transform(y).astype(X.dtype, copy=False)

w0 = np.zeros((classes.size, n_features + int(fit_intercept)),
w0 = np.zeros((max(2, classes.size), n_features + int(fit_intercept)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amueller 's #11476 (comment) still apply here I think

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this can be safely reverted. I think we block classes.size < 2...


if multi_class == 'auto':
multi_class = ('ovr' if coefs[0].shape[0] <= 2 or solver == 'liblinear'
else 'multinomial')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use _check_multi_class here instead of the above line, or is that redundant?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And this can be safely removed. _log_reg_scoring_path assumes pre-validation.

@jnothman
Copy link
Member Author

If you've verified that you're happy with the new tests, I think we can just merge and get the release on its way.

@jnothman
Copy link
Member Author

btw, I did change it to not warn upon binary, but I've not fixed up the examples, docs, tests to not pass redundantly.

@jnothman
Copy link
Member Author

Oh, this didn't change any examples, so...

Copy link
Member Author

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ready for merge.

@jnothman jnothman changed the title ENH add multi_class='auto' for LogisticRegression, default from 0.22; default solver will be 'lbfgs' [MRG+1] ENH add multi_class='auto' for LogisticRegression, default from 0.22; default solver will be 'lbfgs' Aug 25, 2018
y_bin = y_multi == 0
est_auto_bin = fit(X, y_bin, multi_class='auto', solver=solver)
est_ovr_bin = fit(X, y_bin, multi_class='ovr', solver=solver)
assert np.allclose(est_auto_bin.coef_, est_ovr_bin.coef_)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not critical, but I think assert_allclose produces better trackbacks in the case of failure, also it has a mores strict default relative and absolute tolerances we might want to use one or the other but not mix both.

solver=solver).coef_)
assert not np.allclose(est_auto_bin.coef_,
fit(X, y_multi, multi_class='multinomial',
solver=solver).coef_)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think these tests are fairly thorough!

@rth
Copy link
Member

rth commented Aug 26, 2018

Merging with a +2 in the parent PR (#11476) and another +1 from #11476 (comment) for the added changes.

Thanks a lot @jnothman for all your work on making the RC happen!

@rth rth merged commit 7ed61a2 into scikit-learn:master Aug 26, 2018
yarikoptic added a commit to yarikoptic/scikit-learn that referenced this pull request Sep 10, 2018
* tag '0.20rc1': (1109 commits)
  MNT rc version
  DOC Release dates for 0.20 (scikit-learn#11838)
  DOC Fix: require n_splits > 1 in TimeSeriesSplit (scikit-learn#11937)
  FIX xfail for MacOS LogisticRegressionCV stability (scikit-learn#11936)
  MNT: Use GEMV in enet_coordinate_descent (Pt. 1) (scikit-learn#11896)
  [MRG] TST/FIX stop optics reachability failure on 32bit (scikit-learn#11916)
  ENH add multi_class='auto' for LogisticRegression, default from 0.22; default solver will be 'lbfgs' (scikit-learn#11905)
  MAINT Fix test_logistic::test_dtype_match failure on 32 bit arch (scikit-learn#11899)
  DOC Updated link to Laurens van der Maaten's home page (scikit-learn#11907)
  DOC Remove stray backtick in /doc/modules/feature_extraction.rst (scikit-learn#11910)
  Deprecate min_samples_leaf and min_weight_fraction_leaf (scikit-learn#11870)
  MNT modify test_sparse_oneclasssvm to be parametrized (scikit-learn#11894)
  EXA set figure size to avoid overlaps (scikit-learn#11889)
  MRG/REL fixes /skips for 32bit tests (scikit-learn#11879)
  add durations=20 to makefile to show test runtimes locally (scikit-learn#11147)
  DOC loss='l2' is no longer accpeted in l1_min_c
  DOC add note about brute force nearest neighbors for string data (scikit-learn#11884)
  DOC Change sign of energy in RBM (scikit-learn#11156)
  RFC try to warn on iid less often (scikit-learn#11613)
  DOC reduce plot_gpr_prior_posterior.py warnings(scikit-learn#11664)
  ...
yarikoptic added a commit to yarikoptic/scikit-learn that referenced this pull request Sep 10, 2018
* releases: (1109 commits)
  MNT rc version
  DOC Release dates for 0.20 (scikit-learn#11838)
  DOC Fix: require n_splits > 1 in TimeSeriesSplit (scikit-learn#11937)
  FIX xfail for MacOS LogisticRegressionCV stability (scikit-learn#11936)
  MNT: Use GEMV in enet_coordinate_descent (Pt. 1) (scikit-learn#11896)
  [MRG] TST/FIX stop optics reachability failure on 32bit (scikit-learn#11916)
  ENH add multi_class='auto' for LogisticRegression, default from 0.22; default solver will be 'lbfgs' (scikit-learn#11905)
  MAINT Fix test_logistic::test_dtype_match failure on 32 bit arch (scikit-learn#11899)
  DOC Updated link to Laurens van der Maaten's home page (scikit-learn#11907)
  DOC Remove stray backtick in /doc/modules/feature_extraction.rst (scikit-learn#11910)
  Deprecate min_samples_leaf and min_weight_fraction_leaf (scikit-learn#11870)
  MNT modify test_sparse_oneclasssvm to be parametrized (scikit-learn#11894)
  EXA set figure size to avoid overlaps (scikit-learn#11889)
  MRG/REL fixes /skips for 32bit tests (scikit-learn#11879)
  add durations=20 to makefile to show test runtimes locally (scikit-learn#11147)
  DOC loss='l2' is no longer accpeted in l1_min_c
  DOC add note about brute force nearest neighbors for string data (scikit-learn#11884)
  DOC Change sign of energy in RBM (scikit-learn#11156)
  RFC try to warn on iid less often (scikit-learn#11613)
  DOC reduce plot_gpr_prior_posterior.py warnings(scikit-learn#11664)
  ...
yarikoptic added a commit to yarikoptic/scikit-learn that referenced this pull request Sep 10, 2018
* dfsg: (1109 commits)
  MNT rc version
  DOC Release dates for 0.20 (scikit-learn#11838)
  DOC Fix: require n_splits > 1 in TimeSeriesSplit (scikit-learn#11937)
  FIX xfail for MacOS LogisticRegressionCV stability (scikit-learn#11936)
  MNT: Use GEMV in enet_coordinate_descent (Pt. 1) (scikit-learn#11896)
  [MRG] TST/FIX stop optics reachability failure on 32bit (scikit-learn#11916)
  ENH add multi_class='auto' for LogisticRegression, default from 0.22; default solver will be 'lbfgs' (scikit-learn#11905)
  MAINT Fix test_logistic::test_dtype_match failure on 32 bit arch (scikit-learn#11899)
  DOC Updated link to Laurens van der Maaten's home page (scikit-learn#11907)
  DOC Remove stray backtick in /doc/modules/feature_extraction.rst (scikit-learn#11910)
  Deprecate min_samples_leaf and min_weight_fraction_leaf (scikit-learn#11870)
  MNT modify test_sparse_oneclasssvm to be parametrized (scikit-learn#11894)
  EXA set figure size to avoid overlaps (scikit-learn#11889)
  MRG/REL fixes /skips for 32bit tests (scikit-learn#11879)
  add durations=20 to makefile to show test runtimes locally (scikit-learn#11147)
  DOC loss='l2' is no longer accpeted in l1_min_c
  DOC add note about brute force nearest neighbors for string data (scikit-learn#11884)
  DOC Change sign of energy in RBM (scikit-learn#11156)
  RFC try to warn on iid less often (scikit-learn#11613)
  DOC reduce plot_gpr_prior_posterior.py warnings(scikit-learn#11664)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

change LogisticRegression default solver to lbfgs and multiclass to multinomial?
3 participants