ENH Enable the "sufficient stats" mode of LARS #11699

yukuairoy · 2018-07-27T23:07:38Z

What does this implement/fix? Explain your changes.

We'd like to enable the "gram and cov matrix" based mode of the LARS algorithm in the function lars_path(...). As the original paper of B. Efron, T. Hastie, I. Johnstone, R. Tibshirani (2004) documented, as long as we know the sufficient statistics, in this case, the Gram matrix, the Cov vector (Xy) and sample size, the LARS algorithm will be able to work.

We'd like to add a lars_path_gram(...) function to allow users to run through it even if they only know the sufficient statistics but not the original data X and y.

Additional tests have been added to ensure the new lars_path_gram(...) function works as intended.

agramfort · 2018-07-28T12:31:23Z

@yukuairoy you can already pass gram and Xy as parameter to lars_path. Why is it not enough?

also I see a number of cosmetic changes to existing code in this PR. Avoid this when possible
so the diff is limited to what the new feature code.

agramfort · 2018-07-29T07:57:10Z

@yukuairoy can you answer my question above?

yukuairoy · 2018-07-29T13:59:12Z

@agramfort Thank you for looking at this PR. Previously users have to pass in non-None X and y for lars_path() to work. The "sufficient stats" mode allows users to skip providing X and y (using None as a placeholder) by providing the set of sufficient statistics instead. In addition to the Gram matrix and Cov matrix (Xy), n_samples is needed as the complete set of sufficient statistics. n_samples was not part of the signature of lars_path() hence we also need to add it into the function signature for this mode to work through.

I've sent a separate PR (#11703) to fix the cosmetic issues and I'll push another commit to make sure this current PR is only about the change in functionality.

…-angle

jnothman · 2018-07-29T21:48:22Z

n_samples = Xy.shape[0]???

…nto least-angle

yukuairoy · 2018-07-30T23:27:22Z

@jnothman Thank you for the comment. Could you clarify what you mean by "n_samples = Xy.shape[0]???"? I cannot find this line in my diff. Or are you suggesting that n_samples can be inferred from Xy?

Please correct me if I'm wrong, I'd think Xy.shape[0] informs us the n_features info as opposed to n_samples. As a matter of fact, n_samples cannot be inferred from either Gram or Xy unless it is explicitly input by the user.

agramfort · 2018-07-31T07:47:01Z

thinking about it I think it would be cleaner to have a new lars_path_gram function that takes Gram and Xy as input (no X or y) and to deprecate the option to pass Gram and Xy to lars_path. That will simplify the API of lars_path.

jnothman · 2018-07-31T07:57:16Z

right. The parameter documentation of Xy days n_samples, but should say n_features

yukuairoy · 2018-08-03T05:34:57Z

Hi @jnothman, thanks very much for bringing this up. It looks like on the master branch there is indeed a documentation "bug" in the parameter description of Xy. After this PR is approved and merged into master, I can create another PR to fix that documentation "bug" if it helps the community.

yukuairoy · 2018-08-03T05:35:08Z

Hi @agramfort, thanks very much for this nice thought. To provide a bit more context, in many of the existing client codebases we know (as well as several cases which the unit-test code exemplified), sometimes users like to invoke so-called "precomputed" mode of the lars_path(...) function, i.e. passing in both X, y, besides Gram and Xy which are literally precomputed. Under the hood of the numerical implementation, knowing X directly might enable the solving algorithm to employ slightly more efficiently numerical linear-algebra routines, compared with the case where X is not known. This is also consistent with the fact that the numerical output of the “precomputed” mode is not always exactly equal to output of the raw mode (some living examples can be found in the test-cases; more can be found in client codebases though they are hard to see from the github perspective). If we deprecate the option to pass Gram and Xy to lars_path(...), one concern is that it will likely break many of the existing client codebases, which may not be thoroughly necessary.

agramfort · 2018-08-04T08:09:07Z

@yukuairoy you will have to change the code of your clients anyway to support what you are aiming for.

When you start to have parameters that are necessary/optional depending on other parameters documenting the API starts to be a mess. With the current master you always need X and y and you can pass precomputed values to avoid recomputation. With what you propose we can have X and y None but then we need to pass n_samples. It starts to be mess I think.

I would prefer to have

lars_path(X, y, precompute=’auto’ | True | False) that would follow the Lars API http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lars.html

and

lars_path_gram(Gram, Xy, n_samples)

of course we should do this without code duplication via a private function.

yukuairoy · 2018-08-15T06:47:09Z

@agramfort I agree that we should keep the two modes separate. My only concern with your suggestion of lars_path(X, y, precompute=’auto’ | True | False) is in order to support the precompute mode, lars_path(), in addition to a precompute parameter, still needs to take Gram and Xy -- making the precompute parameter redundant.

How about we keep the original

lars_path(X, y, Xy=None, Gram=None, max_iter=500, alpha_min=0, method='lar', copy_X=True, eps=np.finfo(np.float).eps, copy_Gram=True, verbose=0, return_path=True, return_n_iter=False, positive=False)

intact and add an additional

lars_path_gram(Gram, Xy, n_samples, max_iter=500, alpha_min=0, method='lar', copy_X=True, eps=np.finfo(np.float).eps, copy_Gram=True, verbose=0, return_path=True, return_n_iter=False, positive=False)?

This way we get to keep backward compatibility. Of course we'll use a private function to avoid code duplication.

yukuairoy · 2019-01-05T21:37:41Z

Thanks @agramfort and @jnothman for the comments. I've updated the code. Please take a look.

…into least-angle

sklearn/linear_model/tests/test_least_angle.py

yukuairoy · 2019-01-15T20:13:17Z

@agramfort Do you have further comments?

yukuairoy · 2019-01-24T08:04:47Z

@agramfort Friendly ping

yukuairoy · 2019-02-04T07:25:52Z

@agramfort can you please review the current version?

yukuairoy · 2019-02-04T07:25:56Z

@jnothman Thanks for reviewing this merge request. Is there anything we can do to make sure @agramfort reviews the latest changes?

agramfort · 2019-02-16T21:45:02Z

@yukuairoy we need a what's new update before merging.

…into least-angle

yukuairoy · 2019-03-05T01:35:35Z

@agramfort @jnothman thanks for LGTM. I've updated the What's New.

jnothman · 2019-03-06T07:06:41Z

Thanks @yukuairoy!

…11699)" This reverts commit 870f68c.

yukuairoy added 8 commits June 15, 2018 11:51

check in changes

285cd35

now the test passes

866d29a

fix lasso add tests

37abbcf

Use single quotes instead of double quotes

6cb2543

Add more sufficient stats tests

a45c92b

Merge branch 'master' into least-angle

7c2f8a5

Merge branch 'master' into least-angle

401d925

merge conflict

743aa98

yukuairoy added 2 commits July 28, 2018 10:40

fix test

b365678

style changes

3617030

style fixes

709afd0

Merge branch 'master' of github.com:yukuairoy/scikit-learn into least…

4ffbd99

…-angle

yukuairoy added 4 commits July 29, 2018 18:56

revert style changes

78dada6

remove unused rng in test_least_angle

b67f67b

Merge branch 'master' of git://github.com/scikit-learn/scikit-learn i…

7413722

…nto least-angle

small improvement

843cbaa

Merge branch 'master' of git://github.com/scikit-learn/scikit-learn

3534084

yukuairoy added 2 commits August 15, 2018 02:50

merge

1d7e1ba

Merge branch 'master' into least-angle

f72e147

yukuairoy added 3 commits January 5, 2019 12:37

Fix tests

8185b2e

doc fix

5a862c3

fix flake8

f0f2705

yukuairoy added 2 commits January 5, 2019 16:38

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

6084471

…into least-angle

delete duplicate

a71a1ca

jnothman approved these changes Jan 8, 2019

View reviewed changes

sklearn/linear_model/tests/test_least_angle.py Outdated Show resolved Hide resolved

yukuairoy added 3 commits January 7, 2019 22:25

Use assert_allclose instead of assert_array_equal

18a0f0b

pull in latest changes

abb1186

fix

641af97

jnothman changed the title ~~Enable the "sufficient stats" mode of LARS~~ ENH Enable the "sufficient stats" mode of LARS Jan 8, 2019

agramfort approved these changes Feb 16, 2019

View reviewed changes

yukuairoy added 4 commits March 4, 2019 08:19

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

9112bbc

…into least-angle

Modified what_new

965a0e6

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

73bb5e9

…into least-angle

Fix docstring style

4687d45

yukuairoy and others added 2 commits March 5, 2019 20:14

Merge

9f1a45e

Update v0.21.rst

34965c1

jnothman merged commit fec7670 into scikit-learn:master Mar 6, 2019

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

ENH Enable the "sufficient stats" mode of LARS (scikit-learn#11699)

870f68c

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "ENH Enable the "sufficient stats" mode of LARS (scikit-learn#…

dc1330a

…11699)" This reverts commit 870f68c.

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "ENH Enable the "sufficient stats" mode of LARS (scikit-learn#…

dd85a37

…11699)" This reverts commit 870f68c.

koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019

ENH Enable the "sufficient stats" mode of LARS (scikit-learn#11699)

9e18b19

Uh oh!

ENH Enable the "sufficient stats" mode of LARS #11699

ENH Enable the "sufficient stats" mode of LARS #11699

Uh oh!

Conversation

yukuairoy commented Jul 27, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this implement/fix? Explain your changes.

Uh oh!

agramfort commented Jul 28, 2018

Uh oh!

agramfort commented Jul 29, 2018

Uh oh!

yukuairoy commented Jul 29, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman commented Jul 29, 2018 via email

Uh oh!

yukuairoy commented Jul 30, 2018

Uh oh!

agramfort commented Jul 31, 2018

Uh oh!

jnothman commented Jul 31, 2018 via email

Uh oh!

yukuairoy commented Aug 3, 2018

Uh oh!

yukuairoy commented Aug 3, 2018

Uh oh!

agramfort commented Aug 4, 2018

Uh oh!

yukuairoy commented Aug 15, 2018

Uh oh!

yukuairoy commented Jan 5, 2019

Uh oh!

Uh oh!

yukuairoy commented Jan 15, 2019

Uh oh!

yukuairoy commented Jan 24, 2019

Uh oh!

yukuairoy commented Feb 4, 2019

Uh oh!

yukuairoy commented Feb 4, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

agramfort commented Feb 16, 2019

Uh oh!

yukuairoy commented Mar 5, 2019

Uh oh!

jnothman commented Mar 6, 2019

Uh oh!

Uh oh!

yukuairoy commented Jul 27, 2018 •

edited

Loading

yukuairoy commented Jul 29, 2018 •

edited

Loading

yukuairoy commented Feb 4, 2019 •

edited

Loading