-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
ENH Enable the "sufficient stats" mode of LARS #11699
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@yukuairoy you can already pass gram and Xy as parameter to lars_path. Why is it not enough? also I see a number of cosmetic changes to existing code in this PR. Avoid this when possible |
@yukuairoy can you answer my question above? |
@agramfort Thank you for looking at this PR. Previously users have to pass in non-None I've sent a separate PR (#11703) to fix the cosmetic issues and I'll push another commit to make sure this current PR is only about the change in functionality. |
n_samples = Xy.shape[0]???
|
@jnothman Thank you for the comment. Could you clarify what you mean by "n_samples = Xy.shape[0]???"? I cannot find this line in my diff. Or are you suggesting that Please correct me if I'm wrong, I'd think |
thinking about it I think it would be cleaner to have a new lars_path_gram function that takes Gram and Xy as input (no X or y) and to deprecate the option to pass Gram and Xy to lars_path. That will simplify the API of lars_path. |
right. The parameter documentation of Xy days n_samples, but should say
n_features
|
Hi @jnothman, thanks very much for bringing this up. It looks like on the master branch there is indeed a documentation "bug" in the parameter description of |
Hi @agramfort, thanks very much for this nice thought. To provide a bit more context, in many of the existing client codebases we know (as well as several cases which the unit-test code exemplified), sometimes users like to invoke so-called "precomputed" mode of the |
@yukuairoy you will have to change the code of your clients anyway to support what you are aiming for. When you start to have parameters that are necessary/optional depending on other parameters documenting the API starts to be a mess. With the current master you always need X and y and you can pass precomputed values to avoid recomputation. With what you propose we can have X and y None but then we need to pass n_samples. It starts to be mess I think. I would prefer to have
and
of course we should do this without code duplication via a private function. |
@agramfort I agree that we should keep the two modes separate. My only concern with your suggestion of How about we keep the original
intact and add an additional
This way we get to keep backward compatibility. Of course we'll use a private function to avoid code duplication. |
Thanks @agramfort and @jnothman for the comments. I've updated the code. Please take a look. |
@agramfort Do you have further comments? |
@agramfort Friendly ping |
@agramfort can you please review the current version? |
@jnothman Thanks for reviewing this merge request. Is there anything we can do to make sure @agramfort reviews the latest changes? |
@yukuairoy we need a what's new update before merging. |
@agramfort @jnothman thanks for LGTM. I've updated the What's New. |
Thanks @yukuairoy! |
What does this implement/fix? Explain your changes.
We'd like to enable the "gram and cov matrix" based mode of the LARS algorithm in the function
lars_path(...)
. As the original paper of B. Efron, T. Hastie, I. Johnstone, R. Tibshirani (2004) documented, as long as we know the sufficient statistics, in this case, the Gram matrix, the Cov vector (Xy
) and sample size, the LARS algorithm will be able to work.We'd like to add a
lars_path_gram(...)
function to allow users to run through it even if they only know the sufficient statistics but not the original dataX
andy
.Additional tests have been added to ensure the new
lars_path_gram(...) function
works as intended.