-
-
Notifications
You must be signed in to change notification settings - Fork 26.3k
[MRG+1] Fix BIC/AIC for Lasso #9022
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
with np.errstate(divide='ignore'): | ||
self.criterion_ = n_samples * np.log(mean_squared_error) + K * df | ||
eps64 = np.finfo('float64').eps | ||
self.criterion_ = (n_samples * mean_squared_error / (sigma2 + eps64) + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the eps64 to avoid the division by zero
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, it took me a while to understand that the reason that we don't need the log is that the MSE is actually already the log-likelihood.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a comment pointing any future reader to eqn 53/54 of https://web.stanford.edu/~hastie/Papers/dflasso.pdf? It's true the paper is already in the docstring though, so feel free to just ignore this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also maybe document somewhere that the criterion is off by a constant and scaled by n, compared to the actual value, but that this doesn't affect relative comparisons? The wikipedia page for AIC has a section calling this "software unreliability"...
X = diabetes.data | ||
y = diabetes.target | ||
X = np.c_[X, rng.randn(X.shape[0], 4)] # add 4 bad features | ||
X = np.c_[X, rng.randn(X.shape[0], 5)] # add 4 bad features |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need this to have enough alpha on the grid so alpha_bic > alpha_aic (otherwise it was equal)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment is now out of sync with code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment is out of sync with code
X = y.reshape(-1, 1) | ||
lars = linear_model.LassoLarsIC(normalize=False) | ||
assert_no_warnings(lars.fit, X, y) | ||
assert_true(np.any(np.isinf(lars.criterion_))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not needed anymore. There is no log anymore
The information criterion calculation is not compatible with the original paper Zou, Hui, Trevor Hastie, and Robert Tibshirani. "On the “degrees of freedom” of the lasso." The Annals of Statistics 35.5 (2007): 2173-2192. APA
581c3d0
to
ee9802f
Compare
good to go on my end. Travis is happy |
doc/whats_new.rst
Outdated
|
||
- Fixed a memory leak in our LibLinear implementation. :issue:`9024` by | ||
:user:`Sergei Lebedev <superbobry>` | ||
- Fix AIC/BIC criterion computation in LassoLarsIC by `Alexandre Gramfort`_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
space before?
LGTM. +1 for merge |
@vene you give me the other +1 plz? |
raise ValueError('criterion should be either bic or aic') | ||
|
||
R = y[:, np.newaxis] - np.dot(X, coef_path_) # residuals | ||
mean_squared_error = np.mean(R ** 2, axis=0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here we take the mean of sum of squares, only to multiply by n_samples afterwards. It seems mean_squared_error is not used elsewhere in the local scope, we could save some cycles.
Hmm I'd like a stronger test but I don't have any good ideas of how to get one... this was subtle until I found the equations in the paper, the wikipedia page was not super helpful... |
LGTM apart from the minor comments. @agramfort , if you're busy but agree with my comments I'll be happy to make the changes myself and merge. |
@vene please take over. No time anymore :(
|
DOC comments and docstring on criterion computation
My comment have been addressed and @GaelVaroquaux 's +1 should still stand. Travis passes; Appveyor failures seem irrelevant and also present on master. (AFAIK appveyor is just back from an outage.) Merging. Thanks @agramfort and @mehmetbasbug ! |
* correcting information criterion calculation in least_angle.py The information criterion calculation is not compatible with the original paper Zou, Hui, Trevor Hastie, and Robert Tibshirani. "On the “degrees of freedom” of the lasso." The Annals of Statistics 35.5 (2007): 2173-2192. APA * FIX : fix AIC/BIC computation in LassoLarsIC * update what's new * fix test * fix test * address comments * DOC comments and docstring on criterion computation
* correcting information criterion calculation in least_angle.py The information criterion calculation is not compatible with the original paper Zou, Hui, Trevor Hastie, and Robert Tibshirani. "On the “degrees of freedom” of the lasso." The Annals of Statistics 35.5 (2007): 2173-2192. APA * FIX : fix AIC/BIC computation in LassoLarsIC * update what's new * fix test * fix test * address comments * DOC comments and docstring on criterion computation
* correcting information criterion calculation in least_angle.py The information criterion calculation is not compatible with the original paper Zou, Hui, Trevor Hastie, and Robert Tibshirani. "On the “degrees of freedom” of the lasso." The Annals of Statistics 35.5 (2007): 2173-2192. APA * FIX : fix AIC/BIC computation in LassoLarsIC * update what's new * fix test * fix test * address comments * DOC comments and docstring on criterion computation
* correcting information criterion calculation in least_angle.py The information criterion calculation is not compatible with the original paper Zou, Hui, Trevor Hastie, and Robert Tibshirani. "On the “degrees of freedom” of the lasso." The Annals of Statistics 35.5 (2007): 2173-2192. APA * FIX : fix AIC/BIC computation in LassoLarsIC * update what's new * fix test * fix test * address comments * DOC comments and docstring on criterion computation
* correcting information criterion calculation in least_angle.py The information criterion calculation is not compatible with the original paper Zou, Hui, Trevor Hastie, and Robert Tibshirani. "On the “degrees of freedom” of the lasso." The Annals of Statistics 35.5 (2007): 2173-2192. APA * FIX : fix AIC/BIC computation in LassoLarsIC * update what's new * fix test * fix test * address comments * DOC comments and docstring on criterion computation
* correcting information criterion calculation in least_angle.py The information criterion calculation is not compatible with the original paper Zou, Hui, Trevor Hastie, and Robert Tibshirani. "On the “degrees of freedom” of the lasso." The Annals of Statistics 35.5 (2007): 2173-2192. APA * FIX : fix AIC/BIC computation in LassoLarsIC * update what's new * fix test * fix test * address comments * DOC comments and docstring on criterion computation
* correcting information criterion calculation in least_angle.py The information criterion calculation is not compatible with the original paper Zou, Hui, Trevor Hastie, and Robert Tibshirani. "On the “degrees of freedom” of the lasso." The Annals of Statistics 35.5 (2007): 2173-2192. APA * FIX : fix AIC/BIC computation in LassoLarsIC * update what's new * fix test * fix test * address comments * DOC comments and docstring on criterion computation
* correcting information criterion calculation in least_angle.py The information criterion calculation is not compatible with the original paper Zou, Hui, Trevor Hastie, and Robert Tibshirani. "On the “degrees of freedom” of the lasso." The Annals of Statistics 35.5 (2007): 2173-2192. APA * FIX : fix AIC/BIC computation in LassoLarsIC * update what's new * fix test * fix test * address comments * DOC comments and docstring on criterion computation
Reference Issue
taking over #6080
What does this implement/fix? Explain your changes.
AIC/BIC criterion in LassoLarsIC was buggy ...
Any other comments?
Nope