Description
I'm using a very large data-set. After fitting 100 x 3-fold cross-validated LASSOs at lightning speed (a few seconds total), LassoCV stalls at the final hurdle: fitting a LASSO with the chosen alpha value to the whole data-set (waiting over half an hour - it should only take approximately 50% longer than a single fold...). After a lot of head-scratching I found the reason why. In coordinate_descent.py's LinearModelCV.fit() just before calling the final model.fit() (line 1223 in Python2.7/sklearn0.19.0), there is the rather inconspicuous line
model.precompute = False
So even if you've specified precompute as True when calling LassoCV, it is ignored. Why is this? It's making the computation impractically slow (should it even be this slow without precompute?) - literally just commenting the line out makes the fit instantaneous. Am I missing something here mathematically - I can't see anything wrong with using a precomputed Gram matrix for the final fit when it was used for all of the cross-validation fits? The implementation seems to imply it should be used for performance whenever num_samples > num_features. Why hard set it to False?