Description
A fair amount of estimators currently have copy=True
(or copy_X=True
) by default. In practice, this means that the code looks something like,
X = check_array(X, copy=copy)
and then some other calculations that may change or not X inplace. In the case when the following operations are not done inplace, we have just made a wasteful copy with no good reason.
As discussed in #13923, an example is for instance Ridge(fit_intercept=False)
that will copy X, although it is not needed. Actually, I can't find any inplace operations of (found it)X
in Ridge
even with fit_intercept=True
, but maybe I am missing something.
I think in general it would be better to avoid the,
X = check_array(X, copy=copy)
pattern, and instead make a copy explicitly where it is needed. Maybe it could be OK to not make a copy with copy=True
if no copy is needed. Alternatively we could introduce copy=None
by default.
Adding a common test that checks that Estimator(copy=True).fit(X, y)
doesn't change X
.