-
-
Notifications
You must be signed in to change notification settings - Fork 26.6k
Description
As stated in the change log of PR #31414
max_samplesis now interpreted as a fraction ofsample_weight.sum()instead ofX.shape[0]when passed as a float.
Because max_samples default to 1.0, this is a change of the default behavior. I expressed concerns about this change in this thread #31529 (comment) for a different PR but the same kind of change (on random forests).
It seems @antoinebaker, the author of both PRs, agrees with me that having max_samples=None as default (and None is then re-interpreted as X.shape[0]) sounds safer/less surprising for users. (I'll let you confirm or not @antoinebaker ^^).
So I propose to not merge the PR/commit #31414 in v1.8 (is this even possible? 😅 @lesteve)
Or, merge a PR to change the default before v1.8? (not sure this is possible either).
Why I think this is "urgent"? Because we don't want to change default twice in a row (those are backward incompatible changes: same code produces different results).
What are your thoughts? @antoinebaker @ogrisel