Thanks to visit codestin.com
Credit goes to github.com

Skip to content

RFC: Bagging estimators: avoid changing max_samples default behavior in 1.8 #32805

@cakedev0

Description

@cakedev0

As stated in the change log of PR #31414

max_samples is now interpreted as a fraction of sample_weight.sum() instead of X.shape[0] when passed as a float.

Because max_samples default to 1.0, this is a change of the default behavior. I expressed concerns about this change in this thread #31529 (comment) for a different PR but the same kind of change (on random forests).

It seems @antoinebaker, the author of both PRs, agrees with me that having max_samples=None as default (and None is then re-interpreted as X.shape[0]) sounds safer/less surprising for users. (I'll let you confirm or not @antoinebaker ^^).

So I propose to not merge the PR/commit #31414 in v1.8 (is this even possible? 😅 @lesteve)

Or, merge a PR to change the default before v1.8? (not sure this is possible either).

Why I think this is "urgent"? Because we don't want to change default twice in a row (those are backward incompatible changes: same code produces different results).

What are your thoughts? @antoinebaker @ogrisel

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions