`sklearn/tree/*` refactoring

This is the summary of what we have already discussed about the `sklearn/tree/splitter.*` cleaup.

The code uses separate classes for the dense and sparse data, and that mainly is to handle the `pre_sort=True` parameter, which used by the `GradientBoosting*` (not the new `HistGradientBoosting*`).

According to a quick benchmark done by @glemaitre , the `pre_sort=True` parameter gives a 2x speedup, which is insignificant to the speedup provided by the `HistGradientBoosting*`.

There were also other refactoring we could do to speedup the splitter, which we realized while reviewing the `HistGradientBoost*` code. The one I remember is that we could sort the data in the quick sort manner and pass the start and end indices, instead of passing the mask array to each splitter.

also ping @NicolasHug , @ogrisel 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

`sklearn/tree/*` refactoring #14711

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

sklearn/tree/* refactoring #14711

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`sklearn/tree/*` refactoring #14711