Thanks to visit codestin.com
Credit goes to github.com

Skip to content

sklearn/tree/* refactoring #14711

@adrinjalali

Description

@adrinjalali

This is the summary of what we have already discussed about the sklearn/tree/splitter.* cleaup.

The code uses separate classes for the dense and sparse data, and that mainly is to handle the pre_sort=True parameter, which used by the GradientBoosting* (not the new HistGradientBoosting*).

According to a quick benchmark done by @glemaitre , the pre_sort=True parameter gives a 2x speedup, which is insignificant to the speedup provided by the HistGradientBoosting*.

There were also other refactoring we could do to speedup the splitter, which we realized while reviewing the HistGradientBoost* code. The one I remember is that we could sort the data in the quick sort manner and pass the start and end indices, instead of passing the mask array to each splitter.

also ping @NicolasHug , @ogrisel

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions