-
-
Notifications
You must be signed in to change notification settings - Fork 26.6k
Description
This is the summary of what we have already discussed about the sklearn/tree/splitter.* cleaup.
The code uses separate classes for the dense and sparse data, and that mainly is to handle the pre_sort=True parameter, which used by the GradientBoosting* (not the new HistGradientBoosting*).
According to a quick benchmark done by @glemaitre , the pre_sort=True parameter gives a 2x speedup, which is insignificant to the speedup provided by the HistGradientBoosting*.
There were also other refactoring we could do to speedup the splitter, which we realized while reviewing the HistGradientBoost* code. The one I remember is that we could sort the data in the quick sort manner and pass the start and end indices, instead of passing the mask array to each splitter.
also ping @NicolasHug , @ogrisel