ENH reuse parent histograms as one of the child's histogram #27865

lorentzenchr · 2023-11-28T12:07:08Z

Reference Issues/PRs

None

What does this implement/fix? Explain your changes.

This PR reuses the parent node's histogram in the histogram subtraction trick in HGBT (as does LightGBM). This saves new memory allocation for one of the child nodes and also makes the histogram subtraction a tiny bit faster. (But the hist subtraction is only a fraction of the overall fit time, so basically no effect on fit.)

Any other comments?

github-actions · 2023-11-28T12:08:23Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 6ac7413. Link to the linter CI: here}

thomasjpfan

Thanks for the PR! How much runtime improve do you get with this PR?

thomasjpfan · 2023-11-28T12:19:41Z

sklearn/ensemble/_hist_gradient_boosting/grower.py

@@ -618,9 +618,8 @@ def split_next(self):
                if child.is_leaf:
                    del child.histograms

-        # Release memory used by histograms as they are no longer needed for
-        # internal nodes once children histograms have been computed.
-        del node.histograms


This was included in e325f16 because of a memory issue. To be safe, can you rerun the benchmark in #18334 (comment) to make sure there are no regressions?

lorentzenchr · 2023-11-28T14:06:02Z

Main
Run 1 | Run 2 | Run 3 in same ipython instance

Fit 100 trees in 7.730 s, (12700 total leaves) | 7.693 s           | 7.636 s
Time spent computing histograms: 3.950s        | 3.928s            | 3.818s
Time spent finding best splits:  2.086s        | 2.061s            | 2.164s
Time spent applying splits:      0.233s        | 0.236s            | 0.246s
Time spent predicting:           0.008s        | 0.008s            | 0.008s
281.06, 104.00 MB                              | 295.38, 117.08 MB | 325.07, 141.31 MB

This PR

Fit 100 trees in 7.917 s, (12700 total leaves) | 7.764 s
Time spent computing histograms: 3.963s        | 3.872s
Time spent finding best splits:  2.123s        | 2.064s
Time spent applying splits:      0.245s        | 0.239s
Time spent predicting:           0.008s        | 0.007s
336.08, 159.11 MB                              | 349.80, 171.91 MB

Wow, this seems to bring back the cyclic memory references. So, current state of PR is worse than main. But note the large variation even for main branch.

Taken from #18334 (comment)

from sklearn.datasets import make_classification
from sklearn.experimental import enable_hist_gradient_boosting
from sklearn.ensemble import HistGradientBoostingClassifier
from memory_profiler import memory_usage

X, y = make_classification(n_classes=2,
                           n_samples=10_000,
                           n_features=400,
                           random_state=0)

hgb = HistGradientBoostingClassifier(
    max_iter=100,
    max_leaf_nodes=127,
    learning_rate=.1,
    random_state=0,
    verbose=1,
)

mems = memory_usage((hgb.fit, (X, y)))
print(f"{max(mems):.2f}, {max(mems) - min(mems):.2f} MB")

lorentzenchr · 2023-11-28T14:23:53Z

I fixed the cyclic memory references again in d242a6d. Now, I get:
Run 1 | 2 | 3 in same ipython instance

Fit 100 trees in 6.721 s, (12700 total leaves) | 6.909 s            | 6.917 s
Time spent computing histograms: 3.092s        | 3.213s             | 3.230s
Time spent finding best splits:  2.041s        | 2.121s             | 2.147s
Time spent applying splits:      0.259s        | 0.234s             | 0.236s
Time spent predicting:           0.007s        | 0.008s             | 0.008s
286.75, 110.00 MB                              | 315.38, 137.90 MB  | 323.48, 145.31 MB

Results show a large variation. Runtime seems improved by roughly 10%, but memory usage seems, on average, a bit worse than main.

lorentzenchr · 2023-11-28T19:12:29Z

Interesting: If only the lines

mems = memory_usage((hgb.fit, (X, y)))
print(f"{max(mems):.2f}, {max(mems) - min(mems):.2f} MB")

are run again in the same ipython instance, I get (Run 1 full, run 2... only the 2 lines):

Main

Run	total time [s]	time histograms [s]	max memory [MB]	max - min memory [MB]
1	7.788	3.911	295.03	118.02
2	7.702	3.906	271.74	100.22
3	7.753	3.940	283.8	109.86
4	7.707	3.904	276.03	103.54

PR

Run	total time [s]	time histograms [s]	max memory [MB]	max - min memory [MB]
1	7.263	3.426	276.61	99.85
2	7.067	3.358	286.92	115.39
3	7.234	3.416	278.11	105.56
4	6.997	3.337	291.90	119.30

Conclusion: This PR is a clear improvement. It would be nice to better understand some gc behavior.

thomasjpfan

This adds a little bit of complexity, but it still looks manageable. LGTM!

jjerphan

LGTM.

I think most (all?) implementations of malloc (called by np.empty) now reuse blocks of memory between allocations and deallocation, so the overhead might only be numpy's wrappers'.

In dilettante, I just have one comment regarding the potential extension of some context that might now qualify for nogil.

sklearn/ensemble/_hist_gradient_boosting/histogram.pyx

ENH reuse parent histograms as one of the child's histogram

23531cc

github-actions bot added module:ensemble cython labels Nov 28, 2023

thomasjpfan reviewed Nov 28, 2023

View reviewed changes

ENH break cyclic memory references again

d242a6d

DOC add whatsnew

cc2f238

thomasjpfan approved these changes Nov 30, 2023

View reviewed changes

thomasjpfan added the Waiting for Second Reviewer First reviewer is done, need a second one! label Nov 30, 2023

jjerphan approved these changes Dec 2, 2023

View reviewed changes

sklearn/ensemble/_hist_gradient_boosting/histogram.pyx Show resolved Hide resolved

lorentzenchr added this to the 1.4 milestone Dec 3, 2023

lorentzenchr added 2 commits December 3, 2023 16:22

Merge branch 'main' into hgbt_reuse_parent_hist_in_subtract_histogram

2210eb7

Merge branch 'main' into hgbt_reuse_parent_hist_in_subtract_histogram

6ac7413

jjerphan merged commit 7b9f794 into scikit-learn:main Dec 3, 2023

lorentzenchr deleted the hgbt_reuse_parent_hist_in_subtract_histogram branch December 4, 2023 09:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH reuse parent histograms as one of the child's histogram #27865

ENH reuse parent histograms as one of the child's histogram #27865

Uh oh!

lorentzenchr commented Nov 28, 2023

Uh oh!

github-actions bot commented Nov 28, 2023 •

edited

Loading

Uh oh!

thomasjpfan left a comment

Uh oh!

thomasjpfan Nov 28, 2023

Uh oh!

lorentzenchr commented Nov 28, 2023 •

edited

Loading

Uh oh!

lorentzenchr commented Nov 28, 2023 •

edited

Loading

Uh oh!

lorentzenchr commented Nov 28, 2023

Uh oh!

thomasjpfan left a comment

Uh oh!

jjerphan left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ENH reuse parent histograms as one of the child's histogram #27865

ENH reuse parent histograms as one of the child's histogram #27865

Uh oh!

Conversation

lorentzenchr commented Nov 28, 2023

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

github-actions bot commented Nov 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

thomasjpfan Nov 28, 2023

Choose a reason for hiding this comment

Uh oh!

lorentzenchr commented Nov 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lorentzenchr commented Nov 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lorentzenchr commented Nov 28, 2023

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

jjerphan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Nov 28, 2023 •

edited

Loading

lorentzenchr commented Nov 28, 2023 •

edited

Loading

lorentzenchr commented Nov 28, 2023 •

edited

Loading