-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
ENH reuse parent histograms as one of the child's histogram #27865
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH reuse parent histograms as one of the child's histogram #27865
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! How much runtime improve do you get with this PR?
@@ -618,9 +618,8 @@ def split_next(self): | |||
if child.is_leaf: | |||
del child.histograms | |||
|
|||
# Release memory used by histograms as they are no longer needed for | |||
# internal nodes once children histograms have been computed. | |||
del node.histograms |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was included in e325f16 because of a memory issue. To be safe, can you rerun the benchmark in #18334 (comment) to make sure there are no regressions?
Main
This PR
Wow, this seems to bring back the cyclic memory references. So, current state of PR is worse than main. But note the large variation even for main branch. Taken from #18334 (comment) from sklearn.datasets import make_classification
from sklearn.experimental import enable_hist_gradient_boosting
from sklearn.ensemble import HistGradientBoostingClassifier
from memory_profiler import memory_usage
X, y = make_classification(n_classes=2,
n_samples=10_000,
n_features=400,
random_state=0)
hgb = HistGradientBoostingClassifier(
max_iter=100,
max_leaf_nodes=127,
learning_rate=.1,
random_state=0,
verbose=1,
)
mems = memory_usage((hgb.fit, (X, y)))
print(f"{max(mems):.2f}, {max(mems) - min(mems):.2f} MB") |
I fixed the cyclic memory references again in d242a6d. Now, I get:
Results show a large variation. Runtime seems improved by roughly 10%, but memory usage seems, on average, a bit worse than main. |
Interesting: If only the lines mems = memory_usage((hgb.fit, (X, y)))
print(f"{max(mems):.2f}, {max(mems) - min(mems):.2f} MB") are run again in the same ipython instance, I get (Run 1 full, run 2... only the 2 lines): Main
PR
Conclusion: This PR is a clear improvement. It would be nice to better understand some gc behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This adds a little bit of complexity, but it still looks manageable. LGTM!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
I think most (all?) implementations of malloc
(called by np.empty
) now reuse blocks of memory between allocations and deallocation, so the overhead might only be numpy's wrappers'.
In dilettante, I just have one comment regarding the potential extension of some context that might now qualify for nogil
.
Reference Issues/PRs
None
What does this implement/fix? Explain your changes.
This PR reuses the parent node's histogram in the histogram subtraction trick in HGBT (as does LightGBM). This saves new memory allocation for one of the child nodes and also makes the histogram subtraction a tiny bit faster. (But the hist subtraction is only a fraction of the overall fit time, so basically no effect on fit.)
Any other comments?