Thanks to visit codestin.com
Credit goes to github.com

Skip to content

ENH Break cyclic references in Histogram GBRT #18334

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Sep 3, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions doc/whats_new/v0.24.rst
Original file line number Diff line number Diff line change
Expand Up @@ -188,6 +188,13 @@ Changelog
method `staged_predict`, which allows monitoring of each stage.
:pr:`16985` by :user:`Hao Chun Chang <haochunchang>`.

- |Efficiency| break cyclic references in the tree nodes used internally in
:class:`ensemble.HistGradientBoostingRegressor` and
:class:`ensemble.HistGradientBoostingClassifier` to allow for the timely
garbage collection of large intermediate datastructures and to improve memory
usage in `fit`. :pr:`18334` by `Olivier Grisel`_ `Nicolas Hug`_, `Thomas
Fan`_ and `Andreas Müller`_.

- |API|: The parameter ``n_classes_`` is now deprecated in
:class:`ensemble.GradientBoostingRegressor` and returns `1`.
:pr:`17702` by :user:`Simona Maggio <simonamaggio>`.
Expand Down
23 changes: 11 additions & 12 deletions sklearn/ensemble/_hist_gradient_boosting/grower.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,8 +39,6 @@ class TreeNode:
The sum of the gradients of the samples at the node.
sum_hessians : float
The sum of the hessians of the samples at the node.
parent : TreeNode, default=None
The parent of the node. None for root.

Attributes
----------
Expand All @@ -52,8 +50,6 @@ class TreeNode:
The sum of the gradients of the samples at the node.
sum_hessians : float
The sum of the hessians of the samples at the node.
parent : TreeNode or None
The parent of the node. None for root.
split_info : SplitInfo or None
The result of the split evaluation.
left_child : TreeNode or None
Expand All @@ -73,8 +69,6 @@ class TreeNode:
left_child = None
right_child = None
histograms = None
sibling = None
parent = None

# start and stop indices of the node in the splitter.partition
# array. Concretely,
Expand All @@ -88,13 +82,12 @@ class TreeNode:
partition_stop = 0

def __init__(self, depth, sample_indices, sum_gradients,
sum_hessians, parent=None, value=None):
sum_hessians, value=None):
self.depth = depth
self.sample_indices = sample_indices
self.n_samples = sample_indices.shape[0]
self.sum_gradients = sum_gradients
self.sum_hessians = sum_hessians
self.parent = parent
self.value = value
self.is_leaf = False
self.set_children_bounds(float('-inf'), float('+inf'))
Expand Down Expand Up @@ -388,19 +381,15 @@ def split_next(self):
sample_indices_left,
node.split_info.sum_gradient_left,
node.split_info.sum_hessian_left,
parent=node,
value=node.split_info.value_left,
)
right_child_node = TreeNode(depth,
sample_indices_right,
node.split_info.sum_gradient_right,
node.split_info.sum_hessian_right,
parent=node,
value=node.split_info.value_right,
)

left_child_node.sibling = right_child_node
right_child_node.sibling = left_child_node
node.right_child = right_child_node
node.left_child = left_child_node

Expand Down Expand Up @@ -492,6 +481,16 @@ def split_next(self):
self._compute_best_split_and_push(right_child_node)
self.total_find_split_time += time() - tic

# Release memory used by histograms as they are no longer needed
# for leaf nodes since they won't be split.
for child in (left_child_node, right_child_node):
if child.is_leaf:
del child.histograms

# Release memory used by histograms as they are no longer needed for
# internal nodes once children histograms have been computed.
del node.histograms

return left_child_node, right_child_node

def _finalize_leaf(self, node):
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -100,30 +100,30 @@ def assert_children_values_bounded(grower, monotonic_cst):
if monotonic_cst == MonotonicConstraint.NO_CST:
return

def recursively_check_children_node_values(node):
def recursively_check_children_node_values(node, right_sibling=None):
if node.is_leaf:
return
if node is not grower.root and node is node.parent.left_child:
sibling = node.sibling # on the right
middle = (node.value + sibling.value) / 2
if right_sibling is not None:
middle = (node.value + right_sibling.value) / 2
if monotonic_cst == MonotonicConstraint.POS:
assert (node.left_child.value <=
node.right_child.value <=
middle)
if not sibling.is_leaf:
if not right_sibling.is_leaf:
assert (middle <=
sibling.left_child.value <=
sibling.right_child.value)
right_sibling.left_child.value <=
right_sibling.right_child.value)
else: # NEG
assert (node.left_child.value >=
node.right_child.value >=
middle)
if not sibling.is_leaf:
if not right_sibling.is_leaf:
assert (middle >=
sibling.left_child.value >=
sibling.right_child.value)
right_sibling.left_child.value >=
right_sibling.right_child.value)

recursively_check_children_node_values(node.left_child)
recursively_check_children_node_values(node.left_child,
right_sibling=node.right_child)
recursively_check_children_node_values(node.right_child)

recursively_check_children_node_values(grower.root)
Expand Down