Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit a34afbd

Browse files
lorentzenchrthomasjpfanogrisel
authored andcommitted
FIX out of bound error in split_indices (#21130)
Co-authored-by: Thomas J. Fan <[email protected]> Co-authored-by: Olivier Grisel <[email protected]>
1 parent e65ba33 commit a34afbd

File tree

2 files changed

+35
-6
lines changed

2 files changed

+35
-6
lines changed

doc/whats_new/v1.1.rst

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,21 @@ Changelog
3838
:pr:`123456` by :user:`Joe Bloggs <joeongithub>`.
3939
where 123456 is the *pull request* number, not the issue number.
4040
41+
:mod:`sklearn.calibration`
42+
..........................
43+
44+
- |Enhancement| :func:`calibration.calibration_curve` accepts a parameter
45+
`pos_label` to specify the positive class label.
46+
:pr:`21032` by :user:`Guillaume Lemaitre <glemaitre>`.
47+
48+
:mod:`sklearn.ensemble`
49+
...........................
50+
51+
- |Fix| Fixed a bug that could produce a segfault in rare cases for
52+
:class:`ensemble.HistGradientBoostingClassifier` and
53+
:class:`ensemble.HistGradientBoostingRegressor`.
54+
:pr:`21130` :user:`Christian Lorentzen <lorentzenchr>`.
55+
4156
:mod:`sklearn.linear_model`
4257
...........................
4358

sklearn/ensemble/_hist_gradient_boosting/splitting.pyx

Lines changed: 20 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -388,11 +388,25 @@ cdef class Splitter:
388388
&left_indices_buffer[offset_in_buffers[thread_idx]],
389389
sizeof(unsigned int) * left_counts[thread_idx]
390390
)
391-
memcpy(
392-
&sample_indices[right_offset[thread_idx]],
393-
&right_indices_buffer[offset_in_buffers[thread_idx]],
394-
sizeof(unsigned int) * right_counts[thread_idx]
395-
)
391+
if right_counts[thread_idx] > 0:
392+
# If we're splitting the rightmost node of the tree, i.e. the
393+
# rightmost node in the partition array, and if n_threads >= 2, one
394+
# might have right_counts[-1] = 0 and right_offset[-1] = len(sample_indices)
395+
# leading to evaluating
396+
#
397+
# &sample_indices[right_offset[-1]] = &samples_indices[n_samples_at_node]
398+
# = &partition[n_samples_in_tree]
399+
#
400+
# which is an out-of-bounds read access that can cause a segmentation fault.
401+
# When boundscheck=True, removing this check produces this exception:
402+
#
403+
# IndexError: Out of bounds on buffer access
404+
#
405+
memcpy(
406+
&sample_indices[right_offset[thread_idx]],
407+
&right_indices_buffer[offset_in_buffers[thread_idx]],
408+
sizeof(unsigned int) * right_counts[thread_idx]
409+
)
396410

397411
return (sample_indices[:right_child_position],
398412
sample_indices[right_child_position:],
@@ -839,7 +853,7 @@ cdef class Splitter:
839853
# other category. The low-support categories will always be mapped to
840854
# the right child. We scan the sorted categories array from left to
841855
# right and from right to left, and we stop at the middle.
842-
856+
843857
# Considering ordered categories A B C D, with E being a low-support
844858
# category: A B C D
845859
# ^

0 commit comments

Comments
 (0)