-
-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Closed
Labels
BugEasyWell-defined and straightforward way to resolveWell-defined and straightforward way to resolve
Description
Describe the bug
HistGradientBoostingRegressor
with least_absolute_deviation
loss function raises ValueError on fit
call with sample_weight
parameter.
Steps/Code to Reproduce
import numpy as np
from sklearn.experimental import enable_hist_gradient_boosting
from sklearn.ensemble import HistGradientBoostingRegressor
n = 500000
x = np.random.uniform(-1, 1, [n, 3])
y = np.random.uniform(-1, 1, n)
sample_weight = np.random.uniform(0, 1, n)
gb = HistGradientBoostingRegressor(loss='least_absolute_deviation')
gb.fit(x, y, sample_weight=sample_weight)
Expected Results
No error is thrown.
Actual Results
Traceback (most recent call last):
File "/snap/pycharm-professional/230/plugins/python/helpers/pydev/pydevd.py", line 1477, in _exec
pydev_imports.execfile(file, globals, locals) # execute the script
File "/snap/pycharm-professional/230/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
exec(compile(contents+"\n", file, 'exec'), glob, loc)
File "/home/vadim/Загрузки/tt.py", line 11, in <module>
gb.fit(x, y, sample_weight=sample_weight)
File "/home/vadim/projects/monza/venv/lib/python3.8/site-packages/sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py", line 466, in fit
self._loss.update_leaves_values(grower, y_train,
File "/home/vadim/projects/monza/venv/lib/python3.8/site-packages/sklearn/ensemble/_hist_gradient_boosting/loss.py", line 264, in update_leaves_values
median_res = _weighted_percentile(y_true[indices]
File "/home/vadim/projects/monza/venv/lib/python3.8/site-packages/sklearn/utils/stats.py", line 43, in _weighted_percentile
sorted_weights = _take_along_axis(sample_weight, sorted_idx, axis=0)
File "/home/vadim/projects/monza/venv/lib/python3.8/site-packages/sklearn/utils/fixes.py", line 172, in _take_along_axis
return np.take_along_axis(arr=arr, indices=indices, axis=axis)
File "<__array_function__ internals>", line 5, in take_along_axis
File "/home/vadim/projects/monza/venv/lib/python3.8/site-packages/numpy/lib/shape_base.py", line 170, in take_along_axis
return arr[_make_along_axis_idx(arr_shape, indices, axis)]
File "/home/vadim/projects/monza/venv/lib/python3.8/site-packages/numpy/lib/shape_base.py", line 34, in _make_along_axis_idx
raise ValueError(
ValueError: `indices` and `arr` must have the same number of dimensions
Versions
Output from sklearn.show_versions()
:
python: 3.8.5 (default, Jul 28 2020, 12:59:40) [GCC 9.3.0]
executable: /home/vadim/projects/monza/venv/bin/python
machine: Linux-5.8.0-41-generic-x86_64-with-glibc2.29
Python dependencies:
pip: 20.3.3
setuptools: 51.3.3
sklearn: 0.24.0
numpy: 1.19.5
scipy: 1.6.0
Cython: None
pandas: 1.2.0
matplotlib: 3.3.3
joblib: 1.0.0
threadpoolctl: 2.1.0
Built with OpenMP: True
Possible bug location
It seems like bug in sklearn.ensemble._hist_gradient_boosting.loss.py
file LeastAbsoluteDeviation.update_leaves_values
method, line 264:
median_res = _weighted_percentile(y_true[indices]
- raw_predictions[indices],
sample_weight=sample_weight, # -> sample_weight[indices]
percentile=50)
Metadata
Metadata
Assignees
Labels
BugEasyWell-defined and straightforward way to resolveWell-defined and straightforward way to resolve