Thanks to visit codestin.com
Credit goes to github.com

Skip to content

HistGradientBoostingRegressor with 'least_absolute_deviation' loss function and sample_weight raises ValueError: indices and arr must have the same number of dimensions #19400

@vadim-ushtanit

Description

@vadim-ushtanit

Describe the bug

HistGradientBoostingRegressor with least_absolute_deviation loss function raises ValueError on fit call with sample_weight parameter.

Steps/Code to Reproduce

import numpy as np
from sklearn.experimental import enable_hist_gradient_boosting
from sklearn.ensemble import HistGradientBoostingRegressor

n = 500000
x = np.random.uniform(-1, 1, [n, 3])
y = np.random.uniform(-1, 1, n)
sample_weight = np.random.uniform(0, 1, n)
gb = HistGradientBoostingRegressor(loss='least_absolute_deviation')
gb.fit(x, y, sample_weight=sample_weight)

Expected Results

No error is thrown.

Actual Results

Traceback (most recent call last):
  File "/snap/pycharm-professional/230/plugins/python/helpers/pydev/pydevd.py", line 1477, in _exec
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/snap/pycharm-professional/230/plugins/python/helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"\n", file, 'exec'), glob, loc)
  File "/home/vadim/Загрузки/tt.py", line 11, in <module>
    gb.fit(x, y, sample_weight=sample_weight)
  File "/home/vadim/projects/monza/venv/lib/python3.8/site-packages/sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py", line 466, in fit
    self._loss.update_leaves_values(grower, y_train,
  File "/home/vadim/projects/monza/venv/lib/python3.8/site-packages/sklearn/ensemble/_hist_gradient_boosting/loss.py", line 264, in update_leaves_values
    median_res = _weighted_percentile(y_true[indices]
  File "/home/vadim/projects/monza/venv/lib/python3.8/site-packages/sklearn/utils/stats.py", line 43, in _weighted_percentile
    sorted_weights = _take_along_axis(sample_weight, sorted_idx, axis=0)
  File "/home/vadim/projects/monza/venv/lib/python3.8/site-packages/sklearn/utils/fixes.py", line 172, in _take_along_axis
    return np.take_along_axis(arr=arr, indices=indices, axis=axis)
  File "<__array_function__ internals>", line 5, in take_along_axis
  File "/home/vadim/projects/monza/venv/lib/python3.8/site-packages/numpy/lib/shape_base.py", line 170, in take_along_axis
    return arr[_make_along_axis_idx(arr_shape, indices, axis)]
  File "/home/vadim/projects/monza/venv/lib/python3.8/site-packages/numpy/lib/shape_base.py", line 34, in _make_along_axis_idx
    raise ValueError(
ValueError: `indices` and `arr` must have the same number of dimensions

Versions

Output from sklearn.show_versions():

    python: 3.8.5 (default, Jul 28 2020, 12:59:40)  [GCC 9.3.0]
executable: /home/vadim/projects/monza/venv/bin/python
   machine: Linux-5.8.0-41-generic-x86_64-with-glibc2.29

Python dependencies:
          pip: 20.3.3
   setuptools: 51.3.3
      sklearn: 0.24.0
        numpy: 1.19.5
        scipy: 1.6.0
       Cython: None
       pandas: 1.2.0
   matplotlib: 3.3.3
       joblib: 1.0.0
threadpoolctl: 2.1.0

Built with OpenMP: True

Possible bug location

It seems like bug in sklearn.ensemble._hist_gradient_boosting.loss.py file LeastAbsoluteDeviation.update_leaves_values method, line 264:

median_res = _weighted_percentile(y_true[indices]
                                                  - raw_predictions[indices],
                                                  sample_weight=sample_weight,  # -> sample_weight[indices]
                                                  percentile=50)

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugEasyWell-defined and straightforward way to resolve

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions