Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Division by Zero running GridSearchCV with HistGradientBoostingClassifier #14014

@stevennic

Description

@stevennic

Description

I just tried running GridSearchCV on HistGradientBoostingClassifier but it crashed with a division by zero error. I was running with verbose logging. The last report from GridSearchCV indicated the following status:

[CV]  learning_rate=1, max_iter=500, max_leaf_nodes=None, tol=1e-08, total=16.8min
[CV] learning_rate=1, max_iter=500, max_leaf_nodes=31, tol=1e-06 .....
Binning 0.015 GB of data: 0.089 s
Fitting gradient boosted rounds:
[1/500] 1 tree, 31 leaves, max depth = 8, in 0.042s
[2/500] 1 tree, 31 leaves, max depth = 9, in 0.058s
...
[489/500] 1 tree, 2 leaves, max depth = 1, in 0.016s
[490/500] 
<crash>

Steps/Code to Reproduce

from sklearn.experimental import enable_hist_gradient_boosting
from sklearn.ensemble import HistGradientBoostingClassifier

parameters={
    'learning_rate':(1,0.1,0.01),
    'max_iter':(100,500,1000),
    'max_leaf_nodes':(None,31,50),
    'tol':(1e-6,1e-7,1e-8)}
hgb = HistGradientBoostingClassifier(loss='binary_crossentropy',verbose=2)
clf=GridSearchCV(hgb, parameters, cv=5, verbose=2)
clf.fit(x,y)

Expected Results

Not to crash.

Actual Results

ZeroDivisionError                         Traceback (most recent call last)
<ipython-input-486-22ccf9e7ea94> in <module>
      7 clf=GridSearchCV(hgb,parameters,cv=5,verbose=2)
      8 # Order:
----> 9 clf.fit(x,y)
     10 clf.best_estimator_

d:\programming\python\3.7.2\lib\site-packages\sklearn\model_selection\_search.py in fit(self, X, y, groups, **fit_params)
    685                 return results
    686 
--> 687             self._run_search(evaluate_candidates)
    688 
    689         # For multi-metric evaluation, store the best_index_, best_params_ and

d:\programming\python\3.7.2\lib\site-packages\sklearn\model_selection\_search.py in _run_search(self, evaluate_candidates)
   1146     def _run_search(self, evaluate_candidates):
   1147         """Search all candidates in param_grid"""
-> 1148         evaluate_candidates(ParameterGrid(self.param_grid))
   1149 
   1150 

d:\programming\python\3.7.2\lib\site-packages\sklearn\model_selection\_search.py in evaluate_candidates(candidate_params)
    664                                for parameters, (train, test)
    665                                in product(candidate_params,
--> 666                                           cv.split(X, y, groups)))
    667 
    668                 if len(out) < 1:

d:\programming\python\3.7.2\lib\site-packages\joblib\parallel.py in __call__(self, iterable)
    922                 self._iterating = self._original_iterator is not None
    923 
--> 924             while self.dispatch_one_batch(iterator):
    925                 pass
    926 

d:\programming\python\3.7.2\lib\site-packages\joblib\parallel.py in dispatch_one_batch(self, iterator)
    757                 return False
    758             else:
--> 759                 self._dispatch(tasks)
    760                 return True
    761 

d:\programming\python\3.7.2\lib\site-packages\joblib\parallel.py in _dispatch(self, batch)
    714         with self._lock:
    715             job_idx = len(self._jobs)
--> 716             job = self._backend.apply_async(batch, callback=cb)
    717             # A job can complete so quickly than its callback is
    718             # called before we get here, causing self._jobs to

d:\programming\python\3.7.2\lib\site-packages\joblib\_parallel_backends.py in apply_async(self, func, callback)
    180     def apply_async(self, func, callback=None):
    181         """Schedule a func to be run"""
--> 182         result = ImmediateResult(func)
    183         if callback:
    184             callback(result)

d:\programming\python\3.7.2\lib\site-packages\joblib\_parallel_backends.py in __init__(self, batch)
    547         # Don't delay the application, to avoid keeping the input
    548         # arguments in memory
--> 549         self.results = batch()
    550 
    551     def get(self):

d:\programming\python\3.7.2\lib\site-packages\joblib\parallel.py in __call__(self)
    223         with parallel_backend(self._backend, n_jobs=self._n_jobs):
    224             return [func(*args, **kwargs)
--> 225                     for func, args, kwargs in self.items]
    226 
    227     def __len__(self):

d:\programming\python\3.7.2\lib\site-packages\joblib\parallel.py in <listcomp>(.0)
    223         with parallel_backend(self._backend, n_jobs=self._n_jobs):
    224             return [func(*args, **kwargs)
--> 225                     for func, args, kwargs in self.items]
    226 
    227     def __len__(self):

d:\programming\python\3.7.2\lib\site-packages\sklearn\model_selection\_validation.py in _fit_and_score(estimator, X, y, scorer, train, test, verbose, parameters, fit_params, return_train_score, return_parameters, return_n_test_samples, return_times, return_estimator, error_score)
    512             estimator.fit(X_train, **fit_params)
    513         else:
--> 514             estimator.fit(X_train, y_train, **fit_params)
    515 
    516     except Exception as e:

d:\programming\python\3.7.2\lib\site-packages\sklearn\ensemble\_hist_gradient_boosting\gradient_boosting.py in fit(self, X, y)
    255                     min_samples_leaf=self.min_samples_leaf,
    256                     l2_regularization=self.l2_regularization,
--> 257                     shrinkage=self.learning_rate)
    258                 grower.grow()
    259 

d:\programming\python\3.7.2\lib\site-packages\sklearn\ensemble\_hist_gradient_boosting\grower.py in __init__(self, X_binned, gradients, hessians, max_leaf_nodes, max_depth, min_samples_leaf, min_gain_to_split, max_bins, actual_n_bins, l2_regularization, min_hessian_to_split, shrinkage)
    194         self.total_compute_hist_time = 0.  # time spent computing histograms
    195         self.total_apply_split_time = 0.  # time spent splitting nodes
--> 196         self._intilialize_root(gradients, hessians, hessians_are_constant)
    197         self.n_nodes = 1
    198 

d:\programming\python\3.7.2\lib\site-packages\sklearn\ensemble\_hist_gradient_boosting\grower.py in _intilialize_root(self, gradients, hessians, hessians_are_constant)
    259             return
    260         if sum_hessians < self.splitter.min_hessian_to_split:
--> 261             self._finalize_leaf(self.root)
    262             return
    263 

d:\programming\python\3.7.2\lib\site-packages\sklearn\ensemble\_hist_gradient_boosting\grower.py in _finalize_leaf(self, node)
    398         """
    399         node.value = -self.shrinkage * node.sum_gradients / (
--> 400             node.sum_hessians + self.splitter.l2_regularization)
    401         self.finalized_leaves.append(node)
    402 

ZeroDivisionError: float division by zero

Versions

System:
    python: 3.7.2 (tags/v3.7.2:9a3ffc0492, Dec 23 2018, 23:09:28) [MSC v.1916 64 bit (AMD64)]
executable: D:\Programming\Python\3.7.2\python.exe
   machine: Windows-10-10.0.17763-SP0

BLAS:
    macros:
  lib_dirs:
cblas_libs: cblas

Python deps:
       pip: 19.1.1
setuptools: 40.8.0
   sklearn: 0.21.2
     numpy: 1.16.2
     scipy: 1.2.1
    Cython: None
    pandas: 0.24.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions