From 3a4be9a289a717385d0964675aea8d5886b9fda8 Mon Sep 17 00:00:00 2001 From: ArturoAmorQ Date: Tue, 21 May 2024 10:39:34 +0200 Subject: [PATCH 1/3] DOC Add quantile loss to user guide on HGBT regression --- doc/modules/ensemble.rst | 21 +++++++++++++-------- 1 file changed, 13 insertions(+), 8 deletions(-) diff --git a/doc/modules/ensemble.rst b/doc/modules/ensemble.rst index 08c831431d197..c00ccc08182fa 100644 --- a/doc/modules/ensemble.rst +++ b/doc/modules/ensemble.rst @@ -98,14 +98,19 @@ controls the number of iterations of the boosting process:: >>> clf.score(X_test, y_test) 0.8965 -Available losses for regression are 'squared_error', -'absolute_error', which is less sensitive to outliers, and -'poisson', which is well suited to model counts and frequencies. For -classification, 'log_loss' is the only option. For binary classification it uses the -binary log loss, also known as binomial deviance or binary cross-entropy. For -`n_classes >= 3`, it uses the multi-class log loss function, with multinomial deviance -and categorical cross-entropy as alternative names. The appropriate loss version is -selected based on :term:`y` passed to :term:`fit`. +Available losses for **regression** are: + +- 'squared_error', which is the default loss; +- 'absolute_error', which is less sensitive to outliers than the squared error; +- 'poisson', which is well suited to model counts and frequencies; +- 'quantile' that allows for estimating prediction intervals. + +For **classification**, 'log_loss' is the only option. For binary classification +it uses the binary log loss, also known as binomial deviance or binary +cross-entropy. For `n_classes >= 3`, it uses the multi-class log loss function, +with multinomial deviance and categorical cross-entropy as alternative names. +The appropriate loss version is selected based on :term:`y` passed to +:term:`fit`. The size of the trees can be controlled through the ``max_leaf_nodes``, ``max_depth``, and ``min_samples_leaf`` parameters. From 572555a47411221c441c4ec54ed5463a8547b2f4 Mon Sep 17 00:00:00 2001 From: Arturo Amor <86408019+ArturoAmorQ@users.noreply.github.com> Date: Tue, 21 May 2024 18:17:24 +0200 Subject: [PATCH 2/3] Update doc/modules/ensemble.rst Co-authored-by: Guillaume Lemaitre --- doc/modules/ensemble.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/doc/modules/ensemble.rst b/doc/modules/ensemble.rst index c00ccc08182fa..886a91b92d1e4 100644 --- a/doc/modules/ensemble.rst +++ b/doc/modules/ensemble.rst @@ -103,7 +103,8 @@ Available losses for **regression** are: - 'squared_error', which is the default loss; - 'absolute_error', which is less sensitive to outliers than the squared error; - 'poisson', which is well suited to model counts and frequencies; -- 'quantile' that allows for estimating prediction intervals. +- 'quantile', which allows for estimating a conditional quantile that can later + be used to predict confidence intervals. For **classification**, 'log_loss' is the only option. For binary classification it uses the binary log loss, also known as binomial deviance or binary From 442a0a14c4096cdea642bf5620325cbec2439331 Mon Sep 17 00:00:00 2001 From: Arturo Amor <86408019+ArturoAmorQ@users.noreply.github.com> Date: Tue, 11 Jun 2024 14:08:52 +0200 Subject: [PATCH 3/3] Apply suggestions from code review Co-authored-by: Olivier Grisel --- doc/modules/ensemble.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/doc/modules/ensemble.rst b/doc/modules/ensemble.rst index 886a91b92d1e4..3a2c85d138bfc 100644 --- a/doc/modules/ensemble.rst +++ b/doc/modules/ensemble.rst @@ -102,9 +102,10 @@ Available losses for **regression** are: - 'squared_error', which is the default loss; - 'absolute_error', which is less sensitive to outliers than the squared error; +- 'gamma', which is well suited to model strictly positive outcomes; - 'poisson', which is well suited to model counts and frequencies; - 'quantile', which allows for estimating a conditional quantile that can later - be used to predict confidence intervals. + be used to obtain prediction intervals. For **classification**, 'log_loss' is the only option. For binary classification it uses the binary log loss, also known as binomial deviance or binary