Enhanced classifier comparison example with Gradient Boosting Classifier and improved comments #30476

Wilbebs · 2024-12-12T19:44:58Z

Added a summary at the top of the file for clarity.
Enhanced comments for classifiers, datasets, and visualizations.
Added the Gradient Boosting Classifier as a new feature to improve functionality.
Improved the script's clarity and made it more accessible for new contributors.

…mple

… example

github-actions · 2024-12-12T19:46:18Z

❌ Linting issues

This PR is introducing linting issues. Here's a summary of the issues. Note that you can avoid having linting issues by enabling pre-commit hooks. Instructions to enable them can be found here.

You can see the details of the linting issues under the lint job here

`black`

black detected issues. Please run black . locally and push the changes. Here you can see the detected issues. Note that running black might also fix some of the issues which might be detected by ruff. Note that the installed black version is black=24.3.0.


--- /home/runner/work/scikit-learn/scikit-learn/examples/classification/plot_classifier_comparison.py	2024-12-12 19:45:14.925881+00:00
+++ /home/runner/work/scikit-learn/scikit-learn/examples/classification/plot_classifier_comparison.py	2024-12-12 19:45:23.184305+00:00
@@ -16,10 +16,11 @@
 The plots show training points in solid colors and testing points
 semi-transparent. The lower right shows the classification accuracy on the test
 set.
 
 """
+
 """
 This script compares the performance of multiple classification algorithms on
 synthetic datasets. It visualizes the decision boundaries of classifiers such as
 Logistic Regression, SVM, Decision Tree, Random Forest, and Gradient Boosting. 
 Use this as a reference to understand how different classifiers perform on simple data.
@@ -116,11 +117,13 @@
     cm_bright = ListedColormap(["#FF0000", "#0000FF"])
     ax = plt.subplot(len(datasets), len(classifiers) + 1, i)
     if ds_cnt == 0:
         ax.set_title("Input data")
     ax.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap=cm_bright, edgecolors="k")
-    ax.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap=cm_bright, alpha=0.6, edgecolors="k")
+    ax.scatter(
+        X_test[:, 0], X_test[:, 1], c=y_test, cmap=cm_bright, alpha=0.6, edgecolors="k"
+    )
     ax.set_xlim(x_min, x_max)
     ax.set_ylim(y_min, y_max)
     ax.set_xticks(())
     ax.set_yticks(())
     i += 1
@@ -135,12 +138,21 @@
 
         DecisionBoundaryDisplay.from_estimator(
             clf, X, cmap=cm, alpha=0.8, ax=ax, eps=0.5
         )
 
-        ax.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap=cm_bright, edgecolors="k")
-        ax.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap=cm_bright, edgecolors="k", alpha=0.6)
+        ax.scatter(
+            X_train[:, 0], X_train[:, 1], c=y_train, cmap=cm_bright, edgecolors="k"
+        )
+        ax.scatter(
+            X_test[:, 0],
+            X_test[:, 1],
+            c=y_test,
+            cmap=cm_bright,
+            edgecolors="k",
+            alpha=0.6,
+        )
 
         ax.set_xlim(x_min, x_max)
         ax.set_ylim(y_min, y_max)
         ax.set_xticks(())
         ax.set_yticks(())
would reformat /home/runner/work/scikit-learn/scikit-learn/examples/classification/plot_classifier_comparison.py

Oh no! 💥 💔 💥
1 file would be reformatted, 921 files would be left unchanged.

`ruff`

ruff detected issues. Please run ruff check --fix --output-format=full . locally, fix the remaining issues, and push the changes. Here you can see the detected issues. Note that the installed ruff version is ruff=0.5.1.


examples/classification/plot_classifier_comparison.py:24:79: W291 Trailing whitespace
   |
22 | This script compares the performance of multiple classification algorithms on
23 | synthetic datasets. It visualizes the decision boundaries of classifiers such as
24 | Logistic Regression, SVM, Decision Tree, Random Forest, and Gradient Boosting. 
   |                                                                               ^ W291
25 | Use this as a reference to understand how different classifiers perform on simple data.
26 | """
   |
   = help: Remove trailing whitespace

examples/classification/plot_classifier_comparison.py:31:1: I001 [*] Import block is un-sorted or un-formatted
   |
29 |   # SPDX-License-Identifier: BSD-3-Clause
30 |   
31 | / import matplotlib.pyplot as plt
32 | | import numpy as np
33 | | from matplotlib.colors import ListedColormap
34 | | 
35 | | from sklearn.datasets import make_circles, make_classification, make_moons
36 | | from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
37 | | from sklearn.ensemble import (
38 | |     AdaBoostClassifier,
39 | |     RandomForestClassifier,
40 | |     GradientBoostingClassifier,  # Gradient Boosting added
41 | | )
42 | | from sklearn.gaussian_process import GaussianProcessClassifier
43 | | from sklearn.gaussian_process.kernels import RBF
44 | | from sklearn.inspection import DecisionBoundaryDisplay
45 | | from sklearn.model_selection import train_test_split
46 | | from sklearn.naive_bayes import GaussianNB
47 | | from sklearn.neighbors import KNeighborsClassifier
48 | | from sklearn.neural_network import MLPClassifier
49 | | from sklearn.pipeline import make_pipeline
50 | | from sklearn.preprocessing import StandardScaler
51 | | from sklearn.svm import SVC
52 | | from sklearn.tree import DecisionTreeClassifier
53 | | 
54 | | # Classifier descriptions
   | |_^ I001
55 |   names = [
56 |       "Nearest Neighbors",
   |
   = help: Organize imports

examples/classification/plot_classifier_comparison.py:121:89: E501 Line too long (95 > 88)
    |
119 |         ax.set_title("Input data")
120 |     ax.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap=cm_bright, edgecolors="k")
121 |     ax.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap=cm_bright, alpha=0.6, edgecolors="k")
    |                                                                                         ^^^^^^^ E501
122 |     ax.set_xlim(x_min, x_max)
123 |     ax.set_ylim(y_min, y_max)
    |

examples/classification/plot_classifier_comparison.py:140:89: E501 Line too long (91 > 88)
    |
138 |         )
139 | 
140 |         ax.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap=cm_bright, edgecolors="k")
    |                                                                                         ^^^ E501
141 |         ax.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap=cm_bright, edgecolors="k", alpha=0.6)
    |

examples/classification/plot_classifier_comparison.py:141:89: E501 Line too long (99 > 88)
    |
140 |         ax.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap=cm_bright, edgecolors="k")
141 |         ax.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap=cm_bright, edgecolors="k", alpha=0.6)
    |                                                                                         ^^^^^^^^^^^ E501
142 | 
143 |         ax.set_xlim(x_min, x_max)
    |

Found 5 errors.
[*] 1 fixable with the `--fix` option (1 hidden fix can be enabled with the `--unsafe-fixes` option).

_{Generated for commit: 436c80f. Link to the linter CI: here}

adrinjalali

The example also needs to become more notebook style, like other modern examples we have. Eg.: plot_forest_hist_grad_boosting_comparison.py

You also need to enable your pre-commit hooks to fix your linting issues.

adrinjalali · 2025-03-18T10:28:20Z

examples/classification/plot_classifier_comparison.py

+from sklearn.ensemble import (
+    AdaBoostClassifier,
+    RandomForestClassifier,
+    GradientBoostingClassifier,  # Gradient Boosting added


I'd use HistGradientBoostingClassifier instead

adrinjalali · 2025-03-18T10:28:38Z

examples/classification/plot_classifier_comparison.py

+"""
+"""


Suggested change

"""

"""

marenwestermann · 2025-06-01T13:07:59Z

Hi @Wilbebs! Would you like to continue working on this pull request?

Wilbebs added 2 commits December 12, 2024 14:16

Improved comments and added descriptions to classifier comparison exa…

d3e059a

…mple

Added Gradient Boosting Classifier and enhanced classifier comparison…

436c80f

… example

adrinjalali reviewed Mar 18, 2025

View reviewed changes

StefanieSenger mentioned this pull request Mar 21, 2025

DOC add link to plot_semi_supervised_newsgroups.py example in semi_supervised.rst #30882

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Enhanced classifier comparison example with Gradient Boosting Classifier and improved comments #30476

Enhanced classifier comparison example with Gradient Boosting Classifier and improved comments #30476

Uh oh!

Wilbebs commented Dec 12, 2024

Uh oh!

github-actions bot commented Dec 12, 2024

Uh oh!

adrinjalali left a comment

Uh oh!

adrinjalali Mar 18, 2025

Uh oh!

adrinjalali Mar 18, 2025

Uh oh!

marenwestermann commented Jun 1, 2025

Uh oh!

Uh oh!

Uh oh!

Enhanced classifier comparison example with Gradient Boosting Classifier and improved comments #30476

Are you sure you want to change the base?

Enhanced classifier comparison example with Gradient Boosting Classifier and improved comments #30476

Uh oh!

Conversation

Wilbebs commented Dec 12, 2024

Uh oh!

github-actions bot commented Dec 12, 2024

❌ Linting issues

black

ruff

Uh oh!

adrinjalali left a comment

Choose a reason for hiding this comment

Uh oh!

adrinjalali Mar 18, 2025

Choose a reason for hiding this comment

Uh oh!

adrinjalali Mar 18, 2025

Choose a reason for hiding this comment

Uh oh!

marenwestermann commented Jun 1, 2025

Uh oh!

Uh oh!

`black`

`ruff`