Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Enhanced classifier comparison example with Gradient Boosting Classifier and improved comments #30476

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Wilbebs
Copy link

@Wilbebs Wilbebs commented Dec 12, 2024

  • Added a summary at the top of the file for clarity.
  • Enhanced comments for classifiers, datasets, and visualizations.
  • Added the Gradient Boosting Classifier as a new feature to improve functionality.
  • Improved the script's clarity and made it more accessible for new contributors.

Copy link

❌ Linting issues

This PR is introducing linting issues. Here's a summary of the issues. Note that you can avoid having linting issues by enabling pre-commit hooks. Instructions to enable them can be found here.

You can see the details of the linting issues under the lint job here


black

black detected issues. Please run black . locally and push the changes. Here you can see the detected issues. Note that running black might also fix some of the issues which might be detected by ruff. Note that the installed black version is black=24.3.0.


--- /home/runner/work/scikit-learn/scikit-learn/examples/classification/plot_classifier_comparison.py	2024-12-12 19:45:14.925881+00:00
+++ /home/runner/work/scikit-learn/scikit-learn/examples/classification/plot_classifier_comparison.py	2024-12-12 19:45:23.184305+00:00
@@ -16,10 +16,11 @@
 The plots show training points in solid colors and testing points
 semi-transparent. The lower right shows the classification accuracy on the test
 set.
 
 """
+
 """
 This script compares the performance of multiple classification algorithms on
 synthetic datasets. It visualizes the decision boundaries of classifiers such as
 Logistic Regression, SVM, Decision Tree, Random Forest, and Gradient Boosting. 
 Use this as a reference to understand how different classifiers perform on simple data.
@@ -116,11 +117,13 @@
     cm_bright = ListedColormap(["#FF0000", "#0000FF"])
     ax = plt.subplot(len(datasets), len(classifiers) + 1, i)
     if ds_cnt == 0:
         ax.set_title("Input data")
     ax.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap=cm_bright, edgecolors="k")
-    ax.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap=cm_bright, alpha=0.6, edgecolors="k")
+    ax.scatter(
+        X_test[:, 0], X_test[:, 1], c=y_test, cmap=cm_bright, alpha=0.6, edgecolors="k"
+    )
     ax.set_xlim(x_min, x_max)
     ax.set_ylim(y_min, y_max)
     ax.set_xticks(())
     ax.set_yticks(())
     i += 1
@@ -135,12 +138,21 @@
 
         DecisionBoundaryDisplay.from_estimator(
             clf, X, cmap=cm, alpha=0.8, ax=ax, eps=0.5
         )
 
-        ax.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap=cm_bright, edgecolors="k")
-        ax.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap=cm_bright, edgecolors="k", alpha=0.6)
+        ax.scatter(
+            X_train[:, 0], X_train[:, 1], c=y_train, cmap=cm_bright, edgecolors="k"
+        )
+        ax.scatter(
+            X_test[:, 0],
+            X_test[:, 1],
+            c=y_test,
+            cmap=cm_bright,
+            edgecolors="k",
+            alpha=0.6,
+        )
 
         ax.set_xlim(x_min, x_max)
         ax.set_ylim(y_min, y_max)
         ax.set_xticks(())
         ax.set_yticks(())
would reformat /home/runner/work/scikit-learn/scikit-learn/examples/classification/plot_classifier_comparison.py

Oh no! 💥 💔 💥
1 file would be reformatted, 921 files would be left unchanged.

ruff

ruff detected issues. Please run ruff check --fix --output-format=full . locally, fix the remaining issues, and push the changes. Here you can see the detected issues. Note that the installed ruff version is ruff=0.5.1.


examples/classification/plot_classifier_comparison.py:24:79: W291 Trailing whitespace
   |
22 | This script compares the performance of multiple classification algorithms on
23 | synthetic datasets. It visualizes the decision boundaries of classifiers such as
24 | Logistic Regression, SVM, Decision Tree, Random Forest, and Gradient Boosting. 
   |                                                                               ^ W291
25 | Use this as a reference to understand how different classifiers perform on simple data.
26 | """
   |
   = help: Remove trailing whitespace

examples/classification/plot_classifier_comparison.py:31:1: I001 [*] Import block is un-sorted or un-formatted
   |
29 |   # SPDX-License-Identifier: BSD-3-Clause
30 |   
31 | / import matplotlib.pyplot as plt
32 | | import numpy as np
33 | | from matplotlib.colors import ListedColormap
34 | | 
35 | | from sklearn.datasets import make_circles, make_classification, make_moons
36 | | from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
37 | | from sklearn.ensemble import (
38 | |     AdaBoostClassifier,
39 | |     RandomForestClassifier,
40 | |     GradientBoostingClassifier,  # Gradient Boosting added
41 | | )
42 | | from sklearn.gaussian_process import GaussianProcessClassifier
43 | | from sklearn.gaussian_process.kernels import RBF
44 | | from sklearn.inspection import DecisionBoundaryDisplay
45 | | from sklearn.model_selection import train_test_split
46 | | from sklearn.naive_bayes import GaussianNB
47 | | from sklearn.neighbors import KNeighborsClassifier
48 | | from sklearn.neural_network import MLPClassifier
49 | | from sklearn.pipeline import make_pipeline
50 | | from sklearn.preprocessing import StandardScaler
51 | | from sklearn.svm import SVC
52 | | from sklearn.tree import DecisionTreeClassifier
53 | | 
54 | | # Classifier descriptions
   | |_^ I001
55 |   names = [
56 |       "Nearest Neighbors",
   |
   = help: Organize imports

examples/classification/plot_classifier_comparison.py:121:89: E501 Line too long (95 > 88)
    |
119 |         ax.set_title("Input data")
120 |     ax.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap=cm_bright, edgecolors="k")
121 |     ax.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap=cm_bright, alpha=0.6, edgecolors="k")
    |                                                                                         ^^^^^^^ E501
122 |     ax.set_xlim(x_min, x_max)
123 |     ax.set_ylim(y_min, y_max)
    |

examples/classification/plot_classifier_comparison.py:140:89: E501 Line too long (91 > 88)
    |
138 |         )
139 | 
140 |         ax.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap=cm_bright, edgecolors="k")
    |                                                                                         ^^^ E501
141 |         ax.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap=cm_bright, edgecolors="k", alpha=0.6)
    |

examples/classification/plot_classifier_comparison.py:141:89: E501 Line too long (99 > 88)
    |
140 |         ax.scatter(X_train[:, 0], X_train[:, 1], c=y_train, cmap=cm_bright, edgecolors="k")
141 |         ax.scatter(X_test[:, 0], X_test[:, 1], c=y_test, cmap=cm_bright, edgecolors="k", alpha=0.6)
    |                                                                                         ^^^^^^^^^^^ E501
142 | 
143 |         ax.set_xlim(x_min, x_max)
    |

Found 5 errors.
[*] 1 fixable with the `--fix` option (1 hidden fix can be enabled with the `--unsafe-fixes` option).

Generated for commit: 436c80f. Link to the linter CI: here

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The example also needs to become more notebook style, like other modern examples we have. Eg.: plot_forest_hist_grad_boosting_comparison.py

You also need to enable your pre-commit hooks to fix your linting issues.

from sklearn.ensemble import (
AdaBoostClassifier,
RandomForestClassifier,
GradientBoostingClassifier, # Gradient Boosting added
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use HistGradientBoostingClassifier instead

Comment on lines +20 to +21
"""
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""
"""

@marenwestermann
Copy link
Member

Hi @Wilbebs! Would you like to continue working on this pull request?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants