-
-
Notifications
You must be signed in to change notification settings - Fork 26k
Description
Describe the issue linked to the documentation
I have observed that the plotted feature receive the color of the expected label, i.e., train, test, and outlier.
scikit-learn/examples/ensemble/plot_isolation_forest.py
Lines 57 to 62 in 778b119
b1 = plt.scatter(X_train[:, 0], X_train[:, 1], c='white', | |
s=20, edgecolor='k') | |
b2 = plt.scatter(X_test[:, 0], X_test[:, 1], c='green', | |
s=20, edgecolor='k') | |
c = plt.scatter(X_outliers[:, 0], X_outliers[:, 1], c='red', | |
s=20, edgecolor='k') |
Instead, I would expect to plot the predicted labels, i.e., the color of the plotted features according to the prediction of the method. In the published example, the variables defined below are not used for the plot.
scikit-learn/examples/ensemble/plot_isolation_forest.py
Lines 45 to 47 in 778b119
y_pred_train = clf.predict(X_train) | |
y_pred_test = clf.predict(X_test) | |
y_pred_outliers = clf.predict(X_outliers) |
Suggest a potential alternative/fix
The expected plot would be something like this:
b1 = plt.scatter(X_train[:, 0], X_train[:, 1], c=['white' if x > 0 else 'red' for x in y_pred_train],
s=20, edgecolor='k')
b2 = plt.scatter(X_test[:, 0], X_test[:, 1], c=['green' if x > 0 else 'red' for x in y_pred_test],
s=20, edgecolor='k')
c = plt.scatter(X_outliers[:, 0], X_outliers[:, 1], c=['yellow' if x > 0 else 'red' for x in y_pred_outliers],
s=20, edgecolor='k')
And the result in this case would vary:
Link to the online example: https://scikit-learn.org/stable/auto_examples/ensemble/plot_isolation_forest.html