Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Misleading plot example for Isolation Forest method #16113

@pedrormjunior

Description

@pedrormjunior

Describe the issue linked to the documentation

I have observed that the plotted feature receive the color of the expected label, i.e., train, test, and outlier.

b1 = plt.scatter(X_train[:, 0], X_train[:, 1], c='white',
s=20, edgecolor='k')
b2 = plt.scatter(X_test[:, 0], X_test[:, 1], c='green',
s=20, edgecolor='k')
c = plt.scatter(X_outliers[:, 0], X_outliers[:, 1], c='red',
s=20, edgecolor='k')

Instead, I would expect to plot the predicted labels, i.e., the color of the plotted features according to the prediction of the method. In the published example, the variables defined below are not used for the plot.

y_pred_train = clf.predict(X_train)
y_pred_test = clf.predict(X_test)
y_pred_outliers = clf.predict(X_outliers)

Suggest a potential alternative/fix

The expected plot would be something like this:

b1 = plt.scatter(X_train[:, 0], X_train[:, 1], c=['white' if x > 0 else 'red' for x in y_pred_train],
                 s=20, edgecolor='k')
b2 = plt.scatter(X_test[:, 0], X_test[:, 1], c=['green' if x > 0 else 'red' for x in y_pred_test],
                 s=20, edgecolor='k')
c = plt.scatter(X_outliers[:, 0], X_outliers[:, 1], c=['yellow' if x > 0 else 'red' for x in y_pred_outliers],
                s=20, edgecolor='k')

And the result in this case would vary:
isolation_forest_doc_sklearn

Link to the online example: https://scikit-learn.org/stable/auto_examples/ensemble/plot_isolation_forest.html

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions