Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Lack of consistency for decision_function methods in outlier detection #8693

Closed
@albertcthomas

Description

@albertcthomas

Description

I think we could improve the consistency of the decision_function of the outlier detection algorithms implemented in scikit-learn.

  • decision_function for OCSVM is such that if the value is positive then the sample is an inlier and if negative then it is an outlier. It takes into account the parameter nu which can be seen as a contamination parameter. The decision_function of IsolationForest does not take into account the contamination parameter, it just returns the score of the samples. For LOF, it is private _decision_function and does not take into account the contamination parameter. For EllipticEnveloppe, decision_function takes into account the contamination parameter and it is said in the documentation that it is meant to "ensure a compatibility with other outlier detection tools such as the One-Class SVM".

decision_function should maybe stick with the OCSVM convention and we could add a score_samples method, as for kernel density estimation, which would return the scores of the algorithms as defined in their original papers. This would be useful when performing benchmarks with ROC curves for instance. When I did a benchmark with sklearn anomaly detection algorithms I defined a subclass for each algorithm, each with a score method.

If you think this should be adressed I can submit a PR.

See also #8677.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions