Closed
Description
Description
I think we could improve the consistency of the decision_function
of the outlier detection algorithms implemented in scikit-learn.
decision_function
for OCSVM is such that if the value is positive then the sample is an inlier and if negative then it is an outlier. It takes into account the parameternu
which can be seen as a contamination parameter. Thedecision_function
of IsolationForest does not take into account thecontamination
parameter, it just returns the score of the samples. For LOF, it is private_decision_function
and does not take into account the contamination parameter. For EllipticEnveloppe,decision_function
takes into account the contamination parameter and it is said in the documentation that it is meant to "ensure a compatibility with other outlier detection tools such as the One-Class SVM".
decision_function
should maybe stick with the OCSVM convention and we could add a score_samples
method, as for kernel density estimation, which would return the scores of the algorithms as defined in their original papers. This would be useful when performing benchmarks with ROC curves for instance. When I did a benchmark with sklearn anomaly detection algorithms I defined a subclass for each algorithm, each with a score
method.
If you think this should be adressed I can submit a PR.
See also #8677.
Metadata
Metadata
Assignees
Labels
No labels