4
4
Novelty and Outlier Detection
5
5
===================================================
6
6
7
+ .. currentmodule :: sklearn
8
+
7
9
Many applications require being able to decide whether a new observation
8
10
belongs to the same distribution as exiting observations (it is an
9
11
`inlier `), or should be considered as different (it is an outlier).
@@ -52,19 +54,20 @@ observations. Otherwise, if they lay outside the frontier, we can say
52
54
that they are abnormal with a given confidence in our assessment.
53
55
54
56
The One-Class SVM has been introduced in [1] for that purpose and
55
- implemented in the ` sklearn. svm ` package in the :class: ` OneClassSVM `
56
- object. It requires the choice of a kernel and a scalar parameter to
57
- define a frontier. The RBF kernel is usually chosen although there
58
- exist no exact formula or algorithm to set its bandwith
59
- parameter. This is the default in the scikit-learn implementation. The
60
- :math: `\nu ` parameter, also known as the margin of the One-Class SVM,
61
- corresponds to the probability of finding a new, but regular ,
62
- observation outside the frontier.
57
+ implemented in the :ref: ` svm ` module in the
58
+ :class: ` svm.OneClassSVM ` object. It requires the choice of a
59
+ kernel and a scalar parameter to define a frontier. The RBF kernel is
60
+ usually chosen although there exist no exact formula or algorithm to
61
+ set its bandwith parameter. This is the default in the scikit-learn
62
+ implementation. The :math: `\nu ` parameter, also known as the margin of
63
+ the One-Class SVM, corresponds to the probability of finding a new,
64
+ but regular, observation outside the frontier.
63
65
64
66
.. topic :: Examples:
65
67
66
- * See :ref: `example_svm_plot_oneclass.py ` for vizualizing the frontier
67
- learned around some data by a :class: `OneClassSVM ` object.
68
+ * See :ref: `example_svm_plot_oneclass.py ` for vizualizing the
69
+ frontier learned around some data by a
70
+ :class: `svm.OneClassSVM ` object.
68
71
69
72
.. figure :: ../auto_examples/svm/images/plot_oneclass_1.png
70
73
:target: ../auto_examples/svm/plot_oneclasse.html
@@ -84,17 +87,17 @@ of regular observations that can be used to train any tool.
84
87
Fitting an elliptic envelop
85
88
-----------------------------
86
89
87
- .. currentmodule :: sklearn.covariance
88
90
89
91
One common way of performing outlier detection is to assume that the
90
92
regular data come from a known distribution (e.g. data are Gaussian
91
93
distributed). From this assumption, we generaly try to define the
92
94
"shape" of the data, and can define outlying observations as
93
95
observations which stand far enough from the fit shape.
94
96
95
- The scikit-learn provides an object :class: `EllipticEnvelop ` that fits a
96
- robust covariance estimate to the data, and thus fits an ellipse to the
97
- central data points, ignoring points outside the central mode.
97
+ The scikit-learn provides an object
98
+ :class: `covariance.EllipticEnvelop ` that fits a robust covariance
99
+ estimate to the data, and thus fits an ellipse to the central data
100
+ points, ignoring points outside the central mode.
98
101
99
102
For instance, assuming that the inlier data are Gaussian distributed, it
100
103
will estimate the inlier location and covariance in a robust way (i.e.
@@ -111,9 +114,9 @@ This strategy is illustrated below.
111
114
112
115
* See :ref: `example_covariance_plot_mahalanobis_distances.py ` for
113
116
an illustration of the difference between using a standard
114
- (:class: `EmpiricalCovariance `) or a robust estimate ( :class: ` MinCovDet `)
115
- of location and covariance to assess the degree of outlyingness of an
116
- observation.
117
+ (:class: `covariance. EmpiricalCovariance `) or a robust estimate
118
+ ( :class: ` covariance.MinCovDet `) of location and covariance to
119
+ assess the degree of outlyingness of an observation.
117
120
118
121
One-class SVM versus elliptic envelop
119
122
--------------------------------------
@@ -126,8 +129,9 @@ inlying data is very challenging, and a One-class SVM gives useful
126
129
results in these situations.
127
130
128
131
The examples below illustrate how the performance of the
129
- :class: `EllipticEnvelop ` degrades as the data is less and less unimodal.
130
- :class: `OneClassSVM ` works better on data with multiple modes.
132
+ :class: `coavariance.EllipticEnvelop ` degrades as the data is less and
133
+ less unimodal. :class: `svm.OneClassSVM ` works better on data with
134
+ multiple modes.
131
135
132
136
.. |outlier1 | image :: ../auto_examples/covariance/images/plot_outlier_detection_1.png
133
137
:target: ../auto_examples/covariance/plot_outlier_detection.html
@@ -146,32 +150,35 @@ The examples below illustrate how the performance of the
146
150
147
151
*
148
152
- For a inlier mode well-centered and elliptic, the
149
- :class: `OneClassSVM ` is not able to benefit from the rotational
150
- symmetry of the inlier population. In addition, it fits a bit the
151
- outlyers present in the training set. On the opposite, the
152
- decision rule based on fitting an :class: `EllipticEnvelop `
153
- learns an ellipse, which fits well the inlier distribution.
153
+ :class: `svm.OneClassSVM ` is not able to benefit from the
154
+ rotational symmetry of the inlier population. In addition, it
155
+ fits a bit the outlyers present in the training set. On the
156
+ opposite, the decision rule based on fitting an
157
+ :class: `covariance.EllipticEnvelop ` learns an ellipse, which
158
+ fits well the inlier distribution.
154
159
- |outlier1 |
155
160
156
161
*
157
- - As the inlier distribution becomes bimodal, the
158
- :class: `EllipticEnvelop ` does not fit well the inliers. However,
159
- we can see that the :class: `OneClassSVM ` tends to overfit:
160
- because it has not model of inliers, it interprets a region
161
- where, by chance some outliers are clustered, as inliers.
162
+ - As the inlier distribution becomes bimodal, the
163
+ :class: `covariance.EllipticEnvelop ` does not fit well the
164
+ inliers. However, we can see that the :class: `svm.OneClassSVM `
165
+ tends to overfit: because it has not model of inliers, it
166
+ interprets a region where, by chance some outliers are
167
+ clustered, as inliers.
162
168
- |outlier2 |
163
169
164
170
*
165
171
- If the inlier distribution is strongly non Gaussian, the
166
- :class: `OneClassSVM ` is able to recover a reasonable
167
- approximation, whereas the :class: `EllipticEnvelop ` completely
168
- fails.
172
+ :class: `svm. OneClassSVM ` is able to recover a reasonable
173
+ approximation, whereas the :class: `covariance. EllipticEnvelop `
174
+ completely fails.
169
175
- |outlier3 |
170
176
171
177
.. topic :: Examples:
172
178
173
- * See :ref: `example_covariance_plot_outlier_detection.py ` for a comparison
174
- of the :class: `OneClassSVM ` (tuned to perform like an outlier detection
175
- method) and a covariance-based outlier detection with :class: `MinCovDet `.
179
+ * See :ref: `example_covariance_plot_outlier_detection.py ` for a
180
+ comparison of the :class: `svm.OneClassSVM ` (tuned to perform like
181
+ an outlier detection method) and a covariance-based outlier
182
+ detection with :class: `covariance.MinCovDet `.
176
183
177
184
0 commit comments