Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
19 views4 pages

Kohl PerformanceMeasures2012

The article provides an overview of performance measures used in binary classification, focusing on metrics such as sensitivity, specificity, positive and negative predictive values, and likelihood ratios. It emphasizes the importance of these measures in evaluating diagnostic tests and introduces the ROC curve and AUC as tools for assessing test performance. The document highlights that understanding these metrics is crucial for their application in medical diagnostics.

Uploaded by

A.F
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views4 pages

Kohl PerformanceMeasures2012

The article provides an overview of performance measures used in binary classification, focusing on metrics such as sensitivity, specificity, positive and negative predictive values, and likelihood ratios. It emphasizes the importance of these measures in evaluating diagnostic tests and introduces the ROC curve and AUC as tools for assessing test performance. The document highlights that understanding these metrics is crucial for their application in medical diagnostics.

Uploaded by

A.F
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/273136254

Performance Measures in Binary Classification

Article in International Journal of Statistics in Medical Research · October 2012


DOI: 10.6000/1929-6029.2012.01.01.08

CITATIONS READS
47 2,203

1 author:

Matthias Kohl
Furtwangen University
341 PUBLICATIONS 7,192 CITATIONS

SEE PROFILE

All content following this page was uploaded by Matthias Kohl on 10 March 2015.

The user has requested enhancement of the downloaded file.


International Journal of Statistics in Medical Research, 2012, 1, 79-81 79

Performance Measures in Binary Classification

Matthias Kohl*

Department of Mechanical and Process Engineering, Furtwangen University, Jakob-Kienzle-Str. 17, D-78054
VS-Schwenningen, Germany
Abstract: We give a brief overview over common performance measures for binary classification. We cover sensitivity,
specificity, positive and negative predictive value, positive and negative likelihood ratio as well as ROC curve and AUC.

Keywords: Sensitivity, specificity, positive predictive value, negative predictive value, positive likelihood ratio,
negative likelihood ratio, prevalence, ROC curve, AUC, informative diagnostic test.

PERFORMANCE MEASURES Table 1: Confusion Matrix for Binary Classification

In many cases diagnostic tests are performed to Test result


distinguish between two groups reflecting presence or 0 1
absence of a relevant medical condition. In this setup
True 0 True negative (TN) False positive (FP)
let us assume a group of N patients with true status y1,
situation
..., yN where yi = 1 represents presence and yi = 0 1 False negative (FN) True positive (TP)
absence of the medical condition. A diagnostic test T
yields results t1, ..., tN where ti = 1 represents a positive In addition, it is the basis for the definition of various
and ti = 0 a negative test. performance measures. The percentage of correct
positive tests for patients having the medical condition
The simplest approach to measure the performance is called sensitivity (Se), whereas the percentage of
of test T is to use the probability of misclassification correct negative tests for patients not having the
(PMC) medical condition is called specificity (Sp).
cardinality of {i = 1,…, N yi  ti } TP TN
PMC = Se = SP =
N TP + FN TN + FP

respectively, the accuracy (ACC) = 1 - PMC The accuracy can also be expressed as a weighted
sum of sensitivity and specificity
However, such a single performance measure may
be misleading, as there are two possibilities for a ACC = Pr * Se + (1  Pr) * SP
correct respectively, wrong decision of the diagnostic
test that are the correct respectively, wrong prediction The positive predictive value (PPV) is the probability
of the presence or absence of the medical condition [1]. that a patient with a positive test has the medical
Thus, a pair of criteria should be used to obtain an condition and the negative predictive value (NPV) is the
exact description of the performance. In general, the probability that a patient with a negative test does not
results of a test can be summarized by the so called have the medical condition.
confusion matrix.
TP TN
PPV = NPV =
The confusion matrix whose structure is presented TP + FP TN + FN
in Table 1 includes the information on the prevalence
(Pr) for the considered group. The positive likelihood ratio (PLR) tells how likely
patients with the medical condition are to have a
TP + FN positive test compared to patients without the medical
Pr =
TP + FN + TN + FP condition. The negative likelihood ratio (NLR) tells how
likely patients with the medical condition are to have a
negative result compared to patients without the
medical condition.
*Address corresponding to this author at the Department of Mechanical and
Process Engineering, Furtwangen University, Jakob-Kienzle-Str. 17, D-78054
VS-Schwenningen, Germany; Tel: +49 (0) 7720 307-4746; Fax: +49 (0) 7720
307-4727; E-mail: [email protected]

E-ISSN: 1929-6029/12 © 2012 Lifescience Global


80 International Journal of Statistics in Medical Research, 2012 Vol. 1, No. 1 Matthias Kohl

Se 1  Se region A, then it has a higher PLR and a lower NLR


PLR = NLR =
1  Sp Sp which means that it is better. If the second test has a
sensitivity and specificity lying in region B respectively,
During the development of a diagnostic test it is region C, then PLR is smaller and NLR is smaller
standard to use sensitivity and specificity for assessing respectively, PLR is larger and NLR is larger. Thus, it is
the performance of the test. However, if there are two not clear and it depends on the actual situation which
or more tests which have to be compared PLR and classifier performs better. Finally, if the second
NLR should be chosen [2, 3]. classifier’s sensitivity and specificity lie in region D, it
has a lower PLR and a higher NLR, i.e. it is performing
It is important to note that both pairs of performance worse [2].
measures do not depend on the prevalence of the
selected group which may be different from the
intended-use population, whereas PPV and NPV
depend on prevalence.

Pr  Se
PPV =
Pr  Se + (1  Pr)  (1  Sp)
(1  Pr)  Sp
NPV =
(1  Pr)  Sp + Pr  (1  Se)

The information provided by PPV and NPV is of


great importance for physicians and patients [4]. In
real-world applications where prevalence is often below
10% the diagnostic test must aim at substantially high
values for sensitivity and specificity in order to be of
utility otherwise PPV and NPV will be unacceptably
low.

ROC CURVE
Figure 1: ROC curve for a diagnostic test.
Let us assume a diagnostic test T giving not only 0
and 1 but a whole range of values where large values ROC curves are often summarized by the area
of T are more likely for patients having the medical under the ROC curve (AUC) where an AUC of 0.5
condition and small values of T are more likely for means that the diagnostic test is not better than chance
patients not having the medical condition. Hence, for in predicting the categories, whereas values larger than
the final diagnostic test, which should only return 0 or 0.5 indicate a result better than chance. If the AUC is
1, we have to select a threshold to distinguish between smaller than 0.5 the labels of the categories are
the two categories. In this setup sensitivity and misplaced and should be switched leading to an AUC
specificity are the most frequently used performance greater than 0.5. Having an AUC larger than 0.5 the
measures and are displayed by so-called receiver diagnostic test is informative; equivalent
operating characteristic (ROC) curves. For each characterizations in terms of the introduced
threshold for the values of T one obtains a sensitivity performance measures are: Se + Sp > 1, PPV + NPV >
and specificity value. Plotting these values leads to the 1 , PPV > Pr, NPV < 1 – Pr, PLR > 1, or NLR < 1.
ROC-curve of the diagnostic test T.
ACKNOWLEDGEMENT
In Figure 1 we have selected a threshold leading to
a sensitivity and a specificity of 0.75. Decreasing the We would like to thank two anonymous referees for
threshold will increase the sensitivity and decrease the valuable comments on the manuscript.
specificity, whereas increasing the threshold will
APPENDIX OF SYMBOLS
decrease the sensitivity and increase the specificity.
The line Se = 1 – Sp reflects a diagnostic test which is PMC = probability of misclassification
not informative, i.e. not better than chance. If there is a
second test with a sensitivity and specificity lying in ACC = accuracy
Performance Measures in Binary Classification International Journal of Statistics in Medical Research, 2012 Vol. 1, No. 1 81

TN = true negative ROC = receiver operating characteristic

FN = false negative AUC = area under the ROC curve

TP = true positive REFERENCES

TN = true negative [1] Sokolova M and Lapalme G. A systematic analysis of


performance measures for classification tasks. Inf Process
Manag 2009; 45: 427-37.
Pr = prevalence http://dx.doi.org/10.1016/j.ipm.2009.03.002
[2] Biggerstaff BJ. Comparing diagnostic tests: a simple graphic
Se = sensitivity using likelihood ratios. Statist Med 2000; 19: 649-63.
http://dx.doi.org/10.1002/(SICI)1097-
Sp = specificity 0258(20000315)19:5<649::AID-SIM371>3.0.CO;2-H
[3] Deeks JJ, Altman DG. Diagnostic tests 4: likelihood ratios.
BMJ 2004; 329(7458): 168-69.
PPV = positive predictive value http://dx.doi.org/10.1136/bmj.329.7458.168
[4] Altman DG, Bland JM. Diagnostic tests 2: predictive values.
NPV = negative predictive value BMJ 1994; 309(6947): 102.
http://dx.doi.org/10.1136/bmj.309.6947.102
PLR = positive likelihood ratio

NLR = negative likelihood ratio

Received on 15-09-2012 Accepted on 01-10-2012 Published on 08-10-2012

http://dx.doi.org/10.6000/1929-6029.2012.01.01.08

View publication stats

You might also like