0% found this document useful (0 votes)

19 views4 pages

Kohl PerformanceMeasures2012

The article provides an overview of performance measures used in binary classification, focusing on metrics such as sensitivity, specificity, positive and negative predictive values, and likelihood ratios. It emphasizes the importance of these measures in evaluating diagnostic tests and introduces the ROC curve and AUC as tools for assessing test performance. The document highlights that understanding these metrics is crucial for their application in medical diagnostics.

Uploaded by

A.F

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views4 pages

Kohl PerformanceMeasures2012

Uploaded by

A.F

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/273136254

Performance Measures in Binary Classiﬁcation

Article in International Journal of Statistics in Medical Research · October 2012

DOI: 10.6000/1929-6029.2012.01.01.08

CITATIONS READS
47 2,203

1 author:

Matthias Kohl
Furtwangen University
341 PUBLICATIONS 7,192 CITATIONS

SEE PROFILE

All content following this page was uploaded by Matthias Kohl on 10 March 2015.

The user has requested enhancement of the downloaded file.

International Journal of Statistics in Medical Research, 2012, 1, 79-81 79

Performance Measures in Binary Classification

Matthias Kohl*

Department of Mechanical and Process Engineering, Furtwangen University, Jakob-Kienzle-Str. 17, D-78054
VS-Schwenningen, Germany
Abstract: We give a brief overview over common performance measures for binary classification. We cover sensitivity,
specificity, positive and negative predictive value, positive and negative likelihood ratio as well as ROC curve and AUC.

Keywords: Sensitivity, specificity, positive predictive value, negative predictive value, positive likelihood ratio,
negative likelihood ratio, prevalence, ROC curve, AUC, informative diagnostic test.

PERFORMANCE MEASURES Table 1: Confusion Matrix for Binary Classification

In many cases diagnostic tests are performed to Test result

distinguish between two groups reflecting presence or 0 1
absence of a relevant medical condition. In this setup
True 0 True negative (TN) False positive (FP)
let us assume a group of N patients with true status y1,
situation
..., yN where yi = 1 represents presence and yi = 0 1 False negative (FN) True positive (TP)
absence of the medical condition. A diagnostic test T
yields results t1, ..., tN where ti = 1 represents a positive In addition, it is the basis for the definition of various
and ti = 0 a negative test. performance measures. The percentage of correct
positive tests for patients having the medical condition
The simplest approach to measure the performance is called sensitivity (Se), whereas the percentage of
of test T is to use the probability of misclassification correct negative tests for patients not having the
(PMC) medical condition is called specificity (Sp).
cardinality of {i = 1,…, N yi ti } TP TN
PMC = Se = SP =
N TP + FN TN + FP

respectively, the accuracy (ACC) = 1 - PMC The accuracy can also be expressed as a weighted
sum of sensitivity and specificity
However, such a single performance measure may
be misleading, as there are two possibilities for a ACC = Pr * Se + (1 Pr) * SP
correct respectively, wrong decision of the diagnostic
test that are the correct respectively, wrong prediction The positive predictive value (PPV) is the probability
of the presence or absence of the medical condition [1]. that a patient with a positive test has the medical
Thus, a pair of criteria should be used to obtain an condition and the negative predictive value (NPV) is the
exact description of the performance. In general, the probability that a patient with a negative test does not
results of a test can be summarized by the so called have the medical condition.
confusion matrix.
TP TN
PPV = NPV =
The confusion matrix whose structure is presented TP + FP TN + FN
in Table 1 includes the information on the prevalence
(Pr) for the considered group. The positive likelihood ratio (PLR) tells how likely
patients with the medical condition are to have a
TP + FN positive test compared to patients without the medical
Pr =
TP + FN + TN + FP condition. The negative likelihood ratio (NLR) tells how
likely patients with the medical condition are to have a
negative result compared to patients without the
medical condition.
*Address corresponding to this author at the Department of Mechanical and
Process Engineering, Furtwangen University, Jakob-Kienzle-Str. 17, D-78054
VS-Schwenningen, Germany; Tel: +49 (0) 7720 307-4746; Fax: +49 (0) 7720
307-4727; E-mail: [email protected]

80 International Journal of Statistics in Medical Research, 2012 Vol. 1, No. 1 Matthias Kohl

Se 1 Se region A, then it has a higher PLR and a lower NLR

PLR = NLR =
1 Sp Sp which means that it is better. If the second test has a
sensitivity and specificity lying in region B respectively,
During the development of a diagnostic test it is region C, then PLR is smaller and NLR is smaller
standard to use sensitivity and specificity for assessing respectively, PLR is larger and NLR is larger. Thus, it is
the performance of the test. However, if there are two not clear and it depends on the actual situation which
or more tests which have to be compared PLR and classifier performs better. Finally, if the second
NLR should be chosen [2, 3]. classifier’s sensitivity and specificity lie in region D, it
has a lower PLR and a higher NLR, i.e. it is performing
It is important to note that both pairs of performance worse [2].
measures do not depend on the prevalence of the
selected group which may be different from the
intended-use population, whereas PPV and NPV
depend on prevalence.

Pr Se
PPV =
Pr Se + (1 Pr) (1 Sp)
(1 Pr) Sp
NPV =
(1 Pr) Sp + Pr (1 Se)

The information provided by PPV and NPV is of

great importance for physicians and patients [4]. In
real-world applications where prevalence is often below
10% the diagnostic test must aim at substantially high
values for sensitivity and specificity in order to be of
utility otherwise PPV and NPV will be unacceptably
low.

ROC CURVE
Figure 1: ROC curve for a diagnostic test.
Let us assume a diagnostic test T giving not only 0
and 1 but a whole range of values where large values ROC curves are often summarized by the area
of T are more likely for patients having the medical under the ROC curve (AUC) where an AUC of 0.5
condition and small values of T are more likely for means that the diagnostic test is not better than chance
patients not having the medical condition. Hence, for in predicting the categories, whereas values larger than
the final diagnostic test, which should only return 0 or 0.5 indicate a result better than chance. If the AUC is
1, we have to select a threshold to distinguish between smaller than 0.5 the labels of the categories are
the two categories. In this setup sensitivity and misplaced and should be switched leading to an AUC
specificity are the most frequently used performance greater than 0.5. Having an AUC larger than 0.5 the
measures and are displayed by so-called receiver diagnostic test is informative; equivalent
operating characteristic (ROC) curves. For each characterizations in terms of the introduced
threshold for the values of T one obtains a sensitivity performance measures are: Se + Sp > 1, PPV + NPV >
and specificity value. Plotting these values leads to the 1 , PPV > Pr, NPV < 1 – Pr, PLR > 1, or NLR < 1.
ROC-curve of the diagnostic test T.
ACKNOWLEDGEMENT
In Figure 1 we have selected a threshold leading to
a sensitivity and a specificity of 0.75. Decreasing the We would like to thank two anonymous referees for
threshold will increase the sensitivity and decrease the valuable comments on the manuscript.
specificity, whereas increasing the threshold will
APPENDIX OF SYMBOLS
decrease the sensitivity and increase the specificity.
The line Se = 1 – Sp reflects a diagnostic test which is PMC = probability of misclassification
not informative, i.e. not better than chance. If there is a
second test with a sensitivity and specificity lying in ACC = accuracy
Performance Measures in Binary Classification International Journal of Statistics in Medical Research, 2012 Vol. 1, No. 1 81

TN = true negative ROC = receiver operating characteristic

FN = false negative AUC = area under the ROC curve

TP = true positive REFERENCES

TN = true negative [1] Sokolova M and Lapalme G. A systematic analysis of

performance measures for classification tasks. Inf Process
Manag 2009; 45: 427-37.
Pr = prevalence http://dx.doi.org/10.1016/j.ipm.2009.03.002
[2] Biggerstaff BJ. Comparing diagnostic tests: a simple graphic
Se = sensitivity using likelihood ratios. Statist Med 2000; 19: 649-63.
http://dx.doi.org/10.1002/(SICI)1097-
Sp = specificity 0258(20000315)19:5<649::AID-SIM371>3.0.CO;2-H
[3] Deeks JJ, Altman DG. Diagnostic tests 4: likelihood ratios.
BMJ 2004; 329(7458): 168-69.
PPV = positive predictive value http://dx.doi.org/10.1136/bmj.329.7458.168
[4] Altman DG, Bland JM. Diagnostic tests 2: predictive values.
NPV = negative predictive value BMJ 1994; 309(6947): 102.
http://dx.doi.org/10.1136/bmj.309.6947.102
PLR = positive likelihood ratio

NLR = negative likelihood ratio

Received on 15-09-2012 Accepted on 01-10-2012 Published on 08-10-2012

http://dx.doi.org/10.6000/1929-6029.2012.01.01.08

View publication stats

The Wessex Head Injury Matrix
No ratings yet
The Wessex Head Injury Matrix
7 pages
Module 16 Basic Integrity and Authenticity
No ratings yet
Module 16 Basic Integrity and Authenticity
44 pages
Measures of Diagnostic Accuracy: Basic Definitions: Ana-Maria Šimundić
No ratings yet
Measures of Diagnostic Accuracy: Basic Definitions: Ana-Maria Šimundić
9 pages
FA 2022 Small Size Export
No ratings yet
FA 2022 Small Size Export
2 pages
ROC Curve for Medical Research
No ratings yet
ROC Curve for Medical Research
16 pages
SPSE
No ratings yet
SPSE
1 page
Bayes' Theorem in Medical Testing
No ratings yet
Bayes' Theorem in Medical Testing
10 pages
Guide To Interprate ROC Analysis
No ratings yet
Guide To Interprate ROC Analysis
4 pages
Statistics Review 13 Receiver Operating Characteristic Curves
No ratings yet
Statistics Review 13 Receiver Operating Characteristic Curves
5 pages
Diagnostic Accuracy Measures: Methodological Notes
No ratings yet
Diagnostic Accuracy Measures: Methodological Notes
6 pages
Test Accuracy: Sensitivity & Specificity
No ratings yet
Test Accuracy: Sensitivity & Specificity
26 pages
CAT Summary
No ratings yet
CAT Summary
3 pages
Diagnostic Test Performance Guide
No ratings yet
Diagnostic Test Performance Guide
32 pages
Diagnostic Tests
No ratings yet
Diagnostic Tests
5 pages
Lecture 7
No ratings yet
Lecture 7
26 pages
PHPS30020 Clinical Epidemiology Prof Fitzpatrick 2023 Posted
No ratings yet
PHPS30020 Clinical Epidemiology Prof Fitzpatrick 2023 Posted
38 pages
Metlit-05 Metode Penelitian Diagnosis
No ratings yet
Metlit-05 Metode Penelitian Diagnosis
45 pages
11 Estudos Testes Diagnostico Bmj.e3999.full
No ratings yet
11 Estudos Testes Diagnostico Bmj.e3999.full
7 pages
1603 - EvaluatingDiagnosis - PDF Version 1
No ratings yet
1603 - EvaluatingDiagnosis - PDF Version 1
5 pages
Multivariate Statistics - Tutorial 4 Sensitivity, Specificity, ROC and Validation
No ratings yet
Multivariate Statistics - Tutorial 4 Sensitivity, Specificity, ROC and Validation
19 pages
Prevalence Threshold and Bounds in The Accuracy of Binary Classification Systems
No ratings yet
Prevalence Threshold and Bounds in The Accuracy of Binary Classification Systems
15 pages
Clinical Tests Sensitivity and Specificity
No ratings yet
Clinical Tests Sensitivity and Specificity
3 pages
Clinicians' Guide to Test Accuracy
No ratings yet
Clinicians' Guide to Test Accuracy
3 pages
Habibzadeh 2016
No ratings yet
Habibzadeh 2016
11 pages
Likelihood Ratio PDF
No ratings yet
Likelihood Ratio PDF
5 pages
EBM Diagnosis Slide
0% (1)
EBM Diagnosis Slide
28 pages
Interpreting Diagnostic Tests: Ian Mcdowell Department of Epidemiology & Community Medicine January 2010
No ratings yet
Interpreting Diagnostic Tests: Ian Mcdowell Department of Epidemiology & Community Medicine January 2010
30 pages
Chemical Pathology Workshop II - Diagnostic Theory in Chemical Pathology (2017.11.14)
100% (1)
Chemical Pathology Workshop II - Diagnostic Theory in Chemical Pathology (2017.11.14)
57 pages
Nciph ERIC16
No ratings yet
Nciph ERIC16
4 pages
Validity of Diagnostic Tests 2015
No ratings yet
Validity of Diagnostic Tests 2015
26 pages
2006-Article-10 1556-2006 2024 30000
No ratings yet
2006-Article-10 1556-2006 2024 30000
3 pages
Confusion Matrix
No ratings yet
Confusion Matrix
24 pages
Jurnal Sensitivitas & Spesifisitas
No ratings yet
Jurnal Sensitivitas & Spesifisitas
9 pages
Sensitivity, Specificity, Accuracy, Associated Confidence Interval and ROC Analysis With Practical SAS Implementations
No ratings yet
Sensitivity, Specificity, Accuracy, Associated Confidence Interval and ROC Analysis With Practical SAS Implementations
9 pages
Conditional Probability and Medical Tests
No ratings yet
Conditional Probability and Medical Tests
20 pages
Testing and Screening For Disease
No ratings yet
Testing and Screening For Disease
31 pages
Roc Curve
No ratings yet
Roc Curve
43 pages
5.1 Naive Bayes Experimental Result
No ratings yet
5.1 Naive Bayes Experimental Result
24 pages
Diagnostic Test: Magdalena Sidhartani
No ratings yet
Diagnostic Test: Magdalena Sidhartani
17 pages
Diagnostic Study May 2025 PPT Dr. A. Precilla Catherine
No ratings yet
Diagnostic Study May 2025 PPT Dr. A. Precilla Catherine
18 pages
Diagnostic Studies Annette Pluddemann
No ratings yet
Diagnostic Studies Annette Pluddemann
58 pages
Sensitivity Vs Specificity
No ratings yet
Sensitivity Vs Specificity
16 pages
Evaluating Diagnostic Tests: Payam Kabiri, Md. Phd. Clinical Epidemiologist Tehran University of Medical Sciences
No ratings yet
Evaluating Diagnostic Tests: Payam Kabiri, Md. Phd. Clinical Epidemiologist Tehran University of Medical Sciences
96 pages
MedCalc's Diagnostic Test Evaluation Calculator
No ratings yet
MedCalc's Diagnostic Test Evaluation Calculator
1 page
Health Research
No ratings yet
Health Research
26 pages
83 ROCCurves
No ratings yet
83 ROCCurves
9 pages
Machine Learning Project Report (Group 3) Shahbaz Khan
No ratings yet
Machine Learning Project Report (Group 3) Shahbaz Khan
11 pages
Stats Step 3
No ratings yet
Stats Step 3
9 pages
Critical Appraisal UFH EM IFS
No ratings yet
Critical Appraisal UFH EM IFS
69 pages
1-3, Diagnostic Accuracy of Laboratory Tests
No ratings yet
1-3, Diagnostic Accuracy of Laboratory Tests
46 pages
Understanding ROC Curves in Diagnostics
No ratings yet
Understanding ROC Curves in Diagnostics
3 pages
Evolution of Disgnostic Testing
No ratings yet
Evolution of Disgnostic Testing
41 pages
Final Validation of Screening Tests
No ratings yet
Final Validation of Screening Tests
69 pages
Medical Test Accuracy Explained
100% (1)
Medical Test Accuracy Explained
4 pages
Module 5 ML
No ratings yet
Module 5 ML
12 pages
Screening 2025
No ratings yet
Screening 2025
24 pages
Part 5 PDF
No ratings yet
Part 5 PDF
4 pages
A Software Tool For Calculating The Uncertainty of
No ratings yet
A Software Tool For Calculating The Uncertainty of
50 pages
Epid 600 Class 11 Screening
100% (1)
Epid 600 Class 11 Screening
65 pages
Predictive Values, Sensitivity and Specificity in Clinical Virology
No ratings yet
Predictive Values, Sensitivity and Specificity in Clinical Virology
26 pages
Sample Size in Diagnostic Accuracy Studies
No ratings yet
Sample Size in Diagnostic Accuracy Studies
9 pages
Untitled Document
No ratings yet
Untitled Document
1 page
Malware Classification Using Deep Neural Networks: Performance Evaluation and Applications in Edge Devices
No ratings yet
Malware Classification Using Deep Neural Networks: Performance Evaluation and Applications in Edge Devices
6 pages
CSR 2023 Defense
No ratings yet
CSR 2023 Defense
24 pages
Threats To Federated Learning A Survey FLPI
No ratings yet
Threats To Federated Learning A Survey FLPI
15 pages
131 574 1 PB
No ratings yet
131 574 1 PB
12 pages
Blockchain Energy Trading in Smart Grids
100% (1)
Blockchain Energy Trading in Smart Grids
25 pages
1 s2.0 S1877050918317733 Main
No ratings yet
1 s2.0 S1877050918317733 Main
8 pages
1212-Article Text-5792-2-10-20230821
No ratings yet
1212-Article Text-5792-2-10-20230821
7 pages
Supply Chain Management Project
No ratings yet
Supply Chain Management Project
3 pages
Result Sem-6 UG 2025 Sixth 9 Subjects-1
No ratings yet
Result Sem-6 UG 2025 Sixth 9 Subjects-1
22 pages
Classical ALV Reporting - Overview of ALV
No ratings yet
Classical ALV Reporting - Overview of ALV
54 pages
Emotional Intelligence Brochure PLI
100% (1)
Emotional Intelligence Brochure PLI
2 pages
Online Credit Risk Analytics and Modeling
0% (2)
Online Credit Risk Analytics and Modeling
7 pages
RJ3
100% (4)
RJ3
1 page
ST62T00CM6 TR
No ratings yet
ST62T00CM6 TR
100 pages
2 JHA On Shot Grit Blasting1
No ratings yet
2 JHA On Shot Grit Blasting1
3 pages
Effectiveness of Structured Teaching Programme On Knowledge Regarding Acid Peptic Disease and Its Prevention Among The Industrial Workers
No ratings yet
Effectiveness of Structured Teaching Programme On Knowledge Regarding Acid Peptic Disease and Its Prevention Among The Industrial Workers
6 pages
Lecture 1 - Introduction To ML
No ratings yet
Lecture 1 - Introduction To ML
25 pages
Prime and Composite Numbers PDF
No ratings yet
Prime and Composite Numbers PDF
6 pages
1-6 Practice
No ratings yet
1-6 Practice
2 pages
HFSS-High Frequency Structure Simulator
No ratings yet
HFSS-High Frequency Structure Simulator
38 pages
DataTables Export Guide
No ratings yet
DataTables Export Guide
2 pages
Geometric Series
No ratings yet
Geometric Series
16 pages
Time Series Analysis and Forecasting of Gold Price Using ARIMA and LSTM Model
No ratings yet
Time Series Analysis and Forecasting of Gold Price Using ARIMA and LSTM Model
8 pages
KUGWETSA Biology End of Term 1
100% (1)
KUGWETSA Biology End of Term 1
12 pages
Manual Wms
No ratings yet
Manual Wms
4 pages
Chapter 4 Notes Class 12
100% (1)
Chapter 4 Notes Class 12
21 pages
3RB30461XW1
No ratings yet
3RB30461XW1
7 pages
Case Study
No ratings yet
Case Study
2 pages
Database Programming With PL/SQL 2-3: Practice Activities: Recognizing Data Types
No ratings yet
Database Programming With PL/SQL 2-3: Practice Activities: Recognizing Data Types
3 pages
Shakeel Saleem File Albania
No ratings yet
Shakeel Saleem File Albania
27 pages
James Dobson Homework
100% (1)
James Dobson Homework
6 pages
Maharashtra State Board of Technical Education, Mumbai: Sant Gajanan Maharaj Rural Polytechnic, Mahagaon
No ratings yet
Maharashtra State Board of Technical Education, Mumbai: Sant Gajanan Maharaj Rural Polytechnic, Mahagaon
12 pages
Digital Innovations Exam UiTM
No ratings yet
Digital Innovations Exam UiTM
6 pages
Chapter 1 Notes
No ratings yet
Chapter 1 Notes
9 pages
Unit 1 - What Kind of Movies Have You Been Watching Recently
No ratings yet
Unit 1 - What Kind of Movies Have You Been Watching Recently
12 pages
Procurement Documents
100% (1)
Procurement Documents
3 pages

Kohl PerformanceMeasures2012

Uploaded by

Kohl PerformanceMeasures2012

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Performance Measures in Binary Classiﬁcation

Article in International Journal of Statistics in Medical Research · October 2012

The user has requested enhancement of the downloaded file.

Performance Measures in Binary Classification

PERFORMANCE MEASURES Table 1: Confusion Matrix for Binary Classification

In many cases diagnostic tests are performed to Test result

E-ISSN: 1929-6029/12 © 2012 Lifescience Global

Se 1 Se region A, then it has a higher PLR and a lower NLR

The information provided by PPV and NPV is of

TN = true negative ROC = receiver operating characteristic

FN = false negative AUC = area under the ROC curve

TP = true positive REFERENCES

TN = true negative [1] Sokolova M and Lapalme G. A systematic analysis of

NLR = negative likelihood ratio

Received on 15-09-2012 Accepted on 01-10-2012 Published on 08-10-2012

View publication stats

You might also like