Improving Classification with the AdaBoost
meta-algorithm
Hawking Bear
February 5, 2022
National Institute of Science Education and Research
1
Problem Statement
Problem Statement
• For a classification problem (assume binary), we are given
a ”weak classifier”.
• Weak classifier - Classifier that performs just slightly
better than random guessing (> 50% accuracy).
• Can we combine multiple instances of the weak classifier
to obtain a strong classifier?
2
Meta-algorithms
• Methods that combine multiple classifiers are called
ensemble methods or meta-algorithms.
• Bagging and boosting are two common types.
3
Bagging and Boosting
Bagging
• Given a dataset X, we randomly sample X (with
replacement) S times to make S new datasets of equal size
as X.
• The weak classifier is applied to each dataset individually.
• To classify a new data point, we apply our S classifiers to
the new data points and take a majority vote.
4
Boosting
• Sequential use of classifiers over T rounds.
• In each subsequent round, the data points that were
misclassified in the previous round are given higher
priority.
• AdaBoost is the most popular boosting algorithm.
5
AdaBoost
AdaBoost
• To demonstrate the algorithms, we’ll use decision stumps
as the weak classifier.
• Decision stumps are decision trees of depth one which
classify data points based on just one feature and one
threshold.
6
AdaBoost
Figure 1: Sample data for decision stumps.
7
AdaBoost Pseudocode
Figure 2: AdaBoost pseudocode [1].
8
AdaBoost Schematic
Figure 3: Schematic representation of AdaBoost.
9
Why this formula for α?
• If α takes the given form and α > 0, it can be shown that
the classification error exponentially decreases over
multiple rounds [2].
• αt ≥ 0 if t ≤ 1/2, which is why we require the weak
classifier to have greater than 50% classification accuracy.
10
Class Imbabalance
What is it?
• Let’s say we’re building a classifier to detect a rare brain
tumor from MRI scans.
• In the dataset, for every positive sample there are 100,000
negative samples.
• A model that seeks to minimize classification error will
perform poorly at detecting cancer patients.
11
How do we detect it?
• Classification error doesn’t cut it, we need alternative
performance metrics.
• Confusion matrix is useful here.
Figure 4: Confusion matrix for a binary classification problem.
12
How do we detect it?
TP
• Precision = TP+FP = fraction of records that were positive
from the group that the classifier predicted to be positive.
TP
• Recall = TP+FN = fraction of positive examples the classifier
got right.
• Very useful when used together.
13
How do we address it?
1. Manipulate the cost matrix.
2. Resample during training.
Figure 5: Typical (top) and
modified (bottom) cost
matrices.
14
References i
1. Freund, Y., Schapire, R. & Abe, N. A short introduction to
boosting. Journal-Japanese Society For Artificial
Intelligence 14, 1612 (1999).
2. Freund, Y. & Schapire, R. E. A Decision-Theoretic
Generalization of On-Line Learning and an Application to
Boosting. Journal of Computer and System Sciences 55,
119–139. issn: 0022-0000.
https://www.sciencedirect.com/science/
article/pii/S002200009791504X (1997).
15
Why the name?
1
• Let the training error t of ht be given by 2 − γt .
• Previous learning algorithms required that γt be known a
priori before boosting begins.
• AdaBoost adapts to the error rates of the individual weak
hypotheses, thus the name ’adaptive’.