Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
11 views3 pages

Module 7 Notes

DFFE

Uploaded by

abdullahikulei
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views3 pages

Module 7 Notes

DFFE

Uploaded by

abdullahikulei
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

MODULE 7: Ensemble learning methods

Three common methods of ensemble learning are boosting, bagging and stack-
ing.
In boosting, the training is sequential. Each new model attempts to correct
the errors made by the previously trained model. This is made by assigning
higher weights to the previously misclassified examples. Examples of boosting
algorithms are AdaBoost, GradientBoost and XGBoost.
Bagging involves training several independent models using different ran-
dom subsets of the training data with replacement. Predictions of the different
models are then combined using voting for classifications and averaging for re-
gression. The most common bagging algorithm is random forests.
Stacking is another sequential ensemble learning method. In stacking, the
final learner use the outputs of base learners as inputs. The base learners are
trained using different algorithms. This is also known as meta-learning and the
final model is known as a meta-model.
We will review two ensemble learning algorithms that is AdaBoost and Ran-
dom Forests.

AdaBoost
The adaptive Boost algorithm iteratively creates a set of weak learners. In each
iteration, the weak learner trained focuses on the wrongly classified samples from
the previous iterations. This is achieved by selecting a bootstrapped dataset
which consists of samples from the original dataset. A sample in the original
dataset is more likely to be included in the bootstrapped dataset based on its
weight. A sample previously misclassified has more weight compared to a sample
that was correctly classified.
The following is the sequence of steps in the AdaBoost Algorithm:
Given the sample dataset

(x1 , y1 ) , (x2 , y2 ) . . . (xn , yn )

(i) Assign Equal Weights to each of the samples


(ii) Training
(1) Select a bootstrapped sample of size n by random selection with replacement
with probability of selection determined by the weight of the sample
(2) Train a weak learner (ft ) such as a decision stamp using the bootstrapped
sample
(3) Calculate the error rate of the weak learner using the formula
Pn
j=1 wj (ft (xj ) ̸= yj )
Ef t = Pn
k=1 wk

(4) Compute the weight of the weak learner using the formula below

1
 
1 (1 − Ef t )
αt = ln
2 Ef t
(5) Update the weights of the of the original samples using the formula

wi = wi × e±αt
Where the exponent is negative for correctly classified examples and positive
for incorrectly classified examples.
(6) Normalize the weights using the formula below
wi
wi = Pn
j=1 wj
(iii) Repeat step (ii) until the number of weak learners is sufficient
For a new sample to be classified x you use the formula below:
T
X
F (x) = Arg max αt ft (x) = c
c∈C
t=1

Random Forest Classifier


Random forest is an ensemble learning mechanism that was introduced by Leo
Breiman (Cutler A, Cutler D., & Stevens, 2014). They can be used for both clas-
sification and regression. According to Cutler A, Cutler D., & Stevens (2014),
the following are the advantages of random forests.
i. The can be used for both classification and regression
ii. They are fast to train
iii. They are fast in predicting
iv. The depend on one or two training parameters
v. They have a built in estimate of the generalization error
vi. They can be used for high dimension problems
vii. They can be implemented in parallel
viii. The can provide variable importance
The following is the sequence of steps in training a random forest classifier:
i. Randomly select M bootstrap samples each of size n from the datase † with
replacement
ii. Train M decision trees using the process below
At each node, given the total number of features p, randomly select k features
that are going to compete for the creation of the node. k is usually squareroot
of p for classification and p/3 for regression.
Grow the tree to the greatest extent possible without pruning
iii. Combine the trees as necessary for classification or regression. For classi-
fication, the majority class is taken as the predicted value. For regression, the
output of all the trees are averaged.

2
IMPLEMENTATION OF ENSEMBLE ALGO-
RITHMS USING PYTHON
You can implement AdaBoost and RandomForest classifiers
using python as shown below.
from sklearn.ensemble import AdaBoostClassifier, RandomForestClassifier
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn import metrics
from datetime import datetime
import matplotlib.pyplot as plt
data=load_iris()
y=data.target
labels=data.target_names
x=data.data
# split the data into training and test sets
x_train,x_test, y_train,y_test= train_test_split(x,y, random_state=42,
test_size=0.2, shuffle=True)
#create the classifiers
adBoostModel= AdaBoostClassifier(n_estimators=100, random_state=0)
rfModel= RandomForestClassifier(n_estimators=100, random_state=0)
#fit the models.
adBoostModel.fit(x_train, y_train)
rfModel.fit(x_train, y_train)
y_pred_adBoost= adBoostModel.predict(x_test)
y_pred_rf= rfModel.predict(x_test)
#get the accuracy
AdaBoostAccuracy=metrics.accuracy_score(y_test,y_pred_adBoost)
rfAccuracy=metrics.accuracy_score(y_test,y_pred_rf)
print("AdaBoost Accuracy:",AdaBoostAccuracy)
print("Random Forest Accuracy:",rfAccuracy)

You might also like