0% found this document useful (0 votes)

44 views90 pages

12 Ensemble Model

Slide python

Uploaded by

Nguyen Dang Vinh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views90 pages

12 Ensemble Model

Slide python

Uploaded by

Nguyen Dang Vinh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 90

Trịnh Tấn Đạt

Khoa CNTT – Đại Học Sài Gòn

Email: [email protected]
Website: https://sites.google.com/site/ttdat88/
Contents
 Introduction
 Voting
 Bagging
 Boosting
 Stacking and Blending
Introduction
Definition
 An ensemble of classifiers is a set of classifiers whose individual decisions
are combined in some way (typically, by weighted or un-weighted voting)
to classify new examples
 Ensembles are often much more accurate than the individual classifiers that
make them up.
Learning Ensembles
 Learn multiple alternative definitions of a concept using different training
data or different learning algorithms.
 Combine decisions of multiple definitions, e.g. using voting.

Training Data

Data 1 Data 2  Data K

Learner 1 Learner 2  Learner K

Model 1 Model 2  Model K

Model Combiner Final Model

Necessary and Sufficient Condition
 For the idea to work, the classifiers should be
 Accurate
 Diverse
 Accurate: Has an error rate better than random guessing on new instances
 Diverse: They make different errors on new data points
Why they Work?
 Suppose there are 25 base classifiers
 Each classifier has an error rate,  = 0.35
 Assume classifiers are independent
 Probability that the ensemble classifier makes a wrong prediction:

25
 25  i
 
 i 
i =13 
 (1 −  ) 25 − i
= 0.06

Marquis de Condorcet (1785) Majority vote is wrong with probability:
Value of Ensembles
 When combing multiple independent and diverse decisions each of which is at
least more accurate than random guessing, random errors cancel each other out,
correct decisions are reinforced.
 Human ensembles are demonstrably better
 How many jelly beans in the jar?: Individual estimates vs. group average.
A Motivating Example
 Suppose that you are a patient with a set of symptoms
 Instead of taking opinion of just one doctor (classifier), you decide to take
opinion of a few doctors!
 Is this a good idea? Indeed it is.
 Consult many doctors and then based on their diagnosis; you can get a fairly
accurate idea of the diagnosis.
The Wisdom of Crowds
 The collective knowledge of a diverse and independent body of people
typically exceeds the knowledge of any single individual and can be
harnessed by voting
When Ensembles Work?
 Ensemble methods work better with ‘unstable classifiers’
 Classifiers that are sensitive to minor perturbations in the training set
 Examples:
 Decision trees
 Rule-based
 Artificial neural networks
Ensembles
 Homogeneous Ensembles : all individual models are obtained with the same learning
algorithm, on slightly different datasets
 Use a single, arbitrary learning algorithm but manipulate training data to make it
learn multiple models.
 Data1  Data2  …  Data K
 Learner1 = Learner2 = … = Learner K
 Different methods for changing training data:
 Bagging: Resample training data
 Boosting: Reweight training data

 Heterogeneous Ensembles : individual models are obtained with different algorithms

 Stacking and Blending
 combining mechanism is that the output of the classifiers (Level 0 classifiers) will be used as training data
for another classifier (Level 1 classifier)
Methods of Constructing Ensembles
1. Manipulate training data set
2. Cross-validated Committees
3. Weighted Training Examples
4. Manipulating Input Features
5. Manipulating Output Targets
6. Injecting Randomness
Methods of Constructing Ensembles - 1
1. Manipulate training data set
 Bagging (bootstrap aggregation)
 On each run, Bagging presents the learning algorithm with a training set
drawn randomly, with replacement, from the original training data. This
process is called boostrapping.
 Each bootstrap aggregate contains, on the average 63.2% of original training
data, with several examples appearing multiple times
Methods of Constructing Ensembles - 2
2. Cross-validated Committees
 Construct training sets by leaving out disjointed sets of training data
 Idea similar to k-fold cross validation
3. Maintain a set of weights over the training examples. At each iteration
the weights are changed to place more emphasis on misclassified
examples (Adaboost)
Methods of Constructing Ensembles - 3
4. Manipulating Input Features
 Works if the input features are highly redundant (e.g., down sampling FFT
bins)
5. Manipulating Output Targets
6. Injecting Randomness
Variance and Bias
 Bias is due to differences
between the model and the
true function.
 Variance represents the
sensitivity of the model to
individual data points
Variance and Bias
Variance and Bias
Variance and Bias
Bias-Variance tradeoff
Voting
Simple Ensemble Techniques
 Max Voting: multiple models are used to make predictions for each data point.
The predictions by each model are considered as a ‘vote’. The predictions which
we get from the majority of the models are used as the final prediction.

from sklearn.ensemble import VotingClassifier

model1 = LogisticRegression(random_state=1)
model2 = tree.DecisionTreeClassifier(random_state=1)
model = VotingClassifier(estimators=[('lr', model1), ('dt', model2)], voting='hard')
model.fit(x_train,y_train)
model.score(x_test,y_test)
Simple Ensemble Techniques
 Averaging: multiple predictions are made for each data point in averaging. In this method, we
take an average of predictions from all the models and use it to make the final prediction.
Averaging can be used for making predictions in regression problems or while calculating
probabilities for classification problems.

model1 = tree.DecisionTreeClassifier()
model2 = KNeighborsClassifier()
model3= LogisticRegression()
model1.fit(x_train,y_train)
model2.fit(x_train,y_train)
model3.fit(x_train,y_train)
pred1=model1.predict_proba(x_test)
pred2=model2.predict_proba(x_test)
pred3=model3.predict_proba(x_test)

finalpred=(pred1+pred2+pred3)/3
Simple Ensemble Techniques
 Weighted Average: All models are assigned different weights defining
the importance of each model for prediction

finalpred=(pred1*0.3+pred2*0.3+pred3*0.4)
Bagging and Boosting
Bagging and Boosting
 Bagging and Boosting aggregate multiple hypotheses generated by the same
learning algorithm invoked over different distributions of training data
 Bagging and Boosting generate a classifier with a smaller error on the training
data as it combines multiple hypotheses which individually have a large error

 Bagging : reduce variance

 Boosting : reduce bias
Bagging and Boosting
 Bagging replicates training sets by sampling with replacement from the
training instances
 Boosting uses all instances but weights them and therefore produces different
classifiers
 Classifiers are then combined by voting to create a composite classifier
 Bagging: classifiers have equal vote. Majority wins
 Boosting: vote dependent on the classifier’s accuracy. Extra weightage to the
opinion of some
Bagging ML

ML
f2 f

ML
fT
Boosting
ML
Training Sample f1
ML
Weighted Sample f2
f
…
ML
Weighted Sample fT
Principle of Adaboost
 Failure is the mother of success

Strong Weak classifier

classifier
Weight
Features
vector
Bagging
Bagging
 Create ensembles by repeatedly randomly resampling the training data (Brieman, 1996).
 Given a training set of size N, create K samples, each of size N, by drawing N examples from the
original data, with replacement.
 Each bootstrap sample will on average contain 63.2% of the unique training examples, the rest are
replicates.
 Combine the K resulting models using simple majority vote.
 Decreases error by decreasing the variance in the results due to unstable learners,
algorithms (like decision trees) whose output can change dramatically when the training data is
slightly changed.
Bagging
• Also known as bootstrap aggregation
Original Data 1 2 3 4 5 6 7 8 9 10
Bagging (Round 1) 7 8 10 8 2 5 10 10 5 9
Bagging (Round 2) 1 4 9 1 2 3 2 7 3 2
Bagging (Round 3) 1 8 5 10 5 5 9 6 3 7

• Sampling uniformly with replacement

• Build classifier on each bootstrap sample
• Each bootstrap sample Di contains approx. 63.2% of the original training
data
• Remaining (36.8%) are used as test set
Bagging
• Decision Stump
• Single level decision binary tree
•Accuracy at most 70%
Bagging

Accuracy of ensemble classifier: 100% ☺

Bagging- Final Points
 Works well if the “base classifiers” are unstable
 Increased accuracy because it reduces the variance of the individual
classifier
 Does not focus on any particular instance of the training data
 Therefore, less susceptible to model over-fitting when applied to noisy
data
Bagging Algorithm: Training Phase
1. Initialize the parameters
1. D = {empty set}, the ensemble
2. K = number of classifiers to train
2. For k = 1 to K
1. Take a bootstrap sample Sk from training data
2. Build a classifier Dk using Sk as training set
3. Add the classifier to the ensemble, D = D  Dk
3. Return D
Bagging Algorithm: Classification Phase
4. Run D1,...,DK on the input data x
5. The class with the maximum number of votes is chosen as the label for x
Why Bagging Works
 Main reasons for error in learning are due bias and variance.
 Bias is due to differences between the model and the true function.
 Variance represents the sensitivity of the model to individual data points.
 Does bagging minimizes these errors ? Yes
 Averaging over bootstrap samples can reduce error from variance especially in
unstable classifiers
When is bagging useful?
 Bagging is bad if models are very similar (not independent enough)
 This happens if the learning algorithm is stable
 That is, model does not usually change much after changing a few instances
Other methods
 Bagging meta-estimator

 Random Forest

 Extremely Randomized Trees

Summary of Bagging
 Individual models trained on bootstrap-sampled instances, predictions
are aggregated
 Bagging is useful when the algorithm to learn individual models is:
 Relatively accurate
 Relatively unstable (high variance)
 The aggregated model is then usually better than the original model
trained on full dataset
Boosting
Boosting
 Originally developed by computational learning theorists to guarantee
performance improvements on fitting training data for a weak learner
that only needs to generate a hypothesis with a training accuracy greater
than 0.5 (Schapire, 1990).
 Revised to be a practical algorithm, AdaBoost, for building ensembles
that empirically improves generalization performance (Freund & Shapire,
1996).
Boosting
 Instances are given weights. At each iteration, a new hypothesis is learned and
the instances are reweighted to focus on instances that the most-recently-
learned classifier got wrong.
 Initially, all N instances are assigned equal weights
 Unlike bagging, weights may change at the end of a boosting round
Boosting
 Equal weights are assigned to each training instance (1/N for round 1)
 After a classifier Mi is learned, the weights are adjusted to allow the
subsequent classifier M to “pay more attention” to instances that were
misclassified by Mi.
 Final boosted classifier M* combines the votes of each individual classifier
 Weight of each classifier’s vote is a function of its accuracy
 Adaboost – popular boosting algorithm
Adaboost
 AdaBoost (adaptive boosting) is an ensemble learning algorithm that can
be used for classification or regression.
 AdaBoost creates the strong learner by iteratively adding weak learners
Toy Example – taken from Antonio Torralba @MIT
Each data point has
a class label:
+1
yt =
-1

and a weight:
wt =1

Weak learners from the

family of lines

h => p(error) = 0.5 it is at chance

Toy example

Each data point has

a class label:
+1
yt =
-1

and a weight:
wt =1

This one seems to be the best

This is a ‘weak classifier’: It performs slightly better than chance.
Toy example

Each data point has

a class label:
+1
yt =
-1

We update the weights:

wt wt exp{-yt Ht}

We set a new problem for which the previous weak classifier performs at chance again
Toy example

Each data point has

a class label:
+1
yt =
-1

We update the weights:

wt wt exp{-yt Ht}

We set a new problem for which the previous weak classifier performs at chance again
Toy example

Each data point has

a class label:
+1
yt =
-1

We update the weights:

wt wt exp{-yt Ht}

We set a new problem for which the previous weak classifier performs at chance again
Toy example

Each data point has

a class label:
+1
yt =
-1

We update the weights:

wt wt exp{-yt Ht}

We set a new problem for which the previous weak classifier performs at chance again
Toy example
f1 f2

The strong (non- linear) classifier is built as the combination of all the weak (linear) classifiers.
Adaboost Strategy
 At each stage of the algorithm, Adaboost trains a new classifier using a data set
in which the weighting coefficients have been adjusted according to the
performance of the previously trained classifier so as to give greater weight to
misclassified instances
 Finally, when the desired number of base classifiers have been trained, their
results are combined to form a committee using different weights to different
classifiers
Adaboost: Initialization
 Given a set of input vectors {x1,x2,...,xN} along with binary target values
{t1,t2,...,tN}.
 That is, tn Î {-1, +1}
 Each instance is given a weight wn
 Initially, set wn = 1/N, for all n
 Assume that we have a procedure to train the base (weak) classifier. (Say, a
Perceptron)
Boosting Framework
Weight Progression
 Each base classifier ym(x) is trained on a weighted form of the training set
(blue arrows)
 The weights depend upon the performance of the previous base classifier
ym-1(x) (green arrows)
Adaboost Algorithm
1. Initialize the data weighting coefficients {wn} by setting
wn = 1 for n = 1, 2,..., N
N
2. For m = 1,..., M
a. Fit a classifier ym (x) to the training data by minimizing the weighted error
function
N
J m = å w (nm ) I (ym (xn ) ¹ tn )
n=1

I(ym (xn ) ¹ tn ) =1 when ym (xn ) ¹ tn , 0 otherwise

Indicator Function
 The I above is called the indicator function
 Notice that I = 1 when an instance is misclassified

 Jm is the “error” function of the mth classifier. It identifies the weights

associated with each misclassified training instance and adds them up
 The quantity epsilonm can be thought of as “error rate” of each base
classifier on the data set
Epsilon & Alpha
b. Evaluate the quantities
N

å n I (ym (xn ) ¹ tn )
w (m)

em = n=1
N

å n
w (m)

n=1

ì1 - e m ü
am = ln í ý
î em þ
Weight Update & Prediction
c. Update the data weighting coefficients

w (m+1)
n =w (m)
n exp {am I(ym (xn ) ¹ tn )}
3. Make predictions using the final model

æ M ö
YM (x) = sign ç åa m ym (x)÷
è m=1 ø
Comments
 Note that the first base classifier is trained using wn
(1)
that are all equal
 In subsequent iterations these weights are increased for data instances that are
misclassified and decreased/unchanged for those correctly classified.
 The alphas eventually give greater weight to the more accurate classifiers
Experimental Results on Ensembles
(Freund & Schapire, 1996; Quinlan, 1996)
 Ensembles have been used to improve generalization accuracy on a
wide variety of problems.
 On average, Boosting provides a larger increase in accuracy than
Bagging.
 Boosting on rare occasions can degrade accuracy.
 Boosting is particularly subject to over-fitting when there is
significant noise in the training data.
Issues in Ensembles
 Parallelism in Ensembles: Bagging is easily parallelized, Boosting is not.
 Variants of Boosting to handle noisy data.
 How “weak” should a base-learner for Boosting be?
 Combining Boosting and Bagging.
Adaboost
 AdaBoost.M1 and AdaBoost.M2 – original algorithms for binary and multiclass
classification
 LogitBoost – binary classification (for poorly separable classes)
 Gentle AdaBoost or GentleBoost – binary classification (for use with multilevel
categorical predictors)
 RobustBoost – binary classification (robust against label noise)
 LSBoost – least squares boosting (for regression ensembles)
Improving
 Gradient boosting (GBoosting)
 Stochastic Gradient Boosting
 Penalized Gradient Boosting
 Extreme Gradient Boosting (XGBoost)
Summary:
 Can combine many weak classifiers/regressors into a stronger classifier;
voting, averaging, bagging
 if weak classifiers/regressors are better than random.
 if there is sufficient de-correlation (independence) amongst the weak
classifiers/regressors.
 Can combine many (high-bias) weak classifiers/regressors into a strong
classifier; boosting
 if weak classifiers/regressors are chosen and combined using knowledge of how
well they and others performed on the task on training data.
 The selection and combination encourages the weak classifiers to be
complementary, diverse and de-correlated.
Stacking and Blending
Stacking
 Both bagging and boosting assume we have a single “base learning”
algorithm.
 But what if we want to ensemble an arbitrary set of classifiers?
 E.g., combine the output of a SVM, naïve Bayes, and a nearest neighbor
model?
Stacking
Meta-model
When does stacking work?
 Stacking works best when the base models have complementary strengths
and weaknesses.
 For example: combining k-nearest neighbor models with different k-
values, Naïve Bayes, and logistic regression. Each of these models has
different underlying assumptions so (hopefully) they will be
complementary.
Stacked learners: first attempt
Stacking
 EX:
 Step 1: The train set is split into 10 parts.

 Step 2: A base model (suppose a

decision tree) is fitted on 9 parts and
predictions are made for the 10th part.
Stacking
 Step 3: Using this model, predictions
are made on the test set

 Step 4: Steps 2 to 3 are repeated for

another base model (say knn)
resulting in another set of
predictions for the train set and test
set.
Stacking
 Step 5: The predictions from the train set are
used as features to build a new model ( can
use logistic regression)

 Step 6: This model is used to make final

predictions on the test prediction set.
Blending
 Blending follows the same approach as stacking but uses only a holdout
(validation) set from the train set to make predictions.
 In other words, unlike stacking, the predictions are made on the holdout
set only.
 The holdout set and the predictions are used to build a model which is
run on the test set.
Blending
 Step 1: The train set is split into training and
validation sets.
 Step 2: Model(s) are fitted on the training set.

 Step 3: The predictions are made on the

validation set and the test set.
 Step 4: The validation set and its predictions are
used as features to build a new model.
 Step 5: This model is used to make final
predictions on the test and meta-features.
Blending
 EX: simple code
model1 = tree.DecisionTreeClassifier()
model1.fit(x_train, y_train)
val_pred1=model1.predict(x_val)
test_pred1=model1.predict(x_test)
val_pred1=pd.DataFrame(val_pred1)
test_pred1=pd.DataFrame(test_pred1)
df_val=pd.concat([x_val, val_pred1,val_pred2],axis=1)
model2 = KNeighborsClassifier()
df_test=pd.concat([x_test, test_pred1,test_pred2],axis=1)
model2.fit(x_train,y_train)
val_pred2=model2.predict(x_val)
model = LogisticRegression()
test_pred2=model2.predict(x_test)
model.fit(df_val,y_val)
val_pred2=pd.DataFrame(val_pred2)
model.score(df_test,y_test)
test_pred2=pd.DataFrame(test_pred2)
Netflix challenge - 1 million USD (2006-
2009)
 Netflix, an online DVD-rental and online video streaming service
 Task: predict user ratings to films from ratings by other users
 Goal: improve existing method by 10%
 Winner’s solution: ensemble with over 500 heterogeneous models,
aggregated with gradient boosted decision trees

 Ensembles based on blending/stacking were key approaches used in the

netflix competition
Conclusions
 Ensemble methods combine several hypotheses into one prediction.
 They work better than the best individual hypothesis from the same class
because they reduce bias or variance (or both).
 Bagging is mainly a variance-reduction technique, useful for complex
hypotheses.
 Boosting focuses on harder examples, and gives a weighted vote to the
hypotheses.
 Boosting works by reducing bias and increasing classification margin.
 Stacking is a generic approach to ensemble various models and performs very
well in practice.
Bài Tập
1) Cài đặt các kỹ thuật trong ensemble model cho bài toán dự đoán bệnh
tiểu đường (Diabetes Predictions)

All patients here are females at least 21 years

old of Pima Indian heritage
Number of Instances: 768
Number of Attributes: 8 plus class
Missing Attribute Values: None
Class Distribution: (class value 1 is interpreted
as "tested positive for diabetes")
Class Value Number of instances 0:500 ; 1: 268

0: tested_negative
1: tested_positive
Bài Tập
❖ Áp dụng các kỹ thuật sau
 Voting
 Hard voting
 Soft voting
 Weighted voting

 Comparison Ensemble models

 Bagging
 Boosting
 Voting
Diabetes Predictions
 https://www.openml.org/d/37
Voting
 Based model (scikit-learn)
 KNN
 RandomForest
 Logistic regression

 Voting
 Hard voting 0.7835497835497836
 Soft voting 0.7878787878787878
 Weighted voting 0.7922077922077922
Comparison Ensemble models
 Based model:
 rf = RandomForestClassifier()
 et = ExtraTreesClassifier()
 knn = KNeighborsClassifier()
 svc = SVC()
 rg = RidgeClassifier()

❖ Comparison Ensemble models

▪ Bagging
▪ Boosting
▪ Voting
Reference
 https://medium.com/@rrfd/boosting-bagging-and-stacking-ensemble-
methods-with-sklearn-and-mlens-a455c0c982de
 https://machinelearningmastery.com/ensemble-machine-learning-
algorithms-python-scikit-learn/

8 Ensembles
No ratings yet
8 Ensembles
94 pages
Perceptron AND Gate Tutorial
100% (1)
Perceptron AND Gate Tutorial
5 pages
DL Notes 1 5 Deep Learning
100% (1)
DL Notes 1 5 Deep Learning
189 pages
Bagging
No ratings yet
Bagging
7 pages
Unit 3
No ratings yet
Unit 3
99 pages
Lecture 2.1.2activation Function
No ratings yet
Lecture 2.1.2activation Function
15 pages
Ensemble Methods
No ratings yet
Ensemble Methods
19 pages
ML UNIT-4 Notes PDF
100% (1)
ML UNIT-4 Notes PDF
40 pages
Backpropagation PDF 1644779488
No ratings yet
Backpropagation PDF 1644779488
8 pages
ML Unit 5
No ratings yet
ML Unit 5
34 pages
Supervised Learning
No ratings yet
Supervised Learning
9 pages
Unit 4 Ensemble Techniques and Unsupervised Learning
100% (1)
Unit 4 Ensemble Techniques and Unsupervised Learning
25 pages
LR Desktop Udo6rlp
No ratings yet
LR Desktop Udo6rlp
4 pages
Jntuk R20 ML Unit-Iii
100% (1)
Jntuk R20 ML Unit-Iii
21 pages
Ensemble Methods
No ratings yet
Ensemble Methods
31 pages
CSE 445 - Lecture 7 - Ensemble Learning
No ratings yet
CSE 445 - Lecture 7 - Ensemble Learning
17 pages
Ensemble Learning in Machine Learning
No ratings yet
Ensemble Learning in Machine Learning
9 pages
Machine Learning: Ensemble Methods
No ratings yet
Machine Learning: Ensemble Methods
54 pages
Unit 4 ML
No ratings yet
Unit 4 ML
25 pages
Lecture 2
No ratings yet
Lecture 2
35 pages
Ensemble Methods Send
No ratings yet
Ensemble Methods Send
20 pages
Week 11
No ratings yet
Week 11
16 pages
MNIST Database
No ratings yet
MNIST Database
4 pages
Ensemble Learning
No ratings yet
Ensemble Learning
26 pages
An Introduction of Ensemble Learning
100% (1)
An Introduction of Ensemble Learning
40 pages
Ensemble Learning
No ratings yet
Ensemble Learning
13 pages
Unit 4
No ratings yet
Unit 4
24 pages
Module 7 - Ensemble Learning
No ratings yet
Module 7 - Ensemble Learning
41 pages
Module 5,1 Ensemble - Bagging, RF, Boosting
No ratings yet
Module 5,1 Ensemble - Bagging, RF, Boosting
66 pages
Unit V - Multiple Learners
No ratings yet
Unit V - Multiple Learners
54 pages
Ensembling Techniques
No ratings yet
Ensembling Techniques
11 pages
Chap 10-1 - Sequence Modeling Recurrent and Recursive Nets - Eunjeong Yi
No ratings yet
Chap 10-1 - Sequence Modeling Recurrent and Recursive Nets - Eunjeong Yi
21 pages
UNIT III Word File
No ratings yet
UNIT III Word File
13 pages
UMl - Unit 3
No ratings yet
UMl - Unit 3
50 pages
Ensemble Learning
No ratings yet
Ensemble Learning
35 pages
Ensemble Learning in Machine Learning
No ratings yet
Ensemble Learning in Machine Learning
15 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
24 pages
Ensemble Learning
No ratings yet
Ensemble Learning
52 pages
771 A18 Lec21
No ratings yet
771 A18 Lec21
109 pages
Unit 3
No ratings yet
Unit 3
63 pages
Unit 3 Aml
No ratings yet
Unit 3 Aml
9 pages
Ensemble Techniques Presentation
No ratings yet
Ensemble Techniques Presentation
17 pages
Bagging+Boosting+Gradient Boosting
100% (1)
Bagging+Boosting+Gradient Boosting
48 pages
Bcse332l Deep-Learning TH 1.0 0 Bcse332l
No ratings yet
Bcse332l Deep-Learning TH 1.0 0 Bcse332l
3 pages
2.4-Ensemble Methods Lecture Notes
No ratings yet
2.4-Ensemble Methods Lecture Notes
14 pages
Module 2
No ratings yet
Module 2
34 pages
Article Review 9 Eng
No ratings yet
Article Review 9 Eng
21 pages
Ens Embling
No ratings yet
Ens Embling
8 pages
DL Exp-1 16010422230
No ratings yet
DL Exp-1 16010422230
6 pages
Unit-3 ML
No ratings yet
Unit-3 ML
18 pages
Ensembles Learning
No ratings yet
Ensembles Learning
16 pages
ML Mod 5.1
No ratings yet
ML Mod 5.1
18 pages
Ensemble Learning for Data Scientists
No ratings yet
Ensemble Learning for Data Scientists
31 pages
Ensemble Classifiers Overview
No ratings yet
Ensemble Classifiers Overview
37 pages
Ensemble TBL Notes
No ratings yet
Ensemble TBL Notes
2 pages
L17 Perceptron
No ratings yet
L17 Perceptron
21 pages
Lecture 5
No ratings yet
Lecture 5
11 pages
Part 1.2. Back Propagation
No ratings yet
Part 1.2. Back Propagation
30 pages
Ensemble Methods
100% (1)
Ensemble Methods
15 pages
Lecture 10 Ensemble Methods
No ratings yet
Lecture 10 Ensemble Methods
69 pages
Ensemble Classification
No ratings yet
Ensemble Classification
25 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
8 pages
DL - Unit IV
No ratings yet
DL - Unit IV
36 pages
Ensemble Learning for Data Scientists
No ratings yet
Ensemble Learning for Data Scientists
41 pages
(K Nearest Neighbors) KNN
No ratings yet
(K Nearest Neighbors) KNN
3 pages
Data Mining - Ensemble Methods
No ratings yet
Data Mining - Ensemble Methods
12 pages
Unit 5
No ratings yet
Unit 5
35 pages
Lecture 18 - 2024
No ratings yet
Lecture 18 - 2024
34 pages
Student Notes - Convolutional Neural Networks (CNN) Introduction - Belajar Pembelajaran Mesin Indonesia
No ratings yet
Student Notes - Convolutional Neural Networks (CNN) Introduction - Belajar Pembelajaran Mesin Indonesia
14 pages
ML Cat 2 - 7
No ratings yet
ML Cat 2 - 7
30 pages
4.0 Classification Methodologies R
No ratings yet
4.0 Classification Methodologies R
200 pages
ML Lab Assignment 1
No ratings yet
ML Lab Assignment 1
1 page
Module 4
No ratings yet
Module 4
30 pages
Intro to CNNs for Deep Learning Students
No ratings yet
Intro to CNNs for Deep Learning Students
65 pages
Psor
No ratings yet
Psor
8 pages
LSTM Tutorial for AI Beginners
No ratings yet
LSTM Tutorial for AI Beginners
34 pages
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
36 pages
Homework 2: SVM, Kernel Methods, Ensemble Learning, Learning Theory
No ratings yet
Homework 2: SVM, Kernel Methods, Ensemble Learning, Learning Theory
12 pages
Classifier Evaluation for Researchers
No ratings yet
Classifier Evaluation for Researchers
49 pages
P1NN Gallardo - Pinal - Erika
No ratings yet
P1NN Gallardo - Pinal - Erika
17 pages
Week 11 EnsembleLearning
No ratings yet
Week 11 EnsembleLearning
34 pages
Ensemble Methods (Final)
No ratings yet
Ensemble Methods (Final)
16 pages
Ensemble Classifiers
100% (1)
Ensemble Classifiers
37 pages
Ensembles of Classifiers: Evgueni Smirnov
No ratings yet
Ensembles of Classifiers: Evgueni Smirnov
43 pages
Ensemble Learning: Proprietary Content. ©great Learning. All Rights Reserved. Unauthorized Use or Distribution Prohibited
No ratings yet
Ensemble Learning: Proprietary Content. ©great Learning. All Rights Reserved. Unauthorized Use or Distribution Prohibited
6 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
4 pages
MLDM Lect17 Classification Ensembles
No ratings yet
MLDM Lect17 Classification Ensembles
2 pages

12 Ensemble Model

Uploaded by

12 Ensemble Model

Uploaded by

Trịnh Tấn Đạt

Khoa CNTT – Đại Học Sài Gòn

Data 1 Data 2  Data K

Learner 1 Learner 2  Learner K

Model 1 Model 2  Model K

Model Combiner Final Model

 Heterogeneous Ensembles : individual models are obtained with different algorithms

from sklearn.ensemble import VotingClassifier

 Bagging : reduce variance

Strong Weak classifier

• Sampling uniformly with replacement

Accuracy of ensemble classifier: 100% ☺

 Extremely Randomized Trees

Weak learners from the

h => p(error) = 0.5 it is at chance

Each data point has

This one seems to be the best

Each data point has

We update the weights:

Each data point has

We update the weights:

Each data point has

We update the weights:

Each data point has

We update the weights:

I(ym (xn ) ¹ tn ) =1 when ym (xn ) ¹ tn , 0 otherwise

 Jm is the “error” function of the mth classifier. It identifies the weights

 Step 2: A base model (suppose a

 Step 4: Steps 2 to 3 are repeated for

 Step 6: This model is used to make final

 Step 3: The predictions are made on the

 Ensembles based on blending/stacking were key approaches used in the

All patients here are females at least 21 years

 Comparison Ensemble models

❖ Comparison Ensemble models

You might also like