Ensemble Learning (Autosaved)

Ensemble learning is a machine learning technique that combines multiple prediction algorithms to enhance accuracy and performance. Key methods include bagging, which reduces variance through averaging predictions, and boosting, which focuses on correcting errors from previous models to reduce bias. Techniques for handling class imbalance in multiclass classification include oversampling, undersampling, and using ensemble learning methods.

Uploaded by

nvshelke370122

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views31 pages

Ensemble Learning (Autosaved)

Uploaded by

nvshelke370122

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 31

Ensemble Learning

• Ensemble means group of prediction algorithms.

• It uses multiple predicting algorithms to obtain better predictive
performance. (e. g. Election – predict who will win)
• Ensemble learning is a machine learning technique that aggregates
two or more learners (e.g. regression models, neural networks) in
order to produce better predictions.
• In other words, an ensemble model combines several individual
models to produce more accurate predictions than a single model
alone.
• Accuracy is more but computational cost increases as we use multiple
prediction algos. But we get more accurate data.
Bagging

• Bagging (Bootstrap Aggregating) is an ensemble learning technique

designed to improve the accuracy and stability of machine learning
algorithms. It involves the following steps:
• Data Sampling: Creating multiple subsets of the training dataset using
bootstrap sampling (random sampling with replacement).
• Model Training: Training a separate model on each subset of the data.
• Aggregation: Combining the predictions from all individual models
(averaged for regression or majority voting for classification) to
produce the final output.
Key Benefits of bagging:

• Reduces Variance: By averaging multiple predictions, bagging reduces

the variance of the model and helps prevent overfitting.
• Improves Accuracy: Combining multiple models usually leads to
better performance than individual models.
• Example of Bagging Algorithms:
• Random Forests (an extension of bagging applied to decision trees)
Boosting
• What is Boosting?
• Boosting is another ensemble learning technique that focuses on
creating a strong model by combining several weak models. It involves
the following steps:
• Sequential Training: Training models sequentially, each one trying to
correct the errors made by the previous models.
• Weight Adjustment: Each instance in the training set is weighted.
Initially, all instances have equal weights. After each model is trained, the
weights of misclassified instances are increased so that the next model
focuses more on difficult cases.
• Model Combination: Combining the predictions from all models to
produce the final output, typically by weighted voting or weighted
averaging.
Key Benefits of boosting
• Reduces Bias: By focusing on hard-to-classify instances, boosting
reduces bias and improves the overall model accuracy.
• Produces Strong Predictors: Combining weak learners leads to a
strong predictive model.
• Example of Boosting Algorithms:
• AdaBoost
• Gradient Boosting Machines (GBM)
• XGBoost
• LightGBM
Adaboost algorithm
• Differences between bagging and boosting
• Random forest is bagging technique and Adaboost is a boosting
technique.
• Random forest is a parallel learning process whereas Adaboost is a
sequential learning process.
• In the random forest, the individual models or individual decision
trees are built from the main data parallelly and independently of
each other.
• In the random forest, multiple trees are built from the same data
parallelly and none of the trees is dependent on other trees. Hence
this process is called a parallel process.
• In the sequential process, one tree is dependent on the previous tree
which means if there are multiple models implemented say ML model
1, ML model 2, ML model 3, and so on.
• This process in which all the models are dependent on each other or
dependent on the previous model is called sequential learning

• There are multiple models fit and all these models combine to make a
bigger model or a master model.
• In the random forest, all the models are said to have equal weights in
the final model.
• In Adaboost, all the trees or all the models do not have equal weights
(vote).
• In the random forest, all the individual models are one fully grown
decision tree.
• In Adaboost, the trees are not fully grown. Rather the trees are just
one root and two leaves.
• Specifically, they are called stumps in the language of Adaboost.
Stumps are nothing but
one root node and two leaf nodes.
Stage 1 :
Train a decision tree. Partition decision tree into different stumps with one root node and
two leaf nodes.
Finalize one decision stump as model 1 M1.
Now we have to see performance of M1 on given dataset and get output y.
Calculate weight of M1
error=0.2+0.2=0.4
alpha1=1/2 log(1-error/error)
alpha1 = 0.20

x1 x2 y Y pred Weight
3 7 1 1 0.2
2 9 0 1 0.2
1 4 1 0 0.2
9 8 0 0 0.2
3 7 0 0 0.2
• Stage 2 :
• Now, we have to send errors of M1 to M2 by increasing weightage of
misclassified points and decreasing weight of correct points.
• For misclassified rows,
New-weight= current-weight * e^alpha1
= 0.2 * e^0.2 = 0.24
• For correctly classified rows,
New-weight= current-weight * e^-alpha1
= 0.2 * e^-0.2 = 0.16
x1 x2 y Y pred Weight Updated Normalize
weight d
3 7 1 1 0.2 0.16 0.166
2 9 0 1 0.2 0.24 0.25
1 4 1 0 0.2 0.24 0.25
9 8 0 0 0.2 0.16 0.166
3 7 0 0 0.2 0.16 0.166

• Total of updated weight = 0.96

x1 x2 y New weight range
3 7 1 0.166 0-0.166
2 9 0 0.25 0.166 – 0.416
1 4 1 0.25 0.416 – 0.666
9 8 0 0.166 0.666 – 0.832
3 7 0 0.166 0.832 – 1.0
• Generate 5 times random numbers between 0 and 1
• 0.13 1, 3, 3, 3, 4
• 0.43
• 0.62
• 0.50
• 0.8
• Here 3 has more weightage. This is called upsampling technique.
• Remove 2 and 5 rows. 1,3,3,3,4 will be new dataset which passes to M2.
• Repeat this process for next decision stumps.
• P=alpha1 h1(x) +alpha2 h2(x) ….
Binary vs Multiclass Classification

Parameters Binary classification Multi-class classification

There can be any number of
It is a classification of two
classes in it, i.e., classifies
No. of classes groups, i.e. classifies objects
the object into more than
in at most two classes.
two classes.
The most popular algorithms Popular algorithms that can
used by the binary be used for multi-class
classification are- classification include:
• Logistic Regression •k-Nearest Neighbors
Algorithms used
•k-Nearest Neighbors •Decision Trees
•Decision Trees •Naive Bayes
•Support Vector Machine •Random Forest.
•Naive Bayes •Gradient Boosting

Examples of binary Examples of multi-class

classification include- classification include:
•Email spam detection •Face classification.
Examples
Balanced and imbalanced multiclass classification
problems
• In a balanced dataset, the number of Positive and Negative labels is
about equal.
• In Imbalanced dataset, there is high difference between positive and
negative values.

• Problems with Imbalanced dataset

• Standard machine learning techniques such as decision tree, logistic
regression have a bias towards a majority class, and they tend to
ignore a minority class.
• E.g. fruit dataset with class labels apples, oranges, bananas.If majority
of data points belong to apples, then dataset is said to be skewed and
imbalanced.
Techniques to solve the class
imbalance problem
• 1. Use the right evaluation metrics
• Confusion matrix :A table showing correct predictions and type of
incorrect predictions.
• With the help of confusion matrix , we can calculate Precision, Recall,
F1 score. Rather than concentrating on accuracy, we can calculate
these three matrices i.e. Precision, Recall and F1 score, on the basis of
these evaluation matrix, we will know the model is performing well or
not.
2. Over sampling (Up sampling)
• Oversampling increases number of minority class members in the
training set.
• Advantage is that no information from original training set is lost, as
all observations from the minority and majority classes are kept.
• On the other hand, It prone to overfitting. (perform well on training
data but not on testing data)
3. Undersampling (Down
sampling)
• It reduces the number of majority samples to balance the class
distribution.
• But it might discard useful information.
4. Ensemble learning technique
• It mainly combines the output of multiple base learners.
• There are many approaches in ensemble learning like bagging,
boosting, stacking etc.
Variants of multiclass classification
• One-vs-One and One –vs-All

Features Class C1 C2 C3
F1 F2 F3 C1 +1 -1 -1
F4 F11 F9 C2 -1 +1 -1
F8 F7 F6 C3 -1 -1 +1
F12 F13 F14 C1 +1 -1 -1
F15 F16 F17 C2 -1 +1 -1
F18 F19 F20 C3 -1 -1 +1

F1 F2 F3 is a tuple belongs to C1 class. We have to create classifiers.

Training dataset passes to algorithm which generated a model or classifier.
Number of Classifiers generated equal to number of classes in dataset in one vs all.
T -> Algo -> M
In this example it will require 3 training sets, here C1, C2, C3 are 3 training datasets.
Tuple which belong to C1 will be +1 and others will be -1.Same for C2 and C3.
• C1 will generate M1, C2 generate C2 and C3 generate M3
• E.g Fa1 Fa2 Fa3 is a test tuple will give to M1, M2, M3

Bagging
No ratings yet
Bagging
7 pages
LR Desktop Udo6rlp
No ratings yet
LR Desktop Udo6rlp
4 pages
Ensemble Learning Methods
100% (1)
Ensemble Learning Methods
24 pages
Ensemble Methods
No ratings yet
Ensemble Methods
19 pages
Ensemble Learning in Machine Learning
No ratings yet
Ensemble Learning in Machine Learning
9 pages
Unit 4 Ensemble Techniques and Unsupervised Learning
100% (1)
Unit 4 Ensemble Techniques and Unsupervised Learning
25 pages
Ensemble Learning
No ratings yet
Ensemble Learning
13 pages
Eda - M4
No ratings yet
Eda - M4
7 pages
Unit 3 Aml
No ratings yet
Unit 3 Aml
9 pages
Ensemble Learning
100% (1)
Ensemble Learning
7 pages
Week 11
No ratings yet
Week 11
16 pages
Unit 3 Ds
No ratings yet
Unit 3 Ds
10 pages
Chapter 3 Ensemble Learning
No ratings yet
Chapter 3 Ensemble Learning
37 pages
ML Lecture 15 Ensemble
No ratings yet
ML Lecture 15 Ensemble
27 pages
UNIT III Word File
No ratings yet
UNIT III Word File
13 pages
Unit 5 ML
No ratings yet
Unit 5 ML
14 pages
Unit 3-Ensemble Techniques
No ratings yet
Unit 3-Ensemble Techniques
47 pages
Evaluating Machine Learning Algorithms and Model Selection
No ratings yet
Evaluating Machine Learning Algorithms and Model Selection
10 pages
Unit 4 ML
No ratings yet
Unit 4 ML
25 pages
Ch-4 Ensemble Learning
No ratings yet
Ch-4 Ensemble Learning
18 pages
Ensemble Methods
No ratings yet
Ensemble Methods
31 pages
Summary of Case Study Research: Design and Methods by Robert K. Yin
50% (2)
Summary of Case Study Research: Design and Methods by Robert K. Yin
5 pages
Ensemble Learning
No ratings yet
Ensemble Learning
26 pages
Week 2 - Demographics and Introduction To Statistics
50% (2)
Week 2 - Demographics and Introduction To Statistics
53 pages
Unit 4
No ratings yet
Unit 4
24 pages
Ensemble Learning in Machine Learning
No ratings yet
Ensemble Learning in Machine Learning
4 pages
Unit V - Multiple Learners
No ratings yet
Unit V - Multiple Learners
54 pages
Ens Embling
No ratings yet
Ens Embling
8 pages
Aimlunit4 250115133449 E0e46c09
No ratings yet
Aimlunit4 250115133449 E0e46c09
32 pages
Assessing Predictive Models
No ratings yet
Assessing Predictive Models
25 pages
Time To Explore (5) ML
No ratings yet
Time To Explore (5) ML
9 pages
Ebook Db2 Performance Handbook All en 1006
No ratings yet
Ebook Db2 Performance Handbook All en 1006
170 pages
Unit-3 ML
No ratings yet
Unit-3 ML
18 pages
Unit Iv
No ratings yet
Unit Iv
14 pages
Assignment 1
No ratings yet
Assignment 1
4 pages
AI Learning: Supervised vs. Unsupervised
No ratings yet
AI Learning: Supervised vs. Unsupervised
12 pages
An Introduction of Ensemble Learning
100% (1)
An Introduction of Ensemble Learning
40 pages
ML Unit-3
No ratings yet
ML Unit-3
15 pages
UNIT3 Class
No ratings yet
UNIT3 Class
30 pages
SP Las 10
No ratings yet
SP Las 10
10 pages
Ensemble Method
No ratings yet
Ensemble Method
8 pages
ML Unit 3-1
No ratings yet
ML Unit 3-1
14 pages
AI25
No ratings yet
AI25
7 pages
Data Mining - Ensemble Methods
No ratings yet
Data Mining - Ensemble Methods
12 pages
Biostatistics, Exam - Nursing, July, 2005
No ratings yet
Biostatistics, Exam - Nursing, July, 2005
6 pages
Group9 ABA Ensemble Model
No ratings yet
Group9 ABA Ensemble Model
5 pages
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
No ratings yet
1.1 - Xgboost, GBboost, Adaboost - Boosting - Medium
6 pages
Module 7 - Ensemble Learning
No ratings yet
Module 7 - Ensemble Learning
41 pages
Unit 4 Part 1
No ratings yet
Unit 4 Part 1
47 pages
Ensemble Final
No ratings yet
Ensemble Final
41 pages
Critical Analysis of Research Report and Artical
No ratings yet
Critical Analysis of Research Report and Artical
18 pages
ML Unit-3
No ratings yet
ML Unit-3
28 pages
Proyecto
0% (1)
Proyecto
3 pages
Robiel H. Statistics For Management
No ratings yet
Robiel H. Statistics For Management
18 pages
Random Forest-Supervised ML
No ratings yet
Random Forest-Supervised ML
45 pages
ML Mod 5.1
No ratings yet
ML Mod 5.1
18 pages
Ensemble Learning
No ratings yet
Ensemble Learning
35 pages
Bagging+Boosting+Gradient Boosting
100% (1)
Bagging+Boosting+Gradient Boosting
48 pages
Article Review 9 Eng
No ratings yet
Article Review 9 Eng
21 pages
Quantitative Methods
No ratings yet
Quantitative Methods
5 pages
ML-Unit I - Ensemble Methods
No ratings yet
ML-Unit I - Ensemble Methods
54 pages
Normal Distribution: Mean, Mode, Median 0 +1sd +2sd +3sd
No ratings yet
Normal Distribution: Mean, Mode, Median 0 +1sd +2sd +3sd
6 pages
Fotios 2017 Semi Cylindrical Illuminance TEXT VERSION
No ratings yet
Fotios 2017 Semi Cylindrical Illuminance TEXT VERSION
7 pages
Ensemble Learning
No ratings yet
Ensemble Learning
52 pages
Random Forest
No ratings yet
Random Forest
20 pages
Module 2
No ratings yet
Module 2
34 pages
Bagging and Boosting
No ratings yet
Bagging and Boosting
40 pages
Statistics Homework for Students
No ratings yet
Statistics Homework for Students
16 pages
MCOM Project Proposal Guide
100% (1)
MCOM Project Proposal Guide
24 pages
Savickas Et Al 2009 - A Paradigm For Career Constructing in The 21st Century
100% (2)
Savickas Et Al 2009 - A Paradigm For Career Constructing in The 21st Century
12 pages
Lecture6 Clustering
No ratings yet
Lecture6 Clustering
47 pages
Statistical Data Analysis Summary
No ratings yet
Statistical Data Analysis Summary
3 pages
Data Transformation 1 Reviewed
No ratings yet
Data Transformation 1 Reviewed
43 pages
Worksheet 1
No ratings yet
Worksheet 1
5 pages
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler Fall 2012
36 pages
Laplace Transform
No ratings yet
Laplace Transform
16 pages
Final Research
No ratings yet
Final Research
14 pages
Proposal Car
No ratings yet
Proposal Car
8 pages
Relationship Between Nurse Case Manager's Communication Skills and Patient Satisfaction at Hospital in Jakarta
No ratings yet
Relationship Between Nurse Case Manager's Communication Skills and Patient Satisfaction at Hospital in Jakarta
6 pages
International Baccalaureate (IB) : Artificial Neural Networks - #3
No ratings yet
International Baccalaureate (IB) : Artificial Neural Networks - #3
13 pages
hw8 f08
No ratings yet
hw8 f08
2 pages
Q:-Explain The Probability and Nonprobability Sampling Techniques
No ratings yet
Q:-Explain The Probability and Nonprobability Sampling Techniques
3 pages
Reliability: Notes
No ratings yet
Reliability: Notes
10 pages
(Ebook PDF) Principles of Taxation For Business Investment Planning 2017 Download
No ratings yet
(Ebook PDF) Principles of Taxation For Business Investment Planning 2017 Download
130 pages
The Welfare State As Piggy Bank Information Risk Uncertainty and The Role of The State 1st Edition Barr Download
100% (3)
The Welfare State As Piggy Bank Information Risk Uncertainty and The Role of The State 1st Edition Barr Download
127 pages
(Ebook PDF) Disorders of Childhood: Development and Psychopathology 3rd Edition Download
No ratings yet
(Ebook PDF) Disorders of Childhood: Development and Psychopathology 3rd Edition Download
150 pages

Ensemble Learning (Autosaved)

Uploaded by

Ensemble Learning (Autosaved)

Uploaded by

Ensemble Learning

• Ensemble means group of prediction algorithms.

• Bagging (Bootstrap Aggregating) is an ensemble learning technique

• Reduces Variance: By averaging multiple predictions, bagging reduces

• Total of updated weight = 0.96

Parameters Binary classification Multi-class classification

Examples of binary Examples of multi-class

• Problems with Imbalanced dataset

F1 F2 F3 is a tuple belongs to C1 class. We have to create classifiers.

You might also like