Ensemble Learning
• Ensemble means group of prediction algorithms.
• It uses multiple predicting algorithms to obtain better predictive
performance. (e. g. Election – predict who will win)
• Ensemble learning is a machine learning technique that aggregates
two or more learners (e.g. regression models, neural networks) in
order to produce better predictions.
• In other words, an ensemble model combines several individual
models to produce more accurate predictions than a single model
alone.
• Accuracy is more but computational cost increases as we use multiple
prediction algos. But we get more accurate data.
Bagging
• Bagging (Bootstrap Aggregating) is an ensemble learning technique
designed to improve the accuracy and stability of machine learning
algorithms. It involves the following steps:
• Data Sampling: Creating multiple subsets of the training dataset using
bootstrap sampling (random sampling with replacement).
• Model Training: Training a separate model on each subset of the data.
• Aggregation: Combining the predictions from all individual models
(averaged for regression or majority voting for classification) to
produce the final output.
Key Benefits of bagging:
• Reduces Variance: By averaging multiple predictions, bagging reduces
the variance of the model and helps prevent overfitting.
• Improves Accuracy: Combining multiple models usually leads to
better performance than individual models.
• Example of Bagging Algorithms:
• Random Forests (an extension of bagging applied to decision trees)
Boosting
• What is Boosting?
• Boosting is another ensemble learning technique that focuses on
creating a strong model by combining several weak models. It involves
the following steps:
• Sequential Training: Training models sequentially, each one trying to
correct the errors made by the previous models.
• Weight Adjustment: Each instance in the training set is weighted.
Initially, all instances have equal weights. After each model is trained, the
weights of misclassified instances are increased so that the next model
focuses more on difficult cases.
• Model Combination: Combining the predictions from all models to
produce the final output, typically by weighted voting or weighted
averaging.
Key Benefits of boosting
• Reduces Bias: By focusing on hard-to-classify instances, boosting
reduces bias and improves the overall model accuracy.
• Produces Strong Predictors: Combining weak learners leads to a
strong predictive model.
• Example of Boosting Algorithms:
• AdaBoost
• Gradient Boosting Machines (GBM)
• XGBoost
• LightGBM
Adaboost algorithm
• Differences between bagging and boosting
• Random forest is bagging technique and Adaboost is a boosting
technique.
• Random forest is a parallel learning process whereas Adaboost is a
sequential learning process.
• In the random forest, the individual models or individual decision
trees are built from the main data parallelly and independently of
each other.
• In the random forest, multiple trees are built from the same data
parallelly and none of the trees is dependent on other trees. Hence
this process is called a parallel process.
• In the sequential process, one tree is dependent on the previous tree
which means if there are multiple models implemented say ML model
1, ML model 2, ML model 3, and so on.
• This process in which all the models are dependent on each other or
dependent on the previous model is called sequential learning
• There are multiple models fit and all these models combine to make a
bigger model or a master model.
• In the random forest, all the models are said to have equal weights in
the final model.
• In Adaboost, all the trees or all the models do not have equal weights
(vote).
• In the random forest, all the individual models are one fully grown
decision tree.
• In Adaboost, the trees are not fully grown. Rather the trees are just
one root and two leaves.
• Specifically, they are called stumps in the language of Adaboost.
Stumps are nothing but
one root node and two leaf nodes.
Stage 1 :
Train a decision tree. Partition decision tree into different stumps with one root node and
two leaf nodes.
Finalize one decision stump as model 1 M1.
Now we have to see performance of M1 on given dataset and get output y.
Calculate weight of M1
error=0.2+0.2=0.4
alpha1=1/2 log(1-error/error)
alpha1 = 0.20
x1 x2 y Y pred Weight
3 7 1 1 0.2
2 9 0 1 0.2
1 4 1 0 0.2
9 8 0 0 0.2
3 7 0 0 0.2
• Stage 2 :
• Now, we have to send errors of M1 to M2 by increasing weightage of
misclassified points and decreasing weight of correct points.
• For misclassified rows,
New-weight= current-weight * e^alpha1
= 0.2 * e^0.2 = 0.24
• For correctly classified rows,
New-weight= current-weight * e^-alpha1
= 0.2 * e^-0.2 = 0.16
x1 x2 y Y pred Weight Updated Normalize
weight d
3 7 1 1 0.2 0.16 0.166
2 9 0 1 0.2 0.24 0.25
1 4 1 0 0.2 0.24 0.25
9 8 0 0 0.2 0.16 0.166
3 7 0 0 0.2 0.16 0.166
• Total of updated weight = 0.96
x1 x2 y New weight range
3 7 1 0.166 0-0.166
2 9 0 0.25 0.166 – 0.416
1 4 1 0.25 0.416 – 0.666
9 8 0 0.166 0.666 – 0.832
3 7 0 0.166 0.832 – 1.0
• Generate 5 times random numbers between 0 and 1
• 0.13 1, 3, 3, 3, 4
• 0.43
• 0.62
• 0.50
• 0.8
• Here 3 has more weightage. This is called upsampling technique.
• Remove 2 and 5 rows. 1,3,3,3,4 will be new dataset which passes to M2.
• Repeat this process for next decision stumps.
• P=alpha1 h1(x) +alpha2 h2(x) ….
Binary vs Multiclass Classification
Parameters Binary classification Multi-class classification
There can be any number of
It is a classification of two
classes in it, i.e., classifies
No. of classes groups, i.e. classifies objects
the object into more than
in at most two classes.
two classes.
The most popular algorithms Popular algorithms that can
used by the binary be used for multi-class
classification are- classification include:
• Logistic Regression •k-Nearest Neighbors
Algorithms used
•k-Nearest Neighbors •Decision Trees
•Decision Trees •Naive Bayes
•Support Vector Machine •Random Forest.
•Naive Bayes •Gradient Boosting
Examples of binary Examples of multi-class
classification include- classification include:
•Email spam detection •Face classification.
Examples
Balanced and imbalanced multiclass classification
problems
• In a balanced dataset, the number of Positive and Negative labels is
about equal.
• In Imbalanced dataset, there is high difference between positive and
negative values.
• Problems with Imbalanced dataset
• Standard machine learning techniques such as decision tree, logistic
regression have a bias towards a majority class, and they tend to
ignore a minority class.
• E.g. fruit dataset with class labels apples, oranges, bananas.If majority
of data points belong to apples, then dataset is said to be skewed and
imbalanced.
Techniques to solve the class
imbalance problem
• 1. Use the right evaluation metrics
• Confusion matrix :A table showing correct predictions and type of
incorrect predictions.
• With the help of confusion matrix , we can calculate Precision, Recall,
F1 score. Rather than concentrating on accuracy, we can calculate
these three matrices i.e. Precision, Recall and F1 score, on the basis of
these evaluation matrix, we will know the model is performing well or
not.
2. Over sampling (Up sampling)
• Oversampling increases number of minority class members in the
training set.
• Advantage is that no information from original training set is lost, as
all observations from the minority and majority classes are kept.
• On the other hand, It prone to overfitting. (perform well on training
data but not on testing data)
3. Undersampling (Down
sampling)
• It reduces the number of majority samples to balance the class
distribution.
• But it might discard useful information.
4. Ensemble learning technique
• It mainly combines the output of multiple base learners.
• There are many approaches in ensemble learning like bagging,
boosting, stacking etc.
Variants of multiclass classification
• One-vs-One and One –vs-All
Features Class C1 C2 C3
F1 F2 F3 C1 +1 -1 -1
F4 F11 F9 C2 -1 +1 -1
F8 F7 F6 C3 -1 -1 +1
F12 F13 F14 C1 +1 -1 -1
F15 F16 F17 C2 -1 +1 -1
F18 F19 F20 C3 -1 -1 +1
F1 F2 F3 is a tuple belongs to C1 class. We have to create classifiers.
Training dataset passes to algorithm which generated a model or classifier.
Number of Classifiers generated equal to number of classes in dataset in one vs all.
T -> Algo -> M
In this example it will require 3 training sets, here C1, C2, C3 are 3 training datasets.
Tuple which belong to C1 will be +1 and others will be -1.Same for C2 and C3.
• C1 will generate M1, C2 generate C2 and C3 generate M3
• E.g Fa1 Fa2 Fa3 is a test tuple will give to M1, M2, M3