Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
4 views4 pages

Module 4 Supervised Learning

Uploaded by

vishal.patil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views4 pages

Module 4 Supervised Learning

Uploaded by

vishal.patil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Module 4 Supervised Learning

What is Ensemble Learning with example?


• Ensemble learning refers to a machine learning approach in which the predictions from multiple
models are merged to enhance the accuracy of the ultimate forecast.
 E.g., Suppose you are a movie director and you have created a short movie on a very
important and interesting topic. Now, you want to take preliminary feedback (ratings) on the
movie before making it public. What are the possible ways by which you can do that?
o A: You may ask one of your friends to rate the movie for you.
o B: Another way could be by asking 5 colleagues of yours to rate the movie.
o C: How about asking 50 people to rate the movie?

Simple Ensemble Techniques-


1) Max Voting
2) Averaging
3) Weighted Averaging

a) Max Voting
i) The max voting method is generally used for classification problems.
ii) In this technique, multiple models are used to make predictions for each data point.
iii) The predictions by each model are considered as a ‘vote’.
iv) The predictions which we get from the majority of the models are used as the final
prediction.

b) Averaging
i) Similar to the max voting technique, multiple predictions are made for each data point
in averaging.
ii) In this method, we take an average of predictions from all the models and use it to
make the final prediction.
iii) Averaging can be used for making predictions in regression problems.

c) Weighted Averaging
i) This is an extension of the averaging method.
ii) All models are assigned different weights defining the importance of each model for
prediction.

Advanced Ensemble techniques-


1. Bagging
2. Boosting
3. Stacking
a. Bagging
i. The idea behind bagging is combining the results of multiple models.
ii. you create all the models on the same set of data and combine it, there is a
high chance that these models will give the same result since they are
getting the same input.
iii. Bootstrapping is a sampling technique in which we create subsets of
observations from the original dataset, with replacement.
iv. Multiple subsets are created from the original dataset, selecting
observations with replacement.
v. A base model (weak model) is created on each of these subsets.
vi. The models run in parallel and are independent of each other.
vii. The final predictions are determined by combining the predictions from all
the models.
b. Boosting
i. Boosting is a sequential process, where each subsequent model attempts to
correct the errors of the previous model.
ii. The succeeding models are dependent on the previous model.
 Algorithm
i. A subset is created from the original dataset.
ii. Initially, all data points are given equal weights.
iii. A base model is created on this subset.
iv. This model is used to make predictions on the whole dataset.
v. Errors are calculated using the actual values and predicted values.
vi. The observations which are incorrectly predicted, are given higher
weights.
vii. Another model is created, and predictions are made on the dataset.
viii. Similarly, multiple models are created, each correcting the errors of the
previous model.
ix. The final model (strong learner) is the weighted mean of all the models
(weak learners).

c. Stacking
i. Stacking is an ensemble learning technique that uses predictions from
multiple models (for example decision tree, knn or svm) to build a new
model.
ii. This model is used for making predictions on the test set.
iii. The train set is split into 10 parts.
iv. A base model (suppose a decision tree) is fitted on 9 parts and predictions
are made for the 10th part. This is done for each part of the train set.
v. The base model (in this case, decision tree) is then fitted on the whole train
dataset.
vi. Using this model, predictions are made on the test set.
vii. Steps 2 to 4 are repeated for another base model (say knn) resulting in
another set of predictions for the train set and test set.
viii. The predictions from the train set are used as features to build a new
model.
ix. This model is used to make final predictions on the test prediction set.

Evaluating a ML model-

• How well is my model doing? Is it a useful model?


• Will training my model on more data improve its performance?
• Do I need to include more features?
Metrics-

• Classification metrics
o When performing classification predictions, there's four types of outcomes that could
occur.
 True positives are when you predict an observation belongs to a class and it does belong to that
class.

 True negatives are when you predict an observation does not belong to a class and it does not
belong to that class.

 False positives occur when you predict an observation belongs to a class when it does not.

 False negatives occur when you predict an observation does not belong to a class when in fact it
does.

Accuracy-

• The most used metric to judge a model and is not a clear indicator of the performance. The worst
happens when classes are imbalanced.
TP + TN
-----------------------
TP + FP + TN +FN
Precision-

• Percentage of positive instances out of the total predicted positive instances.


• Here denominator is the model prediction done as positive from the whole given dataset.

TP
-------------
TP + FP
Recall / Sensitivity / True Positive Rate-

• Percentage of positive instances out of the total actual positive instances.


• Therefore denominator (TP + FN) here is the actual number of positive instances present in the
dataset. TP
-------------
TP + FN
Specificity-

• Percentage of negative instances out of the total actual negative instances.


• Therefore denominator (TN + FP) here is the actual number of negative instances present in the
dataset. It is like recall, but the shift is on the negative instances.
TN
-------------
F1 score- TN + FN
• It is the harmonic means of precision and recall.
• This takes the contribution of both, so the higher the F1 score, the better.
• product in the numerator if one goes low, the final F1 score goes down significantly.

2 X Precision X Recall
-------------------------------
Precision + Recall
ROC curve-
• ROC stands for Receiver Operating Characteristic and the graph is plotted against TPR and FPR for
various threshold values.
• As TPR increases FPR also increases.
• We have four categories, and we want the threshold value that leads us closer to the top left corner.

Regression metrics-
• Evaluation metrics for regression models are quite different than the above metrics.
• It is only concerned with whether a prediction was correct or incorrect.
o Explained variance: - Explained variance compares the variance within the expected
outcomes and compares that to the variance in the error of our model. This metric essentially
represents the amount of variation in the original dataset that our model can explain.
o Mean squared error: - Mean squared error is simply defined as the average of squared
differences between the predicted output and the true output. Squared error is commonly used
because it is agnostic to whether the prediction was too high or too low, it just reports that the
prediction was incorrect.
o R2 coefficient represents the proportion of variance in the outcome that our model can
predict based on its features.

Bias vs Variance-
• In general, a machine learning model analyses the data, find patterns in it and make predictions.
• While training, the model learns these patterns in the dataset and applies them to test data for
prediction.
• While making predictions, a difference occurs between prediction values made by the model and
actual values/expected values, and this difference is known as bias errors or Errors due to bias.

 Low Bias: A low bias model will make fewer assumptions about the form of the target function.
 High Bias: A model with a high bias makes more assumptions, and the model becomes unable to
capture the important features of our dataset. A high bias model also cannot perform well on new
data.
• variance tells how much a random variable is different from its expected value.
• a model should not vary too much from one training dataset to another, which means the algorithm
should be good in understanding the hidden mapping between inputs and output variables.
• Variance errors are either of low variance or high variance.
 Low variance means there is a small variation in the prediction of the target function with
changes in the training data set.
 High variance shows a large variation in the prediction of the target function with changes in the
training dataset.

Different Combinations of Bias-Variance-


• Low-Bias,Low-Variance:
The combination of low bias and low variance shows an ideal machine learning model. However, it
is not possible practically.
• Low-Bias, High-Variance: With low bias and high variance, model predictions are inconsistent and
accurate on average. This case occurs when the model learns with many parameters and hence leads
to overfitting.
• High-Bias, Low-Variance: With High bias and low variance, predictions are consistent but inaccurate
on average. This case occurs when a model does not learn well with the training dataset or uses few
numbers of the parameter. It leads to underfitting problems in the model.
• High-Bias,High-Variance:
With high bias and high variance, predictions are inconsistent and inaccurate on average.
 In summary, a model with high bias is limited from learning the true trend and underfits
the data.
 A model with high variance learns too much from the training data and overfits the data.
 The best model sits somewhere in the middle of the two extremes.

You might also like