0% found this document useful (0 votes)

41 views122 pages

Chapter III - Supervised and Unsupervised Algorithms

Uploaded by

khmanal49

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views122 pages

Chapter III - Supervised and Unsupervised Algorithms

Uploaded by

khmanal49

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 122

Departement : Mathematics & Computer Science

Master of DPEIC – First year

Semester 2

OR & Artificial Intelligence

Chapter III - Supervised Machine Learning

Pr. Soufiane HAMIDA 1

Supervised ML Algorithms
How Supervised Learning Works?
• In supervised learning, models are trained using labelled dataset, where the model learns
about each type of data. Once the training process is completed, the model is tested on the
basis of test data (a subset of the training set), and then it predicts the output.

• The working of Supervised learning can be easily understood by the below example and
diagram:

Pr. Soufiane HAMIDA 6

Steps Involved in Supervised Learning
• First Determine the type of training dataset

• Collect/Gather the labelled training data.

• Split the training dataset into training dataset, test dataset, and validation dataset.

• Determine the input features of the training dataset, which should have enough knowledge
so that the model can accurately predict the output.

• Determine the suitable algorithm for the model, such as support vector machine, decision
tree, etc.

• Execute the algorithm on the training dataset. Sometimes we need validation sets as the
control parameters, which are the subset of training datasets.

• Evaluate the accuracy of the model by providing the test set. If the model predicts the correct
output, which means our model is accurate.
Pr. Soufiane HAMIDA 7
Key Concepts

To master supervised learning, you absolutely must understand

and know the following 4 concepts:

1. The Dataset

2. The learning algorithm

3. The Model and its parameters

4. The Cost Function

Pr. Soufiane HAMIDA 8

Steps Involved in Supervised Learning

1) The Dataset

We talk about supervised learning when we provide a machine with

many examples (࢞, ࢟) in order to make it learn the relationship that
connects ࢞ to ࢟.

Pr. Soufiane HAMIDA 9

Steps Involved in Supervised Learning

1) The Dataset

• The variable ࢟ is called Target. This is the

value we are trying to predict.

• The variable ࢞ is called Feature. A Feature

influences the value of ࢟, and we generally
have a lot of Features (࢞૚, ࢞૛, …) in our
Dataset which we group together in a
matrix ࢄ.

Example: a Dataset brings together examples of

apartments with their price ࢟ as well as some of
their characteristics (Features).

Pr. Soufiane HAMIDA 11

Steps Involved in Supervised Learning

2) The learning algorithm

• The main objective in Supervised Learning is to find the model parameters that
minimize the Cost Function. To do this, we use a learning algorithm, the most
common example being the Gradient Descent algorithm,

Pr. Soufiane HAMIDA 12

Steps Involved in Supervised Learning
3) The Model and its parameters
• The development of a model from Dataset. It can be a linear model or a non-linear
model like you.

• We define ࢇ, ࢈, ࢉ, etc. as the parameters of a model.

Pr. Soufiane HAMIDA 13
Steps Involved in Supervised Learning
4) The Cost Function

A model can produce errors when making

predictions compared to the actual values in our
dataset. These errors are a measure of how well
the model is performing — a lower error indicates
a better fit to the data.

The method by which we aggregate these errors

to measure the overall performance of the model is
known as the Cost Function or Loss Function.

Pr. Soufiane HAMIDA 14

Steps Involved in Supervised Learning
4) The Cost Function

• A 'good' model is generally characterized by

its ability to make accurate predictions on
new, previously unseen data.

• The smaller the value returned by the Cost

Function, the smaller the differences
between the predicted and actual values,
indicating a better performing model.

Pr. Soufiane HAMIDA 15

Types of Supervised ML Algorithms

• Supervised learning can be further divided into two types of problems:

Pr. Soufiane HAMIDA 16

Regression vs. Classification in ML

Pr. Soufiane HAMIDA 18

Recap

Regression Algorithm Classification Algorithm

In Regression, the output variable must be of In Classification, the output variable must be a
continuous nature or real value. discrete value.
The task of the regression algorithm is to map the
The task of the classification algorithm is to map the
input value (x) with the continuous output
input value(x) with the discrete output variable(y).
variable(y).
Regression Algorithms are used with continuous
Classification Algorithms are used with discrete data.
data.
In Regression, we try to find the best fit line, which
In Classification, we try to find the decision boundary,
can predict the output more accurately. which can divide the dataset into different classes.
Classification Algorithms can be used to solve
Regression algorithms can be used to solve the
classification problems such as Identification of spam
regression problems such as Weather Prediction,
emails, Speech Recognition, Identification of cancer
House price prediction, etc.
cells, etc.
The regression Algorithm can be further divided into The Classification algorithms can be divided into
Linear and Non-linear Regression. Binary Classifier and Multi-class Classifier.

Pr. Soufiane HAMIDA 19

Choosing the most appropriate algorithm

1. Problem Nature: Classification or Regression

2. Data Characteristics: Size of the Dataset, Feature Types, Feature

Dimensionality, Data Quality, …

3. Model Complexity and Interpretability: Complexity, Interpretability,

4. Experience and Domain Knowledge: Previous Successes and Expertise,

5. Model Updates and Scalability: Static vs. Dynamic Data, Scalability, ..

Pr. Soufiane HAMIDA 20

Performance Evaluation
Generalization and overfitting
Main challenge of Supervised learning:

• It is relatively easy to train a model that “works” well (low prediction error) on
the training data. Extreme example: learning “by rote”

• Generalization: ability of the model to make good predictions on data whose

label is unknown.

• Overfitting: when performance is better on learning data than on new data.

30/03/2022 22
Over-fitting et Under-fitting
1. Over-fitting - Example
• Over-fitting occurs when the model gets so close to the function that it
pays too much attention to noise. The model learns the relationship
between entities and labels in so much detail and picks up the noise.

23
Over-fitting et Under-fitting
2. Under-fitting - Example
• Under-fitting is the opposite of over-fitting. This is when the model
does not approximate the function well enough and is therefore unable
to capture the underlying trend of the data.

24
Over-fitting et Under-fitting

25
Training and test set

30/03/2022 27
Cross validation

• To use all the data for training and validation

• To obtain an average performance

• We separate the data set into K blocks (folds)

• In practice, K=5 or K=10 most often (balance between the number of

experiments and the size of each training set)

We use each of the blocks in turn as a validation set and the union of the others
as a training set.

30/03/2022 28
Cross validation

30/03/2022 29
Cross validation

30/03/2022 30
Model Selection: Validation Set
How to determine the best model among those learned:

- with different learning algorithms;

- with different hyperparameter(s) values for the same algorithm?

• Idea: Select the one with the best performance on the test set.

• Problem: we can no longer determine the generalization error because test

data has already been used.

- We separate the data into 3 sets: learning, validation and test.

30/03/2022 31
Model Selection: Cross-Validation

30/03/2022 32
Model Selection: Cross-Validation

30/03/2022 33
Hyper-parameters Tuning

GridSearchCV systematically works through multiple combinations of

parameter tunes, cross-validating as it goes to determine which tune gives the
best performance. It's thorough but can be slow for large datasets and many
parameters.

RandomSearchCV samples a fixed number of parameter settings from specified

distributions. This approach can be faster and more efficient, especially when
dealing with a large hyper-parameter space, as it doesn't try every combination
but selects at random to sample a wide range of values.

30/03/2022 34
Hyper-parameters Tuning

30/03/2022 35
Hyper-parameters Tuning

30/03/2022 36
Hyper-parameters Tuning - Example

30/03/2022 37
Hyper-parameters Tuning - Example

30/03/2022 38
Evaluation of a Classification model: Confusion Matrix

• The confusion matrix is a matrix used to determine the performance of

the classification models for a given set of test data. It can only be
determined if the true values for test data are known.

• The matrix itself can be easily understood, but the related terminologies
may be confusing. Since it shows the errors in the model performance in
the form of a matrix, hence also known as an error matrix.

30/03/2022 39
Confusion Matrix in Machine Learning

Some features of Confusion matrix are given below:

• For the 2 prediction classes of classifiers, the matrix is of 2*2 table, for 3 classes, it is
3*3 table, and so on.

• The matrix is divided into two dimensions, that are predicted values and actual
values along with the total number of predictions.

• Predicted values are those values, which are predicted by the model, and actual
values are the true values for the given observations.

30/03/2022 40
Confusion Matrix in Machine Learning

• It looks like the below table:

30/03/2022 41
Confusion Matrix in Machine Learning
• It looks like the below table:

30/03/2022 42
Confusion Matrix in Machine Learning
From the previous example, we can conclude that:

• The table is given for the two-class classifier, which has two predictions "Yes"
and "NO." Here, Yes defines that patient has the disease, and No defines that
patient does not has that disease.

• The classifier has made a total of 100 predictions. Out of 100 predictions, 89
are true predictions, and 11 are incorrect predictions.

• The model has given prediction "yes" for 32 times, and "No" for 68 times.
Whereas the actual "Yes" was 27, and actual "No" was 73 times.

30/03/2022 43
Multi-class classification : Confusion Matrix

Classe prédite

Classe réelle

Classe réelle
Classe prédite

Binary classification problem Multiclass classification problem

30/03/2022 44
• Introduction
Calculations using Confusion Matrix
We can perform various calculations for the model, such as the model's
accuracy, using this matrix. These calculations are given below:

TP
Sensitivity=
TP + FN
TP + TN
Accuracy=
TP + TN + FP + FN

TP
Precision= TN
TP + FP Specificity=
TN + FP

30/03/2022 S.HAMIDA 45
ROC Curve

ROC Curve: The ROC is a graph displaying

a classifier's performance for all possible
thresholds. The graph is plotted between
the true positive rate (on the Y-axis) and the
false Positive rate (on the x-axis).

30/03/2022 46
Evaluation of a regression model

30/03/2022 47
Some ML Algorithms
ML Algorithms

Pr. Soufiane HAMIDA 49

Regression solutions

Types of Regression Algorithm:

1. Simple Linear Regression
2. Multiple Linear Regression
3. Polynomial Regression
4. K-Nearest Neighbors Regression
5. Decision Tree Regression
6. Random Forest Regression
7. ANN
8. …..

Pr. Soufiane HAMIDA 51

Classification solutions

Classification Algorithms can be further divided into the following types:

1. K-Nearest Neighbors (KNN)
2. Decision Tree
3. Random Forest
4. Support Vector Machines (SVM)
5. Artificial Neural Networks
6. Logistic Regression (LR)
7. Naïve Bayes
8. ….

Pr. Soufiane HAMIDA 52

K-Nearest Neighbors Algorithm (KNN)

Pr. Soufiane HAMIDA 53

K-Nearest Neighbors Algorithm (KNN)

K-NN (K-NEAREST NEIGHBORS) algorithm is one of the simplest

classification algorithms and it is used to identify data points that are
separated into multiple classes in order to predict the classification of a new
data point. 'sample.
K-NN is a non-parametric and lazy learning algorithm. It classifies new
cases based on a similarity measure (i.e. distance functions).

Pr. Soufiane HAMIDA 54

K-Nearest Neighbors Algorithm (KNN)

Pr. Soufiane HAMIDA 55

KNN Algorithm - Example
Input data:
A dataset D.
A distance definition function d.
An integer K
For a new observation X for which we want to predict its output variable y Do:
1. Calculate all the distances of this observation X with the other observations in
the dataset D
2. Retain the K observations from the dataset D closest to X using the distance
calculation function d
3. Take the values of y of the K observations retained:
1. If we perform a regression, calculate the mean (or median) of y retained
2. If we carry out a classification, calculate the mode of retention
4. Return the value calculated in step 3 as the value that was predicted by K-NN
for observation X.
End Algorithm
Pr. Soufiane HAMIDA 57
K-Nearest Neighbors Algorithm (KNN)

To predict category label ‫ ݕ‬of a new point ࢞ (classification):

• Find k nearest neighbors (according to some distance metric)
• Assign the majority label to the new point
To predict numeric value ‫ ݕ‬of a new point ࢞ (regression):
• Find k nearest neighbors
• “Average” the values associated with the neighbors

Supervised Learning algorithm used for classification. It is particularly
useful for text classification problems.

• The naive Bayes classifier is based on Bayes' theorem. The latter is a classic of
probability theory. This theorem is based on conditional probabilities.

Pr. Soufiane HAMIDA 93

Naive Bayes algorithm

Conditionelles probabilites:
• What is the probability of an event produced?
• Know that someone other event has already happened.
Pr. Soufiane HAMIDA 94
Naive Bayes algorithm - Example

Pr. Soufiane HAMIDA 95

Naive Bayes algorithm - Example

Pr. Soufiane HAMIDA 96

Naive Bayes algorithm - Example

Pr. Soufiane HAMIDA 97

Naive Bayes algorithm - Example

NO
Pr. Soufiane HAMIDA 98
Naive Bayes algorithm - USE CASES

The naive bayes classifier can be applied in various scenarios, one of the
classic use cases for this learning model is the classification of documents. It
involves determining whether a document corresponds to certain categories
or not. It’s used for:
• Spam filtering.
• Sentiment analysis.
• Recommendation systems.

Pr. Soufiane HAMIDA 99

Pr. Soufiane HAMIDA 100

Unsupervised Machine Learning
What is Unsupervised Learning?

• As the name suggests, unsupervised learning is a machine learning technique in

which models are not supervised using training dataset. Instead, models itself
find the hidden patterns and insights from the given data. It can be compared to
learning which takes place in the human brain while learning new things.

• Unsupervised learning cannot be directly applied to a regression or classification

problem because unlike supervised learning, we have the input data but no
corresponding output data. The goal of unsupervised learning is to find the
underlying structure of dataset, group that data according to similarities, and
represent that dataset in a compressed format.

Pr. Soufiane HAMIDA 102

Example - Unsupervised Learning

• Suppose the unsupervised learning algorithm is given an

input dataset containing images of different types of cats and
dogs. The algorithm is never trained upon the given dataset,
which means it does not have any idea about the features of
the dataset. The task of the unsupervised learning algorithm
is to identify the image features on their own. Unsupervised
learning algorithm will perform this task by clustering the
image dataset into the groups according to similarities
between images.

Pr. Soufiane HAMIDA 103

Why use Unsupervised Learning?

Below are some main reasons which describe the importance of Unsupervised Learning:

• Unsupervised learning is helpful for finding useful insights from the data.

• Unsupervised learning is much similar as a human learns to think by their own

experiences, which makes it closer to the real AI.

• Unsupervised learning works on unlabeled and uncategorized data which make

unsupervised learning more important.

• In real-world, we do not always have input data with the corresponding output so
to solve such cases, we need unsupervised learning.

Pr. Soufiane HAMIDA 104

Working of Unsupervised Learning
Working of unsupervised learning can be understood by the below diagram:

Pr. Soufiane HAMIDA 105

Types of Unsupervised Learning Algorithm

Below is the list of some popular unsupervised learning algorithms:

• K-means clustering
• Hierarchal clustering
• Anomaly detection
• Independent Component Analysis
• Apriori algorithm

Pr. Soufiane HAMIDA 106

Advantages of Unsupervised Learning

• Unsupervised learning is used for more complex tasks as compared to

supervised learning because, in unsupervised learning, we don't have labeled
input data.

• Unsupervised learning is preferable as it is easy to get unlabeled data in

comparison to labeled data.

Pr. Soufiane HAMIDA 107

Disadvantages of Unsupervised Learning

• Unsupervised learning is intrinsically more difficult than supervised learning as

it does not have corresponding output.

• The result of the unsupervised learning algorithm might be less accurate as

input data is not labeled, and algorithms do not know the exact output in
advance.

Pr. Soufiane HAMIDA 108

K-Means Clustering Algorithm
K-Means Clustering Algorithm

• K-Means Clustering is an unsupervised learning algorithm that is used to solve

the clustering problems in machine learning or data science.

• K-Means Clustering is an Unsupervised Learning algorithm, which groups the

unlabeled dataset into different clusters. Here K defines the number of pre-
defined clusters that need to be created in the process, as if K=2, there will be
two clusters, and for K=3, there will be three clusters, and so on.

Pr. Soufiane HAMIDA 110

K-Means Clustering Algorithm

• It allows us to cluster the data into different groups and a convenient way to
discover the categories of groups in the unlabeled dataset on its own without the
need for any training.

• It is a centroid-based algorithm, where each cluster is associated with a centroid.

The main aim of this algorithm is to minimize the sum of distances between the
data point and their corresponding clusters.

Pr. Soufiane HAMIDA 111

K-Means Clustering Algorithm

• The algorithm takes the unlabeled dataset as input, divides the dataset into k-number
of clusters, and repeats the process until it does not find the best clusters. The value of
k should be predetermined in this algorithm.

• The k-means clustering algorithm mainly performs two tasks:

1. Determines the best value for K center points or centroids by an iterative process.

2. Assigns each data point to its closest k-center. Those data points which are near to
the particular k-center, create a cluster.

• Hence each cluster has datapoints with some commonalities, and it is away from other
clusters.
Pr. Soufiane HAMIDA 112
K-Means Clustering Algorithm