Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
3 views87 pages

Lec-7 Intro Machine Learning

Lec-7_Intro_Machine_Learning

Uploaded by

Nure Hafsa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views87 pages

Lec-7 Intro Machine Learning

Lec-7_Intro_Machine_Learning

Uploaded by

Nure Hafsa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 87

Machine Learning (ML)

Theory and Concepts


• American pioneer in the field of computer
Arthur Lee Samuel gaming and artificial intelligence

(December 5, 1901 – July • Two early game-playing programs, Samuel


Checkers (self-learning) and TD-Gammon, led
to breakthroughs in artificial intelligence
29, 1990) • MIT Graduate, Bell Labs, IBM, Stanford Prof.

2
Topics (Edit Later)
• What is Machine Learning?
• What is Deep Learning?
• Difference between Supervised and Unsupervised Learning
• Supervised Learning Process
• Evaluating performance
• Overfitting

3
Machine Learning (ML)

Artificial
• Subset/branch/subfield of Artificial Intelligence
Intelligence (AI)
• “Learning machines to imitate
Machine
human intelligence” Learning
• “Focuses on the using data and
algorithms to enable AI to imitate the
way that humans learn, gradually
improving its accuracy.”-IBM
Deep
• Allows computers to learn without Learning
explicit programming

4
Traditional vs ML
Programing

5
ML Related Fields data
mining control theory

statistics
decision theory
information theory machine
learning
cognitive science
databases
psychological models
Machine learning is primarily
concerned with the accuracy and evolutionary neuroscience
models
effectiveness of the computer
system.

6
• Fraud detection. • Network intrusion
ML • Web search results. detection.
Application • Real-time ads on web
• Recommendation
Engines
pages
s • Credit scoring. • Customer Segmentation

• Prediction of equipment • Text Sentiment Analysis


failures. • Customer Churn
• New pricing models. • Pattern and image
recognition.
• Email spam filtering.

7
Types of ML

Semi-
Supervised Unsupervised Reinforcement
Supervised
Machine Machine Machine
Machine
Learning Learning Learning
Learning

8
Supervised
Machine
Learning
• Model gets trained on a
“Labelled Dataset”
• High accuracy as they are
trained on labelled data
• Can be time-consuming
and costly as it relies on
labeled data only
• Two main categories
• Classification
• Regression
Labeled: America
Labelled: British 9
Unsupervised
Machine Learning

• Algorithm discovers patterns and


relationships using unlabeled data
• Discover hidden patterns, similarities,
or clusters within the data
• Without using labels, it may be difficult
to predict the quality of the model’s
output
• Two main categories
• Clustering
• Association

10
Semi- • Works between the supervised and unsupervised
Supervised learning so it uses both labelled and unlabelled data
Machine • Useful when obtaining labeled data is costly, time-
consuming, or resource-intensive
Learning

Example: Image and Speech


11
Analysis
Algorithms

Supervised Unsupervised
• Classification • Clustering
• Logistic Regression • K-Means Clustering algorithm
• Support Vector Machine • Mean-shift algorithm
• Random Forest • DBSCAN Algorithm
• Decision Tree • Principal Component Analysis
• K-Nearest Neighbors (KNN)
• Independent Component Analysis
• Naive Bayes
• Regression • Association
• Linear Regression • Apriori Algorithm
• Polynomial Regression • Eclat
• Ridge Regression • FP-growth Algorithm
• Lasso Regression
• Decision tree
• Random Forest

12
Reinforcement Machine
Learning
• Interacts with the environment by
producing actions and discovering
errors.
• Trial, error, and delay are the most
relevant characteristics
• Popular Algorithm
• Q-learning
• SARSA (State-Action-Reward-State-
Action)
• Deep Q-learning

13
14
ML Applications

15
Photo Tail length neck Has Is
No (cm) length horn? Giraffe?
(cm)
1 5 8 Yes Yes
2 2 3 No No
3 1 2 No No
4 0 2 No No

• Measurable property or unique characteristic of a


phenomenon
• Except labelled all other are feature data (may take part
Feature of)
• Labelled data (Output is given)
• Unlabelled data (Output is not given)
16
Supervised Learning
Pizza Pizza
Problem
size (in prince (in • Pizza price prediction
inch) taka) (regression problem)
• Finding whether the animal is
6 399 Giraffe (classification problem)
• Train the machine using data
9 699 and then do predict based on
the learning
12 1000

17
CRITERIA L AB E L E D D ATA U N L AB E L E D D ATA

Definition Data with both input features and corresponding output labels Data with only input features and no output labels

Usage Primarily used in supervised learning Primarily used in unsupervised learning

Find patterns, groupings, or structures without


Application Train models to predict or classify based on input data
predefined labels

Annotation Annotated with correct answers No annotations or labels

Example Images labeled with categories like "cat," "dog" Images without any category labels

More expensive and time-consuming due to the need for manual


Cost and Effort Easier and cheaper to collect
annotation

Supervised Learning Essential for training models Not used directly in training models

Unsupervised
Not applicable Essential for discovering patterns and structures
Learning

Helps in uncovering hidden patterns and


Importance Helps in learning the relationship between input and output
relationships

18
Supervised Machine Learning Process
(1)

Test
Data

Model
Data Data Model Model
Training &
Acquisition Cleaning Testing Deployment
Building

19
Supervised Machine Learning Process
(2)
• Get your data! Customers, Sensors, etc...

Data
Acquisition

20
Supervised Machine Learning Process
(3)
• Clean and format your data (using Pandas)

Data Data
Acquisition Cleaning

21
Supervised Machine Learning Process
(4)

Test
Data

Model
Data Data Training &
Acquisition Cleaning Building

22
Supervised Machine Learning Process
(5)

Test
Data

Model
Data Data Model
Training &
Acquisition Cleaning Testing
Building

23
Supervised Machine Learning Process
(6)

Test
Data

Model
Data Data Model
Training &
Acquisition Cleaning Testing
Building

Adjust
Model
Parameters

24
Model parameters

• Internal to the model


• Essential for making prediction
• Dependent on dataset
• Example: weight/coefficient of independent
variables in linear regression

Parameters Hyperparameters

• Essential for optimizing model performance


• External for model
• Set manually by ML engineer
• Not dependent on dataset
• Example: kernel and slack in SVM, value of k
in KNN, depth of tree in decision tree

25
Supervised Machine Learning Process
(7)

Test
Data

Model
Data Data Model Model
Training &
Acquisition Cleaning Testing Deployment
Building

26
ML Data Sources

27
Popular Data Sources

28
Why Data Cleaning/Preprocessing?
• Data in the real world is dirty
• incomplete: lacking attribute values, lacking certain attributes of
interest, or containing only aggregate data
• noisy: containing errors or outliers
• inconsistent: containing discrepancies in codes or names
• No quality data, no quality mining results!
• Quality decisions must be based on quality data
• Data warehouse needs consistent integration of quality data

Cleaning, preprocessing, preparing


Data is an important task to be able to
Develop effective ML frameworks

29
Data Reduction Strategies
• Data reduction: Obtain a reduced representation
of the data set that is much smaller in volume. But
produces the same (or almost the same) analytical
results
• Why data reduction?
• A database/data warehouse may store
terabytes of data. Complex data analysis may
take a very long time to run on the complete
data set.
• Data reduction strategies
• Dimensionality reduction, e.g., remove
unimportant attributes
• Principal Components Analysis (PCA)
• Feature subset selection, feature
creation/extraction
• Compression, Sampling, Aggregation,
Filtering, Transformation, …
30
Data Splitting
• Training Data
• Used to train model parameters
• Validation Data
• Used to determine what model hyperparameters to adjust
• Test Data
• Used to get some final performance metric

31
Cross-
validation
• Rahim’s exam
preparation
• Questions covered
from know/unknown
chapters
• Result: good/bad
• Is he a good/bad
student?

32
K-Fold Cross Validation
• Divide the dataset into K chunks (i.e., folds) and train K times, using
a different fold for each time.
• E.g., Assume K=5

Dataset Fold 1 Fold 2 Fold 3 Fold 4 Fold 5

1 Test Train Train Train Train


2 Train Test Train Train Train
3 Train Train Test Train Train
4 Train Train Train Test Train
5 Train Train Train Train Test

33
K-Fold
• Final Model Evaluation: (S1 + S2 + S3 + S4 + S5)
Cross /5
Validation
Iteration (1 to Training Set Test Performance Score
K) Set
1 D1, D2, D3, D4, D1 S1
D5
2 D1, D3, D4, D5 D2 S2
3 D1, D2, D4, D5 D3 S3
4 D1, D2, D3, D5 D4 S4
5 D1, D2, D3, D4 D5 S5
34
Overfitting and Underfitting
• Underfitting: Poor performance on the training data and poor
generalization to other (unseen) data
• Overfitting: Good performance on training data but poor
generalization to other (unseen) data (memorizing!!)

35
Performance Metrics/Model Evaluation
Key Classification Problem • Clustering
• Confusion Matrix (not a metric)
• Elbow Method (not a
• Accuracy performance metric but used to
• Precision find optimal number of K
• Recall cluster)
• F1-Score

Regression Problem

• Mean Absolute Error (MAE)


• Mean Squared Error (MSE)
• R-squared (R²) Value
• Root Mean Squared Error (RMSE)

36
Confusion Matrix (not a
metric)

• It is also called as error matrix


• It is an N x N matrix, where N is the
number of target labels (classes)
• It shows the number of correct and
incorrect predictions made by the classifier
compared to the actual outcomes (target
labels) in the actual data
• E.g., binary classification problem (e.g., two
classes 0|1 or T|F)

37
Evaluating Performance
REGRESSION
Model Evaluation

● But first, we should understand the reasoning


behind these metrics and how they will actually
work in the real world!
Model Evaluation

● Typically in any classification task your model


can only achieve two results:
○ Either your model was correct in its
prediction.
○ Or your model was incorrect in its prediction.
Model Evaluation

● Fortunately incorrect vs correct expands to


situations where you have multiple classes.
● For the purposes of explaining the metrics, let’s
imagine a binary classification situation, where
we only have two available classes.
Model Evaluation

● In our example, we will attempt to predict if an


image is a dog or a cat.
● Since this is supervised learning, we will first
fit/train a model on training data, then test the
model on testing data.
● Once we have the model’s predictions from the
X_test data, we compare it to the true y values
(the correct labels).
Model Evaluation

TRAINED
MODEL
Model Evaluation

TRAINED
Test Image
from X_test
MODEL
Model Evaluation

TRAINED
Test Image
from X_test
MODEL

DOG
Correct Label
from y_test
Model Evaluation

TRAINED
Test Image DOG
from X_test
MODEL
Prediction on
Test Image
DOG
Correct Label
from y_test
Model Evaluation

TRAINED
Test Image DOG
from X_test
MODEL
Prediction on
Test Image
DOG
Correct Label DOG == DOG ?
from y_test
Compare Prediction to Correct Label
Model Evaluation

TRAINED
Test Image CAT
from X_test
MODEL
Prediction on
Test Image
DOG
Correct Label DOG == CAT ?
from y_test
Compare Prediction to Correct Label
Model Evaluation

● We repeat this process for all the images in our X


test data.
● At the end we will have a count of correct
matches and a count of incorrect matches.
● The key realization we need to make, is that in
the real world, not all incorrect or correct
matches hold equal value!
Model Evaluation

● Also in the real world, a single metric won’t tell


the complete story!
● To understand all of this, let’s bring back the 4
metrics we mentioned and see how they are
calculated.
● We could organize our predicted values
compared to the real values in a confusion
matrix.
Model Evaluation

● Accuracy
○ Accuracy in classification problems is the
number of correct predictions made by the
model divided by the total number of
predictions.
Model Evaluation

● Accuracy
○ For example, if the X_test set was 100 images
and our model correctly predicted 80 images,
then we have 80/100.
○ 0.8 or 80% accuracy.
Model Evaluation

● Accuracy
○ Accuracy is useful when target classes are
well balanced
○ In our example, we would have roughly the
same amount of cat images as we have dog
images.
Model Evaluation

● Accuracy
○ Accuracy is not a good choice with
unbalanced classes!
○ Imagine we had 99 images of dogs and 1
image of a cat.
○ If our model was simply a line that always
predicted dog we would get 99% accuracy!
Model Evaluation

● Accuracy
○ Imagine we had 99 images of dogs and 1
image of a cat.
○ If our model was simply a line that always
predicted dog we would get 99% accuracy!
○ In this situation we’ll want to understand recall
and precision
Model Evaluation

● Recall/Sensitivity
○ TP/(TP+FN)
○ Ability of a model to find all the relevant cases within a
dataset.
○ The precise definition of recall is the number of true
positives divided by the number of true positives plus
the number of false negatives.
Model Evaluation
● Assume, a data set of 100 possible cancer patients
whose only 10 patients have really cancer
● Recall: TP/(TP+FN)
● Let, our model identified 5 patients have cancer. So,
True Positive = 5, False Negative = 5
● Recall = 5/10 = .5 = 50 %
● Recall works on false negative- finds the number of
unidentified real cancer patients

57
Model Evaluation

● Precision
○ Ability of a classification model to identify only
the relevant data points.
○ Precision is defined as the number of true
positives divided by the number of true
positives plus the number of false positives.
Model Evaluation
● Assume, a data set of 100 possible cancer patients
whose only 5 patients have really cancer
● Precision: TP/(TP+FP)
● Let, our model predicted all100 patients have
cancer. So, True Positive + False Positive = 100
● But only 5 patients have cancer. True Positive = 5
● Precision = 5/100 = .05 = 5%
● Precision works on false positive- finds the number
of real cancer patients
59
Model Evaluation:
Recall and
Precision

• Often you have a trade-off between


Recall and Precision.
• While recall expresses the ability to find
all relevant instances in a dataset,
precision expresses the proportion of
the data points our model says was
relevant were relevant.
Model Evaluation

● F1-Score
○ In cases where we want to find an optimal
blend of precision and recall we can combine
the two metrics using what is called the F1
score.
Model Evaluation

● F1-Score
○ The F1 score is the harmonic mean of
precision and recall taking both metrics into
account in the following equation:
Model Evaluation

● F1-Score
○ We use the harmonic mean instead of a
simple average because it punishes extreme
values.
○ A classifier with a precision of 1.0 and a recall
of 0.0 has a simple average of 0.5 but an F1
score of 0.

Model Evaluation

Machin Math &


e Statistics
The main point to remember with the confusion matrix and the
Learnin
various calculated metrics is that
D they are all fundamentally ways
g
of comparing the predicted values
S versus the true values.
Softwar Resear
e ch
What constitutes “good” metrics, will really
Domain
depend on the specific situation!
Knowledge
Performance Metrics/Model Evaluation

65
Evaluating Performance
REGRESSION
Evaluating Regression
● Let’s take a moment now to discuss evaluating
Regression Models
● Regression is a task when a model attempts
to predict continuous values (unlike
categorical values, which is classification)
Evaluating Regression
● You may have heard of some evaluation
metrics like accuracy or recall.
● These sort of metrics aren’t useful for
regression problems, we need metrics
designed for continuous values!
Evaluating Regression
● For example, attempting to predict the price of
a house given its features is a regression
task.
● Attempting to predict the country a house is in
given its features would be a classification
task.
Evaluating Regression
● Most common evaluation metrics:
○ Mean Absolute Error (MAE)
○ Mean Squared Error (MSE)
○ Root Mean Square Error (RMSE)
Evaluating
Regression
Mean Absolute Error (MAE)
• This is the mean of
the absolute value of
errors.
• Easy to understand
Evaluating Regression
● MAE won’t punish large errors however.
Evaluating Regression

● MAE won’t punish large errors however.


Evaluating Regression

● We want our error metrics to account for


these!
Evaluating Regression

● Mean Squared Error (MSE)


○ This is the mean of the squared errors.
○ Larger errors are noted more than with
MAE, making MSE more popular.
Evaluating Regression:
R-squared (R²) Value/Score

• Y bar: mean of all


real values
• As value close to 1
the model
performed better

76
• Root Mean Square Error (RMSE)
Evaluating • This is the root of the mean of the
Regression squared errors.
• Most popular (has same units as y)
Machine Learning

● Most common question from students:


○ “Is this value of RMSE good?”
● Context is everything!
● A RMSE of $10 is fantastic for predicting the
price of a house, but horrible for predicting the
price of a candy bar!
Machine Learning

● Compare your error metric to the average value


of the label in your data set to try to get an
intuition of its overall performance.
● Domain knowledge also plays an important role
here!
Unsupervised Learning
Machine Learning

● We’ve covered supervised learning, where the


label was known due to historical labeled
data.
● But what happens when we don’t have historical
labels?
Machine Learning
● There are certain tasks that fall under
unsupervised learning:
○ Clustering
○ Anomaly Detection
○ Dimensionality Reduction
Machine Learning

● Clustering
○ Grouping together unlabeled data points
into categories/clusters
○ Data points are assigned to a cluster based
on similarity
Machine Learning
● Anomaly Detection
○ Attempts to detect outliers in a dataset
○ For example, fraudulent transactions on a
credit card.
Machine Learning

● Dimensionality Reduction
○ Data processing techniques that reduces
the number of features in a data set, either
for compression, or to better understand
underlying trends within a data set.
Machine Learning

● Unsupervised Learning
○ It’s important to note, these are situations
where we don’t have the correct answer
for historical data!
○ Which means evaluation is much harder
and more nuanced!
Unsupervised Process

Test
Data

Model
Data Data Training & Transformation Model
Acquisition Cleaning Building Deployment

You might also like