Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
15 views39 pages

Session 1

The document provides an overview of machine learning concepts, including supervised and unsupervised learning, with examples such as regression for housing price prediction and classification for cancer detection. It discusses various algorithms, cost functions, and gradient descent methods used in training models. Additionally, it highlights the differences between supervised and unsupervised learning, along with common applications and terminologies related to machine learning.

Uploaded by

redu0587
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views39 pages

Session 1

The document provides an overview of machine learning concepts, including supervised and unsupervised learning, with examples such as regression for housing price prediction and classification for cancer detection. It discusses various algorithms, cost functions, and gradient descent methods used in training models. Additionally, it highlights the differences between supervised and unsupervised learning, along with common applications and terminologies related to machine learning.

Uploaded by

redu0587
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

About Me

Email: [email protected] Office Hour: Wendesday 19.00-20.00


Agenda

Supervised Unsupervised Gradient Batch Gradient


Cost Function Learning Rate
Learning Learning Descent Descent

Classification Clustering

Regression
What is
Machine
Learning?
Machine Learning Algorithms

Supervised learning

Unsupervised learning

Recommender system

Reinforcement learning
Supervised Learning

Regression Classification
Supervised Learning
Circle

Model Prediction Square


Labeled Data

Circle
Test Data
Triangle Square

Lables
Regression : Housing Price Prediction
400

300 Linear

Price
200

100

500 1000 1500 2000 2500

House size
Classification: Cancer Detection

Malignant

Benign

Tumor size x (cm)


Classification: Cancer Detection
Malignant type2
2 Malignant type1

Benign

Tumor size x (cm)


Two or More Inputs

malignant
𝑥2

benign

𝑎𝑔𝑒

𝑥1

𝑇𝑢𝑚𝑜𝑟 𝑠𝑖𝑧𝑒
Q&A
Unsupervised Learning

Clustering Anomaly Detection Dimensionality Reduction


𝑥2 𝑥2

age age

𝑥1 𝑥1

𝑇𝑢𝑚𝑜𝑟 𝑠𝑖𝑧𝑒 𝑇𝑢𝑚𝑜𝑟 𝑠𝑖𝑧𝑒

Supervised learning
Unsupervised learning
Learn from data labeled
Find something interesting
with the ‘right answer’
unlabeled data
Clustering
• What is unsupervised learning, and how does it differ from supervised
learning?

• A) Unsupervised learning involves training a model with labeled data, while


supervised learning uses unlabeled data.
• B) Unsupervised learning is used for classification tasks, whereas
supervised learning is used for clustering.
• C) Unsupervised learning deals with unlabeled data and seeks to find
patterns or structures within the data without explicit target labels.
• D) Unsupervised learning requires more computational resources
compared to supervised learning.
• Which of the following is NOT a common application of unsupervised
learning?

• A) Customer segmentation in marketing


• B) Handwriting recognition
• C) Anomaly detection in cybersecurity
• D) Image compressio
Q&A
Linear Regression Model
Linear regression
500
400
Price
300
200
100
0
1000 2000 3000
House size
Regression model predicts numbers
Classification model predicts categories
Terminology

x y
Size(feet^2) Price Notation:
(1) 2104 460 x = ‘input’ variable feature
(2) 1416 232
y = ‘output’ variable
(3) 1534 315
m = number of training examples
(4) 852 178
… …
(47) 3210 870 (x,y) = single training example

(𝑥 (𝑖) , 𝑦 (𝑖) ) = 𝑖 𝑡ℎ 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑒𝑥𝑎𝑚𝑝𝑙𝑒


Training set Learning Algorithm f 𝑦ො

How to represent f?

𝑓𝑤,𝑏 𝑥 = 𝑤𝑥 + 𝑏
y Linear regression with one variable
Univariate linear regression

x
Cost Function 3
y 2
𝑓𝑤,𝑏 𝑥 = 𝑤𝑥 + 𝑏 1
0 1 2 3
0 x

3 3 3
f(x) = 0*x + 1.5 f(x) = 0.5*x f(x) = 0.5*x + 1
2 2 2
1 1 1
0 0 0
0 1 2 3 0 1 2 3 0 1 2 3
w=0 w = 0.5 w = 0.5
b = 1.5 b=0 b=1
𝑓𝑤,𝑏 𝑥 = 𝑤𝑥 + 𝑏
𝑚
y 1
𝐽 𝑤, 𝑏 = ෍(𝑦ො (𝑖) −𝑦 𝑖 )2
2𝑚
𝑖=1

f(x) = 0.5*x + 1
4

2
1 1
𝐽 𝑤, 𝑏 = ∗ ( 1.5 − 1 2 + 4−2 2 + 2. −2.5 2 )
2∗3
1 2 3
Visual of Cost Function
Gradient Descent

OUTLINE:

Initialization: We start with some initial parameter values.

Calculate Gradient: We calculate the gradient of the loss function with respect to each
parameter. This gradient points in the direction of the steepest increase in the loss.

Update Parameters: We adjust the parameters by a small amount in the opposite 𝐽(𝑤)
direction of the gradient. This helps us move closer to the parameter values that
minimize the loss.

Repeat: We repeat steps 2 and 3 iteratively, each time moving a bit closer to the
minimum of the loss function. 𝑤
w

b
𝜕
𝑤𝑗 = 𝑤𝑗 −∝ 𝜕𝑤 𝐽 𝑤, 𝑏 Derivative
𝑗
Learning Rate
𝜕
𝑏 = 𝑏 −∝ 𝐽 𝑤, 𝑏
𝜕𝑤𝑗

Repeat until convergence

Correct: Simultaneous update Incorrect


𝜕 𝜕
temp_w = w −∝ 𝜕𝑤 𝐽 𝑤, 𝑏 temp_w = w −∝ 𝜕𝑤 𝐽 𝑤, 𝑏
𝜕 w = temp_w
temp_b = b −∝ 𝐽 𝑤, 𝑏
𝜕𝑏 𝜕
temp_b = b −∝ 𝜕𝑏 𝐽 𝑤, 𝑏
w = temp_w
b = temp_b b = temp_b
𝐽(𝑤)
𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑠𝑙𝑜𝑝𝑒

𝑤 𝜕
𝑤𝑗 = 𝑤𝑗 −∝ 𝐽 𝑤, 𝑏
𝜕𝑤𝑗

𝐽(𝑤)
𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑠𝑙𝑜𝑝𝑒

𝑤
Learning Rate
𝜕
𝑤𝑗 = 𝑤𝑗 −∝ 𝐽 𝑤, 𝑏
𝜕𝑤𝑗

Proper learning rate Big learning rate


𝐽(𝑤)
Slope = 0

Local minimum

𝑤 𝑤 = 𝑤 −∝∗ 0
5
𝑤=𝑤
𝜕
w = w −∝ 𝐽 𝑤, 𝑏 =0
𝜕𝑤
Can reach local minimum with fixed learning rate
𝐽(𝑤)
𝜕
w = w −∝ 𝐽 𝑤, 𝑏 Large
𝜕𝑤
Not as large
𝑤
Smaller

Near a local minimum,


• Derivative becomes smaller
• Uprdate steps become smaller
The Curse of Local Minima: How to Escape and
Find the Global Minimum

• Adding noise
• Momentum
• Learning rate adjustment
Batch Gradient Descent
Stochastic Gradient Descent(SGD)
Stochastic Gradient Descent(SGD)
Mini-batch Gradient Descent
SUMMARY
Q&A
• Create Account
• Create repository

https://www.youtube.com/watch?v=HW29067qVWk https://www.youtube.com/watch?v=iv8rSLsi1xo

You might also like