About Me
Email: [email protected] Office Hour: Wendesday 19.00-20.00
Agenda
Supervised Unsupervised Gradient Batch Gradient
Cost Function Learning Rate
Learning Learning Descent Descent
Classification Clustering
Regression
What is
Machine
Learning?
Machine Learning Algorithms
Supervised learning
Unsupervised learning
Recommender system
Reinforcement learning
Supervised Learning
Regression Classification
Supervised Learning
Circle
Model Prediction Square
Labeled Data
Circle
Test Data
Triangle Square
Lables
Regression : Housing Price Prediction
400
300 Linear
Price
200
100
500 1000 1500 2000 2500
House size
Classification: Cancer Detection
Malignant
Benign
Tumor size x (cm)
Classification: Cancer Detection
Malignant type2
2 Malignant type1
Benign
Tumor size x (cm)
Two or More Inputs
malignant
𝑥2
benign
𝑎𝑔𝑒
𝑥1
𝑇𝑢𝑚𝑜𝑟 𝑠𝑖𝑧𝑒
Q&A
Unsupervised Learning
Clustering Anomaly Detection Dimensionality Reduction
𝑥2 𝑥2
age age
𝑥1 𝑥1
𝑇𝑢𝑚𝑜𝑟 𝑠𝑖𝑧𝑒 𝑇𝑢𝑚𝑜𝑟 𝑠𝑖𝑧𝑒
Supervised learning
Unsupervised learning
Learn from data labeled
Find something interesting
with the ‘right answer’
unlabeled data
Clustering
• What is unsupervised learning, and how does it differ from supervised
learning?
• A) Unsupervised learning involves training a model with labeled data, while
supervised learning uses unlabeled data.
• B) Unsupervised learning is used for classification tasks, whereas
supervised learning is used for clustering.
• C) Unsupervised learning deals with unlabeled data and seeks to find
patterns or structures within the data without explicit target labels.
• D) Unsupervised learning requires more computational resources
compared to supervised learning.
• Which of the following is NOT a common application of unsupervised
learning?
• A) Customer segmentation in marketing
• B) Handwriting recognition
• C) Anomaly detection in cybersecurity
• D) Image compressio
Q&A
Linear Regression Model
Linear regression
500
400
Price
300
200
100
0
1000 2000 3000
House size
Regression model predicts numbers
Classification model predicts categories
Terminology
x y
Size(feet^2) Price Notation:
(1) 2104 460 x = ‘input’ variable feature
(2) 1416 232
y = ‘output’ variable
(3) 1534 315
m = number of training examples
(4) 852 178
… …
(47) 3210 870 (x,y) = single training example
(𝑥 (𝑖) , 𝑦 (𝑖) ) = 𝑖 𝑡ℎ 𝑡𝑟𝑎𝑖𝑛𝑖𝑛𝑔 𝑒𝑥𝑎𝑚𝑝𝑙𝑒
Training set Learning Algorithm f 𝑦ො
How to represent f?
𝑓𝑤,𝑏 𝑥 = 𝑤𝑥 + 𝑏
y Linear regression with one variable
Univariate linear regression
x
Cost Function 3
y 2
𝑓𝑤,𝑏 𝑥 = 𝑤𝑥 + 𝑏 1
0 1 2 3
0 x
3 3 3
f(x) = 0*x + 1.5 f(x) = 0.5*x f(x) = 0.5*x + 1
2 2 2
1 1 1
0 0 0
0 1 2 3 0 1 2 3 0 1 2 3
w=0 w = 0.5 w = 0.5
b = 1.5 b=0 b=1
𝑓𝑤,𝑏 𝑥 = 𝑤𝑥 + 𝑏
𝑚
y 1
𝐽 𝑤, 𝑏 = (𝑦ො (𝑖) −𝑦 𝑖 )2
2𝑚
𝑖=1
f(x) = 0.5*x + 1
4
2
1 1
𝐽 𝑤, 𝑏 = ∗ ( 1.5 − 1 2 + 4−2 2 + 2. −2.5 2 )
2∗3
1 2 3
Visual of Cost Function
Gradient Descent
OUTLINE:
Initialization: We start with some initial parameter values.
Calculate Gradient: We calculate the gradient of the loss function with respect to each
parameter. This gradient points in the direction of the steepest increase in the loss.
Update Parameters: We adjust the parameters by a small amount in the opposite 𝐽(𝑤)
direction of the gradient. This helps us move closer to the parameter values that
minimize the loss.
Repeat: We repeat steps 2 and 3 iteratively, each time moving a bit closer to the
minimum of the loss function. 𝑤
w
b
𝜕
𝑤𝑗 = 𝑤𝑗 −∝ 𝜕𝑤 𝐽 𝑤, 𝑏 Derivative
𝑗
Learning Rate
𝜕
𝑏 = 𝑏 −∝ 𝐽 𝑤, 𝑏
𝜕𝑤𝑗
Repeat until convergence
Correct: Simultaneous update Incorrect
𝜕 𝜕
temp_w = w −∝ 𝜕𝑤 𝐽 𝑤, 𝑏 temp_w = w −∝ 𝜕𝑤 𝐽 𝑤, 𝑏
𝜕 w = temp_w
temp_b = b −∝ 𝐽 𝑤, 𝑏
𝜕𝑏 𝜕
temp_b = b −∝ 𝜕𝑏 𝐽 𝑤, 𝑏
w = temp_w
b = temp_b b = temp_b
𝐽(𝑤)
𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑠𝑙𝑜𝑝𝑒
𝑤 𝜕
𝑤𝑗 = 𝑤𝑗 −∝ 𝐽 𝑤, 𝑏
𝜕𝑤𝑗
𝐽(𝑤)
𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑠𝑙𝑜𝑝𝑒
𝑤
Learning Rate
𝜕
𝑤𝑗 = 𝑤𝑗 −∝ 𝐽 𝑤, 𝑏
𝜕𝑤𝑗
Proper learning rate Big learning rate
𝐽(𝑤)
Slope = 0
Local minimum
𝑤 𝑤 = 𝑤 −∝∗ 0
5
𝑤=𝑤
𝜕
w = w −∝ 𝐽 𝑤, 𝑏 =0
𝜕𝑤
Can reach local minimum with fixed learning rate
𝐽(𝑤)
𝜕
w = w −∝ 𝐽 𝑤, 𝑏 Large
𝜕𝑤
Not as large
𝑤
Smaller
Near a local minimum,
• Derivative becomes smaller
• Uprdate steps become smaller
The Curse of Local Minima: How to Escape and
Find the Global Minimum
• Adding noise
• Momentum
• Learning rate adjustment
Batch Gradient Descent
Stochastic Gradient Descent(SGD)
Stochastic Gradient Descent(SGD)
Mini-batch Gradient Descent
SUMMARY
Q&A
• Create Account
• Create repository
https://www.youtube.com/watch?v=HW29067qVWk https://www.youtube.com/watch?v=iv8rSLsi1xo