0% found this document useful (0 votes)

40 views61 pages

Intro to Machine Learning Basics

Uploaded by

gjiacheng123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views61 pages

Intro to Machine Learning Basics

Uploaded by

gjiacheng123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 61

Introduction of

Machine / Deep Learning

Hung-yi Lee 李宏毅
Machine Learning
≈ Looking for Function
• Speech Recognition

f( ) = “How are you”

• Image Recognition
f( ) = “Cat”
• Playing Go

f( ) = “5-5” (next move)

Different types of Functions
Regression: The function outputs a scalar.
PM2.5 today
Predict PM2.5 of
PM2.5
temperature f tomorrow
Concentration
of O3

Classification: Given options (classes), the function outputs

the correct one.

Spam
filtering f Yes/No
Different types of Functions
Classification: Given options (classes), the function
outputs the correct one.
Each position
is a class
(19 x 19 classes)

Function
a position on
the board

Next move
Playing GO
Structured Learning
create something with
structure (image, document)

Regression,
Classification
How to find a function?
A Case Study
YouTube Channel

https://www.youtube.com/c/HungyiLeeNTU
The function we want to find …

𝑦=𝑓
no. of views
on 2/26
1. Function
with Unknown Parameters

𝑦=𝑓

Model 𝑦 = 𝑏 + 𝑤𝑥! based on domain knowledge

feature
𝑦: no. of views on 2/26, 𝑥!: no. of views on 2/25
𝑤 and 𝑏 are unknown parameters (learned from data)
weight bias
Ø Loss is a function of
2. Define Loss parameters 𝐿 𝑏, 𝑤
from Training Data Ø Loss: how good a set of
values is.
𝐿 0.5𝑘, 1 𝑦 = 𝑏 + 𝑤𝑥! 𝑦 = 0.5𝑘 + 1𝑥! How good it is?
Data from 2017/01/01 – 2020/12/31
2017/01/01 01/02 01/03 …… 2020/12/30 12/31

4.8k 4.9k 7.5k 3.4k 9.8k

0.5𝑘+1𝑥! = 𝑦 5.3k

𝑒!= 𝑦 − 𝑦& = 0.4𝑘

label 𝑦&

4.9k
Ø Loss is a function of
2. Define Loss parameters 𝐿 𝑏, 𝑤
from Training Data Ø Loss: how good a set of
values is.
𝐿 0.5𝑘, 1 𝑦 = 𝑏 + 𝑤𝑥! 𝑦 = 0.5𝑘 + 1𝑥! How good it is?
Data from 2017/01/01 – 2020/12/31
2017/01/01 01/02 01/03 …… 2020/12/30 12/31

4.8k 4.9k 7.5k 3.4k 9.8k

0.5𝑘+1𝑥! = 𝑦 5.4k 0.5𝑘+1𝑥! = 𝑦

𝑒"= 𝑦 − 𝑦& = 2.1𝑘 𝑒#
𝑦& 𝑦&

4.9k 7.5k 9.8k

Ø Loss is a function of
2. Define Loss parameters 𝐿 𝑏, 𝑤
from Training Data Ø Loss: how good a set of
values is.

4.8k 4.9k

𝑏 + 𝑤𝑥! = 𝑦 1
Loss: 𝐿 = , 𝑒"
𝑒! 𝑁
"
𝑦&

4.9k

𝑒 = 𝑦 − 𝑦& 𝐿 is mean absolute error (MAE)

𝑒 = 𝑦 − 𝑦& "
𝐿 is mean square error (MSE)
If 𝑦 and 𝑦& are both probability distributions Cross-entropy
Ø Loss is a function of
2. Define Loss parameters 𝐿 𝑏, 𝑤
from Training Data Ø Loss: how good a set of
Model 𝑦 = 𝑏 + 𝑤𝑥! values is.
Small 𝐿

𝑏 Error Surface

Large 𝐿 𝑤
Source of image: http://chico386.pixnet.net/album/photo/171572850

3. Optimization 𝑤 ∗ , 𝑏 ∗ = 𝑎𝑟𝑔 min 𝐿

%,'

Gradient Descent
Ø (Randomly) Pick an initial value 𝑤 (
𝜕𝐿
Ø Compute |%)%!
𝜕𝑤
Loss
𝐿 Negative Increase w

Positive Decrease w

𝑤( 𝑤
Source of image: http://chico386.pixnet.net/album/photo/171572850

3. Optimization 𝑤 ∗ , 𝑏 ∗ = 𝑎𝑟𝑔 min 𝐿

%,'

Gradient Descent
Ø (Randomly) Pick an initial value 𝑤 (
𝜕𝐿
Ø Compute |%)%!
𝜕𝑤
Loss 𝜕𝐿
!
𝑤 ←𝑤 −𝜂 ( |%)%!
𝐿 𝜕𝑤
𝜕𝐿
𝜂 |%)%! 𝜂: learning rate
𝜕𝑤
hyperparameters

𝑤( 𝑤! 𝑤
Source of image: http://chico386.pixnet.net/album/photo/171572850

3. Optimization 𝑤 ∗ , 𝑏 ∗ = 𝑎𝑟𝑔 min 𝐿

%,'

Gradient Descent
Ø (Randomly) Pick an initial value 𝑤 (
𝜕𝐿
Ø Compute |%)%!
𝜕𝑤
Loss 𝜕𝐿
!
𝑤 ←𝑤 −𝜂 ( |%)%!
𝐿 𝜕𝑤
Ø Update 𝑤 iteratively
Does local minima truly cause the problem?

Local global
minima minima
𝑤( 𝑤! 𝑤" 𝑤* 𝑤
3. Optimization 𝑤 ∗ , 𝑏 ∗ = 𝑎𝑟𝑔 min 𝐿
%,'

Ø (Randomly) Pick initial values 𝑤 (, 𝑏 (

Ø Compute

𝜕𝐿 𝜕𝐿
|%)%! ,')'! 𝑤! ← 𝑤( −𝜂 |%)%! ,')'!
𝜕𝑤 𝜕𝑤
𝜕𝐿 𝜕𝐿
|%)%! ,')'! 𝑏! ← 𝑏( − 𝜂 |%)%! ,')'!
𝜕𝑏 𝜕𝑏

Can be done in one line in most deep learning frameworks

Ø Update 𝑤 and 𝑏 interatively
Model 𝑦 = 𝑏 + 𝑤𝑥!

3. Optimization 𝑤 ∗ , 𝑏 ∗ = 𝑎𝑟𝑔 min 𝐿

%,'

Compute 𝜕𝐿⁄𝜕𝑤, 𝜕𝐿⁄𝜕𝑏

𝑤 ∗ = 0.97, 𝑏 ∗ = 0.1𝑘
𝑏 𝐿 𝑤 ∗ , 𝑏 ∗ = 0.48𝑘
(−𝜂 𝜕𝐿⁄𝜕𝑤, −𝜂 𝜕𝐿⁄𝜕𝑏)

Compute 𝜕𝐿⁄𝜕𝑤, 𝜕𝐿⁄𝜕𝑏

𝑤
Machine Learning is so simple ……
𝑤 ∗ = 0.97, 𝑏 ∗ = 0.1𝑘
𝑦 = 𝑏 + 𝑤𝑥! 𝐿 𝑤 ∗ , 𝑏 ∗ = 0.48𝑘
Step 1: Step 2: define
Step 3:
function with loss from
optimization
unknown training data
Machine Learning is so simple ……
𝑤 ∗ = 0.97, 𝑏 ∗ = 0.1𝑘
𝑦 = 𝑏 + 𝑤𝑥! 𝐿 𝑤 ∗ , 𝑏 ∗ = 0.48𝑘
Step 1: Step 2: define
Step 3:
function with loss from
optimization
unknown training data

Training

𝑦 = 0.1𝑘 + 0.97𝑥! achieves the smallest loss 𝐿 = 0.48𝑘

on data of 2017 – 2020 (training data)
How about data of 2021 (unseen during training)?
𝐿′ = 0.58𝑘
Red: real no. of views
𝑦 = 0.1𝑘 + 0.97𝑥! blue: estimated no. of views

Views
(k)

2021/01/01 2021/02/14
2017 - 2020 2021
𝑦 = 𝑏 + 𝑤𝑥!
𝐿 = 0.48𝑘 𝐿′ = 0.58𝑘
'
2017 - 2020 2021
𝑦 = 𝑏 + , 𝑤# 𝑥#
𝐿 = 0.38𝑘 𝐿′ = 0.49𝑘
#$!
𝒃 𝒘∗𝟏 𝒘∗𝟐 𝒘∗𝟑 𝒘∗𝟒 𝒘∗𝟓 𝒘∗𝟔 𝒘∗𝟕
0.05k 0.79 -0.31 0.12 -0.01 -0.10 0.30 0.18
()
2017 - 2020 2021
𝑦 = 𝑏 + , 𝑤# 𝑥# 𝐿′ = 0.46𝑘
𝐿 = 0.33𝑘
#$!
%&
2017 - 2020 2021
𝑦 = 𝑏 + , 𝑤# 𝑥#
𝐿 = 0.32𝑘 𝐿′ = 0.46𝑘
#$!
Linear models
Linear models are too simple … we need more sophisticated modes.

Different w
Different 𝑏

𝑥!

Linear models have severe limitation. Model Bias

We need a more flexible model!
red curve = constant + sum of a set of

0
𝑥!

2
All Piecewise Linear Curves
= constant + sum of a set of

More pieces require more

Beyond Piecewise Linear?
Approximate continuous curve
𝑦 by a piecewise linear curve.

𝑥!
To have good approximation, we need sufficient pieces.
red curve = constant + sum of a set of

How to represent
this function? Hard Sigmoid

𝑥!

Sigmoid Function
1
𝑦=𝑐
1 + 𝑒 2 '3%4"
= 𝑐 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑏 + 𝑤𝑥!

𝑥!
Different 𝑤

Change slopes

Different b

Shift

Different 𝑐

Change height
red curve = sum of a set of + constant

𝑦
𝑐! 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑏! + 𝑤!𝑥!
1

𝑐5 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑏5 + 𝑤5𝑥! 3

0
𝑥!
𝑦 = 𝑏 + , 𝑐* 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑏* + 𝑤* 𝑥!
0 * 1 + 2 + 3
𝑐" 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑏" + 𝑤"𝑥! 2
New Model: More Features
𝑦 = 𝑏 + 𝑤𝑥!

𝑦 = 𝑏 + , 𝑐* 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑏* + 𝑤* 𝑥!
*

𝑦 = 𝑏 + , 𝑤# 𝑥#
#

𝑦 = 𝑏 + , 𝑐* 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑏* + , 𝑤*# 𝑥#
* #
𝑗: 1,2,3
𝑦 = 𝑏 + , 𝑐* 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑏* + , 𝑤*# 𝑥# no. of features
* # 𝑖: 1,2,3
no. of sigmoid
1
𝑟! = 𝑏! + 𝑤!!𝑥! + 𝑤!"𝑥" + 𝑤!5𝑥5 + 𝑤!!
𝑏! 𝑤!" 𝑥!
𝑤67 : weight for 𝑥7 for i-th sigmoid 1
𝑤!5
2 𝑥"
𝑟" = 𝑏" + 𝑤"!𝑥! + 𝑤""𝑥" + 𝑤"5𝑥5 +

1 𝑥5

3
𝑟5 = 𝑏5 + 𝑤5!𝑥! + 𝑤5"𝑥" + 𝑤55𝑥5 +

1
𝑦 = 𝑏 + , 𝑐* 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑏* + , 𝑤*# 𝑥# 𝑖: 1,2,3
* # 𝑗: 1,2,3

𝑟! = 𝑏! + 𝑤!!𝑥! + 𝑤!"𝑥" + 𝑤!5𝑥5

𝑟" = 𝑏" + 𝑤"!𝑥! + 𝑤""𝑥" + 𝑤"5𝑥5
𝑟5 = 𝑏5 + 𝑤5!𝑥! + 𝑤5"𝑥" + 𝑤55𝑥5

𝑟! 𝑏! 𝑤!! 𝑤!( 𝑤!+ 𝑥!

𝑟( = 𝑏( + 𝑤(! 𝑤(( 𝑤(+ 𝑥(
𝑟+ 𝑏+ 𝑤+! 𝑤+( 𝑤++ 𝑥+

𝒓 = 𝒃 + 𝑊 𝒙
𝑦 = 𝑏 + , 𝑐* 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑏* + , 𝑤*# 𝑥# 𝑖: 1,2,3
* # 𝑗: 1,2,3

1
𝑟! + 𝑤!!
𝑏! 𝑤!" 𝑥!
1
𝑤!5
2 𝑥"
𝒓 = 𝒃 + 𝑊 𝒙 𝑟" +

1 𝑥5

3
𝑟5 +

1
𝑦 = 𝑏 + , 𝑐* 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑏* + , 𝑤*# 𝑥# 𝑖: 1,2,3
* # 𝑗: 1,2,3

1
𝑎! 𝑟! + 𝑤!!
1 𝑏! 𝑤!" 𝑥!
𝑎! = 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑟! = 1
𝑤!5
1 + 𝑒 28"
2 𝑥"
𝑎" 𝑟" +

1 𝑥5

3
𝒂 =𝜎 𝒓 𝑎5 𝑟5 +

1
𝑦 = 𝑏 + , 𝑐* 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑏* + , 𝑤*# 𝑥# 𝑖: 1,2,3
* # 𝑗: 1,2,3

1
𝑎! 𝑟! + 𝑤!!
𝑐! 𝑏! 𝑤!" 𝑥!
1
𝑤!5
𝑐" 2 𝑥"
𝑦 + 𝑎" 𝑟" +
𝑏
1 1 𝑥5
𝑐5
3
𝑎5 𝑟5 +

1
𝑦 = 𝑏 + 𝒄 , 𝒂
1
𝑎! 𝑟! + 𝑤!!
𝑐! 𝑏! 𝑤!" 𝑥!
1
𝑤!5
𝑐" 2 𝑥"
𝑦 + 𝑎" 𝑟" +
𝑏
1 1 𝑥5
𝑐5
3
𝑎5 𝑟5 +

1
𝑦 = 𝑏 + 𝒄, 𝒂

𝒂 =𝜎 𝒓 𝒓 = 𝒃 + 𝑊 𝒙
1
𝑎! 𝑟! + 𝑤!!
𝑐! 𝑏! 𝑤!" 𝑥!
1
𝑤!5
𝑐" 2 𝑥"
𝑦 + 𝑎" 𝑟" +
𝑏
1 1 𝑥5
𝑐5
3
𝑎5 𝑟5 +

𝑦 = 𝑏 + 𝒄, 𝜎 𝒃 + 𝑊 𝒙
Function with unknown parameters
𝑦 = 𝑏 + 𝒄, 𝜎 𝒃 + 𝑊 𝒙

𝒙 feature Rows
of 𝑊

……
𝜃!
Unknown parameters
𝜃(
𝜽 =
𝜃+
𝑊 𝒃 ⋮

𝒄, 𝑏
Back to ML Framework

Step 1: Step 2: define

Step 3:
function with loss from
optimization
unknown training data

𝑦 = 𝑏 + 𝒄, 𝜎 𝒃 + 𝑊 𝒙
Loss Ø Loss is a function of parameters 𝐿 𝜃
Ø Loss means how good a set of values is.

feature

𝑦 = 𝑏 + 𝒄, 𝜎 𝒃 + 𝑊 𝒙
𝑒
label 𝑦
D
Given a set of values

1
Loss: 𝐿 = , 𝑒"
𝑁
"
Back to ML Framework

Step 1: Step 2: define

Step 3:
function with loss from
optimization
unknown training data

𝑦 = 𝑏 + 𝒄, 𝜎 𝒃 + 𝑊 𝒙
Optimization of New Model 𝜃!

𝜽∗ = 𝑎𝑟𝑔 min 𝐿 𝜽 = 𝜃(
𝜽 𝜃+
⋮
Ø (Randomly) Pick initial values 𝜽(
𝜕𝐿 𝜕𝐿
|𝜽$𝜽! 𝜂 |𝜽$𝜽!
𝜕𝜃! 𝜃!! 𝜃!- 𝜕𝜃!
𝒈 = 𝜕𝐿 𝜃(! ← 𝜃(- − 𝜕𝐿
gradient 𝜕𝜃 |𝜽$𝜽! 𝜂 |𝜽$𝜽!
( ⋮ ⋮ 𝜕𝜃(
⋮ ⋮
𝒈 = ∇𝐿 𝜽- 𝜽! ← 𝜽- − 𝜂𝒈
Optimization of New Model
𝜽∗ = 𝑎𝑟𝑔 min 𝐿
𝜽

Ø (Randomly) Pick initial values 𝜽(

Ø Compute gradient 𝒈 = ∇𝐿 𝜽(
𝜽! ← 𝜽( − 𝜂𝒈
Ø Compute gradient 𝒈 = ∇𝐿 𝜽!
𝜽" ← 𝜽! − 𝜂𝒈
Ø Compute gradient 𝒈 = ∇𝐿 𝜽"
𝜽5 ← 𝜽" − 𝜂𝒈
Optimization of New Model
𝜽∗ = 𝑎𝑟𝑔 min 𝐿
𝜽
B batch
Ø (Randomly) Pick initial values 𝜽( 𝐿
Ø Compute gradient 𝒈 = ∇𝐿! 𝜽( 𝐿!
batch
update 𝜽! ← 𝜽( − 𝜂𝒈
N
Ø Compute gradient 𝒈 = ∇𝐿" 𝜽! 𝐿"
update 𝜽" ← 𝜽! − 𝜂𝒈 batch
Ø Compute gradient 𝒈 = ∇𝐿5 𝜽" 𝐿5
update 𝜽5 ← 𝜽" − 𝜂𝒈 batch
1 epoch = see all the batches once
Optimization of New Model
Example 1
Ø 10,000 examples (N = 10,000) B batch
Ø Batch size is 10 (B = 10)
How many update in 1 epoch?
batch
1,000 updates
Example 2
N
Ø 1,000 examples (N = 1,000) batch
Ø Batch size is 100 (B = 100)
How many update in 1 epoch?
10 updates batch
Back to ML Framework

Step 1: Step 2: define

Step 3:
function with loss from
optimization
unknown training data

𝑦 = 𝑏 + 𝒄, 𝜎 𝒃 + 𝑊 𝒙

More variety of models …

Sigmoid → ReLU
How to represent
this function?

𝑥!

Rectified Linear
Unit (ReLU) 𝑐 𝑚𝑎𝑥 0, 𝑏 + 𝑤𝑥!

𝑥!
𝑐′ 𝑚𝑎𝑥 0, 𝑏′ + 𝑤′𝑥!
Sigmoid → ReLU

𝑦 = 𝑏 + , 𝑐* 𝑠𝑖𝑔𝑚𝑜𝑖𝑑 𝑏* + , 𝑤*# 𝑥#
* #

Activation function

𝑦 = 𝑏 + , 𝑐* 𝑚𝑎𝑥 0, 𝑏* + , 𝑤*# 𝑥#
(* #

Which one is better?

Experimental Results

𝑦 = 𝑏 + , 𝑐* 𝑚𝑎𝑥 0, 𝑏* + , 𝑤*# 𝑥#
(* #

linear 10 ReLU 100 ReLU 1000 ReLU

2017 – 2020 0.32k 0.32k 0.28k 0.27k
2021 0.46k 0.45k 0.43k 0.43k
Back to ML Framework

Step 1: Step 2: define

Step 3:
function with loss from
optimization
unknown training data

𝑦 = 𝑏 + 𝒄, 𝜎 𝒃 + 𝑊 𝒙

Even more variety of models …

+ 𝑎! +
𝑥!
1 1

𝑥"
+ 𝑎" +

1 or 1 𝑥5

……
+ 𝑎5 +

1 1

𝒂′ = 𝜎 𝒃′ + 𝑊′ 𝒂 𝒂 =𝜎 𝒃 + 𝑊 𝒙
Experimental Results
• Loss for multiple hidden layers
• 100 ReLU for each layer
• input features are the no. of views in the past 56
days
1 layer 2 layer 3 layer 4 layer
2017 – 2020 0.28k 0.18k 0.14k 0.10k
2021 0.43k 0.39k 0.38k 0.44k
Red: real no. of views
3 layers blue: estimated no. of views

Views
(k)

2021/01/01 2021/02/14
Back to ML Framework

Step 1: Step 2: define

Step 3:
function with loss from
optimization
unknown training data

𝑦 = 𝑏 + 𝒄, 𝜎 𝒃 + 𝑊 𝒙

It is not fancy enough.

Let’s give it a fancy name!

hidden layer hidden layer
+ 𝑎! +
𝑥!
1 1

𝑥"
+ 𝑎" +

1 1 𝑥5

……
+ 𝑎5 +

1 Neuron 1

Neural Network This mimics human brains … (???)

Many layers means Deep Deep Learning

Deep = Many hidden layers
22 layers

http://cs231n.stanford.e
du/slides/winter1516_le 19 layers
cture8.pdf

8 layers
6.7%
7.3%
16.4%

AlexNet (2012) VGG (2014) GoogleNet (2014)

Deep = Many hidden layers

152 layers 101 layers

Special
structure

Why we want “Deep” network,

not “Fat” network? 3.57%

7.3% 6.7%
16.4%
AlexNet VGG GoogleNet Residual Net Taipei
(2012) (2014) (2014) (2015) 101
Why don’t we go deeper?
• Loss for multiple hidden layers
• 100 ReLU for each layer
• input features are the no. of views in the past 56
days
1 layer 2 layer 3 layer 4 layer
2017 – 2020 0.28k 0.18k 0.14k 0.10k
2021 0.43k 0.39k 0.38k 0.44k
Why don’t we go deeper?
• Loss for multiple hidden layers
• 100 ReLU for each layer
• input features are the no. of views in the past 56
days
1 layer 2 layer 3 layer 4 layer
2017 – 2020 0.28k 0.18k 0.14k 0.10k
2021 0.43k 0.39k 0.38k 0.44k

Better on training data, worse on unseen data

Overfitting
Let’s predict no. of views today!
• If we want to select a model for predicting no. of
views today, which one will you use?
1 layer 2 layer 3 layer 4 layer
2017 – 2020 0.28k 0.18k 0.14k 0.10k
2021 0.43k 0.39k 0.38k 0.44k

We will talk about model selection next time. J

To learn more ……
Backpropagation
Basic Introduction Computing gradients in
an efficient way

https://youtu.be/Dr-WRlEFefw https://youtu.be/ibJpTrp5mcE

DL Unit-2
No ratings yet
DL Unit-2
24 pages
Jam Jim Jam Plan
No ratings yet
Jam Jim Jam Plan
7 pages
Deep Learning
No ratings yet
Deep Learning
299 pages
HODL Lec 2 Training NNs Intro TF
No ratings yet
HODL Lec 2 Training NNs Intro TF
83 pages
DL 02 Basics
No ratings yet
DL 02 Basics
94 pages
Instructor's Solution Manual For Neural Networks
0% (1)
Instructor's Solution Manual For Neural Networks
40 pages
Neural Networks & Backpropagation
No ratings yet
Neural Networks & Backpropagation
77 pages
Deep Learning Lectures - 2
No ratings yet
Deep Learning Lectures - 2
73 pages
Deep Learning Summer School 2015: Introduction To Machine Learning
No ratings yet
Deep Learning Summer School 2015: Introduction To Machine Learning
46 pages
Lect 8
No ratings yet
Lect 8
117 pages
Lec 03
No ratings yet
Lec 03
42 pages
DeepLearning Recap
No ratings yet
DeepLearning Recap
104 pages
Advanced Machine Learning: Neural Networks Decision Trees Random Forest Xgboost
No ratings yet
Advanced Machine Learning: Neural Networks Decision Trees Random Forest Xgboost
61 pages
First Quarter Module 1 Activities
100% (4)
First Quarter Module 1 Activities
2 pages
Kagan Lecture2
No ratings yet
Kagan Lecture2
118 pages
NLP-NeuralNetworks Reading Notes
No ratings yet
NLP-NeuralNetworks Reading Notes
13 pages
Neural Networks for Beginners
No ratings yet
Neural Networks for Beginners
79 pages
Linear Classifier: by Dr. Sanjeev Kumar Associate Professor Department of Mathematics IIT Roorkee, Roorkee-247 667, India
No ratings yet
Linear Classifier: by Dr. Sanjeev Kumar Associate Professor Department of Mathematics IIT Roorkee, Roorkee-247 667, India
86 pages
Lec 04 Deep Networks 2
No ratings yet
Lec 04 Deep Networks 2
78 pages
Introduction To Neural Network
No ratings yet
Introduction To Neural Network
20 pages
Week2 DL
No ratings yet
Week2 DL
29 pages
Week 2 Artificial Neural Networks
No ratings yet
Week 2 Artificial Neural Networks
62 pages
Mathematical Foundations of Computational Linguistics: Manfred Klenner and Jannis Vamvas
No ratings yet
Mathematical Foundations of Computational Linguistics: Manfred Klenner and Jannis Vamvas
32 pages
Lecture 1
No ratings yet
Lecture 1
56 pages
L4 Training Neural Networks en
No ratings yet
L4 Training Neural Networks en
48 pages
Gradient-Based Learning & Neural Networks
No ratings yet
Gradient-Based Learning & Neural Networks
72 pages
Instructor Solution Manual To Neural Networks and Deep Learning A Textbook Solutions 3319944622 9783319944623 - Compress
No ratings yet
Instructor Solution Manual To Neural Networks and Deep Learning A Textbook Solutions 3319944622 9783319944623 - Compress
40 pages
Tuck 2017 - 2018 MBA Admissions Discussion PDF
No ratings yet
Tuck 2017 - 2018 MBA Admissions Discussion PDF
257 pages
Deep Learning
No ratings yet
Deep Learning
78 pages
L3 Cse256 Fa24 FFN
No ratings yet
L3 Cse256 Fa24 FFN
64 pages
Mmu PHD Thesis Guidelines
100% (4)
Mmu PHD Thesis Guidelines
8 pages
Cells: Basic Units of Life Worksheet
No ratings yet
Cells: Basic Units of Life Worksheet
12 pages
Deep Learning Computer Vision
No ratings yet
Deep Learning Computer Vision
302 pages
16 DL 1
No ratings yet
16 DL 1
9 pages
m8l18 PDF
No ratings yet
m8l18 PDF
25 pages
Aula 4 (L) - Oggi La Tua Lezione È in Presenza
No ratings yet
Aula 4 (L) - Oggi La Tua Lezione È in Presenza
11 pages
Bahamas Medical Council: Application Form For Registration
No ratings yet
Bahamas Medical Council: Application Form For Registration
2 pages
ML 01
No ratings yet
ML 01
24 pages
Chapter 9
No ratings yet
Chapter 9
73 pages
הרצאה-Classifiers and Decision Trees
No ratings yet
הרצאה-Classifiers and Decision Trees
119 pages
By Fendy Sutandio Anyone Experienced in Helping Students in Any
No ratings yet
By Fendy Sutandio Anyone Experienced in Helping Students in Any
7 pages
Revise Lp7 Text Type (Narrative, Expository)
No ratings yet
Revise Lp7 Text Type (Narrative, Expository)
2 pages
Effective Negotiation Steps Guide
No ratings yet
Effective Negotiation Steps Guide
11 pages
Daily Lesson Plan 1
No ratings yet
Daily Lesson Plan 1
5 pages
12570
No ratings yet
12570
2 pages
Revised Schedule: FIITJEE Computer Based All India Test Series For JEE Advanced 2020
No ratings yet
Revised Schedule: FIITJEE Computer Based All India Test Series For JEE Advanced 2020
1 page
Unit 2.1
No ratings yet
Unit 2.1
37 pages
Neural Networks Essay Feranmi Dere
No ratings yet
Neural Networks Essay Feranmi Dere
7 pages
Module 5
No ratings yet
Module 5
53 pages
Short Course Machine Learning F de Vuyst 1715052496
No ratings yet
Short Course Machine Learning F de Vuyst 1715052496
74 pages
Lecture 2
No ratings yet
Lecture 2
6 pages
Model: BERT + DNN Discussion: Anushya Subbiah Divya Sudhakar Kenny Hsu
No ratings yet
Model: BERT + DNN Discussion: Anushya Subbiah Divya Sudhakar Kenny Hsu
1 page
B.A. 2nd Sem Exam Schedule 2024
No ratings yet
B.A. 2nd Sem Exam Schedule 2024
1 page
Deep Learning 1
No ratings yet
Deep Learning 1
48 pages
Practical-5 - 2CEIT606 - Artificial Intelligence
No ratings yet
Practical-5 - 2CEIT606 - Artificial Intelligence
14 pages
Neural Network (Perceptrons)
No ratings yet
Neural Network (Perceptrons)
31 pages
16 - The Key To The Most Powerful ML Models
No ratings yet
16 - The Key To The Most Powerful ML Models
25 pages
Social Psychology Essentials
No ratings yet
Social Psychology Essentials
94 pages
Tle 9
No ratings yet
Tle 9
31 pages
RW-Week 6
No ratings yet
RW-Week 6
3 pages
DL 02 Basics
No ratings yet
DL 02 Basics
95 pages
DL Assi02
No ratings yet
DL Assi02
9 pages
CSE 440 AI Volume1 (p1)
No ratings yet
CSE 440 AI Volume1 (p1)
4 pages
2 DL Training
No ratings yet
2 DL Training
60 pages
Lecture 1
No ratings yet
Lecture 1
6 pages
Deep Learning Unit 2
No ratings yet
Deep Learning Unit 2
25 pages
Pediatric Neuropsychology Tool Update
No ratings yet
Pediatric Neuropsychology Tool Update
6 pages
1) Deep - Learning
No ratings yet
1) Deep - Learning
60 pages
Breeds of Animal Week 4
No ratings yet
Breeds of Animal Week 4
5 pages
Qual L01
No ratings yet
Qual L01
28 pages
Deep Neural Networks
No ratings yet
Deep Neural Networks
79 pages
Physics Bits Solutions
No ratings yet
Physics Bits Solutions
20 pages
Neural Networks
No ratings yet
Neural Networks
38 pages
Catholic Practices Exam Mock
No ratings yet
Catholic Practices Exam Mock
5 pages
Strategies To Build Readiness For Transformational Change
No ratings yet
Strategies To Build Readiness For Transformational Change
13 pages
BS Islamic Studies 4TH Semester English Notes
No ratings yet
BS Islamic Studies 4TH Semester English Notes
12 pages
Complete Bundle Test Bank For Social Work and Family Violence 2nd US Edition by McClennen
No ratings yet
Complete Bundle Test Bank For Social Work and Family Violence 2nd US Edition by McClennen
408 pages
Multimodal AI On Wound Images and Clinical Notes For Home Patient Referral
No ratings yet
Multimodal AI On Wound Images and Clinical Notes For Home Patient Referral
11 pages
DL Module II Till7thAug
No ratings yet
DL Module II Till7thAug
131 pages
NEET Chapter Wise Weightage 2025 With Important Topics
No ratings yet
NEET Chapter Wise Weightage 2025 With Important Topics
13 pages
CS2011 5
No ratings yet
CS2011 5
43 pages
Neural Networks
No ratings yet
Neural Networks
108 pages
Module 2
No ratings yet
Module 2
55 pages
DLUNIT2
No ratings yet
DLUNIT2
25 pages
EIM 2nd TOPIC-4GRADING
No ratings yet
EIM 2nd TOPIC-4GRADING
6 pages
MtechDL Unit2
No ratings yet
MtechDL Unit2
25 pages
Deep Learning
No ratings yet
Deep Learning
50 pages
13 - Neural Network (Perceptrons)
No ratings yet
13 - Neural Network (Perceptrons)
31 pages

Intro to Machine Learning Basics

Uploaded by

Intro to Machine Learning Basics

Uploaded by

Introduction of

Machine / Deep Learning

f( ) = “How are you”

f( ) = “5-5” (next move)

Classification: Given options (classes), the function outputs

Model 𝑦 = 𝑏 + 𝑤𝑥! based on domain knowledge

4.8k 4.9k 7.5k 3.4k 9.8k

𝑒!= 𝑦 − 𝑦& = 0.4𝑘

4.8k 4.9k 7.5k 3.4k 9.8k

0.5𝑘+1𝑥! = 𝑦 5.4k 0.5𝑘+1𝑥! = 𝑦

4.9k 7.5k 9.8k

𝑒 = 𝑦 − 𝑦& 𝐿 is mean absolute error (MAE)

3. Optimization 𝑤 ∗ , 𝑏 ∗ = 𝑎𝑟𝑔 min 𝐿

3. Optimization 𝑤 ∗ , 𝑏 ∗ = 𝑎𝑟𝑔 min 𝐿

3. Optimization 𝑤 ∗ , 𝑏 ∗ = 𝑎𝑟𝑔 min 𝐿

Ø (Randomly) Pick initial values 𝑤 (, 𝑏 (

Can be done in one line in most deep learning frameworks

3. Optimization 𝑤 ∗ , 𝑏 ∗ = 𝑎𝑟𝑔 min 𝐿

Compute 𝜕𝐿⁄𝜕𝑤, 𝜕𝐿⁄𝜕𝑏

Compute 𝜕𝐿⁄𝜕𝑤, 𝜕𝐿⁄𝜕𝑏

𝑦 = 0.1𝑘 + 0.97𝑥! achieves the smallest loss 𝐿 = 0.48𝑘

Linear models have severe limitation. Model Bias

More pieces require more

𝑟! = 𝑏! + 𝑤!!𝑥! + 𝑤!"𝑥" + 𝑤!5𝑥5

𝑟! 𝑏! 𝑤!! 𝑤!( 𝑤!+ 𝑥!

Step 1: Step 2: define

Step 1: Step 2: define

Ø (Randomly) Pick initial values 𝜽(

Step 1: Step 2: define

More variety of models …

Which one is better?

linear 10 ReLU 100 ReLU 1000 ReLU

Step 1: Step 2: define

Even more variety of models …

Step 1: Step 2: define

It is not fancy enough.

Let’s give it a fancy name!

Neural Network This mimics human brains … (???)

Many layers means Deep Deep Learning

AlexNet (2012) VGG (2014) GoogleNet (2014)

152 layers 101 layers

Why we want “Deep” network,

Better on training data, worse on unseen data

We will talk about model selection next time. J

You might also like