0% found this document useful (0 votes)

16 views64 pages

ML 1

The document provides an introduction to machine learning concepts, focusing on polynomial curve fitting, probability theory, and decision theory. It discusses various polynomial orders, regularization techniques, and the implications of overfitting, as well as key probability rules and Bayes' theorem. Additionally, it covers parameter estimation methods, cross-validation, and concepts in information theory such as entropy and mutual information.

Uploaded by

wj9hn5fc5c

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views64 pages

ML 1

Uploaded by

wj9hn5fc5c

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 64

1

Machine Learning
Chapter 1: Introduction

孫民
清華大學
Credit : 林嘉文 (Chia-Wen Lin)
2/22/25
3 Polynomial Curve Fitting

Data Set Size:

𝑁 = 10

2/22/25
4 Sum-of-Squares Error Function

2/22/25
5 0th Order Polynomial

2/22/25
6 1st Order Polynomial

2/22/25
7 3rd Order Polynomial

2/22/25
8 9th Order Polynomial

2/22/25
9 Over-fitting

Root-Mean-Square (RMS) Error:

2/22/25
10 Polynomial Coefficients

2/22/25
11 Data Set Size: 𝑁 = 15
9th Order Polynomial

2/22/25
12 Data Set Size: 𝑁 = 100
9th Order Polynomial

2/22/25
13 Regularization

´ Penalize large coefficient values

2/22/25
14 Regularization: ln 𝜆 = −18

2/22/25
15 Regularization: ln 𝜆 = 0

2/22/25
16 Regularization: 𝐸!"# vs. ln 𝜆

2/22/25
17 Polynomial Coefficients

2/22/25
18 Probability Theory
Apples and Oranges

𝐵 𝑜𝑥 𝑖𝑠 𝑏 𝑙𝑢𝑒 𝑜𝑟 𝑟 𝑒𝑑

(F)Ruit is (a)pple or (o)range

2/22/25
19 Probability Theory – two random variables

´Marginal Probability

´Conditional Probability

Joint Probability

2/22/25
20 Probability Theory

´Sum Rule

Product Rule

2/22/25
21 The Rules of Probability

´ Sum Rule

´ Product Rule

2/22/25
22 Bayes’ Theorem
𝑃 𝑋, 𝑌 = 𝑃 𝑌 𝑋 𝑃 𝑋 Product Rule
Since P(X,Y) = P(Y,X), and
𝑃 𝑌, 𝑋 = 𝑃 𝑋 𝑌 𝑃 𝑌
Hence, 𝑃 𝑌 𝑋 𝑃 𝑋 = 𝑃 𝑋 𝑌 𝑃 𝑌

Posterior Likelihood Prior

Evidence

Posterior µ Likelihood × Prior 2/22/25

23 Probability Theory
Apples and Oranges

4
𝑝 𝐵=𝑟 = = 2/5 𝑂𝑣𝑒𝑟𝑎𝑙𝑙 𝑝𝑟𝑜𝑏 𝑜𝑓 𝑝𝑖𝑐𝑘𝑖𝑛𝑔 𝑎𝑛 𝑎𝑝𝑝𝑙𝑒?
10
p(B = b) = 6/10=3/5 p(F=a)?
p(F=a|B=r) = 2/8 = ¼ Use Sum Rule:
P(F=a|B=b) = 3/4 p(F=a) = p(F=a,B=r)+p(F=a,B=b)
Use Product Rule:
p(F=a,B=r) = p(F=a|B=r)p(B=r)
p(F=a,B=b) = p(F=a|B=b)p(B=b)
Hence,
p(F=a) = p(F=a|B=r)p(B=r)+ p(F=a|B=b)p(B=b)
= ¼*2/5+3/4*3/5 = 2/20+9/20=11/20
p(F=o) = 1- p(F=a) = 9/20 2/22/25
24 Probability Theory
Apples and Oranges

2/22/25
25 Probability Densities (continuous variable)
Probability density Cumulative distribution function

2/22/25
26 Transformed Densities

x = g(y)
dx/dy = d g(y)/ dy = g’(y)

2/22/25
27 Expectations

Approximate Expectation
(discrete and continuous)

Conditional Expectation
(discrete)
2/22/25
28 Variances and Covariances

𝑊ℎ𝑒𝑛 𝑥, 𝑦 𝑎𝑟𝑒 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡

E(xy) = E(x)E(y). Hence,
Cov[x,y] = 0

2/22/25
29 The Gaussian Distribution

2/22/25
30 Gaussian Mean and Variance

2/22/25
31 The Multivariate Gaussian
Σ 𝑖𝑠 𝑐𝑜𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑚𝑎𝑡𝑟𝑖𝑥. 𝐷𝑖𝑎𝑔𝑛𝑎𝑙 𝑣𝑎𝑙𝑢𝑒𝑠 𝑎𝑟𝑒 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒𝑠 𝜎

2/22/25
32 Gaussian Parameter Estimation

Likelihood function

𝐱 = 𝑥! , 𝑥" , ⋯ , 𝑥# If x is i.i.d.

2/22/25
33 Two Principles for Estimating Parameters

´Maximum likelihood estimation (ML)

Choose 𝛉 that maximizes the probability (likelihood)
of observed data
.!" = argmax 𝑃(𝐷|𝛉)
𝛉
𝛉

´Maximum a posteriori estimation (MAP)

Choose 𝛉 that is most probable given prior
probability and data
𝑃 𝐷𝛉 𝑃 𝛉
.!$%
𝛉 = argmax 𝑃 𝛉 𝐷 = argmax
𝛉 𝛉 𝑃(𝐷)
2/22/25
34 Maximum (Log) Likelihood
𝐱 = 𝑥! , 𝑥" , ⋯ , 𝑥# , 𝐱 is i.i.d.
𝛉$% = argmax 𝑝 𝐱 𝛉 ?
𝛉

(log-likelihood)

𝛉$% = argmax ln 𝑝 𝐱 𝛉
𝛉

(sample mean) (sample variance)

2/22/25
%
35 Properties of 𝜇"$ and 𝜎"$

(unbiased)

(biased)

2/22/25

(unbiased)
36 Curve Fitting Re-visited

𝛽: inverse variance (precision)

2/22/25
37 Maximum Likelihood

Determine 𝐰$% by minimizing sum-of-squares error, 𝐸 𝐰 .

1 #
"
𝐰$% = arg min 5 𝑦 𝑥( , 𝐰 − 𝑡(
𝐰 2 ()!

2/22/25
ML Curve Fitting
38

Green: Actual model 2/22/25

Red: Predicted model

39 MAP: A Step towards Bayes

Posterior Likelihood Prior

Determine 𝐰$*+ by minimizing regularized sum-of-squares error, 𝐸9 𝐰 .

Eq. (1.4) 2/22/25

40 Bayesian Curve Fitting

Predictive
Distribution
W for both mu
And beta. 𝑝 𝑡 𝑥, 𝐰, 𝛽 = 𝒩 𝑡 𝑦 𝑥, 𝐰 , 𝛽 ,!

(Refer to Sec. 3.3 for detailed derivation)

2/22/25
ML Curve Fitting:
41 Bayesian Predictive Distribution

2/22/25
ML Curve Fitting Bayesian Curve Fitting
42 Cross Validation for Model Selection

´5-fold cross-validation -> split the training

data into 5 equal folds
´4 of them for training and 1 for validation

2/22/25
43 Cross Validation

2/22/25
44 Curse of Dimensionality

2/22/25
Curse of Dimensionality
45

Polynomial curve fitting, M = 3

𝐷R # 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡𝑠

𝑉𝑜𝑙𝑢𝑚𝑛 𝑜𝑓 𝑆𝑝ℎ𝑒𝑟𝑒 𝑤𝑖𝑡ℎ 𝑟𝑎𝑑𝑖𝑢𝑠 𝑟

VD 𝑟 = 𝐾𝐷rD
[VD(1)-VD(1-∈)]/VD(1) = 1-(1- ∈)D

2/22/25
46 Decision Theory

Given (x,t), predict t give new x.

´ Inference step
´ Determine either or .

´ Decision step
´ For given x, determine optimal t or decision/action based
on t.

2/22/25
47 Minimum Misclassification Rate

Assuming t as C1 or C2 class

Change x_hat to
X0, blue/green fixed,
but red reduced.

2/22/25

Red/Green Blue
48 Minimum Expected Loss
´ Example: classify medical images as ‘cancer’
or ‘normal’
Decision

Truth

When a cancer patient is classified as normal -> 1000 loss

2/22/25
49 Minimum Expected Loss

True class k, but

Assign class j

Regions are chosen to minimize

Eliminate common factor p(x)

2/22/25
50 Reject Option – avoid making decision

2/22/25
51 Generative vs Discriminative

´ Generative approach:
Model
Use Bayes’ theorem

´ Discriminative approach:
Model directly

2/22/25
52 Why Separate Inference and Decision?

• Minimizing risk (loss matrix may change over time)

• Reject option
• Unbalanced class priors
• Combining models

2/22/25
53 Decision Theory for Regression

´ Inference step
´ Determine 𝑝 𝐱, 𝑡 .

´ Decision step
´ For given x, make optimal prediction, y(x), for t.

´ Loss function:

2/22/25
54 The Squared Loss Function

𝔼 𝐿 is minimized when

2/22/25
55
Information Theory

2/22/25
56 Entropy

h(x) is a monotonic function of p(x),

and expresses the information content (>=0).

ℎ 𝑥 = −𝑙𝑜𝑔2 𝑝(𝑥)

If x,y independent, p(x,y) = p(x) p(y),

h(x,y) = -log2p(x) -log2p(y) = h(x)+h(y)

H(x) is the expectation of h(x)

2/22/25
57 Entropy

Important quantity in
• coding theory
• statistical physics
• machine learning

2/22/25
58 Entropy

2/22/25
59 Entropy - coding theory

´ Coding theory: x discrete with 8 possible

states; how many bits to transmit the
state of x?
´ All states equally likely

Code: 000, 001, 010, 011, 100, 101, 110, 111 2/22/25
60 Entropy

2/22/25
61 Entropy - statistical physics
In how many ways can N identical
objects be allocated M bins?
Note that ni balls in ith bin.
# ways to allocate (multiplicity)

pi is the prob that ball assigned to ith bin.

Entropy maximized when
2/22/25
64 Differential Entropy – continuous x
Put bins of width ¢ along the real line

Differential entropy maximized (for fixed

& 𝜇) when

in which case (only related to 𝜎)

2/22/25
65 Conditional Entropy

ℎ 𝑦|𝑥 = −𝑙𝑜𝑔2 𝑝(𝑦|𝑥)

H[x]

2/22/25
66 The Kullback-Leibler Divergence (Relative Entropy)
Unknow p(x) modeled by q(x). Additional info required

2/22/25
67 Mutual Information
ℎ 𝑥 = −𝑙𝑜𝑔2 𝑝(𝑥)

If x,y independent, p(x,y) = p(x) p(y),

h(x,y) = -log2p(x) -log2p(y) = h(x)+h(y)

If x, y not independent

2/22/25

Unit 2
No ratings yet
Unit 2
88 pages
Lecture 03 Bayes Classifier With Prob Concepts
No ratings yet
Lecture 03 Bayes Classifier With Prob Concepts
70 pages
Pks Machine Learning Module 2 2
No ratings yet
Pks Machine Learning Module 2 2
41 pages
Probability and Statistics Cheat Sheet
100% (3)
Probability and Statistics Cheat Sheet
28 pages
Applied Statistics - Lecture 1: Mario Beraha
No ratings yet
Applied Statistics - Lecture 1: Mario Beraha
52 pages
MobiSTOP Ultima 02242 R8 EN PDF
No ratings yet
MobiSTOP Ultima 02242 R8 EN PDF
1 page
Assignment HBEC4503 Action Research in Early Childhood Education Assignment 2 May 2019 Semester
No ratings yet
Assignment HBEC4503 Action Research in Early Childhood Education Assignment 2 May 2019 Semester
10 pages
21Csc305P-Machine Learning: Offline
No ratings yet
21Csc305P-Machine Learning: Offline
8 pages
Lec1 Mathreview
No ratings yet
Lec1 Mathreview
61 pages
Chapter 6 Barriers To International Trade
No ratings yet
Chapter 6 Barriers To International Trade
13 pages
Lecture Notes MAI
No ratings yet
Lecture Notes MAI
114 pages
Brief Intro To ML PDF
No ratings yet
Brief Intro To ML PDF
236 pages
Principles of Statistics
No ratings yet
Principles of Statistics
113 pages
Lecture Notes MAI
No ratings yet
Lecture Notes MAI
111 pages
EDAN96 2024 Last Lecture-1
No ratings yet
EDAN96 2024 Last Lecture-1
78 pages
Lecture # 2-1 Probabilistic Models
No ratings yet
Lecture # 2-1 Probabilistic Models
40 pages
Module-2 Notes-Bcs602
No ratings yet
Module-2 Notes-Bcs602
18 pages
Probability and Statistics: Cookbook
No ratings yet
Probability and Statistics: Cookbook
28 pages
Naive Bayes Classifier and Other Topics
No ratings yet
Naive Bayes Classifier and Other Topics
52 pages
2 - Maximum Likelihood
No ratings yet
2 - Maximum Likelihood
20 pages
Week 6 v1.61 (Hidden) - Revision, CW1, and Probabilistic Graphical Models
No ratings yet
Week 6 v1.61 (Hidden) - Revision, CW1, and Probabilistic Graphical Models
65 pages
L09 Learning I Bayesian Learning
No ratings yet
L09 Learning I Bayesian Learning
66 pages
Statistics Notes Based On Pattern Recognition and Machine Learning (PRML)
No ratings yet
Statistics Notes Based On Pattern Recognition and Machine Learning (PRML)
5 pages
Formulario Ep Probability and Statistics
No ratings yet
Formulario Ep Probability and Statistics
28 pages
Chap1 Bishop
No ratings yet
Chap1 Bishop
35 pages
Cs Ai Lecture Notes 02
No ratings yet
Cs Ai Lecture Notes 02
103 pages
PRML Exercise Solutions Guide
No ratings yet
PRML Exercise Solutions Guide
87 pages
Changing Levels of Meaning and Experience - Steve Andreas
No ratings yet
Changing Levels of Meaning and Experience - Steve Andreas
5 pages
Module 2 Notes Bcs602
No ratings yet
Module 2 Notes Bcs602
19 pages
Grade 9 Math: Understanding Mean
No ratings yet
Grade 9 Math: Understanding Mean
8 pages
Introduction ML
No ratings yet
Introduction ML
65 pages
MLP RL1
No ratings yet
MLP RL1
6 pages
Lecture 03 - Feedforward Networks - 4p
No ratings yet
Lecture 03 - Feedforward Networks - 4p
19 pages
04 Bayes Classification Rule
No ratings yet
04 Bayes Classification Rule
46 pages
ML - Lec 2 - Review of Probability and Statistics
No ratings yet
ML - Lec 2 - Review of Probability and Statistics
30 pages
Learning Probability Distributions
No ratings yet
Learning Probability Distributions
47 pages
Lecture5 Maximum Likelihood
No ratings yet
Lecture5 Maximum Likelihood
13 pages
ML 20230316 1
No ratings yet
ML 20230316 1
9 pages
PRML RefSheet
No ratings yet
PRML RefSheet
6 pages
Machine Learning Course Overview
No ratings yet
Machine Learning Course Overview
91 pages
Probability and Statistics Cookbook
No ratings yet
Probability and Statistics Cookbook
28 pages
Probability Theory For Machine Learning: Chris Cremer September 2015
No ratings yet
Probability Theory For Machine Learning: Chris Cremer September 2015
40 pages
Curs 1 SSL - Introduction
No ratings yet
Curs 1 SSL - Introduction
57 pages
Sam Roweis Probx
No ratings yet
Sam Roweis Probx
12 pages
VFD Application Checklist
No ratings yet
VFD Application Checklist
3 pages
7095 Aow10t Exemple
No ratings yet
7095 Aow10t Exemple
2 pages
Machine Learning Handbook - Radivojac and White
No ratings yet
Machine Learning Handbook - Radivojac and White
108 pages
Artificial Intelligence and Machine Learning
No ratings yet
Artificial Intelligence and Machine Learning
55 pages
WinDNC V06 02 NewFeatures en
100% (3)
WinDNC V06 02 NewFeatures en
2 pages
The Incredible Hulk
No ratings yet
The Incredible Hulk
14 pages
Toc 1
No ratings yet
Toc 1
17 pages
Lecture Notes Week 2
No ratings yet
Lecture Notes Week 2
10 pages
2223hk1 Slide01 ML2022-2
No ratings yet
2223hk1 Slide01 ML2022-2
23 pages
BCS-DS-602: Machine Learning: Dr. Sarika Chaudhary Associate Professor Fet-Cse
No ratings yet
BCS-DS-602: Machine Learning: Dr. Sarika Chaudhary Associate Professor Fet-Cse
18 pages
Akshatha Paper
No ratings yet
Akshatha Paper
7 pages
Probability and Statistics - Cookbook
No ratings yet
Probability and Statistics - Cookbook
28 pages
Dealing With Uncertainty P (X - E) : Probability Theory The Foundation of Statistics
No ratings yet
Dealing With Uncertainty P (X - E) : Probability Theory The Foundation of Statistics
34 pages
MCB Types
No ratings yet
MCB Types
3 pages
Management by Walking Around
100% (2)
Management by Walking Around
7 pages
Stats Cheat Sheet
No ratings yet
Stats Cheat Sheet
28 pages
Simulation Thickener
No ratings yet
Simulation Thickener
11 pages
Module 7: Financial Fitness: Participant's Handbook
No ratings yet
Module 7: Financial Fitness: Participant's Handbook
24 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
86 pages
Drilling Machine Mechanics
No ratings yet
Drilling Machine Mechanics
14 pages
MLE and MAP Classifier
No ratings yet
MLE and MAP Classifier
55 pages
Lec11 Introduction2BayesianStatistics
No ratings yet
Lec11 Introduction2BayesianStatistics
48 pages
Reflective Essay On Module
No ratings yet
Reflective Essay On Module
5 pages
Man Xtvsuite en
No ratings yet
Man Xtvsuite en
74 pages
Adam Sanchez - Resume-References
No ratings yet
Adam Sanchez - Resume-References
3 pages
Patiala Army Recruitment Rally 2020
No ratings yet
Patiala Army Recruitment Rally 2020
9 pages
A Probability and Statistics Cheatsheet
No ratings yet
A Probability and Statistics Cheatsheet
28 pages
11 Ergonomics in Osh
No ratings yet
11 Ergonomics in Osh
9 pages
Consent Document For Enrolling Adult Participants in A Research Study
No ratings yet
Consent Document For Enrolling Adult Participants in A Research Study
3 pages
9 RWS PT 4 Math Nida 202425
No ratings yet
9 RWS PT 4 Math Nida 202425
2 pages
Chapter 3 BJT
No ratings yet
Chapter 3 BJT
58 pages
Math 8 Q1 Week 2.2
No ratings yet
Math 8 Q1 Week 2.2
6 pages
Parabola Assignment Solutions
No ratings yet
Parabola Assignment Solutions
34 pages
Os Lec 4 Process
No ratings yet
Os Lec 4 Process
7 pages
2023 Reports - Luminate On Diversity
No ratings yet
2023 Reports - Luminate On Diversity
28 pages
Wind Meter App for Enthusiasts
No ratings yet
Wind Meter App for Enthusiasts
9 pages
Service Manual: Viewsonic Pjd6211
No ratings yet
Service Manual: Viewsonic Pjd6211
60 pages