0% found this document useful (0 votes)

99 views10 pages

Statistical ML Course Overview

This document outlines the goals, format, and key concepts of the Statistical Machine Learning course, including developing predictive models from examples, applying machine learning to tasks like classification and regression, and learning algorithms like discriminative and generative learning that estimate optimal predictors when the data distribution is unknown. The course aims to provide skills for constructing machine learning systems by combining appropriate models and methods for applications such as text classification, computer vision, and medical diagnosis.

Uploaded by

apopop

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

99 views10 pages

Statistical ML Course Overview

Uploaded by

apopop

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Statistical Machine Learning (BE4M33SSU)

Lecture 1.
Czech Technical University in Prague
Course format
2/10
Teachers: Jan Drchal, Boris Flach, Vojtech Franc and Daniel Bonilla
Format: 1 lecture & 1 tutorial per week (6 credits), tutorials of two types
seminars: discussing solutions of theoretical assignments (published a week before the

class). You are expected to work on them in advance.

practical labs: explaining and discussing practical homeworks, i.e. implementation of

selected methods in Python (or Matlab). You have to submit

1. a report in PDF format (typeset preferably in LaTeX). Exception: if necessary, you
may include lengthy formula derivations as handwritten scans.
2. your code either as source file or as python notebook. The code must be
executable.

Code either as source file or as python notebook. The code must be executable.
Grading: 40% homeworks + 60% written exam = 100% (+ bonus points)
Prerequisites:
probability theory and statistics (A0B01PSI)

pattern recognition and machine learning (AE4B33RPZ)

optimisation (AE4B33OPT)

More details: https://cw.fel.cvut.cz/wiki/courses/be4m33ssu/start

Goals
3/10
The aim of statistical machine learning is to develop systems (models and algorithms) for
solving prediction tasks given a set of examples and some prior knowledge about the task.

Machine learning has been successfully applied e.g. in areas

text and document classification,

speech recognition and natural language processing,

computational biology (genes, proteins) and biological imaging & medical diagnosis

computer vision,

fraud detection, network intrusion,

and many others

You will gain skills to construct learning systems for typical applications by successfully
combining appropriate models and learning methods.
Characters of the play
4/10
object features x ∈ X are observable; x can be:

a categorical variable, a scalar, a real valued vector, a tensor, a sequence of values, an

image, a labelled graph, . . .
state of the object y ∈ Y is usually hidden; y can be: see above

prediction strategy (a.k.a. inference rule) h : X → Y; depending on the type of Y:

• y is a categorical variable ⇒ classification

• y is a real valued variable ⇒ regression
training examples T = {(x, y) | x ∈ X , y ∈ Y}

loss function ` : Y × Y → R+ penalises wrong predictions,

i.e. `(y, h(x)) is the loss for predicting y 0 = h(x) when y is the true state

Goal: optimal prediction strategy h : X → Y that minimises the loss

Q: give meaningful application examples for combinations of different X , Y and

related loss functions
Statistical machine learning
5/10
Main assumption:
X, Y are random variables,

X, Y are related by an unknown joint p.d.f. p(x, y),

we can collect examples (x, y) drawn from p(x, y).

Typical concepts:
regression: Y = f (X) + , where f is unknown and is a random error,

classification: p(x, y) = p(y)p(x | y), where p(y) is the prior class probability and

p(x | y) the conditional feature distribution.

Consequences and problems

the inference rule h(X) and the loss `(Y, h(X)) become random variables.

risk of an inference rule h(X) ⇒ expected loss

XX
R(h) = E[`(Y, h(X))] = p(x, y)`(y, h(x))
x∈X y∈Y

how to estimate R(h) if p(x, y) is unknown?

how to choose an optimal predictor h(x) if p(x, y) is unknown?

Statistical machine learning
6/10
Estimating R(h):
m
i i

collect an i.i.d. test sample S = (x , y ) ∈ X × Y | i = 1, . . . , m drawn from the
distribution p(x, y),

estimate the risk R(h) of the strategy h by the empirical risk

m
1 X i
R(h) ≈ RS m (h) = `(y , h(xi))
m i=1

Q: how strong can they deviate from each other? (see next lectures)

P |RS m (h) − R(h)| > ≤??
Statistical machine learning
7/10
Choosing an optimal inference rule h(x)

If p(x, y) is known:

The smallest possible risk is

XX X X
∗
R = inf R(h) = inf p(x, y)`(y, h(x)) = p(x) inf
0
p(y | x)`(y, y 0)
h∈Y X h∈Y X y ∈Y
x∈X y∈Y x∈X y∈Y

The corresponding best possible inference rule is the Bayes inference rule
X
∗
h (x) = arg min p(y | x)`(y, y 0)
y 0 ∈Y y∈Y

But p(x, y) is not known and we can only collect examples drawn from it. We need:

Learning algorithms that use training data and prior assumptions/knowledge about the task
Learning types
8/10
Training data:
m
i i
if T = (x , y ) ∈ X × Y | i = 1, . . . , m ⇒ supervised learning

m
i
if T = x ∈ X | i = 1, . . . , m ⇒ unsupervised learning

m1 S m2
m
if T = Tl Tu , with labelled training data Tlm1 and unlabelled training data Tum2

⇒ semi-supervised learning

Prior knowledge about the task:

Discriminative learning: assume that the optimal inference rule h∗ is in some class of

rules H ⇒ replace the true risk by empirical risk

1 X
RT (h) = `(y, h(x))
|T |
(x,y)∈T

and minimise it w.r.t. h ∈ H, i.e. h∗T = arg min RT (h).

h∈H
Q: How strong can R(h∗T ) deviate from R(h )? How does this deviation depend on H?
∗

P |R(h∗T ) − R(h∗)| > ≤??
Learning types
9/10
Generative learning: assume that the true p.d. p(x, y) is in some parametrised family

of distributions, i.e. p = pθ∗ ∈ PΘ ⇒ use the training set T to estimate θ ∈ Θ:

1. θT∗ = arg max log pθ (T ), i.e. maximum likelihood estimator,
θ∈Θ

2. set h∗T = hθT∗ , where hθ denotes the Bayes inference rule for the p.d. pθ .
Q: How strong can θT∗ deviate from θ∗? How does this deviation depend on PΘ?

Possible combinations (training data vs. learning type)

discr. gener.
superv. yes yes
semi-sup. (yes) yes
unsuperv. no yes

In this course:
discriminative: Support Vector Machines, Deep Neural Networks

generative: mixture models, Hidden Markov Models

other: Bayesian learning, Ensembling

Example: Classification of handwritten digits
10/10

x ∈ X - grey valued images, 28x28, y ∈ Y - categorical variable with 10 values

discriminative: Specify a class of strategies H and a loss function `(y, y 0). How would

you estimate the optimal inference rule h∗ ∈ H?

generative: Specify a parametrised family pθ (x, y), θ ∈ Θ and a loss function `(y, y 0).

How would you estimate the optimal θ∗ by using the MLE? What is the Bayes inference
rule for pθ∗ ?

Vedic Numerology Course Guide
91% (64)
Vedic Numerology Course Guide
114 pages
Business Simulation Assessment 2017 18 PDF
No ratings yet
Business Simulation Assessment 2017 18 PDF
6 pages
PN325 PDS
No ratings yet
PN325 PDS
4 pages
Teaching Vocabulary To Advanced Students
100% (1)
Teaching Vocabulary To Advanced Students
26 pages
Statistical Learning Theory Guide
No ratings yet
Statistical Learning Theory Guide
4 pages
Machine Learning for Math Students
No ratings yet
Machine Learning for Math Students
60 pages
Lecture 1 2022
No ratings yet
Lecture 1 2022
55 pages
Lecture 2 2022
No ratings yet
Lecture 2 2022
34 pages
Lecture 1
No ratings yet
Lecture 1
56 pages
Supervised Learning Cheatsheet
No ratings yet
Supervised Learning Cheatsheet
4 pages
practicalMachineLearning Lecture3
No ratings yet
practicalMachineLearning Lecture3
25 pages
ML 01
No ratings yet
ML 01
24 pages
Statistical Machine Learning-The Basic Approach and Current Research Challenges
No ratings yet
Statistical Machine Learning-The Basic Approach and Current Research Challenges
35 pages
Lecture 1
No ratings yet
Lecture 1
47 pages
Machine Learning PDF
No ratings yet
Machine Learning PDF
77 pages
Cs 171 18 IntroLearning Old
No ratings yet
Cs 171 18 IntroLearning Old
47 pages
Assignment 1
No ratings yet
Assignment 1
6 pages
Statistical Machine Learning-The Basic Approach and Current Research Challenges
No ratings yet
Statistical Machine Learning-The Basic Approach and Current Research Challenges
35 pages
Lecture16 Crossvalidation
No ratings yet
Lecture16 Crossvalidation
32 pages
DSA5102X Lecture1
No ratings yet
DSA5102X Lecture1
51 pages
Unit 5 - Machine Learning - WWW - Rgpvnotes.in
No ratings yet
Unit 5 - Machine Learning - WWW - Rgpvnotes.in
17 pages
Statistical Learning: First Steps: Sasha Rakhlin
No ratings yet
Statistical Learning: First Steps: Sasha Rakhlin
26 pages
DSA5105 Lecture1
No ratings yet
DSA5105 Lecture1
51 pages
2223hk1 Slide01 ML2022-2
No ratings yet
2223hk1 Slide01 ML2022-2
23 pages
CS 601 Machine Learning Unit 5
No ratings yet
CS 601 Machine Learning Unit 5
18 pages
Cheatsheet Supervised Learning
100% (1)
Cheatsheet Supervised Learning
4 pages
07 Intro To ML
No ratings yet
07 Intro To ML
38 pages
Shawe-Taylor-Slides Statiscal Learning Theory For Modern Machine Learning
No ratings yet
Shawe-Taylor-Slides Statiscal Learning Theory For Modern Machine Learning
195 pages
Intro To Machine Learning Lecture Notes2
No ratings yet
Intro To Machine Learning Lecture Notes2
7 pages
Mathematical Foundations of Computational Linguistics: Manfred Klenner and Jannis Vamvas
No ratings yet
Mathematical Foundations of Computational Linguistics: Manfred Klenner and Jannis Vamvas
32 pages
Machine Learning Class Notes: SVM & Bayesian Learning
No ratings yet
Machine Learning Class Notes: SVM & Bayesian Learning
16 pages
MLSM Lecture1 050923
No ratings yet
MLSM Lecture1 050923
37 pages
Lec SML Basic Theory 2
No ratings yet
Lec SML Basic Theory 2
49 pages
Intro DL 01
No ratings yet
Intro DL 01
64 pages
Assignment #1
No ratings yet
Assignment #1
2 pages
Slide07 Bayes
No ratings yet
Slide07 Bayes
51 pages
Week 3
No ratings yet
Week 3
56 pages
Bayesian Machine Learning 1663238292
No ratings yet
Bayesian Machine Learning 1663238292
152 pages
3 LogisticRegression
No ratings yet
3 LogisticRegression
30 pages
Poly Aml
No ratings yet
Poly Aml
76 pages
BITS F464 ML Lecture Notes
No ratings yet
BITS F464 ML Lecture Notes
86 pages
CS 601 Machine Learning Unit 5
No ratings yet
CS 601 Machine Learning Unit 5
18 pages
DAC ML Tutorial Final Deck
No ratings yet
DAC ML Tutorial Final Deck
150 pages
Slide 1
No ratings yet
Slide 1
37 pages
Ai512 Book
No ratings yet
Ai512 Book
127 pages
Cours1 ML
No ratings yet
Cours1 ML
41 pages
Emiprical Risk Minimization
No ratings yet
Emiprical Risk Minimization
12 pages
2021 10 11 - Intro ML - Inserm
No ratings yet
2021 10 11 - Intro ML - Inserm
41 pages
Lec4 Oct12 2022 PracticalNotes LinearRegression
No ratings yet
Lec4 Oct12 2022 PracticalNotes LinearRegression
34 pages
Supervised Learning Cheatsheet
No ratings yet
Supervised Learning Cheatsheet
2 pages
Machine Learning - Unit 2
No ratings yet
Machine Learning - Unit 2
104 pages
Wa0002.
No ratings yet
Wa0002.
24 pages
Matrix Properties
No ratings yet
Matrix Properties
53 pages
Lecture 03 Bayes Classifier With Prob Concepts
No ratings yet
Lecture 03 Bayes Classifier With Prob Concepts
70 pages
ECE 449 Notes
No ratings yet
ECE 449 Notes
5 pages
01 Intro
No ratings yet
01 Intro
22 pages
Basic Supervised ML Algorithms
No ratings yet
Basic Supervised ML Algorithms
34 pages
Machine Learning Optimization Techniques
No ratings yet
Machine Learning Optimization Techniques
30 pages
Curs 1 SSL - Introduction
No ratings yet
Curs 1 SSL - Introduction
57 pages
MIT18 657F15 LecNote PDF
No ratings yet
MIT18 657F15 LecNote PDF
194 pages
CS229 Lecture 3 PDF
100% (1)
CS229 Lecture 3 PDF
35 pages
ML 3
No ratings yet
ML 3
66 pages
Sample Guard House Drawing-Model
No ratings yet
Sample Guard House Drawing-Model
1 page
U.S.S. Europa Starship Specs
No ratings yet
U.S.S. Europa Starship Specs
1 page
Biology of Stem Cells: An Overview: Pedro C. Chagastelles and Nance B. Nardi
No ratings yet
Biology of Stem Cells: An Overview: Pedro C. Chagastelles and Nance B. Nardi
5 pages
7095 Aow10t Exemple
No ratings yet
7095 Aow10t Exemple
2 pages
Drilling Machine Mechanics
No ratings yet
Drilling Machine Mechanics
14 pages
Ielts5 - Santiago Suarez
No ratings yet
Ielts5 - Santiago Suarez
1 page
Harley-Davidson Procurement Software Selection
No ratings yet
Harley-Davidson Procurement Software Selection
3 pages
Export Promotion
No ratings yet
Export Promotion
7 pages
A212 - MC 10 - PROVISIONS, CLCA - Student
No ratings yet
A212 - MC 10 - PROVISIONS, CLCA - Student
4 pages
According To Saunders Et Al
No ratings yet
According To Saunders Et Al
13 pages
Maths Grade-8 Model 2015
No ratings yet
Maths Grade-8 Model 2015
7 pages
Free Incoming Inspection Template
No ratings yet
Free Incoming Inspection Template
5 pages
4 VMXQ J9 R Qyj 7 Xo XUaj EB
No ratings yet
4 VMXQ J9 R Qyj 7 Xo XUaj EB
49 pages
Personal Development: 1 Quarter: Module 2 Developing The Whole Person
100% (2)
Personal Development: 1 Quarter: Module 2 Developing The Whole Person
10 pages
Poultry Group's Half-Year Report
No ratings yet
Poultry Group's Half-Year Report
2 pages
DICA Lab Manual PDF
No ratings yet
DICA Lab Manual PDF
64 pages
Post Test Questionnaire EOC EC
No ratings yet
Post Test Questionnaire EOC EC
4 pages
Multi2sim Quickstart
No ratings yet
Multi2sim Quickstart
10 pages
Pottery Basics
No ratings yet
Pottery Basics
29 pages
Can Profitability & Morality Co-Exist
75% (4)
Can Profitability & Morality Co-Exist
38 pages
Noting and Drafting Skills
100% (2)
Noting and Drafting Skills
33 pages
Finance Theory Exam
No ratings yet
Finance Theory Exam
6 pages
Tyre Industry in India - Me Project
100% (2)
Tyre Industry in India - Me Project
17 pages
AP0070462152019
No ratings yet
AP0070462152019
1 page
Pumping Station Design Guidelines
100% (1)
Pumping Station Design Guidelines
8 pages
Drg. Reza Fajarsyah Putra, SP - BM Prodi Ikg FK Univ Yarsi
No ratings yet
Drg. Reza Fajarsyah Putra, SP - BM Prodi Ikg FK Univ Yarsi
38 pages

Statistical ML Course Overview

Uploaded by

Statistical ML Course Overview

Uploaded by

Statistical Machine Learning (BE4M33SSU)

class). You are expected to work on them in advance.

selected methods in Python (or Matlab). You have to submit

pattern recognition and machine learning (AE4B33RPZ)

More details: https://cw.fel.cvut.cz/wiki/courses/be4m33ssu/start

Machine learning has been successfully applied e.g. in areas

speech recognition and natural language processing,

fraud detection, network intrusion,

and many others

a categorical variable, a scalar, a real valued vector, a tensor, a sequence of values, an

prediction strategy (a.k.a. inference rule) h : X → Y; depending on the type of Y:

• y is a categorical variable ⇒ classification

loss function ` : Y × Y → R+ penalises wrong predictions,

Goal: optimal prediction strategy h : X → Y that minimises the loss

Q: give meaningful application examples for combinations of different X , Y and

X, Y are related by an unknown joint p.d.f. p(x, y),

we can collect examples (x, y) drawn from p(x, y).

p(x | y) the conditional feature distribution.

Consequences and problems

risk of an inference rule h(X) ⇒ expected loss

how to estimate R(h) if p(x, y) is unknown?

how to choose an optimal predictor h(x) if p(x, y) is unknown?

estimate the risk R(h) of the strategy h by the empirical risk

The smallest possible risk is

Prior knowledge about the task:

rules H ⇒ replace the true risk by empirical risk

and minimise it w.r.t. h ∈ H, i.e. h∗T = arg min RT (h).

of distributions, i.e. p = pθ∗ ∈ PΘ ⇒ use the training set T to estimate θ ∈ Θ:

Possible combinations (training data vs. learning type)

generative: mixture models, Hidden Markov Models

other: Bayesian learning, Ensembling

x ∈ X - grey valued images, 28x28, y ∈ Y - categorical variable with 10 values

you estimate the optimal inference rule h∗ ∈ H?

You might also like