0% found this document useful (0 votes)

9 views24 pages

Machine Learning Lecture Guide

Lecture 2 of SYSC4906 focuses on key mathematical concepts in machine learning, including vector-matrix multiplication, random variables, and model-based vs instance-based learning. The session includes in-class exercises on probability mass functions, classification, and regression, as well as practical assignments to enhance understanding. Students are expected to review relevant literature and complete mathematical exercises to solidify their grasp of the material.

Uploaded by

Esraa Al dn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views24 pages

Machine Learning Lecture Guide

Uploaded by

Esraa Al dn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

4415

-
SYSC4906
Introduction to Machine Learning

Lecture 2

Prof James Green

[email protected]
Systems and Computer Engineering, Carleton University
se : input features of unit u in layero

vector dot the same dimension

product : must
Marix-Vector
Multiplication (matrix) x (Vector) = Vector

# Vector on the right (column vectors

=
④ same as rows

-
L
of matrix

23 matrix x 3D vector = 2D Vector

-
(11)

S-
ll 2) (1 , 3)
wei
I Ie
-

W se -
vo vo
,

(2 , 3)
w
x(3)
&

I
w()
(x 2e(3) ul" 2 w(13)
I
2)
=
+ +

je() y(21) + xe() (2 , 2)

+ 2(3)p(2 , 3)

Su
=

two rows
E) Vector on the left (row vector)

if the vector be is on the left side of the multiplication

it has to the transposed :

j
+
=

(x)x() =
④ same as # of Columns of
matrix
2D Vector x 23 = 3D Vector

= +
w =

(als()(
line
=

2 3
Y
De : domain make > max value of f(20)

Y : codomain
argmax => value ze gives max f(x)

pmf :
probability mass function (discrete) pmf = 1

pdf :
probability distribution function (continuous) [pdf = 1

classification :
assigning label to unlabeled example (discrete)

predicting valued label (continuous)

regression :

Model-based learnt
to create model with parameters
· uses
train data

· ex : sum

Instance-based learning
· uses whole dataset as the model

·
ex : NN
Learning Objectives for Lecture 2
• Review and understand the key mathematical concepts underlying the
course material, including:
• Vector-matrix multiplication
• Mathematical notation
• Random variables decision trees)
(ex threshold
in
• Unbiased estimators-learned from data :

• Differentiate the /
following pairs of concepts: depth
programmer (ex
tree
set by
•
· ,

Model
- parameter vs. hyperparameter
- >
Kernels -
• Classification vs. regression
wx b
K-nearest neight)
- y= -

• Model-based learning vs. instance-based learning >

- exik-nearest neighbor
• Shallow vs. deep learning
• Understand how to load and run a Jupyter notebook in Google Colab
• Covered in the tutorial
Lecture 2
Pre-lecture Assignment:
• Read Chapter 2 of the 100pMLB
• Review the Wikipedia page for Linear Equation. You should be competent
in all of its contents.
• Look over the "Key Concepts and Tools" section of Google's "Prerequisites
and Prework" for their Machine Learning Crash Course. (just the pre-req's,
not the entire course!)

In-class activities:
• Finish Lecture 1 material
• Complete a series of mathematical review exercises alone and in groups.
Key terms
• Scalar, vector, dimension, matrix, set, intersection, capital pi notation, dot-
product, transpose, function, domain, codomain, local minimum, interval,
open interval, global minimum, derivative, differentiation, chain rule,
gradient, partial derivatives, random variable, discrete, continuous,
discrete RV, probability distribution, probability mass function, continuous
RV, probability density function, expectation, mean, average, expected
value, statistics, standard deviation, variance, examples, sample, dataset,
unbiased estimators, sample statistic, sample mean, Bayes’ Rule (Bayes’
Theorem), prior, maximum a posteriori, classification, label, unlabeled
example, classification learning algorithm, labeled examples, model,
classes, binary classification (binomial), multiclass classification
(multinomial), regression, regression learning algorithm, parameters, kNN,
shallow learning, neural network, layer, deep neural network, deep
learning.
>
-
Set is a collection of unique
-

values

>
-
range of y values
T
-

>
-

Y =
[ -
1
, 1]
domain =
>
- + (5) =
4 (minimum
value
of flis)

=
4
>
-
In-class exercise (15min alone, 10 min pairs)
1. a) Sketch the probability mass function (pmf) associated with the outcomes of rolling two six-sided dice (i.e. sum of two dice). b) Explain why
you would use a pmf instead of a pdf for this example.
2. A bag contains 10 marbles, 3 blue and 7 yellow. If two marbles are drawn without replacement,
a) What is the probability that both marbles are yellow?
b) What is the probability that only one marble is yellow?
c) What is the probability that at least one marble is yellow?
-D
3. Assume X is a binary indicator variable, such that X {0,1}. If the expected value of X, E[X], is 0.91, what is Pr(X=0)?
4. What is the dot product of the two vectors [3 20 1] and [16 17 18]?
5. What is , if
A = 6
6. Assume that you are building a classifier to distinguish rats from mice that live in Southam Hall, based on the lengths of their tails. You set up
traps in SA classrooms overnight and catch 10 mice and 15 rats. You then measure the length of each animal’s tail and arrive at the following
measurements:
Mice: 8, 5, 6, 2, 3, 9, 4, 5, 8, 3 Rats: 8, 11, 9, 17, 14, 21, 16, 18, 14, 16, 12, 19, 13, 18, 17
a) What is the prior probability of an animal being a mouse (i.e. chance of being a mouse, before measuring its tail)?
b) Assume that your tail length data are normally distributed for each animal type. That is, p(x| =mouse) ~ N(um, m), where x=tail length and is the true
class/species of the animal). Estimate the mean and standard deviation of tail lengths for each animal type.
c) On the same axis, sketch the distribution of tail lengths for each class.
(Recall that 2 ~2.5 and that 95% of probability mass falls within 2 )

d) What is the probability density of catching an animal with a tail length of 16.5, given that the animal is a rat? (note we use “density” here, since this is a
continuous variable and the probability of measuring exactly 16.5000000 is zero) (https://www.danielsoper.com/statcalc/calculator.aspx?id=54)
e) What is the probability of a caught animal being a rat, given that its tail length is 16.5? (Hint: use Bayes’ theorem)
it sums + probability
2 : 1/36 Mobability

Fill
3 : 2/36 probability mass function (puf)
4 3/36
=
:

5 : 4/36
6 :
5/36
7 :
6/36
8 :
5/36
9 :
4/36 ·
10 :
3/36
11 :
2/36

in
12 :
1/36
>
me

= (9) p(both yellow) =

= =
= 0 . 467

(b) P(only one yellow) =

(Fo x
) +
(= =) + = 0 .
40

(C) P(a+ least one yellow) = 1 -

( ) = 0 . 933

5 P(X = 0) = 1 -
0 .
9) =
0 .
04

↑ (3201) ·

[16 17 18) = 3(16) + 20(17) + 1(18) = 406

El =
18E62
(9) P(mouse) =
10/25 =
40 %
(C)
14 87
(b) m
= 5 . 3
↑R
=
.

5 . 3
[m .

0 = 2 4) .

= 3 . 74
m

0 .

5) B
(d) P(X 16 5/x Rat) 5
=
= -0 09 Em
.
= .
B

- -

me - -
-
(16 . 5 ,
0 .

092)
I

OpT
-

(e)p(x = 16 51x
.
=
Rat) =

. PIRAt
PRGH1 I

↑
0 097 . X 0 .
6
=

-0
6) (0
.
4)
092 x 0 .
+ . 0 x 0 .

= I
In-class exercise
• Q1 a) Sketch the probability mass function associated with the
outcomes of rolling two six-sided dice (i.e. sum of two dice).
b) Explain why you would use a probability mass function instead of a
probability density function for this example.
In-class exercise
• Q2) A bag contains 10 marbles, 3 blue and 7 yellow. If two marbles are drawn
without replacement,
• What is the probability that both marbles are yellow?
• (7/10 * 6/9 = 0.47)
• What is the probability that only one marble is yellow?
• (7/10 * 3/9 = 0.23 or 3/10 * 7/9 = 0.23; 0.23+ 0.23 = 0.46)
• What is the probability that at least one marble is yellow?
• (1 - chance of only drawing blue... 1 – (3/10 * 2/9) = 0.93)
In-class exercise
Q3) If X is a binary indicator variable, such that X {0,1}. If the expected value of
X, E[X], is 0.91, what is the probability of Pr(X=0)?

xiPr(X=xi)=1(0.91)+0*0.09) }
In-class exercise
Q4. What is the dot product of the two vectors [3 20 1] and [16 17 18]?
• 3x16 + 20x17 + 1x18 = 48 + 340 + 18 = 406

Q5. What is , if = 6
• Derivative of the sum is the sum of the derivatives (derivative is a linear transformation)
• = 18
4) [32017 ·
(1617 18] = 3(16) + 20(17) + 18

=
48 + 340 + 18

= 406

Ess =

2603 = &182 = 18 Eu
In-class exercise
Q6. Assume that you are building a classifier to distinguish rats from mice that
live in Southam Hall, based on the lengths of their tails. You set up traps in SA
classrooms overnight and catch 10 mice and 15 rats. You then measure the
length of each animal’s tail and arrive at the following measurements:
Mice: 8, 5, 6, 2, 3, 9, 4, 5, 8, 3 Rats: 8, 11, 9, 17, 14, 21, 16, 18, 14, 16,
12, 19, 13, 18, 17

a) What is the prior probability of an animal being a mouse (i.e. chance of being a mouse,
before measuring its tail)? =10/25=40%=0.40
b) Assume that your data are normally distributed for each animal type. That is,
p(x| =mouse) ~ N( m, m)). Estimate the mean and standard deviation of tail lengths for
each animal type. 2
1 1 x
Mouse: 5.3, =2.41; Rat: 14.87, =3.74 P( x ) exp ,
2 2
In-class exercise (Q6 continued)
c) On the same axis, sketch the distribution of tail lengths for each class.
(Recall that 2 2.5 and that 95% of probability mass falls within ± )
Mouse centre height is 1/(2.5*2.41) = 0.165; rat = 1/(2.5*3.74) = 0.107

d) What is the probability density of catching an animal with a tail length of

16.5, given that the animal is a rat? (note we use “density” here, since this is a
continuous variable and the probability of measuring exactly 16.5000000 is
zero)
. .
. .
P(x=16.5| =rat)= . = . = .
. ( . )
(https://www.danielsoper.com/statcalc/calculator.aspx?id=54)

e) What is the probability of a caught animal being a rat, given that its tail
length is 16.5? (Hint: use Bayes’ theorem)
• P(w=rat|x=16.5) = p(x=16.5|w=rat)P(w=rat)/p(x=16.5)
= (0.097*.6)/(0.097*0.6+0.0*0.4) = 1!!
• Essentially, you are guaranteed that this is a rat, given its tail length.
Likelihood Prim

#
.

P).
n
(w(e) =

posterior
Prob evidence
Do you really need to know math to do ML?
• 5 Derivatives to Excel in Your Machine Learning Interview
• Calculus behind Machine Learning: Review of Derivatives, Gradient, Jacobian,
and Hessian
• https://towardsdatascience.com/5-derivatives-to-excel-in-your-machine-
learning-interview-25601c3ba9fc

• If I had to start learning Data Science again, how would I do it?

• https://towardsdatascience.com/if-i-had-to-start-learning-data-science-again-
how-would-i-do-it-78a72b80fd93

DDA3020 L02 Probability
No ratings yet
DDA3020 L02 Probability
27 pages
2 Unit PR Statistical Decision Making
No ratings yet
2 Unit PR Statistical Decision Making
61 pages
Full Unit 4 Including Numericals
No ratings yet
Full Unit 4 Including Numericals
38 pages
Pks Machine Learning Module 2 2
No ratings yet
Pks Machine Learning Module 2 2
41 pages
L5 6 7 ML
No ratings yet
L5 6 7 ML
28 pages
From the Foundations of Probability to its Applications - 學術
No ratings yet
From the Foundations of Probability to its Applications - 學術
5 pages
Lecture 2 2024 2025
No ratings yet
Lecture 2 2024 2025
64 pages
Unit-5 Notes Updated
No ratings yet
Unit-5 Notes Updated
22 pages
MAP Hypothesis-1
No ratings yet
MAP Hypothesis-1
10 pages
00 Statistics
No ratings yet
00 Statistics
18 pages
Generating Simulated Data: Project PHYSNET Physics Bldg. Michigan State University East Lansing, MI
No ratings yet
Generating Simulated Data: Project PHYSNET Physics Bldg. Michigan State University East Lansing, MI
10 pages
Lec-1 Probabilistic Models
No ratings yet
Lec-1 Probabilistic Models
29 pages
238 03242024 - Final 课后
No ratings yet
238 03242024 - Final 课后
10 pages
Probability Probability Distribution Function Probability Density Function Random Variable Bayes' Rule Gaussian Distribution
No ratings yet
Probability Probability Distribution Function Probability Density Function Random Variable Bayes' Rule Gaussian Distribution
26 pages
Unit 2 (2) - 1
No ratings yet
Unit 2 (2) - 1
37 pages
Lec2 IntroToProbabilityAndStatistics
No ratings yet
Lec2 IntroToProbabilityAndStatistics
37 pages
Statistical Perspective
No ratings yet
Statistical Perspective
85 pages
Chapter 5 - 7
No ratings yet
Chapter 5 - 7
110 pages
Applied Maths
No ratings yet
Applied Maths
34 pages
1234adadvklop32165adada PDF
No ratings yet
1234adadvklop32165adada PDF
55 pages
Math Essentials1234adadada PDF
No ratings yet
Math Essentials1234adadada PDF
55 pages
Math Essentials1234adadvklop32165adada PDF
No ratings yet
Math Essentials1234adadvklop32165adada PDF
55 pages
Lec-1 Probabilistic Models
No ratings yet
Lec-1 Probabilistic Models
29 pages
Notation Example
No ratings yet
Notation Example
11 pages
ECE523 Engineering Applications of Machine Learning and Data Analytics - Bayes and Risk - 1
No ratings yet
ECE523 Engineering Applications of Machine Learning and Data Analytics - Bayes and Risk - 1
7 pages
02第二课：基于机器学习方法的自然语言处理
No ratings yet
02第二课：基于机器学习方法的自然语言处理
54 pages
Harolds Stats Distributions Cheat Sheet 2022
No ratings yet
Harolds Stats Distributions Cheat Sheet 2022
18 pages
Bayes Soleved Examples
No ratings yet
Bayes Soleved Examples
5 pages
ML - Lec 2 - Review of Probability and Statistics
No ratings yet
ML - Lec 2 - Review of Probability and Statistics
30 pages
M131-Lecture Notes No. 4
No ratings yet
M131-Lecture Notes No. 4
58 pages
Main
No ratings yet
Main
13 pages
Bayesian Updating for Statisticians
No ratings yet
Bayesian Updating for Statisticians
10 pages
Mathematics in Machine Learning
No ratings yet
Mathematics in Machine Learning
83 pages
A Project Report ON Coaching Management System
100% (1)
A Project Report ON Coaching Management System
66 pages
2223hk1 Slide01 ML2022-2
No ratings yet
2223hk1 Slide01 ML2022-2
23 pages
Lecture 10
No ratings yet
Lecture 10
59 pages
Stat 130n Answers To The LAs in Lessons 3.1-3.3
No ratings yet
Stat 130n Answers To The LAs in Lessons 3.1-3.3
18 pages
1KHW002589 - E Firmware Download For ETL600R4
No ratings yet
1KHW002589 - E Firmware Download For ETL600R4
7 pages
Data Analysis & Interpretation Questions
No ratings yet
Data Analysis & Interpretation Questions
10 pages
S1B 16 All Lectures
No ratings yet
S1B 16 All Lectures
221 pages
Probability & Statistics
No ratings yet
Probability & Statistics
5 pages
DL (Unit I)
No ratings yet
DL (Unit I)
25 pages
Probability Theory For Machine Learning: Chris Cremer September 2015
No ratings yet
Probability Theory For Machine Learning: Chris Cremer September 2015
40 pages
Continuous Random Variables
No ratings yet
Continuous Random Variables
54 pages
Programming Fundamentals: Lecture # 1
No ratings yet
Programming Fundamentals: Lecture # 1
42 pages
Statistical Data Analysis: PH4515: 1 Course Structure
No ratings yet
Statistical Data Analysis: PH4515: 1 Course Structure
5 pages
Chap1 Bishop
No ratings yet
Chap1 Bishop
35 pages
Salesforce Developer Cheat Sheet
No ratings yet
Salesforce Developer Cheat Sheet
2 pages
Probability & Statistics Exam Questions
No ratings yet
Probability & Statistics Exam Questions
5 pages
Checkmate Iv Celox Checkmate Iv Quik-Cup
100% (1)
Checkmate Iv Celox Checkmate Iv Quik-Cup
4 pages
Lecture2 Math ML Review
No ratings yet
Lecture2 Math ML Review
87 pages
Lecture11 (Week 12) Updated
No ratings yet
Lecture11 (Week 12) Updated
34 pages
Section HGHH
No ratings yet
Section HGHH
6 pages
DSCI303-18 NaiveBayes
No ratings yet
DSCI303-18 NaiveBayes
44 pages
3HE12133AAABTQZZA01 - V1 - 7705 SAR Card and Module Support Quick Reference Card Release 8.0
No ratings yet
3HE12133AAABTQZZA01 - V1 - 7705 SAR Card and Module Support Quick Reference Card Release 8.0
6 pages
ML 20230316 1
No ratings yet
ML 20230316 1
9 pages
Carrier Aided Protection Scheme
No ratings yet
Carrier Aided Protection Scheme
4 pages
Cómo Escribir Un Ensayo Paso A Paso
100% (1)
Cómo Escribir Un Ensayo Paso A Paso
7 pages
Lec25 MonteCarloMethods
No ratings yet
Lec25 MonteCarloMethods
57 pages
Approach 2 - Middleware - SAP ECC or S4HANA BTP
No ratings yet
Approach 2 - Middleware - SAP ECC or S4HANA BTP
20 pages
MLE and MAP Classifier
No ratings yet
MLE and MAP Classifier
55 pages
Curs 1 SSL - Introduction
No ratings yet
Curs 1 SSL - Introduction
57 pages
Mock Paper 2
No ratings yet
Mock Paper 2
11 pages
MLESA v2024 Week10 Assignment Solution
No ratings yet
MLESA v2024 Week10 Assignment Solution
7 pages
2nd Quarter ICT CSS Grade 10 Q2 W4 M4 NK 1
No ratings yet
2nd Quarter ICT CSS Grade 10 Q2 W4 M4 NK 1
17 pages
REAKTOR 6 What Is New English 072220
No ratings yet
REAKTOR 6 What Is New English 072220
34 pages
Artificial Intelligence and Machine Learning
No ratings yet
Artificial Intelligence and Machine Learning
55 pages
Optimal Design Algorithm Comparison
No ratings yet
Optimal Design Algorithm Comparison
33 pages
Lutech Viewer2-9 Manual FINAL 200930A
No ratings yet
Lutech Viewer2-9 Manual FINAL 200930A
19 pages
x300b User-Manual 20220527
No ratings yet
x300b User-Manual 20220527
42 pages
Iso 10160 2015
No ratings yet
Iso 10160 2015
15 pages
Unit 5 Java
No ratings yet
Unit 5 Java
23 pages
Clone Blango Repo Clone Blango Repo: in The Terminal
No ratings yet
Clone Blango Repo Clone Blango Repo: in The Terminal
18 pages
Big Data Project-2 Report
No ratings yet
Big Data Project-2 Report
22 pages
Stability-Routh Hurwitz Root Locus
No ratings yet
Stability-Routh Hurwitz Root Locus
19 pages
Advanced VLSI Design Course
No ratings yet
Advanced VLSI Design Course
13 pages
IND AS 115: Revenue Recognition Guide
No ratings yet
IND AS 115: Revenue Recognition Guide
21 pages
ECMT1020 Lecture Notes 01 rv1
No ratings yet
ECMT1020 Lecture Notes 01 rv1
6 pages
Exam Guide - 406 - Kinetic Tools Management
No ratings yet
Exam Guide - 406 - Kinetic Tools Management
8 pages
All Pricelist
No ratings yet
All Pricelist
1 page
Implementation of QKD BB84 Protocol in Qiskit
No ratings yet
Implementation of QKD BB84 Protocol in Qiskit
7 pages
Accelerated Verifiable Fair Digital Exchange: Ntroduction
No ratings yet
Accelerated Verifiable Fair Digital Exchange: Ntroduction
10 pages
Ddco Question Bank
No ratings yet
Ddco Question Bank
1 page
Lec11 Introduction2BayesianStatistics
No ratings yet
Lec11 Introduction2BayesianStatistics
48 pages
Computer Science Glossary Roman English
No ratings yet
Computer Science Glossary Roman English
3 pages
Electronics Engineer Internship Letter
No ratings yet
Electronics Engineer Internship Letter
2 pages
Techciti: Managed Services
No ratings yet
Techciti: Managed Services
6 pages
Ikaj Stochmod Lectnotes
No ratings yet
Ikaj Stochmod Lectnotes
114 pages

Machine Learning Lecture Guide

Uploaded by

Machine Learning Lecture Guide

Uploaded by

4415

Prof James Green

vector dot the same dimension

# Vector on the right (column vectors

23 matrix x 3D vector = 2D Vector

je() y(21) + xe() (2 , 2)

if the vector be is on the left side of the multiplication

it has to the transposed :

predicting valued label (continuous)

• Model-based learning vs. instance-based learning >

= (9) p(both yellow) =

(b) P(only one yellow) =

(C) P(a+ least one yellow) = 1 -

[16 17 18) = 3(16) + 20(17) + 1(18) = 406

d) What is the probability density of catching an animal with a tail length of

• If I had to start learning Data Science again, how would I do it?

You might also like