Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
9 views24 pages

Machine Learning Lecture Guide

Lecture 2 of SYSC4906 focuses on key mathematical concepts in machine learning, including vector-matrix multiplication, random variables, and model-based vs instance-based learning. The session includes in-class exercises on probability mass functions, classification, and regression, as well as practical assignments to enhance understanding. Students are expected to review relevant literature and complete mathematical exercises to solidify their grasp of the material.

Uploaded by

Esraa Al dn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views24 pages

Machine Learning Lecture Guide

Lecture 2 of SYSC4906 focuses on key mathematical concepts in machine learning, including vector-matrix multiplication, random variables, and model-based vs instance-based learning. The session includes in-class exercises on probability mass functions, classification, and regression, as well as practical assignments to enhance understanding. Students are expected to review relevant literature and complete mathematical exercises to solidify their grasp of the material.

Uploaded by

Esraa Al dn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

4415

-
SYSC4906
Introduction to Machine Learning

Lecture 2

Prof James Green


[email protected]
Systems and Computer Engineering, Carleton University
se : input features of unit u in layero

vector dot the same dimension


product : must
Marix-Vector
Multiplication (matrix) x (Vector) = Vector

# Vector on the right (column vectors

=
④ same as rows

-
L
of matrix

23 matrix x 3D vector = 2D Vector

-
(11)

S-
ll 2) (1 , 3)
wei
I Ie
-

W se -
vo vo
,

(2 , 3)
w
x(3)
&

I
w()
(x 2e(3) ul" 2 w(13)
I
2)
=
+ +

je() y(21) + xe() (2 , 2)


+ 2(3)p(2 , 3)

Su
=

two rows
E) Vector on the left (row vector)

if the vector be is on the left side of the multiplication


,

it has to the transposed :

j
+
=

(x)x() =
④ same as # of Columns of
matrix
2D Vector x 23 = 3D Vector

= +
w =

(als()(
line
=

2 3
Y
De : domain make > max value of f(20)

Y : codomain
argmax => value ze gives max f(x)

pmf :
probability mass function (discrete) pmf = 1

pdf :
probability distribution function (continuous) [pdf = 1

classification :
assigning label to unlabeled example (discrete)

predicting valued label (continuous)


regression :

Model-based learnt
to create model with parameters
· uses
train data

· ex : sum

Instance-based learning
· uses whole dataset as the model

·
ex : NN
Learning Objectives for Lecture 2
• Review and understand the key mathematical concepts underlying the
course material, including:
• Vector-matrix multiplication
• Mathematical notation
• Random variables decision trees)
(ex threshold
in
• Unbiased estimators-learned from data :

• Differentiate the /
following pairs of concepts: depth
programmer (ex
tree
set by

· ,

Model
- parameter vs. hyperparameter
- >
Kernels -
• Classification vs. regression
wx b
K-nearest neight)
- y= -

• Model-based learning vs. instance-based learning >


- exik-nearest neighbor
• Shallow vs. deep learning
• Understand how to load and run a Jupyter notebook in Google Colab
• Covered in the tutorial
Lecture 2
Pre-lecture Assignment:
• Read Chapter 2 of the 100pMLB
• Review the Wikipedia page for Linear Equation. You should be competent
in all of its contents.
• Look over the "Key Concepts and Tools" section of Google's "Prerequisites
and Prework" for their Machine Learning Crash Course. (just the pre-req's,
not the entire course!)

In-class activities:
• Finish Lecture 1 material
• Complete a series of mathematical review exercises alone and in groups.
Key terms
• Scalar, vector, dimension, matrix, set, intersection, capital pi notation, dot-
product, transpose, function, domain, codomain, local minimum, interval,
open interval, global minimum, derivative, differentiation, chain rule,
gradient, partial derivatives, random variable, discrete, continuous,
discrete RV, probability distribution, probability mass function, continuous
RV, probability density function, expectation, mean, average, expected
value, statistics, standard deviation, variance, examples, sample, dataset,
unbiased estimators, sample statistic, sample mean, Bayes’ Rule (Bayes’
Theorem), prior, maximum a posteriori, classification, label, unlabeled
example, classification learning algorithm, labeled examples, model,
classes, binary classification (binomial), multiclass classification
(multinomial), regression, regression learning algorithm, parameters, kNN,
shallow learning, neural network, layer, deep neural network, deep
learning.
>
-
Set is a collection of unique
-

values

>
-
range of y values
T
-

>
-

Y =
[ -
1
, 1]
domain =
>
- + (5) =
4 (minimum
value
of flis)

=
4
>
-
In-class exercise (15min alone, 10 min pairs)
1. a) Sketch the probability mass function (pmf) associated with the outcomes of rolling two six-sided dice (i.e. sum of two dice). b) Explain why
you would use a pmf instead of a pdf for this example.
2. A bag contains 10 marbles, 3 blue and 7 yellow. If two marbles are drawn without replacement,
a) What is the probability that both marbles are yellow?
b) What is the probability that only one marble is yellow?
c) What is the probability that at least one marble is yellow?
-D
3. Assume X is a binary indicator variable, such that X {0,1}. If the expected value of X, E[X], is 0.91, what is Pr(X=0)?
4. What is the dot product of the two vectors [3 20 1] and [16 17 18]?
5. What is , if
A = 6
6. Assume that you are building a classifier to distinguish rats from mice that live in Southam Hall, based on the lengths of their tails. You set up
traps in SA classrooms overnight and catch 10 mice and 15 rats. You then measure the length of each animal’s tail and arrive at the following
measurements:
Mice: 8, 5, 6, 2, 3, 9, 4, 5, 8, 3 Rats: 8, 11, 9, 17, 14, 21, 16, 18, 14, 16, 12, 19, 13, 18, 17
a) What is the prior probability of an animal being a mouse (i.e. chance of being a mouse, before measuring its tail)?
b) Assume that your tail length data are normally distributed for each animal type. That is, p(x| =mouse) ~ N(um, m), where x=tail length and is the true
class/species of the animal). Estimate the mean and standard deviation of tail lengths for each animal type.
c) On the same axis, sketch the distribution of tail lengths for each class.
(Recall that 2 ~2.5 and that 95% of probability mass falls within 2 )

d) What is the probability density of catching an animal with a tail length of 16.5, given that the animal is a rat? (note we use “density” here, since this is a
continuous variable and the probability of measuring exactly 16.5000000 is zero) (https://www.danielsoper.com/statcalc/calculator.aspx?id=54)
e) What is the probability of a caught animal being a rat, given that its tail length is 16.5? (Hint: use Bayes’ theorem)
it sums + probability
2 : 1/36 Mobability

Fill
3 : 2/36 probability mass function (puf)
4 3/36
=
:

5 : 4/36
6 :
5/36
7 :
6/36
8 :
5/36
9 :
4/36 ·
10 :
3/36
11 :
2/36

in
12 :
1/36
>
me

= (9) p(both yellow) =


= =
= 0 . 467

(b) P(only one yellow) =

(Fo x
) +
(= =) + = 0 .
40

(C) P(a+ least one yellow) = 1 -

( ) = 0 . 933

5 P(X = 0) = 1 -
0 .
9) =
0 .
04

↑ (3201) ·

[16 17 18) = 3(16) + 20(17) + 1(18) = 406

El =
18E62
(9) P(mouse) =
10/25 =
40 %
(C)
14 87
(b) m
= 5 . 3
↑R
=
.

5 . 3
[m .

0 = 2 4) .

= 3 . 74
m

0 .

5) B
(d) P(X 16 5/x Rat) 5
=
= -0 09 Em
.
= .
B

- -

me - -
-
(16 . 5 ,
0 .

092)
I

OpT
-

(e)p(x = 16 51x
.
=
Rat) =

. PIRAt
PRGH1 I


0 097 . X 0 .
6
=

-0
6) (0
.
4)
092 x 0 .
+ . 0 x 0 .

= I
In-class exercise
• Q1 a) Sketch the probability mass function associated with the
outcomes of rolling two six-sided dice (i.e. sum of two dice).
b) Explain why you would use a probability mass function instead of a
probability density function for this example.
In-class exercise
• Q2) A bag contains 10 marbles, 3 blue and 7 yellow. If two marbles are drawn
without replacement,
• What is the probability that both marbles are yellow?
• (7/10 * 6/9 = 0.47)
• What is the probability that only one marble is yellow?
• (7/10 * 3/9 = 0.23 or 3/10 * 7/9 = 0.23; 0.23+ 0.23 = 0.46)
• What is the probability that at least one marble is yellow?
• (1 - chance of only drawing blue... 1 – (3/10 * 2/9) = 0.93)
In-class exercise
Q3) If X is a binary indicator variable, such that X {0,1}. If the expected value of
X, E[X], is 0.91, what is the probability of Pr(X=0)?

xiPr(X=xi)=1(0.91)+0*0.09) }
In-class exercise
Q4. What is the dot product of the two vectors [3 20 1] and [16 17 18]?
• 3x16 + 20x17 + 1x18 = 48 + 340 + 18 = 406

Q5. What is , if = 6
• Derivative of the sum is the sum of the derivatives (derivative is a linear transformation)
• = 18
4) [32017 ·
(1617 18] = 3(16) + 20(17) + 18

=
48 + 340 + 18

= 406

Ess =

2603 = &182 = 18 Eu
In-class exercise
Q6. Assume that you are building a classifier to distinguish rats from mice that
live in Southam Hall, based on the lengths of their tails. You set up traps in SA
classrooms overnight and catch 10 mice and 15 rats. You then measure the
length of each animal’s tail and arrive at the following measurements:
Mice: 8, 5, 6, 2, 3, 9, 4, 5, 8, 3 Rats: 8, 11, 9, 17, 14, 21, 16, 18, 14, 16,
12, 19, 13, 18, 17

a) What is the prior probability of an animal being a mouse (i.e. chance of being a mouse,
before measuring its tail)? =10/25=40%=0.40
b) Assume that your data are normally distributed for each animal type. That is,
p(x| =mouse) ~ N( m, m)). Estimate the mean and standard deviation of tail lengths for
each animal type. 2
1 1 x
Mouse: 5.3, =2.41; Rat: 14.87, =3.74 P( x ) exp ,
2 2
In-class exercise (Q6 continued)
c) On the same axis, sketch the distribution of tail lengths for each class.
(Recall that 2 2.5 and that 95% of probability mass falls within ± )
Mouse centre height is 1/(2.5*2.41) = 0.165; rat = 1/(2.5*3.74) = 0.107

d) What is the probability density of catching an animal with a tail length of


16.5, given that the animal is a rat? (note we use “density” here, since this is a
continuous variable and the probability of measuring exactly 16.5000000 is
zero)
. .
. .
P(x=16.5| =rat)= . = . = .
. ( . )
(https://www.danielsoper.com/statcalc/calculator.aspx?id=54)

e) What is the probability of a caught animal being a rat, given that its tail
length is 16.5? (Hint: use Bayes’ theorem)
• P(w=rat|x=16.5) = p(x=16.5|w=rat)P(w=rat)/p(x=16.5)
= (0.097*.6)/(0.097*0.6+0.0*0.4) = 1!!
• Essentially, you are guaranteed that this is a rat, given its tail length.
Likelihood Prim

#
.

P).
n
(w(e) =

posterior
Prob evidence
Do you really need to know math to do ML?
• 5 Derivatives to Excel in Your Machine Learning Interview
• Calculus behind Machine Learning: Review of Derivatives, Gradient, Jacobian,
and Hessian
• https://towardsdatascience.com/5-derivatives-to-excel-in-your-machine-
learning-interview-25601c3ba9fc

• If I had to start learning Data Science again, how would I do it?


• https://towardsdatascience.com/if-i-had-to-start-learning-data-science-again-
how-would-i-do-it-78a72b80fd93

You might also like