0% found this document useful (0 votes)

39 views21 pages

2 Probability and Linear Algebra

Uploaded by

KIRAN SINGH BONDILI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views21 pages

2 Probability and Linear Algebra

Uploaded by

KIRAN SINGH BONDILI

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Lecture 2: Probability and linear algebra basics

Statistical Learning (BST 263)

Jeffrey W. Miller

Department of Biostatistics
Harvard T.H. Chan School of Public Health

1 / 21
Outline

Linear algebra basics

Probability basics

Random vectors

2 / 21
Outline

Linear algebra basics

Probability basics

Random vectors

3 / 21
Linear algebra in this course

A little bit of linear algebra is essential for understanding

many machine learning methods.
I E.g., linear regression, logistic regression, LDA, QDA, PCA,
GAMs, kernel ridge, SVMs, K-means.

Linear algebra is not a prerequisite for this course, so I made

the following slides to give you the basic concepts needed.
You will need to study this material carefully if you are not
already familiar with it.

4 / 21
Matrices and transposes
A is an m × n real matrix, written A ∈ Rm×n , if
 
a11 a12 · · · a1n
 a21 a22 · · · a2n 
A= .
 
. .. 
 . . 
am1 am2 · · · amn

where aij ∈ R. The (i, j)th entry of A is Aij = aij .

The transpose of A ∈ Rm×n is defined as
 
A11 A21 · · · Am1
 A12 A22 · · · Am2 
T n×m
A = . ..  ∈ R .
 
 .. . 
A1n A2n · · · Amn

In other words, (AT )ij = Aji .

Note: x ∈ Rn is considered to be a column vector in Rn×1 .
5 / 21
Sums and products of matrices

The sum of matrices A ∈ Rm×n and B ∈ Rm×n is the matrix

A + B ∈ Rm×n such that

(A + B)ij = Aij + Bij .

The product of matrices A ∈ Rm×n and B ∈ Rn×` is the

matrix AB ∈ Rm×` such that
n
X
(AB)ij = Aik Bkj .
k=1

6 / 21
Basic matrix properties

In the following properties, it is assumed that the matrix

dimensions are compatible. (For example, if we write A + B then
it is assumed that A and B are the same size.)

(AB)C = A(BC)
I Consequently, we can write ABC without specifying the order
in which the multiplications are performed.
A(B + C) = AB + AC
(B + C)A = BA + CA
Except in special circumstances, AB is not equal to BA.
(AB)T = B T AT
(A + B)T = AT + B T

7 / 21
Identity, inverse, and trace
The n × n identity matrix, denoted In×n or I for short, is
 
1 0 ··· 0
0 1 · · · 0
n×n
I = In×n =  . ..  ∈ R .
 
 .. .
0 0 ··· 1
IA = A = AI
If it exists, the inverse of A, denoted A−1 , is a matrix such
that A−1 A = I and AA−1 = I.
If A−1 exists, we say that A is invertible.
(A−1 )T = (AT )−1
(AB)−1 = B −1 A−1
n×n , denoted tr(A), is
The trace of a square Pnmatrix A ∈ R
defined as tr(A) = i=1 Aii .
tr(AB) = tr(BA) if AB is a square matrix.
8 / 21
Symmetric and definite matrices

A is symmetric if A = AT .

A is symmetric positive semi-definite (SPSD) if and only if

A = B T B for some B ∈ Rm×n and some m.

A is symmetric positive definite (SPD) if and only if A is

SPSD and A−1 exists.

There are many equivalent definitions of SPSD and SPD

(which is why I wrote “if and only if”). I believe the
definitions above are the easiest to understand and use.

9 / 21
Outline

Linear algebra basics

Probability basics

Random vectors

10 / 21
Discrete random variables
Informally, a random variable (r.v.) is a quantity that
probabilistically takes any one of a range of values.
Notation: Uppercase for r.v.s, lowercase for values taken.

A random variable X is discrete if it takes values in a

countable set X = {x1 , x2 , . . .}.
Examples: Bernoulli, Binomial, Poisson, Geometric.

The density of a discrete r.v. is the function

p(x) = P(X = x) = probability that X equals x.
I Sometimes, p(x) is called the probability mass function in the
discrete case, but “density” is technically correct also.

Properties (discrete case):

X X
0 ≤ p(x) ≤ 1, p(x) = 1, P(X ∈ A) = p(x).
x∈X x∈A

11 / 21
Continuous random variables
A random variable X ∈ R is continuous
R if there is a function
p(x) ≥ 0 such that P(X ∈ A) = A p(x)dx for all A ⊆ R.
I (We will ignore measure-theoretic technicalities in this course.)
Examples: Normal, Uniform, Beta, Gamma, Exponential.

p(x) is called the density of X.

Careful! p(x) is not the probability that X equals x.
R
Note that R p(x)dx = 1, but p(x) can be > 1.

The same definitions apply to random vectors X ∈ Rn , with

Rn in place of R.

The cumulative distribution function (c.d.f.) of X ∈ R is

Z x
F (x) = P(X ≤ x) = p(x0 )dx0 .
−∞

12 / 21
Joint distributions of multiple random variables/vectors
p(x, y) denotes the joint density of X ∈ X and Y ∈ Y.
I P(X = x, Y = y) = p(x, y) if X and Y are discrete.
R
I P(X ∈ A, Y ∈ B) = A×B
p(x, y)dx dy if X and Y are
continuous.
R
I P(X = x, Y ∈ B) = B
p(x, y)dy if X is discrete and Y is
continuous.

The density of X can be recovered from the joint density by

marginalizing over Y :
P
I p(x) = y∈Y p(x, y) if Y is discrete,
R
I p(x) = Y p(x, y)dy if Y is continuous.

Note: It is common to use “p” to denote all densities and

follow the convention that X is taking the value x, Y is
taking the value y, etc.
13 / 21
Conditional densities and Independence
If p(y) > 0 then the conditional density of X given Y = y is
p(x, y)
p(x|y) = .
p(y)

X and Y are independent if p(x, y) = p(x)p(y) for all x, y.

X1 , . . . , Xn are independent if

p(x1 , . . . , xn ) = p(x1 ) · · · p(xn )

for all x1 , . . . , xn .

X1 , . . . , Xn are conditionally independent given Y if

p(x1 , . . . , xn | y) = p(x1 |y) · · · p(xn |y)

for all x1 , . . . , xn , y.
14 / 21
Expectations (a.k.a. expected values)

Suppose h(x) is a real-valued function of x.

The expectation of h(X), denoted E(h(X)), is

P
I E(h(X)) = R x∈X h(x)p(x) if X is discrete,
I E(h(X)) = X h(x)p(x)dx if X is continuous.

The conditional expectation of h(X) given Y = y is

P
I E(h(X) | Y = y) = R x∈X h(x)p(x|y) if X is discrete,
I E(h(X) | Y = y) = X h(x)p(x|y)dx if X is continuous.

E(h(X)|Y ) is defined as g(Y ) where g(y) = E(h(X)|Y = y).

Law of iterated expectations: E(E(h(X)|Y )) = E(h(X)).

15 / 21
Outline

Linear algebra basics

Probability basics

Random vectors

16 / 21
Random vectors

If Z1 , . . . , Zn ∈ R are random variables, then

 
Z1
 .. 
Z =  .  = (Z1 , . . . , Zn )T
Zn

is a random vector in Rn .

The expectation of a random vector Z ∈ Rn is

 
E(Z1 )
E(Z) =  ...  .
 

E(Zn )

17 / 21
Random vectors
The covariance matrix of a random vector Z ∈ Rn is the
matrix Cov(Z) ∈ Rn×n with (i, j)th entry

Cov(Z)ij = Cov(Zi , Zj )

where

Cov(Zi , Zj ) = E (Zi − E(Zi ))(Zj − E(Zj ))
= E(Zi Zj ) − E(Zi )E(Zj ).

Equivalently,

Cov(Z) = E (Z − E(Z))(Z − E(Z))T
= E(ZZ T ) − E(Z)E(Z)T .

Recall that Z ∈ Rn is considered to be a column vector in

Rn×1 , so ZZ T is a matrix in Rn×n .
18 / 21
Random vectors

Cov(Z) is always SPSD.

If Z ∈ Rn is a random vector, then

E(AZ + b) = A E(Z) + b

and
Cov(AZ + b) = A Cov(Z)AT
for any fixed (i.e., nonrandom) A ∈ Rm×n and b ∈ Rm .

If Y, Z ∈ Rn are independent random vectors, then

Cov(Y + Z) = Cov(Y ) + Cov(Z).

19 / 21
Multivariate normal distribution
If µ ∈ Rn and C ∈ Rn×n is SPSD, then Z ∼ N (µ, C) denotes
that Z is multivariate normal with E(Z) = µ and
Cov(Z) = C.

Standard multivariate normal: If Z1 , . . . , Zn ∼ N (0, 1)

independently and Z = (Z1 , . . . , Zn )T , then Z ∼ N (0, I).

Affine transformation property : If Z ∼ N (µ, C) then

AZ + b ∼ N (Aµ + b, ACAT ) for any fixed A ∈ Rm×n ,
b ∈ Rm , µ ∈ Rn , and SPSD C ∈ Rn×n .

Any multivariate normal distribution can be obtained via an

affine transformation (AZ + b) of Z ∼ N (0, In×n ) for an
appropriate choice of n, A, and b.

20 / 21
Multivariate normal distribution

Sum property : If Y ∼ N (µ1 , C1 ) and Z ∼ N (µ2 , C2 )

independently, then Y + Z ∼ N (µ1 + µ2 , C1 + C2 ).

Density : If Z = (Z1 , . . . , Zn )T ∼ N (µ, C) and C −1 exists,

then Z has density
1 1 T −1

p(z) = exp − 2 (z − µ) C (z − µ)
(2π)n/2 | det(C)|1/2

for all z ∈ Rn .

21 / 21

Week1 Summary Detail
No ratings yet
Week1 Summary Detail
29 pages
Week1 Summary Detail
No ratings yet
Week1 Summary Detail
40 pages
Probab Refresh
No ratings yet
Probab Refresh
7 pages
Notes 5 Multivariate Distributions
No ratings yet
Notes 5 Multivariate Distributions
13 pages
Data Science
No ratings yet
Data Science
74 pages
Advanced Statistics
100% (1)
Advanced Statistics
131 pages
LN 1
No ratings yet
LN 1
11 pages
Random Vectors 1
No ratings yet
Random Vectors 1
8 pages
Background Material Crib-Sheet: 1 Probability Theory
No ratings yet
Background Material Crib-Sheet: 1 Probability Theory
4 pages
Module 2 MAT 350
No ratings yet
Module 2 MAT 350
95 pages
Multivariate Normal Distribution
No ratings yet
Multivariate Normal Distribution
15 pages
00 Statistics Slides
No ratings yet
00 Statistics Slides
54 pages
c2 RVs Distribution
No ratings yet
c2 RVs Distribution
48 pages
Lecture 3
No ratings yet
Lecture 3
109 pages
Data - Science and - Artificial - Intelligence
No ratings yet
Data - Science and - Artificial - Intelligence
106 pages
Pattern Classification
No ratings yet
Pattern Classification
41 pages
Linear Algebra for Stats Students
No ratings yet
Linear Algebra for Stats Students
11 pages
Gaussian Random Vectors
No ratings yet
Gaussian Random Vectors
6 pages
Chapter 1 Introduction and Review
No ratings yet
Chapter 1 Introduction and Review
43 pages
Lect Slides#4
No ratings yet
Lect Slides#4
43 pages
Multivariate Analysis (Slides 2)
No ratings yet
Multivariate Analysis (Slides 2)
25 pages
Handout 2 Multivariate
No ratings yet
Handout 2 Multivariate
10 pages
Statistical Foundaments
No ratings yet
Statistical Foundaments
5 pages
Simple Algebra
No ratings yet
Simple Algebra
5 pages
Stat Note4
No ratings yet
Stat Note4
4 pages
Linear Algebra and Statistics
No ratings yet
Linear Algebra and Statistics
31 pages
Applied Maths
No ratings yet
Applied Maths
34 pages
HKNECE313 Cramming Carnival FA24
No ratings yet
HKNECE313 Cramming Carnival FA24
45 pages
Chapter 1 Introduction and Review
No ratings yet
Chapter 1 Introduction and Review
19 pages
Probability
No ratings yet
Probability
44 pages
Gaussian Noise Detection & Estimation
No ratings yet
Gaussian Noise Detection & Estimation
55 pages
Deep-Learning
No ratings yet
Deep-Learning
28 pages
Unit 19
No ratings yet
Unit 19
16 pages
Week 2 DrBuddhananda Banerjee Vector RV
No ratings yet
Week 2 DrBuddhananda Banerjee Vector RV
10 pages
APMA1655
No ratings yet
APMA1655
56 pages
Random Vectors
No ratings yet
Random Vectors
9 pages
Notation
No ratings yet
Notation
4 pages
MIT14 381F13 Lec1 PDF
No ratings yet
MIT14 381F13 Lec1 PDF
8 pages
Microeconometrics Probability Theory Guide
No ratings yet
Microeconometrics Probability Theory Guide
6 pages
Mathematical and Applied Statistics Module 2.
No ratings yet
Mathematical and Applied Statistics Module 2.
53 pages
Probstats Tpmi
No ratings yet
Probstats Tpmi
41 pages
Distributions and Normal Random Variables
No ratings yet
Distributions and Normal Random Variables
8 pages
Random Vectors
No ratings yet
Random Vectors
44 pages
Probability & Statistics Primer
No ratings yet
Probability & Statistics Primer
19 pages
Random Vectors
No ratings yet
Random Vectors
33 pages
Ta 2
No ratings yet
Ta 2
7 pages
Digital Comm: Probability & Algebra
No ratings yet
Digital Comm: Probability & Algebra
5 pages
Random Vectors Explained
No ratings yet
Random Vectors Explained
7 pages
Probability Concepts Explained
No ratings yet
Probability Concepts Explained
21 pages
Probability
No ratings yet
Probability
12 pages
Multivariate Statistical Distributions
No ratings yet
Multivariate Statistical Distributions
12 pages
Math Review For ML
No ratings yet
Math Review For ML
41 pages
Elements of Probability Theory: 2.1 Probability, Random Variables and Random Matrices
No ratings yet
Elements of Probability Theory: 2.1 Probability, Random Variables and Random Matrices
7 pages
Lecture2 Math ML Review
No ratings yet
Lecture2 Math ML Review
87 pages
Econometrics for Graduate Students
No ratings yet
Econometrics for Graduate Students
33 pages
Conditional Expectation
No ratings yet
Conditional Expectation
33 pages