Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
4 views44 pages

Lecture 1

Advanced Econometrics I is a PhD-level course focusing on econometric theory, requiring knowledge of linear algebra, multivariate calculus, and analysis concepts. The course covers topics such as vector spaces, Banach spaces, inner product spaces, and Hilbert spaces, emphasizing their mathematical foundations and applications. The course is structured into 12 lectures, with resources available on a dedicated public course website.

Uploaded by

juliaxiyi.zhuang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views44 pages

Lecture 1

Advanced Econometrics I is a PhD-level course focusing on econometric theory, requiring knowledge of linear algebra, multivariate calculus, and analysis concepts. The course covers topics such as vector spaces, Banach spaces, inner product spaces, and Hilbert spaces, emphasizing their mathematical foundations and applications. The course is structured into 12 lectures, with resources available on a dedicated public course website.

Uploaded by

juliaxiyi.zhuang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Advanced Econometrics I

Jürgen Meinecke
Lecture 1 of 12
Research School of Economics, Australian National University

1 / 44
Welcome

Welcome Advanced Econometrics I


This is a PhD level course in econometric theory
The course makes heavy use of the following mathematical tools:

• linear algebra
• multivariate calculus
• concepts in analysis (real and functional)

If you don’t feel familiar with these, then this course will be
extremely demanding

2 / 44
Staff

You can seek help on matters academic from

• your friendly lecturer (me): Juergen Meinecke


• your amazing tutor: Shu Hu

Be nice to us!

3 / 44
Course Website

I’m not using Wattle very much (with few exceptions)


I’ve set up a public course website that contains pretty much
everything you need to know
Let’s take a look:
https://juergenmeinecke.github.io/EMET8014

4 / 44
Roadmap

Announcements

Vector Spaces, Hilbert Spaces, Projections


Vector Spaces, Banach Spaces
Inner Product Spaces, Hilbert Spaces
Projection Theorem
Linear Projections in 𝐿2

5 / 44
Definition (Vector Space)
A real vector space is a triple (𝑉, +, ⋅), in which 𝑉 is a set, and +
and ⋅ are binary operations such that, for any two elements,
𝑋, 𝑌 ∈ 𝑉 and scalar 𝜆 ∈ R:
𝑋+𝑌 ∈ 𝑉 (closure under additivity)
𝜆⋅𝑋 ∈ 𝑉 (closure under scalar product)

(Note: instead of writing 𝜆 ⋅ 𝑋 we typically just write 𝜆𝑋)

Big picture:
We define a notion of addition between any two elements of 𝑉, and
we define a notion of multiplication between a constant and an
element of 𝑉
These operations do not take us out of the vector space

6 / 44
With addition and multiplication there are some typical sensible
‘requirements’:
Let 𝑋, 𝑌, 𝑍 ∈ 𝑉, and 𝜆, 𝜇 ∈ R

• addition
(i) commutativity: 𝑋 + 𝑌 = 𝑌 + 𝑋
(ii) associativity: (𝑋 + 𝑌) + 𝑍 = 𝑋 + (𝑌 + 𝑍)
(iii) 𝑉 contains a unique element 0 such that 𝑋 + 0 = 𝑋
(iv) 𝑉 contains a unique element −𝑋 such that 𝑋 + (−𝑋) = 0
• multiplication
(i) distributivity: 𝜆 ⋅ (𝑋 + 𝑌) = 𝜆 ⋅ 𝑋 + 𝜆 ⋅ 𝑌
(ii) distributivity: (𝜆 + 𝜇) ⋅ 𝑋 = 𝜆 ⋅ 𝑋 + 𝜇 ⋅ 𝑋
(iii) associativity: 𝜆 ⋅ (𝜇 ⋅ 𝑋) = (𝜆 ⋅ 𝜇) ⋅ 𝑋
(iv) 1⋅𝑋 = 𝑋

7 / 44
The perhaps most intuitive illustration of a real vector space:
Example (Euclidian Space (R𝑛 , +, ⋅))
• elements are quite literally vectors or arrows
• 𝑋 ∶= (𝑥1 , … , 𝑥𝑛 )′ and 𝑌 ∶= (𝑦1 , … , 𝑦𝑛 )′
• define 𝑋 + 𝑌 ∶= (𝑥1 + 𝑦1 , … , 𝑥𝑛 + 𝑦𝑛 )′
• define 𝜆 ⋅ 𝑋 ∶= (𝜆𝑥1 , … , 𝜆𝑥𝑛 )′
• let 𝑛 = 2, 𝑋 = (24, 7)′ and 𝑌 = (18, 2)′ ,
then 𝑋 + 𝑌 = (42, 9)′

When 𝑋 ∈ 𝑉, I refer to 𝑋 as an “element” of 𝑉


Some books use “vector”, one could also say “point”
A less intuitive example…

8 / 44
Example (The Space of Continuous Functions)
Denote by 𝐶 [𝑎, 𝑏] the space of all real valued univariate and
continuous functions on a closed interval [𝑎, 𝑏].

• each 𝑋 ∈ 𝐶 [𝑎, 𝑏] is a function 𝑋 ∶ [𝑎, 𝑏] → R


• the points or elements of the space are functions
• let 𝑡 ∈ [𝑎, 𝑏] and write 𝑋(𝑡) for the function value at 𝑡
• define (𝑋 + 𝑌)(𝑡) ∶= 𝑋(𝑡) + 𝑌(𝑡)
• define (𝜆 ⋅ 𝑋)(𝑡) ∶= 𝜆 ⋅ 𝑋(𝑡)
• let [𝑎, 𝑏] = [2, 3], 𝑋(𝑡) = 2 ⋅ 𝑡 and 𝑌(𝑡) = 1 + 5 ⋅ 𝑡,
then (𝑋 + 𝑌)(𝑡) = 1 + 7 ⋅ 𝑡

Vector spaces of functions are very important, more examples:

• space of differentiable functions


• space of functions that are integrable
(random variables live in this space)

9 / 44
Each element of a vector space can be given a ‘length’:
Definition (Norm)
A norm on a real vector space 𝑉 is a real valued function, denoted
by ‖⋅‖, on 𝑉 with the properties

(i) ‖𝑋‖ ≥ 0
(ii) ‖𝑋‖ = 0 ⇔ 𝑋 = 0
(iii) ∥𝜆 ⋅ 𝑋∥ = |𝜆| ⋅ ‖𝑋‖
(iv) triangle inequality: ∥𝑋 + 𝑌∥ ≤ ‖𝑋‖ + ∥𝑌∥

where 𝑋 and 𝑌 are in 𝑉, and 𝜆 is a real constant

Any function 𝑉 ↦ R that satisfies these properties is a norm


Here are useful examples…

10 / 44
Examples of norms
Example (Euclidian Space (R𝑛 , +, ⋅))
• recall that elements are quite literally vectors or arrows
• 𝑋 ∶= (𝑥1 , … , 𝑥𝑛 )′
• ‖𝑋‖ ∶= √𝑥12 + ⋯ + 𝑥𝑛2

Example (The Space of Continuous Functions)


• recall that each 𝑋 ∈ 𝐶 [𝑎, 𝑏] is a function 𝑋 ∶ [𝑎, 𝑏] → R
• ‖𝑋‖ ∶= max𝑡∈[𝑎,𝑏] |𝑋(𝑡)|

11 / 44
Definition (Normed Space)
A normed space 𝑀 is a vector space endowed with a norm ‖⋅‖.

The norm induces a metric, a notion of distance between two


elements
Definition (Metric)
Given a normed space 𝑀 and 𝑋, 𝑌 ∈ 𝑀, the metric is defined by
∥𝑋 − 𝑌∥.

A notion of distance between elements is crucial for the


understanding of limiting behavior of sequences inside the vector
space
For example, does a sequence {𝑋𝑛 , 𝑛 = 1, 2, …} of elements of a
normed space get “close” to a point?
The metric is key in determining what we mean by “closeness”

12 / 44
Definition (Convergence in Norm)
A sequence {𝑋𝑛 , 𝑛 = 1, 2, …} of elements of a normed space 𝑀 is
said to converge in norm to 𝑋 ∈ 𝑀 if for every 𝜀 > 0 there is an 𝑁𝜀
such that ∥𝑋𝑛 − 𝑋∥ < 𝜀 for every 𝑛 > 𝑁𝜀 .

This definition requires explicit knowledge of the limit 𝑋


Often times we do not know that limit
We’re looking for an alternative way to characterize convergence
Luckily, every convergent sequence is a Cauchy sequence:
Definition (Cauchy Sequence)
A sequence {𝑋𝑛 , 𝑛 = 1, 2, …} of elements of a normed space 𝑀 is
said to be a Cauchy sequence if for every 𝜀 > 0 there is an 𝑁𝜀 such
that ∥𝑋𝑚 − 𝑋𝑛 ∥ < 𝜀 for every 𝑚, 𝑛 > 𝑁𝜀 .

Idea: elements ’out there’ become arbitrarily close to each other


This is captured in the so-called Cauchy criterion ∥𝑋𝑚 − 𝑋𝑛 ∥ < 𝜀
13 / 44
Problem: while every convergent sequence is a Cauchy sequence,
not every Cauchy sequence converges (assignment 1)
We would like a space for which the Cauchy criterion is necessary
and sufficient for convergence
Definition (Complete Space)
A space 𝑀 is complete if every Cauchy sequence of elements of 𝑀
converges to an element of 𝑀, that is, every Cauchy sequence in 𝑀
has a limit which is an element of 𝑀.

We like our spaces to be complete because we can safely consider


limits of elements within the space

14 / 44
Definition (Banach Space)
A Banach space 𝐵 is a complete normed space, that is, a normed
space in which every Cauchy sequence {𝑋𝑛 , 𝑛 = 1, 2, …} converges
in norm to some element 𝑋 ∈ 𝐵.

In Banach spaces we can safely play with length, distance of


elements and sequences and limits of sequences of elements
But something is still missing…

15 / 44
Roadmap

Announcements

Vector Spaces, Hilbert Spaces, Projections


Vector Spaces, Banach Spaces
Inner Product Spaces, Hilbert Spaces
Projection Theorem
Linear Projections in 𝐿2

16 / 44
Given a vector space or Banach space, we can add elements and
multiply them by scalars
We can also measure their length and distance via the metric
We would, in addition, like notions of

• multiplication between elements of a space


• angle, or orthogonality, or perpendicularity between elements
of a space

The inner product comes to the rescue

17 / 44
Definition (Inner Product)
An inner product on a vector space 𝑉 is a mapping, denoted by
⟨⋅, ⋅⟩, of 𝑉 × 𝑉 into R such that
⟨𝑋, 𝑌⟩ = ⟨𝑌, 𝑋⟩ (commutativity)
⟨𝑋 + 𝑌, 𝑍⟩ = ⟨𝑋, 𝑍⟩ + ⟨𝑌, 𝑍⟩ (distributivity)
⟨𝜆𝑋, 𝑌⟩ = 𝜆⟨𝑋, 𝑌⟩
⟨𝑋, 𝑋⟩ ≥ 0 (positive semi-definiteness)
⟨𝑋, 𝑋⟩ = 0 ⟺ 𝑋 = 0 (point separating)

where 𝑋, 𝑌 and 𝑍 are in 𝑉, and 𝜆 is a real constant.

18 / 44
Examples of inner products
Example (Euclidian Space (R𝑛 , +, ⋅))
• recall that elements are quite literally vectors or arrows
• 𝑋 ∶= (𝑥1 , … , 𝑥𝑛 )′ and 𝑌 ∶= (𝑦1 , … , 𝑦𝑛 )′
• ⟨𝑋, 𝑌⟩ ∶= 𝑥1 𝑦1 + ⋯ 𝑥𝑛 𝑦𝑛

Example (The Space of Continuous Functions)


• let 𝑋 and 𝑌 be real-valued functions on [𝑎, 𝑏]
𝑏
• ⟨𝑋, 𝑌⟩ ∶= ∫𝑎 𝑋(𝑡)𝑌(𝑡)𝑑𝑡

19 / 44
Inner products lend themselves naturally to the creation of a norm
Definition (Induced Norm)
Let 𝑀 be a normed space. The norm induced by the inner product
is ‖𝑋‖ ∶= √⟨𝑋, 𝑋⟩, for any 𝑋 ∈ 𝑀.

Likewise there is a metric ∥𝑋 − 𝑌∥ induced by the inner product

20 / 44
Definition (Inner Product Space)
An inner product space is a vector space endowed with an inner
product ⟨⋅, ⋅⟩.

Definition (Hilbert Space)


A Hilbert space is a complete inner product space.

It is clear that completeness is with respect to the norm induced by


the inner product
It follows that all Hilbert spaces are Banach spaces
(but not all Banach spaces are Hilbert spaces)
Why do we like Hilbert spaces?
They are the subset of the Banach spaces that behave very similarly
to Euclidian space, while being considerably more general
A lot of intuition from Euclidian space carries over to Hilbert space
(case in point: triangle inequality)

21 / 44
Equipped with the inner product, we can now define the notion of
angle between elements of an inner product space
Definition (Orthogonality)
Two elements 𝑋, 𝑌 of a Hilbert space are orthogonal if ⟨𝑋, 𝑌⟩ = 0.
We write 𝑋 ⟂ 𝑌.

Think of vectors that are perpendicular


(this is another case in point of Eucldian geometry carries over to
the more general Hilbert space setting)

22 / 44
Hilbert spaces are useful for econometrics because they offer us
powerful tools to address the following optimization problem:

• given an element 𝑌 in a Hilbert space 𝐻,


• and a subspace 𝑆 of 𝐻,
• find the element 𝑌
̂ ∈ 𝑆 closest to 𝑌 in the sense that ∥𝑌 − 𝑌∥
̂ is
minimal

Key questions

• is there such an element 𝑌?


̂
• is it unique?
• what is it, or how can it be characterized?

The projection theorem answers these questions

23 / 44
Roadmap

Announcements

Vector Spaces, Hilbert Spaces, Projections


Vector Spaces, Banach Spaces
Inner Product Spaces, Hilbert Spaces
Projection Theorem
Linear Projections in 𝐿2

24 / 44
Definition (Subspace)
A subset 𝑆 of a Hilbert space 𝐻 is called a subspace of 𝐻 if 𝑆 itself
is a vector space.

We will focus on complete subspaces of Hilbert spaces:


subspaces that contain all of their limit points
(i.e., addition and multiplication don’t take us out of the subspace)

25 / 44
Theorem (Projection Theorem)
Let 𝐻 be a Hilbert space and 𝑆 be a complete subspace of 𝐻.

(i) For any element 𝑌 ∈ 𝐻 there is a unique element 𝑌 ̂∈𝑆


such that ∥𝑌 − 𝑌∥ ≤ ∥𝑌 − 𝑠∥ for all 𝑠 ∈ 𝑆.
̂
(ii) 𝑌
̂ ∈ 𝑆 is the unique minimizer if and only if 𝑌 − 𝑌
̂ ⟂ 𝑆.

The element 𝑌
̂ is called the orthogonal projection of 𝑌 onto 𝑆, also
denoted P𝑆 𝑌
P𝑆 is the projection operator of 𝐻 onto 𝑆
Existence of a unique minimizer sounds great, but how to obtain 𝑌?
̂

26 / 44
Obtaining 𝑌
̂ turns out to be straightforward, when the subspace onto
which we are projecting is ‘generated’ by a finite set of elements of 𝐻
What do I mean?
A linear combination of elements 𝑋1 , … , 𝑋𝐾 of a vector space is an
expression of the form 𝑏1 𝑋1 + ⋯ + 𝑏𝐾 𝑋𝐾
(where the 𝑏𝑖 are real numbers)

Definition (Span)
Let 𝑋 be a nonempty subset of a vector space. The span of 𝑋 is the
set of all linear combinations of elements of 𝑋.

We simply write sp(𝑋) for the span of 𝑋


While 𝑋 might not be a subspace, sp(𝑋) is a subspace by
construction
Put differently:
if 𝑉 is a vector space, then sp(𝑋) is the smallest subspace of 𝑉
containing 𝑋
27 / 44
In econometrics, we are usually interested in the subspace spanned
by ‘regressors’ 𝑋1 , … , 𝑋𝐾
Collect them all in 𝑋 = (𝑋1 , … , 𝑋𝐾 )

Theorem (Existence of Orthonormal Basis (Gram-Schmidt))


There exists is a collection 𝑋̃ 1 , … , 𝑋̃ 𝐾 such that

{0 for 𝑗 ≠ 𝑙
(i) ⟨𝑋̃ 𝑗 , 𝑋̃ 𝑙 ⟩ = ⎨
⎩1 for 𝑗 = 𝑙.
{
(ii) sp(𝑋) ̃ = sp(𝑋) for 𝑋̃ ∶= (𝑋̃ 1 , … , 𝑋̃ 𝐾 ).

The collection 𝑋̃ 1 , … , 𝑋̃ 𝐾 is called an orthonormal basis for sp(𝑋)


Why do we consider orthonormal bases?

28 / 44
Projections on sp(𝑋) are easy to characterize via orthonormal bases
Theorem
Let 𝑋1 , … , 𝑋𝐾 and 𝑌 be elements from a Hilbert space. The
projection of 𝑌 on sp(𝑋1 , … , 𝑋𝐾 ) is
𝐾
𝑌̂ = Psp(𝑋) 𝑌 = ∑⟨𝑋̃ 𝑖 , 𝑌⟩𝑋̃ 𝑖 ,
𝑖=1

where 𝑋̃ 1 , … , 𝑋̃ 𝐾 is an orthonormal basis for sp(𝑋).

This gives us a constructive method for obtaining 𝑌


̂

We will use it soon

29 / 44
Roadmap

Announcements

Vector Spaces, Hilbert Spaces, Projections


Vector Spaces, Banach Spaces
Inner Product Spaces, Hilbert Spaces
Projection Theorem
Linear Projections in 𝐿2

30 / 44
Throughout the semester, we will be working on a particular Hilbert
space:
Definition (The Space 𝐿2 (Ω, ℱ , 𝑃))
Let (Ω, ℱ , 𝑃) be a probability space. Denote by 𝐿2 (Ω, ℱ , 𝑃) the set
of all random variables 𝑋 defined on Ω with the property
E (𝑋 2 ) < ∞.

I’ll refer to 𝐿2 (Ω, ℱ , 𝑃) simply as 𝐿2


The condition E (𝑋 2 ) < ∞ is sometimes referred to as finite second
moment or 𝑋 being square integrable; it implies that Var (𝑋) < ∞
𝐿2 is a huge space
Can it be a Hilbert space?

31 / 44
Need an inner product: let ⟨𝑋, 𝑌⟩ = E(𝑋 ⋅ 𝑌)
Let 𝑋, 𝑌 ∈ 𝐿2 with E(𝑋) = E(𝑌) = 0, then ⟨𝑋, 𝑌⟩ = Cov(𝑋, 𝑌)
In other words, the inner product we’re using here is related to the
familiar notion of covariance
Proposition
The space 𝐿2 with ⟨𝑋, 𝑌⟩ = E(𝑋 ⋅ 𝑌) is a Hilbert space.

See Brockwell and Davis, “Time Series: Theory and Methods” for proof

32 / 44
What is our overarching objective?
We want to “predict” one random variable (the dependent variable)
using a bunch of other random variables (independent variables,
exogenous variables, regressors)
Mapping this into the framework of the projection theorem

• the dependent variable 𝑌 is a “point” in the Hilbert space 𝐿2


• the regressors make up a subspace onto which we “project” 𝑌
• the projection theorem tells us that there exists a unique
optimal 𝑌̂

We need to be precise about the subspace created by the regressors

33 / 44
Let there be a finite collection 𝑋1 , … , 𝑋𝐾 ∈ 𝐿2

Let 𝑋 ∶= (𝑋1 , … , 𝑋𝐾 )

Proposition
sp(𝑋) is a complete subspace of 𝐿2 .

Recall our theorem a few slides earlier:


𝐾
Psp(𝑋) 𝑌 = ∑⟨𝑋̃ 𝑖 , 𝑌⟩𝑋̃ 𝑖
𝑖=1

Using our inner product


𝐾
Psp(𝑋) 𝑌 = ∑ E (𝑋̃ 𝑖 ⋅ 𝑌) 𝑋̃ 𝑖
𝑖=1

Now, let’s go slow and set 𝐾 = 1


That is, we only have one regressor to predict 𝑌

34 / 44
Going back to the question: What is 𝑌
̂ equal to?

Answer, of course, depends on choice of subspace to project on


In the current example we have sp(𝑋1 )
(so only one random variable)
Let’s make the problem even simpler and pick 𝑋1 = 1
(the degenerate rv that is almost surely equal to 1)
What is the orthonormal basis for sp(1)?
Easy: 𝑋1 already is an orthonormal basis (because E(1 ⋅ 1) = 1)
It follows P1 𝑌 = E(1 ⋅ 𝑌) ⋅ 1 = E𝑌 = 𝜇𝑌
Of course you knew this already:
The projection of 𝑌 onto a constant results in the expected value of 𝑌

35 / 44
What if we use a more sophisticated space for the projection?
Let 𝑋1 = 1 and 𝑋2 ∈ 𝐿2 and project on sp(1, 𝑋2 )
Let’s find an orthonormal basis of sp(1, 𝑋2 )
We need to find a version of 𝑋2 that has length 1
This is easy: 𝑋̃ 2 ∶= (𝑋2 − 𝜇2 )/𝜎2 achieves this:
𝑋 −𝜇 2
∥𝑋̃ 2 ∥ = √E ( 2𝜎2 2 ) = 1

Turns out that {1, 𝑋̃ 2 } form an orthonormal basis of sp(1, 𝑋2 )


(confirm this!)
The example of 𝑋̃ 2 offers you some intuition about orthonormal
bases: it is the standardized version of 𝑋2

36 / 44
With 𝑋̃ 1 ∶= 1 and 𝑋̃ 2 = (𝑋2 − 𝜇2 )/𝜎2 , it follows,
2
̂ = Psp(1,𝑋 ) 𝑌 = ∑ E(𝑋̃ 𝑖 ⋅ 𝑌)𝑋̃ 𝑖
𝑌 2
𝑖=1

= E(1 ⋅ 𝑌) ⋅ 1 + E(𝑋̃ 2 𝑌)𝑋̃ 2


= E𝑌 + E(((𝑋2 − 𝜇2 )/𝜎2 )𝑌)(𝑋2 − 𝜇2 )/𝜎2
E(𝑋2 𝑌) − 𝜇2 E𝑌
= E𝑌 + (𝑋2 − 𝜇2 )
𝜎22
Cov(𝑋2 , 𝑌)
= E𝑌 + (𝑋2 − 𝜇2 )
𝜎22
= 𝛽∗1 + 𝛽∗2 𝑋2 ,

where
𝛽∗2 ∶= 𝜎2𝑌 /𝜎22
𝛽∗1 ∶= E𝑌 − 𝛽∗2 E𝑋2

37 / 44
Let’s generalize this once more
What if we’re projecting on sp(1, 𝑋2 , 𝑋3 )?
You might think that {1, 𝑋̃ 2 , 𝑋̃ 3 } form an orthonormal basis,
where 𝑋̃ 2 ∶= (𝑋2 − 𝜇2 )/𝜎2 and 𝑋̃ 3 ∶= (𝑋3 − 𝜇3 )/𝜎3
Not so, sorry!
as
Convince yourself that E(𝑋̃ 2 𝑋̃ 3 ) ≠ 0 unless 𝑋̃ 2 = 𝑋̃ 3 which is a boring
case
Intuitively, the problem is that 𝑋2 and 𝑋3 have nonzero covariance
How do we construct orthonormal bases out of two random
variables that have nonzero covariance?
Answer: Gram-Schmidt orthogonalization!

38 / 44
Gram-Schmidt orthogonalization applied to the current context,
involves these simple steps:

1. Let 𝑋̃ 1 ∶= 1
2. create 𝑋̈ 2 ∶= 𝑋2 − E(𝑋2 𝑋̃ 1 )𝑋̃ 1
normalize by its length: 𝑋̃ 2 ∶= 𝑋̈ 2 / ∥𝑋̈ 2 ∥
(notice that because we include a constant term,
Var 𝑋̈ 2 = E𝑋̈ 22 and therefore ∥𝑋̈ 2 ∥ = √Var 𝑋̈ 2 )
3. create 𝑋̈ 3 ∶= 𝑋3 − E(𝑋3 𝑋̃ 1 )𝑋̃ 1 − E(𝑋3 𝑋̃ 2 )𝑋̃ 2
normalize by its length: 𝑋̃ 3 ∶= 𝑋̈ 3 / ∥𝑋̈ 3 ∥
(notice again that ∥𝑋̈ 3 ∥ = √Var 𝑋̈ 3 )

You can view Gram-Schmidt orthogonalization as an iterative


algorithm to construct orthonormal bases
By the way, the order in which you are doing this does not matter

39 / 44
When you work this out (and you should!), you get
𝑋̃ 2 = (𝑋2 − 𝜇2 )/𝜎2
𝜎
𝑋̈ 3 = (𝑋3 − 𝜇3 ) − 23 (𝑋2 − 𝜇2 )
𝜎22

where 𝜎23 ∶= Cov(𝑋2 , 𝑋3 )


What’s going on here?
𝑋̃ 2 is the same as before, it’s the standardized version of 𝑋2
𝑋̈ 3 is a particular version of 𝑋3 : it is the part of 𝑋3 that has zero
covariance with 𝑋2 ; it’s been orthogonolized
𝑋̃ 3 is an appropriately normalized version of 𝑋̈ 3 so that its length is 1
Also, convince yourself that if Cov(𝑋2 , 𝑋3 ) = 0 then
𝑋̃ 3 = (𝑋3 − 𝜇3 )/𝜎3

40 / 44
With the orthonormal basis it’s easy to construct the projection:
3
Psp(1,𝑋2 ,𝑋3 ) 𝑌 = ∑ E(𝑋̃ 𝑖 ⋅ 𝑌)𝑋̃ 𝑖
𝑖=1

It is tedious but not difficult to show that


Psp(1,𝑋2 ,𝑋3 ) 𝑌 = E𝑌 + 𝛽∗2 (𝑋2 − 𝜇2 ) + 𝛽∗3 (𝑋3 − 𝜇3 )
= 𝛽∗1 + 𝛽∗2 𝑋2 + 𝛽∗3 𝑋3

where
𝜎2𝑌 𝜎32 − 𝜎3𝑌 𝜎23
𝛽∗2 ∶=
𝜎22 𝜎32 − 𝜎23
2

𝜎3𝑌 𝜎22 − 𝜎2𝑌 𝜎23


𝛽∗3 ∶=
𝜎22 𝜎32 − 𝜎23
2

𝛽∗1 ∶= E𝑌 − 𝛽∗2 E𝑋2 − 𝛽∗3 E𝑋3

Looks awkward but it is an important result to internalize!

41 / 44
Look what happens when Cov(𝑋2 , 𝑋3 ) = 0:

𝜎2𝑌 𝜎32 − 𝜎3𝑌 𝜎23 𝜎2𝑌


𝛽∗2 ∶= =
𝜎22 𝜎32 − 𝜎23
2 𝜎22
𝜎3𝑌 𝜎22 − 𝜎2𝑌 𝜎23 𝜎3𝑌
𝛽∗3 ∶= =
𝜎22 𝜎32 − 𝜎23
2 𝜎32

42 / 44
How would you construct an orthonormal basis for
sp(𝑋1 , 𝑋2 , … , 𝑋𝐾 ) with 𝑋1 = 1 and 𝑋𝑘 ∈ 𝐿2 for 𝑘 = 2, … , 𝐾?
Again, use Gram-Schmidt orthogonalization with the inductive
definitions:

1. 𝑋̃ 1 ∶= 1
2. 𝑋̈ 2 ∶= 𝑋2 − E(𝑋2 𝑋̃ 1 )𝑋̃ 1
𝑋̃ 2 ∶= 𝑋̈ 2 / ∥𝑋̈ 2 ∥ = 𝑋̈ 2 /√Var 𝑋̈ 2
3. 𝑋̈ 3 ∶= 𝑋3 − E(𝑋3 𝑋̃ 1 )𝑋̃ 1 − E(𝑋3 𝑋̃ 2 )𝑋̃ 2
𝑋̃ 3 ∶= 𝑋̈ 3 / ∥𝑋̈ 3 ∥ = 𝑋̈ 3 /√Var 𝑋̈ 3
4. 𝑋̈ 4 ∶= 𝑋4 − E(𝑋4 𝑋̃ 1 )𝑋̃ 1 − E(𝑋4 𝑋̃ 2 )𝑋̃ 2 − E(𝑋4 𝑋̃ 3 )𝑋̃ 3
𝑋̃ 4 ∶= 𝑋̈ 4 / ∥𝑋̈ 4 ∥ = 𝑋̈ 4 /√Var 𝑋̈ 4
5. and so forth

43 / 44
The resulting projection will have the form
𝐾
̂ = P𝑋 𝑌 = ∑ E(𝑋̃ 𝑖 ⋅ 𝑌)𝑋̃ 𝑖 ,
𝑌
𝑖=1

where, for simplicity, we write P𝑋 for Psp(𝑋1 ,…,𝑋𝐾 )


Another symbol we use a lot is 𝑌̂ which is defined to be P𝑋 𝑌
The projection can be summarized neatly in matrix notation:
Theorem
Let 𝑋 ∶= (𝑋1 , 𝑋2 , … , 𝑋𝐾 )′ be a 𝐾 × 1 vector. Then
P 𝑋 𝑌 = 𝑋 ′ 𝛽∗ ,

−1
where 𝛽∗ ∶= (E(𝑋𝑋 ′ )) E(𝑋𝑌).

−1
For simplicity we will write E(𝑋𝑋 ′ )−1 for (E(𝑋𝑋 ′ ))

44 / 44

You might also like