0% found this document useful (0 votes)

28 views38 pages

Rediction Error Methods

The document provides a comprehensive overview of Prediction Error Methods (PEM) for system identification, detailing various model structures and optimization techniques. It discusses the relationship between state space models and input-output models, as well as the derivation of prediction error criteria used to identify optimal parameters. Additionally, it includes practical implementation guidance, particularly in MATLAB, and compares different methods for estimating model parameters.

Uploaded by

Gary Rey

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views38 pages

Rediction Error Methods

Uploaded by

Gary Rey

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

PREDICTION ERROR METHODS

David Di Ruscio
Telemark University College
N-3414 Porsgrunn, Norway

Revised: March 22, 2024

March 22, 2024

Contents
1 Introduction 2

2 Overview of the prediction error methods 2

2.1 Further remarks on the PEM . . . . . . . . . . . . . . . . . . . . 7
2.2 Derivatives of the prediction error criterion . . . . . . . . . . . . . 9
2.3 Least Squares and the prediction error method . . . . . . . . . . . 10
2.3.1 Linear regression models . . . . . . . . . . . . . . . . . . . 10
2.3.2 The least squares method . . . . . . . . . . . . . . . . . . 10
2.3.3 Matrix derivation of the least squares method . . . . . . . 12
2.3.4 Alternative matrix derivation of the least squares method . 14
2.3.5 Ridge regularization method . . . . . . . . . . . . . . . . . 14

3 Input and output model structures 15

3.1 ARMAX model structure . . . . . . . . . . . . . . . . . . . . . . . 15
3.2 ARX model structure . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.3 OE model structure . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.4 BJ model structure . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4 Optimal one-step-ahead predictions 21

4.1 State Space Model . . . . . . . . . . . . . . . . . . . . . . . . . . 21
4.2 Input-output model . . . . . . . . . . . . . . . . . . . . . . . . . . 22

5 Optimal M-step-ahead prediction 22

5.1 State space models . . . . . . . . . . . . . . . . . . . . . . . . . . 22

1
6 Matlab implementation 24
6.1 Tutorial: SS-PEM Toolbox for MATLAB . . . . . . . . . . . . . . 24

7 Recursive ordinary least squares method 27

7.1 Comparing ROLS and the Kalman filter . . . . . . . . . . . . . . 31
7.2 ROLS with forgetting factor . . . . . . . . . . . . . . . . . . . . . 32
7.2.1 Recursive computation of Pt . . . . . . . . . . . . . . . . . 33
7.2.2 Recursive computation of θ̂t . . . . . . . . . . . . . . . . . 33

8 Higher order ARX modeling 34

8.1 Miscellaneous examples . . . . . . . . . . . . . . . . . . . . . . . . 35

9 Examples on using the Ordinary Least Squares method 36

1 Introduction
An overview of the Prediction Error Method (PEM) for system identification is
given. Furthermore, a description of the common discrete time input and output
polynomial model structures, which are frequently used in prediction error system
identification methods, is presented in this note. The relationship between the
state space model and the input and output model is in particular pointed out.

2 Overview of the prediction error methods

Given the state space model on innovations form.

x̄k+1 = Ax̄k + Buk + Kek , (1)

yk = Dx̄k + Euk +ek , (2)
| {z }
ȳk

where ∆ = E(ek eTk ) is the covariance matrix of the innovations process and x̄1 is
the initial predicted state. Suppose now that the model is parameterized, i.e. so
that the free parameters in the model matrices (A, B, K, D, E, x1 ) are organized
into a parameter vector θ. The problem is to identify the ”best” parameter vector
from known output and input data matrices (Y, U ). The optimal predictor, i.e.
the optimal prediction, ȳk , for the output yk , is then of the form

x̄k+1 = (A − KD)x̄k + (B − KE)uk + Kyk , (3)

ȳk (θ) = Dx̄k + Euk , (4)

with initial predicted state x̄1 . ȳk (θ) is the prediction of the output yk given
inputs u up to time k, outputs y up to time k − 1 and the parameter vector θ.
Note that if E = 0m×r then inputs u only up to k − 1 are needed. The free

2
parameters in the system matrices are mapped into the parameter vector (or visa
versa). Note that the predictor ȳk (θ) is (only) optimal for the parameter vector
θ which minimize some specified criterion. This criterion is usually a function of
the prediction errors. Note also that it is common to use the notation ȳk|θ for the
prediction. Hence, ȳk|θ = ȳk (θ).
Define the Prediction Error (PE)
k (θ) = yk − ȳk (θ). (5)
A good model is a model for which the model parameters θ results in a ”small”
PE. Hence, it make sense to use a PE criterion which measure the size of the
PE. A PE criterion is usually always in one ore another way defined as a scalar
function of the following important expression and definition
N
1 X
R (θ) = k (θ)k (θ)T ∈ Rm×m , (6)
N k=1

which is the sample covariance matrix of the PE. This means that we want to find
the parameter vector which make R as small as possible. Hence, it make sense
to use a PE criterion which measure the size of the sample covariance matrix of
the PE. A common scalar PE criterion for multivariable output systems is thus
N N
1 X T 1 X
VN (θ) = tr( k (θ)k (θ) ) = k (θ)T k (θ). (7)
N k=1 N k=1

Note that the trace of a matrix is equal to the sum of its diagonal elements and
that tr(AB) = tr(BA) of two matrices A and B of appropriate dimensions. We
will in the following give a discussion of the PE criterion as well as some variants
of it. Define a PE criterion as follows
N
1 X
VN (θ) = `(k (θ)) (8)
N k=1

where `(·) is a scalar valued function, e.g. the Euclidean l2 norm, i.e.
`(k (θ)) =k k (θ) k22 = k (θ)T k (θ), (9)
or a quadratic function
`(k (θ)) = k (θ)T Λk (θ) = tr(Λk (θ)k (θ)T ), (10)
for some weighting matrix Λ. A common criterion for multiple output systems
(with weights) is thus also
N N
1 X T 1 X
VN (θ) = k (θ) Λk (θ) = tr(Λ( k (θ)k (θ)T )), (11)
N k=1 N k=1

3
where we usually simply are using Λ = I. The weighting matrix Λ is usually a
diagonal and positive matrix.
One should note that there exist an optimal weighting matrix Λ, but that this
matrix is difficult to define a-priori. The optimal weighting (when the number of
observations N is large) is defined from the knowledge of the innovations noise
covariance matrix of the system, i.e., Λ = ∆−1 where ∆ = E(ek eTk ). An interesting
solution to this would be to define the weighting matrix from the subspace system
identification method DSR or DSR e and use ∆ = F F T where F is computed by
the DSR algorithm.
The sample covariance matrix of the PE is positive semi-definite, i.e. R ≥ 0.
The PE may be zero for deterministic systems however for combined deterministic
and stochastic systems we usually have that R > 0, i.e. positive definite. This
means off-course in any case that R is a symmetric matrix. The eigenvalues of
a symmetric matrix are all real. Define λ1 , . . . , λm as the eigenvalues of R (θ) for
use in the following discussion.
Hence, a good parameter vector, θ, is such that the sample covariance matrix
R is small. The trace operator in (11), i.e. the PE criterion

VN (θ) = tr(R (θ)) = λ1 + . . . + λm , (12)

is a measure of the size of the matrix R (θ). Hence, the trace of a matrix is equal
to the sum of the diagonal elements of the matrix, but the trace is also equal to
the sum of the eigenvalues of a symmetric matrix.
An alternative PE criterion which often is used is the determinant, i.e.,

VN (θ) = det(R (θ)) = λ1 λ2 . . . λm . (13)

Hence, the determinant of the matrix is equal to the product of its eigenvalues.
This lead us to a third alternative, which is to use the maximum eigenvalue of
R (θ) as a measure of its size, i.e., we may use the following PE criterion

VN (θ) = λmax (R (θ)), (14)

where λmax (·) denotes the maximum eigenvalue of a symmetric matrix.

Note the special case for a single output system, then we have
N N
1 X 1 X
VN (θ) = k (θ)2 = (yk − ȳk (θ))2 . (15)
N k=1 N k=1

The minimizing parameter vector is defined by

θ̂N = arg min VN (θ) (16)

θ∈DM

where arg min denotes the operator which returns the argument that minimizes
the function. The subscript N is often omitted, hence θ̂ = θ̂N and V (θ) = VN (θ).

4
The definition (16) is a standard optimization problem. A simple solution is then
to use a software optimization algorithm which is dependent only on function
evaluations, i.e., where the user only have to define the PE criterion V (θ).
We will in the following give a discussion of how this may be done. The
minimizing parameter vector θ̂ = θ̂N ∈ Rp has to be searched for in the param-
eter space DM by some iterative non-linear optimization method. Optimization
methods are usually constructed as variations of the Gauss-Newton method and
the Newton-Raphson method, i.e.,

θi+1 = θi − αHi−1 (θi )gi (θi ) (17)

where α is defined as a line search (ore step length) scalar parameter chosen to
ensure convergence (i.e. chosen to ensure that V (θi+1 < V (θi )), and where the
gradient, gi (θi ), is

dV (θi )
gi (θi ) = ∈ Rp , (18)
dθi
and
dgi (θi ) d dV (θi ) d2 V (θi )
Hi (θi ) = = ( ) = ∈ Rp×p , (19)
dθiT dθiT dθi dθiT dθi

is the Hessian matrix. Remark that the Hessian is a symmetric matrix and that
it is positive definite in the minimum, i.e.,

d2 V (θ̂)
H(θ̂) = > 0, (20)
dθ̂T dθ̂

where θ̂ = θ̂N is the minimizing parameter vector. Note that the iteration scheme
(17) is identical to the Newton-Raphson method when α = 1. In practice one
often have to use a variable step length parameter α, both in order to stabilize
the algorithm and to improve the rate of convergence far from the minimum.
Once the gradient, gi (θi ) and the Hessian matrix Hi (θi ) (or an approximation of
the Hessian) have been computed, we can chose the line search parameter, α, as

α = arg min VN (θi+1 (α)) = arg min(θi − αHi−1 (θi )gi (θi )). (21)
α α

Equation (21) is an optimization problem for the line search parameter α. Once
α have been determined from the scalar optimization problem (21), the new
parameter vector θi+1 is determined from (17).
The iteration process (17) must be initialized with an initial parameter vector,
θ1 . This was earlier (before the subspace methods) a problem. However, a good
solution is to use the parameters from a subspace identification method. Equation
(17) can then be implemented in a while or for loop. That is to iterate Equation

5
(17) for i = 1, . . . , until convergence, i.e., until the gradient is sufficiently zero,
i.e., until gi (θi ) ≈ 0 for some i ≥ 1. This, and only such a parameter vector is
our estimate, i.e. θ̂ = θ̂N = θi when g(θi ) ≈ 0.
The iteration equation (17) can be deduced from the fact that in the minimum
we have that the gradient, g, is zero, i.e.,

dV (θ̂)
g(θ̂) = = 0. (22)
dθ̂
We can now use the Newton-Raphson method, which can be deduced as follows.
An expression of g(θ̂) can be defined from a Taylor series expansion of g(θ) around
θ, i.e.,
dg(θ)
0 = g(θ̂) ≈ g(θ) + dθT
(θ̂ − θ). (23)
θ

Using this and that g(θ̂) = 0 gives

θ̂ = θ − ( dg(θ)
dθT
)−1 g(θ). (24)
θ

This equation is the background for the iteration scheme (17), i.e., putting θ := θi
and θ̂ := θi+1 . Hence,

θi+1 = θi − ( dg(θ)
dθT
)−1 g(θi ). (25)
θi

Note that we in (17) has used the shorthand notation

dg(θi ) dg(θ)
= dθT
. (26)
dθiT θi

Note that the parameter vector, θ, the gradient, g, and the Hessian matrix have
structures as follows
 
θ1
θ =  ...  ∈ Rp , (27)
 
θp

  dV 

g1 dθ
dV (θ)  .   .. 1 
g(θ) = =  ..  =  .  ∈ Rp , (28)
dθ dV
gp dθp

 
 dg1 dg1  d2 V d2 V
dθ1
... dθp dθ12
... dθp dθ1
dg(θ) d2 V (θ)  . .. ..  =  .. .. .. 
 ∈ Rp×p .(29)
H= = =  .. . .   . . .
dθT dθT dθ dgp dgp

d2 V d2 V

dθ1
... dθp dθ1 dθp
... dθp2

6
The gradient and the Hessian can be computed numerically. However, it is usually
more efficient by an analytically computation if possible. Remark that a common
notation of the Hessian matrix is
d2 V (θ)
H= , (30)
dθ2
and that the elements in the Hessian is given by

∂ 2 V (θ)
hij = , (31)
∂θi ∂θj

where θi and θj are parameter number i and j, respectively, in the parameter

vector θ.
The iteration process (17) is guaranteed to converge to a local minimum, at
least theoretically. There may exist many local minima. However, our experience
is that the parameters from an estimated model from a subspace method is very
close to the minimum. Hence, this initial choice for θ1 should always be consid-
ered first. For systems with many outputs, many inputs and many states there
may be a huge number of parameters. The optimization problem may in some
circumstances be so complicated that the process (17) diverges even when the
initial parameter θ1 is close to minimum, due to numerical problems. Another
problem with PEM is the model parameterization (canonical form) for systems
with many outputs. One need to specify a canonical form of the state space
model, i.e. a state space realization where there are as few free parameters as
possibile in the model matrices (A, B, D, E, K, x1 ). A problem for multiple out-
put systems is that there may not even exist such a canonical form. Another
problem is that the PE criterion, VN (θ), may be almost in-sensitive to pertur-
bations in the parameter vector, θ, for the specified canonical form. Hence, the
optimization problem may be ill-conditioned.
An advantage of the PE methods is that it does not matter if the data (Y ,
U ) is collected from closed loop or open loop process operation. It is however
necessary that for open loop experiments, that the inputs, uk , are rich enough
for the specified model structure (usually the model order, n). For closed loop
experiments we must typically require that the feedback is not to simple, e.g.
not a simple proportional controller uk = Kp yk . However, feedback of the type
uk = Kp (rk − yk ) where the reference, rk , is perturbed gives data which are
informative enough.

2.1 Further remarks on the PEM

Note also that in the multivariable case we have that the parameter estimate

θ̂N = arg min tr(ΛR (θ)), (32)

7
with Λ = ∆−1 where ∆ = (E(ek eTk )), has the same asymptotic covariance matrix
as the parameter estimate

θ̂N = arg min det(R (θ)). (33)

It can also be shown that the PEM parameter estimate for Gaussian distributed
disturbances, ek , (33), or equivalently the PEM estimate (32) with the optimal
weighting, is identical to the Maximum Likelihood (ML) parameter estimate. The
PEM estimates (32) or (33) are both statistically optimal. The only drawback by
using (33), i.e., minimizing the determinant PE criterion VN (θ) = det(R (θ)), is
that it requires more numerical calculations than the trace criterion. However, on
the other side the evaluation of the PE criterion (32) with the optimal weighting
matrix Λ = ∆−1 is not (directly) realistic since the exact covariance matrix ∆
is not known in advance. However, note that an estimate of ∆ can be built up
during the iteration (optimization) process.
The parameter estimate θ̂N is a random vector, i.e., Gaussian distributed
when ek is Gaussian. This means that θ̂N has a mean and a variance. We want
the mean to be as close to the true parameter vector as possibile and the variance
to be as small as possible. We can show that the PEM estimate θ̂N is consistent,
i.e., the mean of the parameter estimate, θ̂N , converges to the true parameter
vector, θ0 , as N tends to infinity. In other words we have that E(θ̂N ) = θ0 .
Consider g(θ̂N ) = 0 expressed as a Taylor series expansion of g(θ0 ) around the
true parameter vector θ0 , i.e.

0 = g(θ̂N ) ≈ g(θ0 ) + H(θ0 )(θ̂N − θ0 ). (34)

where the Hessian matrix

dg(θ)
H(θ0 ) = dθ
, (35)
θ0

is a deterministic (constant) matrix. However, the gradient, g(θ0 ), is a random

vector with zero mean and covariance matrix P0 . From this we have that the
difference between our estimate, θ̂N and the true vector θ0 is given by

θ̂N − θ0 = −H −1 (θ0 )g(θ0 ). (36)

Hence,

E(θ̂N − θ0 ) = −H −1 (θ0 )E(g(θ0 )) = 0. (37)

This shows consistency of the parameter estimate, i.e.

E(θ̂N ) = θ0 , (38)

because θ0 is deterministic.

8
The parameter estimates (33) and (32) with optimal weighting are efficient,
i.e., they ensures that the parameter covariance matrix

P = E((θ̂N − θ0 )(θ̂N − θ0 )T ) = H −1 (θ0 )E(g(θ0 )g(θ0 )T )H −1 (θ0 ) (39)

is minimized. The covariance matrix P may be estimated from the data by

evaluating (39) numerically by using the approximation θ0 ≈ θ̂N .
For single output systems (m = 1) we have that the parameter covariance
matrix, P , can be expressed as

P = E((θ̂N − θ0 )(θ̂N − θ0 )T ) = ∆(E(ψk (θ0 )ψkT (θ0 )))−1 , (40)

where ∆ = E(ek eTk ) = E(e2k ) in the single output case, and with

dȳk (θ)
ψk (θ0 ) = |θ0 ∈ Rp×m (41)
dθ
Loosely spoken, Equation (40) states that the variance, P , of the parameter
estimate, θ̂N , is ”small” if the covariance matrix of ψk (θ0 ) is large. This covariance
matrix is large if the predictor ȳk (θ) is ”very” sensitive to (perturbations in) the
parameter vector θ.

2.2 Derivatives of the prediction error criterion

Consider the PE criterion (8) with (10), i.e.,
`(k (θ))
N N
1 X 1 X zT }| {
VN (θ) = `(k (θ)) = k (θ)Λk (θ) . (42)
N k=1 N k=1

The derivative of the scalar valued function ` = `(k (θ)) with respect to the
parameter vector θ can be expressed from the chain rule

∂`(k (θ)) ∂k ∂`(k (θ)) ∂ ȳk (θ)

= =− 2Λk . (43)
∂θ ∂θ ∂k ∂θ
Define the gradient matrix of the predictor ȳk ∈ R×m with respect to the param-
eter vector θ ∈ R×p as
∂ ȳk (θ)
ψk (θ) = ∈ Rp×m . (44)
∂θ
This gives the following general expression for the gradient, i.e.,
N N
∂VN (θ) 1 X ∂`(k (θ)) 1 X
g(θ) = = = −2 ψk (θ)Λk (45)
∂θ N k=1 ∂θ N k=1

9
2.3 Least Squares and the prediction error method
The optimal predictor, ȳk (θ), is generally a non-linear function of of the unknown
parameter vector, θ. The PEM estimate can in this case in generally not be
solved analytically. However, in some simple and special cases we have that the
predictor is a linear function of the parameter vector. This problem have an
analytical solution and the corresponding PEM is known as the Least Squares
(LS) method. we will in this section give a short description of the solution to
this problem.

2.3.1 Linear regression models

Consider the simple special case of the general linear system (1) and (2) described
by

yk = Euk + ek , (46)

where yk ∈ Rm , E ∈ Rm×r , uk ∈ Rr and ek ∈ Rm is white with covariance matrix

∆ = E(ek eTk ). The model (46) can be written as a linear regression

yk = ϕTk θ + ek , (47)

where

ϕTk = uTk ⊗ Im ∈ Rm×rm (48)

and the true parameters in the system, θ0 , is related to those in E as

θ = vec(E) ∈ Rmr . (49)

Note that the number of parameters in this case is p = mr. Furthermore,

note that (47) is a standard notation of a linear regression equation used in
the identification literature. In order to deduce (47) from (46) we have used that
vec(AXB) = (B T ⊗ A)vec(X). Using this and the fact that Euk = Im Euk gives
(47).
Also note that dynamic systems described by ARX models can be written as
a linear regression of the form (47).

2.3.2 The least squares method

Given a linear regression of the form

yk = ϕTk θ0 + ek , (50)

where yk ∈ Rm , ϕk ∈ Rp×m , ek ∈ Rm is white with covariance matrix ∆ = E(ek eTk )

∈ Rm×m and where θ0 ∈ Rp is the true parameter vector.

10
Here ϕk is a vector of known variables (or quantities) and these variables is
often called regression variables or regressors. The output variables yk is also
called the regressed variables.
A natural predictor is as usual

ȳk (θ) = ϕTk θ. (51)

We will in the following find the parameter estimate, θ̂N , which minimizes the
PE criterion
N
1 X T
VN (θ) = (θ)Λk (θ), (52)
N k=1 k

where k = yk − ϕTk θ is the Prediction Error (PE). The gradient matrix, ψk (θ),
defined in (44) is in this case given by
∂ ȳk (θ)
ψk (θ) = = ϕk ∈ Rp×m . (53)
∂θ
Using this and the expression (45) for the gradient, g(θ), of the PE criterion,
VN (θ), gives
N N
∂VN (θ) 1 X T 1 X
g(θ) = = −2 ϕk Λ(yk − ϕk θ) = −2 (ϕk Λyk − ϕk ΛϕTk θ)). (54)
∂θ N k=1 N k=1

The OLS and the PEM estimate is here simply given by solving g(θ) = 0, i.e.,

XN N
X
T −1
θ̂N = ( ϕk Λϕk ) ϕk Λyk . (55)
k=1 k=1

if the indicated inverse (of the Hessian) exists. Note that the Hessian matrix in
this case is simply given by
N
∂g(θ) 1 X
H(θ) = T
= 2 ϕk ΛϕTk . (56)
∂θ N k=1

The Hessian (symmetric) matrix should be positive definite, H(θ) > 0, for Eq.
(55) to be a minimum. This indicates a positive weighting matrix Λ > 0.
The gradient vector in Eq. (54) may also be derived using the chain rule as
follows
N N
∂(VN (θ)) 1 X ∂k ∂(k (θ)T Λk (θ)) −2 X
g(θ) = = = ϕΛ(yk − ϕT θ). (57)
∂θ N k=1 ∂θ ∂k N k=1

Putting g(θ) = 0 gives the optimal solution in Eq. (55).

11
2.3.3 Matrix derivation of the least squares method
For practical reasons when computing the least squares solution as well as for
the purpose of analyzing the statistical properties of the estimate it may be
convenient to write the linear regression (47) in vector/matrix form as follows

Y = Φθ0 + e, (58)

where Y ∈ RmN , Φ ∈ RmN ×p , θ0 ∈ Rp , p = mr and e ∈ RmN and given by

     
y1 ϕT1 e1
 y2   ϕT   e2 
 2 
Y =  ..  ∈ RmN , Φ =  ..  ∈ RmN ×p , e =  ..  ∈ RmN . (59)
   
 .   .   . 
yN ϕTN eN

Furthermore, e, is zero mean with covariance matrix ∆ ˜ = E(eeT ). This covariance

matrix will be discussed later in this section.. Define
 
1
 .. 
 . 
ε =  k  = Y − Φθ, (60)
 
 . 
 .. 
N

as the matrix of prediction errors. Hence we have that the PE criterion is given
by
N
1 X T 1
VN (θ) = k (θ)Λk (θ) = εT Λ̃ε, (61)
N k=1 N

where Λ̃ is a block diagonal matrix with Λ on the block diagonals, i.e.,

 
Λ 0 ... 0
 0 Λ ... 0 
Λ̃ =  .. .. . . ..  ∈ RmN ×mN . (62)
 
 . . . . 
0 0 ... Λ

The nice thing about (61) is that the summation is not present in the last ex-
pression of the PE criterion. Hence we can obtain a more direct derivation of the
parameter estimate. The gradient in (54) can in this case simply be expressed as

∂VN (θ) ∂ ∂VN (θ) 1

g(θ) = = = (−ΦT )2Λ̃(Y − Φθ). (63)
∂θ ∂θ ∂ N

12
The Hessian is given by
∂g(θ) 1
H(θ) = T
= 2 ΦT Λ̃Φ. (64)
∂θ N
Solving g(θ) = 0 gives the following expression for the estimate
θ̂N = (ΦT Λ̃Φ)−1 ΦT Λ̃Y, (65)
which is identical to (55). It can be shown that the optimal weighting is given by
Λ = ∆−1 . The solution (55) or (65) with the optimal weighting matrix Λ = ∆−1 is
known in the literature as the Best Linear Unbiased Estimate (BLUE). Choosing
Λ = Im in (55) or (65) gives the OLS solution.
A short analysis of the parameter estimate is given in the following. Substi-
tuting (58) into (65) gives an expression of the difference between the parameter
estimate and the true parameter vector, i.e.,
θ̂N − θ0 = (ΦT Λ̃Φ)−1 ΦT Λ̃e. (66)
The parameter estimate (65) is an unbiased estimate since
E(θ̂N ) = θ0 + (ΦT Λ̃Φ)−1 ΦT Λ̃E(e) = θ0 . (67)
The covariance matrix of the parameter estimate is given by
P = E((θ̂N − θ0 )(θ̂N − θ0 )T ) = (ΦT Λ̃Φ)−1 ΦT Λ̃E(eeT )Λ̃Φ(ΦT Λ̃Φ)−1 . (68)
˜ −1 where ∆
Suppose first that we are choosing the weighting matrix Λ̃ = ∆ ˜ =
T ˜
E(ee ). It should be noted that ∆ also is a block diagonal matrix with ∆ =
E(ei eTi ) on the block diagonals. Then we have
N
X
T
PBLUE = E((θ̂N − θ0 )(θ̂N − θ0 ) ) = ˜ −1 Φ)−1 .
ϕk ∆−1 ϕTk = (ΦT ∆ (69)
k=1

The Ordinary Least squares (OLS) estimate is obtained by choosing Λ = Im , i.e.,

no weighting. For the OLS estimate we have that
POLS = E((θ̂N − θ0 )(θ̂N − θ0 )T ) = (ΦT Φ)−1 ΦT E(eeT )Φ(ΦT Φ)−1 . (70)
In the single output case we have that ∆ ˜ = E(eeT ) = δ0 IN . Using this in (70)
gives the standard result for univariate (m = 1) data which is presented in the
literature, i.e.,
POLS = E((θ̂N − θ0 )(θ̂N − θ0 )T ) = δ0 (ΦT Φ)−1 . (71)
An important result is that
PBLUE ≤ POLS . (72)
In fact, all other symmetric and positive definite weighting matrices gives a larger
parameter covariance matrix than the covariance matrix of the BLUE estimate.

13
2.3.4 Alternative matrix derivation of the least squares method
Another linear regression model formulation which is frequently used, e.g., in the
Chemometrics literature, is given by

Y = XB + E, (73)

where Y ∈ RN ×m is a matrix of dependent variables (outputs), X ∈ RN ×r is a

matrix of the independent variables (regressors or inputs), B ∈ Rm×r is a matrix
of system parameters (regression coefficients). E ∈ RN ×m is a matrix of white
noise.
The OLS solution is simply obtained by solving the normal equations X T Y =
X T XB for B, i.e,

BOLS = (X T X)−1 X T Y, (74)

where X T X is non-singular.
Alternative approximations to the inverse X 0 ∗ X is obtained by the Partial
Least Squares (PLS) method solved by reduced rank analysis and the Singular
Value Decomposition (SVD). Furthermore a popular approximation is the Partial
Least Squares Regression (PLS) method. Here in the next subsection we mention
the Ridge regression method.

2.3.5 Ridge regularization method

The Ridge regression method is a method to overcome possible problems by
calculation the inverse and ill-conditioned problems with the matrix X T X in
(74) by simply adding a ridge on the diagonal, i.e. instead work with the inverse
of a regularized matrix X T X + δI for some positive parameter δ > 0.

BOLS Ridge = (X T X + δI)−1 X T Y, (75)

fore some nonzero scalar parameter δ > 0.

Adding a ridge on the diagonal of X T X will include a bias in the solution but
the variance may be smaller. This is a trade of between bias and variance.

Example 2.1 (Regularization and system identification) Given a simple

linear dynamic model

xk+1 = axk + buk + Kek , (76)

y k = xk + e k (77)

where we fix the true parameters as a = 0.9 and b = 0.5.

14
When K = a this gives an ARX or linear regression model of the form

yk = ϕTk θ + ek , (78)

where

a
ϕTk

= yk−1 uk−1 , θ= . (79)
b

This may also be written as a linear regression model as in Eq. (74) with
   
y2 y1 u1
 y3   y2 u2 

a
Y =  ..  , X =  .. ..  , B = b . (80)
   
 .   . . 
yN yN −1 uN −1

3 Input and output model structures

Input and output discrete time model structures are frequently used in connection
with the prediction error methods and software. We will in this section give a
description of these model structures and the connection with the general state
space model structure.

3.1 ARMAX model structure

An ARMAX model structure is defined by the (polynomial) model

A(q)yk = B(q)uk + C(q)ek . (81)

The acronym ARMAX comes from the fact that the term A(q)yk is defined as
an Auto Regressive (AR) part, the term C(q)ek is defined as an Moving Average
(MA) part and that the part B(q)uk represents eXogenous (X) inputs. Note in
connection with this that an Auto Regressive (AR) model is of the form A(q)yk =
ek , i.e. with additive equation noise. The noise term, C(q)ek , in a so called ARMA
model A(q)yk = C(q)ek represents a moving average of the white noise ek .
In general we have that the polynomials in Eq. (81) may be expressed as

A(q) = 1 + a1 q −1 + . . . + ana q −na , (82)

B(q) = b1 q −1 + . . . + bnb q −nb , (83)
C(q) = 1 + c1 q −1 + . . . + cnc q −nc , (84)

when yk , uk and ek are scalar signals and where na, nb and nc are the order
of the A(q), B(q) and the C(q) polynomials, respectively. Note also that the

15
polynomials also may be written as A(q) = A(q −1 ), B(q) = B(q −1 ) and C(q) =
C(q −1 ) where q −1 is the shift operator such that
q −1 yk = yk−1 . (85)
The ARMAX model structure can be deduced from a general linear SISO
state space model. MIMO systems is not considered in this Section. This will be
illustrated in the following example.
Example 3.1 (Estimator canonical form to polynomial form) Consider the
following single input and single output discrete time state space model on esti-
mator canonical form

−a1 1 b1 c1
xk+1 = xk + uk + vk (86)
−a0 0 b0 c0

yk = 1 0 xk + euk + f vk (87)
where uk is a known
1 deterministic
input signal, vk is an unknown white noise
T 2
process and xk = xk+1 xk+1 is the state vector.
An input output model formulation can be derived as follows
x1k+1 = −a1 x1k + x2k + b1 uk + c1 vk (88)
x2k+1 = −a0 x1k + b0 uk + c0 vk (89)
yk = x1k + euk + f vk (90)
Express Equation (88) with k =: k +1 and substitute for x2k+1 defined by Equation
(89). This gives an equation in terms of the 1st state x1k . Finaly, eliminate x1k by
using Equation (90). This gives
 
y k
1 a1 a0  yk−1 
yk−2
   
uk vk
= e b1 + a1 e b0 + a0 e  uk−1  + f c1 + a1 f c0 + a0 f  vk−1  (91)
uk−2 vk−2
Let us introduce the backward shift operator q −1 such that q −1 uk = uk−1 . This
gives the following polynomial (or transfer function) model
A(q)
z }| {
(1 + a1 q −1 + a0 q −2 ) yk
B(q) C(q)
z }| { z }| {
−1 −2 −1 −2
= (e + (b1 + a1 e)q + (b0 + a0 e)q ) uk + (f + (c1 + a1 f )q + (c0 + a0 f )q ) vk
(92)
which is equal to an ARMAX model structure.

16
Example 3.2 (Observability canonical form to polynomial form)
Consider the following single input and single output discrete time state space
model on observability canonical form

0 1 b0 c0
xk+1 = xk + uk + vk , (93)
−a1 −a0 b1 c1

yk = 1 0 xk + euk + f vk , (94)

where uk is a known
1deterministic
input signal, vk is an unknown white noise
T 2
process and xk = xk+1 xk+1 is the state vector. An input and output model
can be derived by eliminating the states in (93) and (94). We have

x1k+1 = x2k + b0 uk + c0 vk , (95)

x2k+1 = −a1 x1k − a0 x2k + b1 uk + c1 vk , (96)
yk = x1k + euk + f vk . (97)

Solve (95) for x2k and substitute into (96). This gives an equation in x1k , i.e.,

x1k+2 − b0 uk+1 − c0 vk+1 = −a1 x1k − a0 (x1k+1 − b0 uk − c0 vk ) + b1 uk + c1 vk . (98)

Solve (97) for x1k and substitute into (98). This gives

yk+2 − euk+2 − f vk+2 − b0 uk+1 − c0 vk+1 = −a1 (yk − euk − f vk )

−a0 (yk+1 − euk+1 − f vk+1 ) + a0 b0 uk + a0 c0 vk + b1 uk + c1 vk . (99)

This gives the input and output equation

   
y k+2 uk+2
1 a0 a1  yk+1  = e b0 + a0 e b1 + a0 b0 + a1 e  uk+1 
yk uk
 
vk+2
+ f c0 + a0 f c1 + a0 c0 + a1 f  vk+1  . (100)
vk

Hence, we can write (100) as an ARMAX polynomial model

A(q)yk = B(q)uk + C(q)vk , (101)

A(q) = 1 + a0 q −1 + a1 q −2 , (102)
B(q) = e + (b0 + a0 e)q −1 + (b1 + a0 b0 + a1 e)q −2 , (103)
C(q) = f + (c0 + a0 f )q −1 + (c1 + a0 c0 + a1 f )q −2 . (104)

17
3.2 ARX model structure
An Auto Regression (AR) model A(q)yk = ek with eXtra (or eXogenous) inputs
(ARX) model can be expressed as follows

A(q)yk = B(q)uk + ek (105)

where the term A(q)yk is the Auto Regressive (AR) part and the term B(q)uk
represents the part with the eXtra (X) inputs. The eXtra variables uk are called
eXogenous in econometrics. Note also that the ARX model also is known as an
equation error model because of the additive error or white noise term ek .
It is important to note that the parameters in an ARX model, e.g. as defined
in Equation (105) where ek is a white noise process, can be identified directly
by the Ordinary Least Squares (OLS) method, if the inputs, uk , are informative
enough. A problem with the ARX structure is off-course that the noise model
(additive noise) is to simple in many practical cases.

Example 3.3 (ARX and State Space model structure) Comparing the ARX
model structure (105) with the more general model structure given by (92) we find
that the state space model (86) and (87) is equivalent to an ARX model if

c1 = −a1 f, c0 = −a0 f, ek = f vk (106)

This gives the following ARX model

A(q) B(q)
z }| { z }| {
(1 + a1 q −1 + a0 q −2 ) yk = (e + (b1 + a1 e)q −1 + (b0 + a0 e)q −2 ) uk + f vk (107)

which, from the above discussion, have the following state space model equivalent

−a1 1 b1 −a1
xk+1 = xk + uk + f vk (108)
−a0 0 b0 −a0

yk = 1 0 xk + euk + f vk (109)

It is at first sight not easy to see that this state space model is equivalent to an
ARX model. It is also important to note that the ARX model has a state space
model equivalent. The noise term ek = f vk is white noise if vk is withe noise. The
noise ek appears as both process (state equation) noise and measurements (output
equation) noise. The noise is therefore filtred throug the process dynamics.

18
3.3 OE model structure
An Output Error (OE) model structure can be represented as the following poly-
nomial model

A(q)yk = B(q)uk + A(q)ek (110)

or equivalently for single output plants

B(q)
yk = uk + ek (111)
A(q)

Example 3.4 (OE and State Space model structure) From (92) we find that
the state space model (86) and (87) is equivalent to an OE model if

c1 = 0, c0 = 0, f = 1, vk = ek (112)

Hence, the following OE model

A(q) B(q)
z }| { z }| {
(1 + a1 q −1 + a0 q −2 ) yk = (e + (b1 + a1 e)q −1 + (b0 + a0 e)q −2 ) uk
A(q)
z }| {
+ (1 + a1 q −1 + a0 q −2 ) ek (113)

have the following state space model equivalent

−a1 1 b1
xk+1 = xk + uk (114)
−a0 0 b0

yk = 1 0 xk + euk + ek (115)

Note that the noise term ek appears in the state space model as an equivalent
measurements noise term or output error term.

3.4 BJ model structure

In an ARMAX model structure the dynamics in the path from the inputs uk to
the output yk is the same as the dynamics in the path from the process noise ek
to the output. In practice it is quite realistic that some or all of the dynamics
in the two paths are different. This can be represented by the Box Jenkins (BJ)
model structure

F (q)D(q)yk = D(q)B(q)uk + F (q)C(q)ek (116)

19
or for single output systems

B(q) C(q)
yk = uk + ek (117)
F (q) D(q)

In the BJ model the moving average noise (coloured noise) term C(q)ek is filtred
throug the dynamics represented by the polynomial D(q). Similarly, the dynamics
in the path from the inputs uk are represented by the polynomial A(q).
However, it is important to note that the BJ model structure can be repre-
sented by an equivalent state space model structure.

3.5 Summary
The family of polynomial model structures

BJ : Box Jenkins
ARM AX : Auto Regressive Moving Average with eXtra inputs
ARX : Auto Regressive with eXtra inputs (118)
ARIM AX : Auto Regressive Integrating Moving Average with eXtra inputs
OE : Output Error

can all be represented by a state space model of the form

yk = Dxk + Euk + F ek , (119)

xk+1 = Axk + Buk + Cek . (120)

When using the prediction error methods for system identification the model
structure and the order of the polynomial need to be specified. It is important
that the prediction error which is to be minimized is a function of as few unknown
parameters as possible. Note also that all of the model structures discussed above
are linear models. However, the optimization problem of computing the unknown
parameters are highly non-linear. Hence, we can run into numerical problems.
This is especially the case for multiple output and MIMO systems.
The subspace identification methods, e.g. DSR, is based on the general state
space model defined by (119) and (120. The subspace identification methods
is therefore flexibile enough to identify systems described by all the polynomial
model structures (118). Note also that not even the system order n need to be
specified beforehand when using the subspace identification method.

Remark 3.1 (Other notations frequently used)

Another notation for the backward shift operator q −1 (defined such that q −1 yk =
yk−1 ) is the z− operator, (z −1 such that z −1 yk = yk−1 . The notation A(q), B(q)
and so on are used by Ljung (1999). In Söderström and Stoica (1989) the notation
A(q −1 ), B(q −1 ) are used for the same polynomials.

20
4 Optimal one-step-ahead predictions
4.1 State Space Model
Consider the innovations model

x̄k+1 = Ax̄k + Buk + Kek , (121)

yk = Dx̄k + Euk + ek , (122)

where K is the Kalman filter gain matrix. A predictor ȳk for yk can be defined
as the two first terms on the right hand side of (122), i.e.

ȳk = Dx̄k + Euk . (123)

The equations for the (optimal) predictor (Kalman filter) is therefore given by

x̄k+1 = Ax̄k + Buk + K(yk − ȳk ), (124)

ȳk = Dx̄k + Euk , (125)

which can be written as

x̄k+1 = (A − KD)x̄k + (B − KE)uk + Kyk , (126)

ȳk = Dx̄k + Euk . (127)

Hence, the optimal prediction, ȳk , of the output yk can simply be obtained by
simulating (126) and (127) with a specified initial predicted state x̄1 . The results
is known as the one-step ahead predictions. The name one-step ahead predictor
comes from the fact that the prediction of yk+1 is based upon all outputs up to
time k as well as all relevant inputs. This can be seen by writing (126) and (127)
as

ȳk+1 = D(A − KD)x̄k + D(B − KE)uk + Euk+1 + DKyk , (128)

which is the one-step ahead prediction.

Note that the optimal prediction (126) and (127) can be written as the fol-
lowing transfer function model

ȳk (θ) = Hed (q)uk + Hes (q)yk , (129)

where

Hed (q) = D(qI − (A − KD))−1 (B − KE) + E, (130)

Hes (q) = D(qI − (A − KD))−1 K. (131)

The derivation of the one-step-ahead predictor for polynomial models is further

discussed in the next section.

21
4.2 Input-output model
The linear system can be expressed as the following input and output polynomial
model
yk = G(q)uk + H(q)ek . (132)
The noise term ek can be expressed as
H −1 (q)yk = H −1 (q)G(q)uk + ek . (133)
Adding yk on both sides gives
yk = (I − H −1 (q))yk + H −1 (q)G(q)uk + ek . (134)
The prediction of the output is given by the first two terms on the right hand
side of (134) since ek is white and therefore cannot be predicted, i.e.,
ȳk (θ) = (I − H −1 (q))yk + H −1 (q)G(q)uk . (135)
Loosely spoken, the optimal prediction of yk is given by the predictor, ȳk , so
that a measure of the difference ek = yk − ȳk (θ), e.g., the variance, is minimized
with respect to the parameter vector θ, over the data horizon. We also assume
that a sufficient model structure is used, and that the unknown parameters is
parameterized in θ. Ideally, the prediction error ek will be white.

5 Optimal M-step-ahead prediction

5.1 State space models
The aim of this section is to develop the optimal j-step-ahead predictions ȳk+j
∀ j = 1, ..., M . The optimal one-steap-ahead predictor, i.e., for j = 1, is defined
in (128). We will in the following derivation assume that only outputs up to
time k is available. Furthermore, it is assumed that all inputs which are needed
in the derivation are available. This is realistic in practice. Only, past outputs
. . . , yk−2 , yk−1 , yk are known. Note that we can assume values for the future
inputs, or the future inputs can be computed in an optimization strategy as in
Model Predictive Control (MPC). The Kalman filter on innovations form is
x̄k+1 = Ax̄k + Buk + Kek , (136)
yk = Dx̄k + Euk + ek . (137)
The prediction for j = 1, i.e., the one-step-ahead prediction, is given in (128).
The other predictions are derived as follows. The output at time k := k + 2 is
then defined by
x̄k+2 = Ax̄k+1 + Buk+1 + Kek+1 , (138)
yk+2 = Dx̄k+2 + Euk+2 + ek+2 . (139)

22
The (white) noise vectors, ek+1 and ek+2 are all in the future, and they can not
be predicted because they are white. The best prediction of yk+2 must then be
to put ek+1 = 0 and ek+2 = 0. Hence, the predictor for j = 2 is

xk+1 = x̄k+1 , (140)

xk+2 = Axk+1 + Buk+1 , (141)
ȳk+2 = Dxk+2 + Euk+2 . (142)

This can simply be generalized for j > 2 as presented in the following lemma.

Lemma 5.1 (Optimal j-step-ahead predictions)

The optimal j-step-ahead predictions, j = 1, 2, . . . , M , can be obtained by a pure
simulation of the system (A, B, D, E) with the optimal predicted Kalman filter
state x̄k+1 as the initial state.

xk+j+1 = Axk+j + Buk+j , (143)

yk+j = Dxk+j + Euk+j , ∀ j = 1, . . . , M (144)

where the initial state xk+1 = x̄k+1 is given from the Kalman filter state equation

x̄k+1 = (A − KD)x̄k + (B − KE)uk + Kyk , (145)

where the initial state x̄1 is known and specified. This means that all outputs up
to time, k, and all relevant inputs, are used to predict the output at time k + 1,
. . ., k + M .

Example 5.1 (Prediction model on matrix form (proper system))

Given the Kalman filter matrices (A, B, D, E, K) of an only proper process, and
the initial predicted state x̄k . The predictions ȳk+1 , ȳk+2 and ȳk+3 can be written
in compact form as follows
     
ȳk+1 D D
 ȳk+2  =  DA  (A − KD)x̄k +  DA  Kyk
ȳk+3 DA2 DA2
 
  uk
D(B − KE) E 0 0  uk+1 
+  DA(B − KE) DB E 0   . (146)
 u k+2

DA2 (B − KE) DAB DB E
uk+3

This formulation may be useful in e.g., model predictive control.

Example 5.2 (Prediction model on matrix form (strictly proper system))

23
Given the Kalman filter matrices (A, B, D, K) of an strictly proper system, and
the initial predicted state x̄k . The predictions ȳk+1 , ȳk+2 and ȳk+3 can be written
in compact form as follows
     
ȳk+1 D D
 ȳk+2  =  DA  (A − KD)x̄k +  DA  Kyk
ȳk+3 DA2 DA2
  
DB 0 0 uk
+  DAB DB 0   uk+1  . (147)
2
DA B DAB DB uk+2

This formulation is more realistic for control considerations, where we almost

always have a delay of one sample between the input and the output.

6 Matlab implementation
We will in this section illustrate a simple but general MATLAB implementation
of a Prediction Error Method (PEM).

6.1 Tutorial: SS-PEM Toolbox for MATLAB

The MATLAB files in this work is built up as a small toolbox, i.e., the SS-PEM
toolbox for MATLAB. This toolbox should be used in connection with the D-SR
Toolbox for MATLAB. An overview of the functions in the toolbox is given by
the command

>> help ss-pem

The main prediction error method is implemented in the function sspem.m.

An initial parameter vector, θ1 , is computed by first using the DSR algorithm
to compute an initial state space Kalman filter model, i.e. the corresponding
matrices (A, B, D, E, K, x1). This model is then transformed to observability
canonical form by the D-SR Toolbox function ss2cf.m. The initial parameter
vector, θ1 , is then taken as the free parameters in these canonical state space
model matrices. The minimizing parameter vector, θ̂, is then computed by the
Optimization toolbox function fminunc.m. Other optimization algorithms can
off-course be used.
A tutorial and overview is given in the following. Assume that output and
input data matrices, Y ∈ RN ×m and U ∈ RN ×r are given. A state space model
(Kalman filter) can then simply be identified by the PEM function sspem.m, in
the MATLAB command window as follows.

>> [A,B,D,E,K,x1]=sspem(Y,U,n);

24
A typical drawback with PEMs is that the system order, n, has to be specified
beforehand. A good solution is to analyze and identify the system order, n, by
the subspace algorithm DSR. The identified model is represented in observability
canonical form. That is a state space realization with as few free parameters as
possibile. This realization can be illustrated as follows. Consider a system with
m = 2 outputs, r = 2 inputs and n = 3 states, which can be represented by
a linear model. The resulting model from sspem.m will be on Kalman filter
innovations form, i.e.,
xk+1 = Axk + Buk + Kek , (148)
yk = Dxk + Euk + ek , (149)
with initial predicted state x1 given, or on prediction form, i.e.,
xk+1 = Axk + Buk + K(yk − ȳk ), (150)
ȳk (θ) = Dxk + Euk , (151)
and where the model matrices (A, B, D, E, K) and the initial predicted state x1
are given by
     
0 0 1 θ7 θ10 θ13 θ16
A =  θ1 θ3 θ5  , B =  θ8 θ11  , K =  θ14 θ17  , (152)
θ2 θ4 θ6 θ9 θ12 θ15 θ18
 
θ23
1 0 0 θ19 θ21
D= , E= , x1  θ24  . (153)
0 1 0 θ20 θ22
θ25
This model is on a so called observability canonical form, i.e. the third order
model has as few free parameters as possibile. The number of free parameters
are in this example 25. The observability canonical form is such that the D
matrix is filled with ones and zeros. We have that D = ones(m, n). Furthermore,
we have that a concatenation of theD matrix and the first n − m rows of the
D
A matrix is the identity matrix, i.e. = In . This may be used
A(1 : n − m, :)
as a rule when writing up the n − m first rows of the A matrix. The rest of
the A matrix as well as the matrices B, K, E and x1 is filled with parameters.
However, the user does not have to deal with the construction of the canonical
form, when using the SSPEM toolbox.
The total number of free parameters in a linear state space model and in the
parameter vector θ ∈ Rp are in general
p = mn + nr + nm + mr + n = (2m + r + 1)n + mr (154)
An existing DSR model can also be refined as follows. Suppose now that a
state space model, i.e. the matrices (A, B, D, E, K, x1) are identified by the DSR
algorithm as follows

25
>> [A,B,D,E,C,F,x1]=dsr(Y,U,L);
>> K=C*inv(F);

This model can be transformed to observability canonical form and the free pa-
rameters in this model can be mapped into the parameter vector θ1 by the func-
tion ss2thp.m, as follows

>> th_1=ss2thp(A,B,D,E,K,x1);

The model parameter vector θ1 can be further refined by using these parameters
as initial values to the PEM method sspem.m. E.g. as follows

>> [A,B,D,E,K,x1,V,th]=sspem(Y,U,n,th_1);

The value of the prediction error criterion, V (θ), is returned in V . The new and
possibly better (more optimal) parameter vector is returned in th.
Note that for a given parameter vector, then the state space model matrices
can be constructed by

>> (A,B,D,E,K,x1)=thp2ss(th,n,m,r);

It is also worth to note that the value, V (θ), of the prediction error criterion
is evaluated by the MATLAB function vfun mo.m. The data matrices Y and
U and the system order, n, must first be defined as global variables, i.e.

>> global Y U n

The PE criterion is then evaluated as

>> V=vfun_mo(th);

where th is the parameter vector. Note that the parameter vector must be of
length p as explained above.
See also the function ss2cf.m in the D-SR Toolbox for MATLAB which re-
turns an observability form state space realization from a state space model.

26
7 Recursive ordinary least squares method
We start this section by a simple example of how the mean of a variable, say yk ,
may be recursively estimated.

Example 7.1
The mean of a variable yk at present time t may be expressed as
t
1X
ȳt = yk . (155)
t k=1

In recursive identification we want to use new information at present time, i.e.,

yt , to update an estimate at the previous time instant. The sum in Eq. (155)
may be divided into two parts and expressed as follows
ȳt−1
}| z {
t−1 t−1
1 X t−1 1 X 1
ȳt = ( yk + yt ) = yk + yt . (156)
t k=1 t t − 1 k=1 t

This may be written as

1
ȳt = ȳt−1 + (yt − ȳt−1 ), (157)
t
where
t−1
1 X
ȳt−1 = yk , (158)
t − 1 k=1

is the mean at the previous time instant.

Eq. (157) is quite appealing since the mean is estimated very similarly as
a Kalman filter estimate, i.e., the new estimate is obtained as the sum of the
previous estimate plus a correction term.

The OLS estimate (55) can be written as

Xt t
X
T −1
θ̂t = ( ϕk Λϕk ) ϕk Λyk . (159)
k=1 k=1

We have simply replaced N in (55) by t in order to stress the dependence of θ̂ on

time, t. Let us define the indicated inverse in (159) by

Xt
Pt = ( ϕk ΛϕTk )−1 . (160)
k=1

27
From this definition we have that
Xt−1
Pt = ( ϕk ΛϕTk + ϕt ΛϕTt )−1 . (161)
k=1

which gives.

Pt−1 = Pt−1
−1
+ ϕt ΛϕTt (162)

Pt can be viewed as a covariance matrix which can be recursively computed at

each time step by (162).
The idea is now to write (159) as
t
X
θ̂t = Pt ϕk Λyk
k=1
Xt−1
= Pt ( ϕk Λyk + ϕt Λyt ) (163)
k=1

From (159) we have that θ̂t−1 is given by

t−1
X
θ̂t−1 = Pt−1 ϕk Λyk (164)
k=1

Substituting this into (163) gives

−1
θ̂t = Pt (Pt−1 θ̂t−1 + ϕt Λyt ) (165)

From (162) we have that

−1
Pt−1 = Pt−1 − ϕt ΛϕTt (166)

Substituting this into (165) gives

θ̂t = Pt ((Pt−1 − ϕt ΛϕTt )θ̂t−1 + ϕt Λyt ) (167)

Rearranging this we get

θ̂t = θ̂t−1 + Pt ϕt Λ(yt − ϕTt θt−1 ) (168)

We then have the following ROLS algorithm

Algorithm 7.1 (Recursive OLS algorithm)

The algorithm is summarized as follows:

28
Step 1 Initial values for Pt=0 and θ̂t=0 . It is common practice to take Pt=0 = ρIp
with ρ a ”large” constant and θ̂t=0 = 0 without any a-priori information.

Step 2 Updating the covariance matrix, Pt , and the parameter estimate, θt , as

follows
−1
Pt−1 = Pt−1 + ϕt ΛϕTt , (169)
Kt = Pt ϕt Λ, (170)

and

θ̂t = θ̂t−1 + Kt (yt − ϕTt θ̂t−1 ). (171)

At each time step we need to compute the inverse

−1
Pt = (Pt−1 + ϕt ΛϕTt )−1 . (172)

Using the following matrix inversion lemma

(A + BCD)−1 = A−1 − A−1 B(C −1 + DA−1 B)−1 DA−1 , (173)

we have that (172) can be written as

Pt = Pt−1 − Pt−1 ϕt (Λ−1 + ϕTt Pt−1 ϕt )−1 ϕTt Pt−1 . (174)

Example 7.2 (Recursive OLS algorithm)

Given a system described by a state space model

xk+1 = Axk + Buk + Cek (175)

yk = Dxk (176)

where

0 1 b1 c1
A= , B= , C= , D= 1 0 (177)
a1 a2 b2 c2

This gives the input and output (ARMAX) model

yk+2 = a2 yk+1 + a1 yk
+ b1 uk+1 + (b2 − a2 b1 )uk
+ ek+2 + (c1 − a2 )ek+1 + (c2 − a2 c1 − a1 )ek (178)

Putting c1 = a2 and c2 = a22 + a1 and changeing the time index with t = k + 2

gives the ARX model

yt = a2 yt−1 + a1 yt−2 + b1 ut−1 + (b2 − a2 b1 )ut−2 + et (179)

29
This can be written as a linear regression model
θ
z }| {
ϕT
t a2
z }| {  a1 
yt = yt−1 yt−2 ut−1 ut−2   +et . (180)
 b1 
b 2 − a2 b 1
An ROLS algorithm for the estimation of the parameter vector is implemented
in the following MATLAB script.

% File name: main_rols_ex.m

% Example:
% Implementation of a Reqursive Ordinary Least Squares (ROLS) algorithm
% for ARX model parameter estimation.
a1=-0.7; a2=1.5; b1=0.25; b2=0.625;
A=[0 1;a1 a2]; B=[b1;b2]; D=[1,0];

N=200;
rand(’state’,0), randn(’seed’,0)
u=randn(N,1);
u=prbs1(N,10,40);

randn(’seed’,0)

for i=1:5 % Simulate M=5 different data set..

y=dlsim(A,B,D,0,u);
y=y+randn(N,1)*0.01;

Th=zeros(N,4); % Array to hold parameter estimates

P=10000*eye(4); th=[0;0;0;0]; Lam=0.998; % Initial values
%Lam=1;
lf=1;
for t=3:N
Phi=[y(t-1);y(t-2);u(t-1);u(t-2)]; % Varphi_t matrix at time t.

ybar=Phi’*th; % Predikted output at time t.

Pi=inv(P); % Updating the covariance matrix, P_t.

Pi=lf*Pi+Phi*Lam*Phi’; P=inv(Pi);

K=PPhiLam; % Kalman gain, K_t.

th=th+K(y(t)-Phi’th); % Parameter estimates at time, t.

30
Th(t,:)=th’; % Store the parameter estimates.
end
Tm(i,:)=th’;
end

th
th0=[a2;a1;b1;b2-a2*b1]

figure(1)
subplot(411), plot(Th(:,1)), ylabel(’a_2’), title(’ROLS example’)
subplot(412), plot(Th(:,2)), ylabel(’a_1’)
subplot(413), plot(Th(:,3)), ylabel(’b_1’)
subplot(414), plot(Th(:,4)), ylabel(’b_2-a_2b_1’)
xlabel(’Diskret tid: t’)

figure(2)
subplot(211), plot(u), ylabel(’u_k’), title(’ROLS_example’)
subplot(212), plot(y), ylabel(’y_k’)
xlabel(’Diskret tid: t’)

7.1 Comparing ROLS and the Kalman filter

Let us compare the ROLS algorithm with the Kalman filter on the following
model.
θt+1 = θt , (181)
yt = ϕTt θt + wt , (182)
with given noise covariance matrix W = E(wt wtT ).
The kalman filter gives
ȳt = Dt θ¯t (183)
Kt = X̂t DtT W −1 = X̄t DtT (Dt X̄t DtT + W )−1 (184)
θ̂t = θ̄t + Kt (yt − ȳt ) (185)
θ̄t+1 = θ̂t (186)
X̂t = X̄t − X̄t DtT (W + Dt X̄t DtT )−1 Dt X̄t (187)
X̄t+1 = X̂t (188)
From this we have that X̄t = X̂t−1 which gives with Dt = ϕTt that
X̂t = X̂t−1 − X̂t−1 ϕt (W + ϕTt X̂t−1 ϕt )−1 ϕTt X̂t−1 (189)
Comparing this with the ROLS algorithm, Equation (174) shows that
Pt = X̂t = E((θt − θ̂t )(θt − θ̂t )T ) (190)

31
Using (186) in (185) gives
θ̂t = θ̂t−1 + Kt (yt − ϕTt θ̂t−1 ). (191)
Hence, θ̄t = θ̂t−1 . Finally, the comparisons with ROLS and the Kalman filter also
shows that
Λ = W −1 . (192)
Hence, the optimal weighting matrix in the OLS algorithm is the inverse of the
measurements noise covariance matrix.
The formulation of the Kalman filter gain presented in (184) is slightly differ-
ent from the more common formulation, viz.
Kt = X̄t DtT (W + Dt X̄DtT )−1 . (193)
In order to prove that (184) are equivalent we substititute X̂t into the first ex-
pression in (184), i.e.,
Kt = X̂t DtT W −1
= (X̄t − X̄t DtT (W + Dt X̄t DtT )−1 Dt X̄t )DtT W −1
= (X̄t DtT W −1 − X̄t DtT (W + Dt X̄t DtT )−1 Dt X̄t DtT W −1 )
= X̄t DtT (W −1 − (W + Dt X̄t DtT )−1 Dt X̄t DtT W −1 )
= X̄t DtT (W + Dt X̄DtT )−1 ((W + Dt X̄DtT )W −1 − Dt X̄t DtT W −1 )
= X̄t DtT (W + Dt X̄DtT )−1 . (194)

7.2 ROLS with forgetting factor

Consider the following modification of the objective in (52), i.e.,
t
X
Vt (θ) = λt−k Tk (θ)Λk (θ). (195)
k=1

where we simply have set N = t and omitted the mean 1t and included a forgetting
factor λ in order to be able to weighting the newest data more than old data. We
have typically that 0 < λ ≤ 1, often λ = 0.99.
The least squares estimate is given by
t
X
θ̂t = Pt λt−k ϕk Λyk . (196)
k=1

where
Xt
Pt = ( λt−k ϕk ΛϕTk )−1 . (197)
k=1

A recursive formulation is deduced as follows.

32
7.2.1 Recursive computation of Pt
Let us derive a recursive formulation for the covariance matrix Pt . We have from
the definition of Pt that
t
X
Pt = ( λt−k ϕk ΛϕTk )−1
k=1
t−1
X
= ( λt−k ϕk ΛϕTk + ϕt ΛϕTt )−1
k=1
Xt−1
= (λ λt−1−k ϕk ΛϕTk +ϕt ΛϕTt )−1 (198)
k=1
| {z }
−1
Pt−1

Using that
Xt−1
Pt−1 =( λt−1−k ϕk ΛϕTk )−1 (199)
k=1

gives
−1
Pt = (λPt−1 + ϕt ΛϕTt )−1 (200)
Using the matrix inversion lemma (173) we have the more common covariance
update equation
λPt = Pt−1 − Pt−1 ϕt (λΛ−1 + ϕTt Pt−1 ϕt )−1 ϕTt Pt−1 (201)

7.2.2 Recursive computation of θ̂t

From the OLS parameter estimate (196) we have that
t
X
θ̂t = Pt λt−k ϕk Λyk
k=1
Xt−1
= Pt ( λt−k ϕk Λyk + ϕt Λyt )
k=1
Xt−1
= Pt (λ λt−1−k ϕk Λyk +ϕt Λyt ). (202)
|k=1 {z }
−1
Pt−1 θ̂t−1

From the OLS parameter estimate (196) we also have that

t−1
X
θ̂t−1 = Pt−1 λt−1−k ϕk Λyk . (203)
k=1

33
Substituting (203) into (202) gives
−1
θ̂t = Pt (λPt−1 θ̂t−1 + ϕt Λyt ). (204)
From (200) we have that
−1
λPt−1 = Pt−1 − ϕt ΛϕTt (205)
Substituting this into (204) gives
θ̂t = Pt ((Pt−1 − ϕt ΛϕTt )θ̂t−1 + ϕt Λyt )
= θ̂t−1 + Pt ϕt Λyt − Pt ϕt ΛϕTt θ̂t−1 (206)
hence,
θ̂t = θ̂t−1 + Pt ϕt Λ(yt − ϕTt θ̂t−1 ) (207)

8 Higher order ARX modeling

It is well known that general linear dynamic systems, in state space form, or
input output ARMAX formulations, may be approximated by higher order ARX
systems. We will in this section discuss this and use it for subspace system
identification. One advantage of this approach is that the theory is applicable
for both open as well as closed loop systems. A drawback is that it requires
that the input experiment is rich enough with perturbations in order to estimate
the higher order ARX model, and this usually limit the approach for practical
reasons. The variance of the resulting model parameters will also usually be
larger than a direct prediction error approach. The discussion gives also some
relationships to the subspace system identification algorithm DSR e.
A system identification algorithm based on the identification of a higher order
ARX (OLS) model may be described in the following items:
Algorithm 8.1 (HARX algorithm)

1. Identify a higher order ARX model using the OLS approach. The order of
the ARX model should be chosen large enough, and such that the residual is
approximately white noise. One have to ensure that the input or reference
signal experiment is rich enough with perturbations in order for the OLS
problem to be well defined.
2. Form a higher order state space model from the ARX model parameters
and then form the necessary number of impulse response matrices. the
impulse response matrices may also be formed directly from the ARX model
parameters.
3. Use Hankel matrix realization theory to compute a reduced order state space
model of correct order.

34
8.1 Miscellaneous examples
Example 8.1 (Higher order ARX model)
Given a 1st order state space model in the form
xk+1 = axk + buk + cek , (208)
yk = xk + ek . (209)
This can be written as the following ARMAX model
yk + a1 yk−1 = b1 uk−1 + ek + c1 ek−1 , (210)
where a1 = −a, b1 = b and c1 = c − a.
Let us now investigate how well the parameters can be estimated by a third
order ARX model.
Putting k := k + 1 in (210) and substitute for ek solved from (210) into this
equation gives a second order difference equation. Repetition of this gives the
following third order difference model
yk+2 + (a1 − c1 )yk+1 + (c21 − c1 a1 )yk + c21 a1 yk−1 = b1 uk+1 − c1 b1 uk + c21 b1 uk−1
+ ek+2 + c31 ek−1 . (211)
This can be written as the approximate linear regression model
θ
z }| {
c 1 − a1
ϕT
k
 c1 a1 − c21 
z }| { 
 −c21


yk = yk−1 yk−2 yk−3 uk−1 uk−2 uk−3   +ek + c31 ek−3 (212)

 b1 

 −c1 b1 
c21 b1
The parameters in this model can be approximately estimated via an ARX model
solved by the Ordinary Least Squares (OLS) method when c31 = (c − a)3 ≈ 0. Note
also that the predictor (Kalman filter) on innovations form is stable. This means
that the magnitude of the eigenvalues of the predictor, a − cd, is less than one.
Hence, c31 = (c − a)3 ≈ 0 is a good assumption.

Example 8.2 (Higher order ARX model)

Given a 1st order state space model in the form
xk+1 = axk + buk + kek , (213)
y k = xk + e k . (214)
This can be written as the following 1st order ARMAX model
yk = ayk−1 + buk−1 + ek + (k − a)ek−1 , (215)

35
Note also that the noise term may be expressed as

ek = yk − ayk−1 − buk−1 − (k − a)ek−1 , (216)

Putting k := k − 1 in (216) and substituting into (215) gives the 2nd order ARX
model approximation

yk = kyk−1 − a(k − a)yk−2 + buk−1 − b(k − a)uk−2 + ek − (k − a)2 ek−2 , (217)

when (k − a)2 ≈ 0. Similarly expressing ek−2 from (216) and substituting into
(217) gives the 3rd order ARX model approximation

yk = kyk−1 + (−a(k − a) − (k − a)2 )yk−2 + a(k − a)2 yk−3

+ buk−1 − b(k − a)uk−2 + b(k − a)2 uk−3 + ek + (k − a)3 ek−3 , (218)

when (k − a)3 ≈ 0.

9 Examples on using the Ordinary Least Squares

method
Example 9.1 (Parameters in Antoine’s equation)
Given the Antoine equation for the relationship between temperature T and vapor
pressure p in pure components such as e.g. saturated steam, i.e.
b
p = ea− T +c . (219)

This is a non-linear relation between temperature T and vapor pressure p.

With some algebra we may formulate a linear regression model yk = ϕTk θ
θ
ϕT
}| {
z
yk k
z }| { z }| {
c
T ln(p) = − ln(p) T 1  a  (220)
ac − b

In order to show this from eq. (219) we find

b
ln(p) = a − . (221)
T +c
Multiplying (221)) with T + c gives

(T + c) ln(p) = a(T + c) − b, (222)

and from eq. (222) we find the linear regression model eq. (220).

36
Example 9.2 (Parameters in Antoine’s equation)
Given the Antoine equation for the relationship between temperature T , and vapor
pressure p, in pure components such as e.g. saturated steam, as also discussed in
Example 9.1 i.e.
b
p = ea− T +c . (223)

This is a non-linear relation between temperature T and vapor pressure p.

A linear regression model of the form yi = ϕTi θ where developed in Example
9.1, eq. (220 and we may use the OLS method to calculate the parameters a,
b and c in the nonlinear eq. (223) from some observations of temperature and
pressure.
Assume that we have given some pairs of observations {Ti , pi } ∀ i = 1, . . . , N ,
of temperature Ti and pressure pi in saturated steam and with N = 10 observations
as stored in Table 1.

Table 1: Pressure and temperature data in saturated steam

i pi [Bar] Ti [◦ C]
1 1 99.1
2 2 119.6
3 3 132.9
4 4 142.9
5 5 151.1
6 6 158.1
7 7 164.2
8 8 169.6
9 9 174.5
10 10 179.0

Using the results from Example 9.1 we form the linear regression matrix equa-
tion

Y = XB, (224)

or equivalently Y = Φθ.
Using eq. (220) then we may form the known data matrices Y and X as

37
follows
 
0

 82.90 


y1
 
T1 ln(p1 )

 146.01 

..   ..   198.13 
.   .

  
243.19

  
Y = yi  =  Ti ln(pi )  =  , (225)
  
 ..   ..   283.24 
.   .   
319.52
  
 
y10 T10 ln(p10 )  352.67 
 
 383.42 
412.16

 
0 99.10 1

 −0.69 119.60 1 


ϕT1

− ln(p1 ) T1

1
 
 −1.10 132.90 1 

 ..   .. .. ..   −1.39 142.92 1 
 .   . . .   
 T     −1.61 151.10 1 
X = Φ =  ϕi  =  − ln(pi ) Ti 1 =  . (226)
 .   .. .. ..   −1.79 158.08 1 
 ..   . . .   
 −1.95 164.20 1 
ϕT10 − ln(p10 ) T10
 
1  −2.08 169.60 1 
 
 −2.20 174.50 1 
−2.30 179.00 1

Then we find the ordinary least squares estimate of the parameter vector θ (or
equivalently the vector of regressor matrix B) as follows
   
c 226.37
BOLS = θ̂ = (X T X)−1 X T Y =  a  =  11.68  . (227)
ac − b −1157.23

Finally we find the parameters in the Antoine’s equation a = 11.68, c = 226.37

and b = ac + 1157.23 = 3800.85.

Soderstrom T., Stoica P. System Identification (PH 1989) (ISBN S
100% (6)
Soderstrom T., Stoica P. System Identification (PH 1989) (ISBN S
637 pages
1989, Söderström, T. and Stoica, P. - System Identification PDF
100% (1)
1989, Söderström, T. and Stoica, P. - System Identification PDF
646 pages
Ljung L System Identification Theory For User
No ratings yet
Ljung L System Identification Theory For User
255 pages
Ljung L System Identification Theory For User
No ratings yet
Ljung L System Identification Theory For User
255 pages
Pem1 PDF
No ratings yet
Pem1 PDF
53 pages
SI2018
No ratings yet
SI2018
32 pages
System Id
No ratings yet
System Id
3 pages
5 Identification Using Prediction Error Methods: Nap NC, Model Will Be Addressed. in
No ratings yet
5 Identification Using Prediction Error Methods: Nap NC, Model Will Be Addressed. in
24 pages
System Identification
100% (2)
System Identification
646 pages
Msi PDF
No ratings yet
Msi PDF
127 pages
Basic Models
No ratings yet
Basic Models
5 pages
Selecting A Model and Validation
100% (1)
Selecting A Model and Validation
49 pages
1 s2.0 S1474667016377849 Main
No ratings yet
1 s2.0 S1474667016377849 Main
6 pages
Basic Concept System Identification Methods Procedures in System Identification Examples
No ratings yet
Basic Concept System Identification Methods Procedures in System Identification Examples
18 pages
An Innovations Approach To Fault Detection and Diagnosis in Dynamic Systems
No ratings yet
An Innovations Approach To Fault Detection and Diagnosis in Dynamic Systems
4 pages
Subspace System Identification Theory and Applications: Lecture Notes
No ratings yet
Subspace System Identification Theory and Applications: Lecture Notes
282 pages
System Identification Lecture Notes
No ratings yet
System Identification Lecture Notes
24 pages
Lecture Notes - Kristiaan Pelckmans
100% (1)
Lecture Notes - Kristiaan Pelckmans
153 pages
Lecture Notes 2013
No ratings yet
Lecture Notes 2013
231 pages
System Identification: - A Criterion, or Verification Function
No ratings yet
System Identification: - A Criterion, or Verification Function
14 pages
DSR E Algorithm Approximated by ARX
No ratings yet
DSR E Algorithm Approximated by ARX
10 pages
Sta 2010
No ratings yet
Sta 2010
9 pages
Residual Generation For Diagnosis of Additive Faults in Linear Systems
No ratings yet
Residual Generation For Diagnosis of Additive Faults in Linear Systems
29 pages
Clases Identificación
No ratings yet
Clases Identificación
42 pages
L07 Identification
No ratings yet
L07 Identification
41 pages
Linear Fitting with Noisy Data
No ratings yet
Linear Fitting with Noisy Data
38 pages
Unit 3 - Estimation And Prediction: θ 1 2 n 1 2 n 1 1 2 2 n n
No ratings yet
Unit 3 - Estimation And Prediction: θ 1 2 n 1 2 n 1 1 2 2 n n
14 pages
System Identification: Theory For The User
No ratings yet
System Identification: Theory For The User
7 pages
Parameter System
No ratings yet
Parameter System
103 pages
GOOD Notes For System Identification and Parameter Estimation
100% (1)
GOOD Notes For System Identification and Parameter Estimation
103 pages
System Identification of A DC Servo Motor Using ARX and ARMAX Models
No ratings yet
System Identification of A DC Servo Motor Using ARX and ARMAX Models
4 pages
Exercises Predictors
No ratings yet
Exercises Predictors
14 pages
Machine Learning Matrix Methods
No ratings yet
Machine Learning Matrix Methods
25 pages
What Is System Identification: Different Approaches To System Identification Depending On Model Class
No ratings yet
What Is System Identification: Different Approaches To System Identification Depending On Model Class
48 pages
Lec 9
No ratings yet
Lec 9
14 pages
Optimal Filter
No ratings yet
Optimal Filter
26 pages
Nonlinear System Parameter Estimation
No ratings yet
Nonlinear System Parameter Estimation
13 pages
MDFA Legacy
No ratings yet
MDFA Legacy
278 pages
TVTIx Chapter4 EACT631 AdaptiveControl1
No ratings yet
TVTIx Chapter4 EACT631 AdaptiveControl1
24 pages
Dimensionality Reduction & Model Evaluation
No ratings yet
Dimensionality Reduction & Model Evaluation
80 pages
Least Squares & Vector Calculus
No ratings yet
Least Squares & Vector Calculus
4 pages
Rkhussain Sir - 1
No ratings yet
Rkhussain Sir - 1
4 pages
Parameterized Expectations Algorithm Guide
No ratings yet
Parameterized Expectations Algorithm Guide
33 pages
6.linear Prediction
No ratings yet
6.linear Prediction
4 pages
A. H. Nuttal and G. C. Carter, A Generalized Framework For Power Spectral Estimation, Appendices
No ratings yet
A. H. Nuttal and G. C. Carter, A Generalized Framework For Power Spectral Estimation, Appendices
37 pages
Identification and Estimation
No ratings yet
Identification and Estimation
37 pages
System Identi Cation Data-Driven Modelling of Dynamic Systems - Paul M.J. Van Den Hof
No ratings yet
System Identi Cation Data-Driven Modelling of Dynamic Systems - Paul M.J. Van Den Hof
305 pages
G129-Matlabi Ir
No ratings yet
G129-Matlabi Ir
6 pages
Id PDF
No ratings yet
Id PDF
118 pages
RGA LMA NLMA NGA Algorithms
No ratings yet
RGA LMA NLMA NGA Algorithms
13 pages
Mimo State-Space Model Identification Using Step Responses
No ratings yet
Mimo State-Space Model Identification Using Step Responses
8 pages
System Identification Theory For User
100% (1)
System Identification Theory For User
255 pages
2.160 Identification, Estimation, and Learning Lecture Notes No. 1
No ratings yet
2.160 Identification, Estimation, and Learning Lecture Notes No. 1
7 pages
Lecture 17 Least Squares, State Estimation
No ratings yet
Lecture 17 Least Squares, State Estimation
29 pages
GSVD and Its Applications in Model Analy
No ratings yet
GSVD and Its Applications in Model Analy
15 pages
PDPID Controller Tuning Based On Model
No ratings yet
PDPID Controller Tuning Based On Model
13 pages
Model Predictive Control and Optimization
100% (1)
Model Predictive Control and Optimization
207 pages
Xu Shang (2019) Willems' Fundamental Lemma For Nonlinear Systems With Koopman Linear Embedding
No ratings yet
Xu Shang (2019) Willems' Fundamental Lemma For Nonlinear Systems With Koopman Linear Embedding
9 pages
State Derivative Feedback by LQR
No ratings yet
State Derivative Feedback by LQR
6 pages
2017 - On The Sample Complexity of The Linear Quadratic Regulator
No ratings yet
2017 - On The Sample Complexity of The Linear Quadratic Regulator
43 pages
Model-Free RL for Linear Quadratic Control
No ratings yet
Model-Free RL for Linear Quadratic Control
16 pages
2017-Non-Asymptotic Analysis of Robust Control From Coarse-Grained Identification
No ratings yet
2017-Non-Asymptotic Analysis of Robust Control From Coarse-Grained Identification
28 pages
System Identification Technique For Control of Hybrid Bio-System
No ratings yet
System Identification Technique For Control of Hybrid Bio-System
7 pages
Machine Learning Based System Identification With
No ratings yet
Machine Learning Based System Identification With
9 pages
Number Systems Quiz
No ratings yet
Number Systems Quiz
3 pages
Template B.Eng Evaluations
No ratings yet
Template B.Eng Evaluations
1 page
Tutorial Sheet Digital
No ratings yet
Tutorial Sheet Digital
3 pages
Chapter 12-14 Study Questions
No ratings yet
Chapter 12-14 Study Questions
2 pages
Chapter 15: Helical, Bevel and Worm Gears: The Main Object of Science Is The Freedom and Happiness of Man
No ratings yet
Chapter 15: Helical, Bevel and Worm Gears: The Main Object of Science Is The Freedom and Happiness of Man
32 pages
Grade 2 Class Prog
No ratings yet
Grade 2 Class Prog
1 page
C Programming Learn To Code 1st Edition Sisir Kumar Jena Download
No ratings yet
C Programming Learn To Code 1st Edition Sisir Kumar Jena Download
91 pages
CH-1, Work Sheet
No ratings yet
CH-1, Work Sheet
2 pages
Topology Optimization for Engineers
No ratings yet
Topology Optimization for Engineers
14 pages
Ibps RRB Officer Scale 1 Previous Year Paper 2013 Based On Old Pattern (Go Through The Questions For Practice)
No ratings yet
Ibps RRB Officer Scale 1 Previous Year Paper 2013 Based On Old Pattern (Go Through The Questions For Practice)
7 pages
Design & Analysis of Algorithms (DAA) Unit - II
No ratings yet
Design & Analysis of Algorithms (DAA) Unit - II
24 pages
6 Math
No ratings yet
6 Math
184 pages
Physics Midterm Review Packet
No ratings yet
Physics Midterm Review Packet
6 pages
List of Syllabus of All Subjects in IISc
100% (1)
List of Syllabus of All Subjects in IISc
225 pages
Arhaan Math - Merged
No ratings yet
Arhaan Math - Merged
11 pages
7com1078 Cap Mock 2021
No ratings yet
7com1078 Cap Mock 2021
2 pages
Worksheet - 1 Tangent - Normal
No ratings yet
Worksheet - 1 Tangent - Normal
11 pages
CFD Analysis of Manifold
No ratings yet
CFD Analysis of Manifold
27 pages
Vip No.5 - Mesl
No ratings yet
Vip No.5 - Mesl
4 pages
Me6301 Engineering Thermodynamics Nov Dec 2011
No ratings yet
Me6301 Engineering Thermodynamics Nov Dec 2011
3 pages
DC 21EC51 Module 5 Notes
No ratings yet
DC 21EC51 Module 5 Notes
103 pages
PID Controller
No ratings yet
PID Controller
5 pages
Ce 1252 - Strength of Materials: Two Mark Question & Answers
No ratings yet
Ce 1252 - Strength of Materials: Two Mark Question & Answers
21 pages
Using Basketball To Understand Options
No ratings yet
Using Basketball To Understand Options
3 pages
Vehicle Dynamics Exam Guide
No ratings yet
Vehicle Dynamics Exam Guide
18 pages
BCS2213 - Async Interface
No ratings yet
BCS2213 - Async Interface
21 pages
Aai 50 Days
No ratings yet
Aai 50 Days
1 page
DSAT Practice Test SAT #1
No ratings yet
DSAT Practice Test SAT #1
49 pages
Final Push Trig & Stats
No ratings yet
Final Push Trig & Stats
24 pages
Real Numbers - Class X
No ratings yet
Real Numbers - Class X
8 pages
Barnouw - Vico and The Continuity of Science
No ratings yet
Barnouw - Vico and The Continuity of Science
13 pages
Tutorial sheet-1-MA1003E
No ratings yet
Tutorial sheet-1-MA1003E
2 pages

Rediction Error Methods

Uploaded by

Rediction Error Methods

Uploaded by

PREDICTION ERROR METHODS

Revised: March 22, 2024

2 Overview of the prediction error methods 2

3 Input and output model structures 15

4 Optimal one-step-ahead predictions 21

5 Optimal M-step-ahead prediction 22

7 Recursive ordinary least squares method 27

8 Higher order ARX modeling 34

9 Examples on using the Ordinary Least Squares method 36

2 Overview of the prediction error methods

x̄k+1 = Ax̄k + Buk + Kek , (1)

x̄k+1 = (A − KD)x̄k + (B − KE)uk + Kyk , (3)

VN (θ) = tr(R (θ)) = λ1 + . . . + λm , (12)

VN (θ) = det(R (θ)) = λ1 λ2 . . . λm . (13)

VN (θ) = λmax (R (θ)), (14)

where λmax (·) denotes the maximum eigenvalue of a symmetric matrix.

The minimizing parameter vector is defined by

θ̂N = arg min VN (θ) (16)

θi+1 = θi − αHi−1 (θi )gi (θi ) (17)

Using this and that g(θ̂) = 0 gives

Note that we in (17) has used the shorthand notation

where θi and θj are parameter number i and j, respectively, in the parameter

2.1 Further remarks on the PEM

θ̂N = arg min tr(ΛR (θ)), (32)

θ̂N = arg min det(R (θ)). (33)

0 = g(θ̂N ) ≈ g(θ0 ) + H(θ0 )(θ̂N − θ0 ). (34)

where the Hessian matrix

is a deterministic (constant) matrix. However, the gradient, g(θ0 ), is a random

θ̂N − θ0 = −H −1 (θ0 )g(θ0 ). (36)

E(θ̂N − θ0 ) = −H −1 (θ0 )E(g(θ0 )) = 0. (37)

This shows consistency of the parameter estimate, i.e.

P = E((θ̂N − θ0 )(θ̂N − θ0 )T ) = H −1 (θ0 )E(g(θ0 )g(θ0 )T )H −1 (θ0 ) (39)

is minimized. The covariance matrix P may be estimated from the data by

P = E((θ̂N − θ0 )(θ̂N − θ0 )T ) = ∆(E(ψk (θ0 )ψkT (θ0 )))−1 , (40)

2.2 Derivatives of the prediction error criterion

∂`(k (θ)) ∂k ∂`(k (θ)) ∂ ȳk (θ)

2.3.1 Linear regression models

where yk ∈ Rm , E ∈ Rm×r , uk ∈ Rr and ek ∈ Rm is white with covariance matrix

ϕTk = uTk ⊗ Im ∈ Rm×rm (48)

and the true parameters in the system, θ0 , is related to those in E as

θ = vec(E) ∈ Rmr . (49)

Note that the number of parameters in this case is p = mr. Furthermore,

2.3.2 The least squares method

where yk ∈ Rm , ϕk ∈ Rp×m , ek ∈ Rm is white with covariance matrix ∆ = E(ek eTk )

ȳk (θ) = ϕTk θ. (51)

Putting g(θ) = 0 gives the optimal solution in Eq. (55).

where Y ∈ RmN , Φ ∈ RmN ×p , θ0 ∈ Rp , p = mr and e ∈ RmN and given by

Furthermore, e, is zero mean with covariance matrix ∆ ˜ = E(eeT ). This covariance

where Λ̃ is a block diagonal matrix with Λ on the block diagonals, i.e.,

∂VN (θ) ∂ ∂VN (θ) 1

The Ordinary Least squares (OLS) estimate is obtained by choosing Λ = Im , i.e.,

where Y ∈ RN ×m is a matrix of dependent variables (outputs), X ∈ RN ×r is a

BOLS = (X T X)−1 X T Y, (74)

2.3.5 Ridge regularization method

BOLS Ridge = (X T X + δI)−1 X T Y, (75)

fore some nonzero scalar parameter δ > 0.

Example 2.1 (Regularization and system identification) Given a simple

xk+1 = axk + buk + Kek , (76)

where we fix the true parameters as a = 0.9 and b = 0.5.

3 Input and output model structures

3.1 ARMAX model structure

A(q)yk = B(q)uk + C(q)ek . (81)

A(q) = 1 + a1 q −1 + . . . + ana q −na , (82)

x1k+1 = x2k + b0 uk + c0 vk , (95)

x1k+2 − b0 uk+1 − c0 vk+1 = −a1 x1k − a0 (x1k+1 − b0 uk − c0 vk ) + b1 uk + c1 vk . (98)

yk+2 − euk+2 − f vk+2 − b0 uk+1 − c0 vk+1 = −a1 (yk − euk − f vk )

This gives the input and output equation

Hence, we can write (100) as an ARMAX polynomial model

A(q)yk = B(q)uk + C(q)vk , (101)

A(q)yk = B(q)uk + ek (105)

c1 = −a1 f, c0 = −a0 f, ek = f vk (106)

This gives the following ARX model

A(q)yk = B(q)uk + A(q)ek (110)

or equivalently for single output plants

Hence, the following OE model

have the following state space model equivalent

VN (θ) = tr(R (θ)) = λ1 + . . . + λm , (12)

VN (θ) = det(R (θ)) = λ1 λ2 . . . λm . (13)

VN (θ) = λmax (R (θ)), (14)

θ̂N = arg min tr(ΛR (θ)), (32)

θ̂N = arg min det(R (θ)). (33)

∂`(k (θ)) ∂k ∂`(k (θ)) ∂ ȳk (θ)

∂VN (θ) ∂ ∂VN (θ) 1

K=PPhiLam; % Kalman gain, K_t.

th=th+K(y(t)-Phi’th); % Parameter estimates at time, t.