Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
67 views12 pages

EMORF/S: EM-Based Outlier-Robust Filtering and Smoothing With Correlated Measurement Noise

Uploaded by

JaspreetSingh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views12 pages

EMORF/S: EM-Based Outlier-Robust Filtering and Smoothing With Correlated Measurement Noise

Uploaded by

JaspreetSingh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

1

EMORF/S: EM-Based Outlier-Robust Filtering and


Smoothing With Correlated Measurement Noise
Aamir Hussain Chughtai, Muhammad Tahir, Senior Member, IEEE, and Momin Uppal, Senior Member, IEEE

Abstract—In this article, we consider the problem of outlier- Particle Filters (PFs) [14], ensemble Kalman filter (EnKF) [15]
robust state estimation where the measurement noise can be etc.
correlated. Outliers in data arise due to many reasons like Smoothing, on the other hand, refers to offline state esti-
sensor malfunctioning, environmental behaviors, communication
glitches, etc. Moreover, noise correlation emerges in several real- mation where the primary concern is not to work on a per-
arXiv:2307.02163v1 [eess.SP] 5 Jul 2023

world applications e.g. sensor networks, radar data, GPS-based sample basis. We are rather interested in state inference consid-
systems, etc. We consider these effects in system modeling which ering the entire batch of measurements. Different options for
is subsequently used for inference. We employ the Expectation- smoothing exist including the famous Rauch–Tung–Striebel
Maximization (EM) framework to derive both outlier-resilient (RTS), two-filter smoothers [16], [17] etc.
filtering and smoothing methods, suitable for online and offline
estimation respectively. The standard Gaussian filtering and The standard state estimators are devised with the assump-
the Gaussian Rauch–Tung–Striebel (RTS) smoothing results are tion that the dynamical system under consideration is perfectly
leveraged to devise the estimators. In addition, Bayesian Cramer- modeled. The estimators assume the availability of system
Rao Bounds (BCRBs) for a filter and a smoother which can and observation mathematical models including the process
perfectly detect and reject outliers are presented. These serve as and measurement noise statistics. However, any modeling
useful theoretical benchmarks to gauge the error performance
of different estimators. Lastly, different numerical experiments, mismatch can result in deteriorated performance even possibly
for an illustrative target tracking application, are carried out crippling the functionality of the regular estimators completely.
that indicate performance gains compared to similarly engineered In this work, we are interested in coping with the modeling
state-of-the-art outlier-rejecting state estimators. The advantages discrepancy and the associated estimation degradation that
are in terms of simpler implementation, enhanced estimation results from the occurrence of outliers in the measurements.
quality, and competitive computational performance.
Data outliers can arise due to several factors including data
Index Terms—State-Space Models, Approximate Bayesian In- communication problems, environmental variations, and ef-
ference, Nonlinear Filtering and Smoothing, Outliers, Kalman fects, data preprocessing front-end malfunctioning, inherent
Filters, Variational Inference, Expectation-Maximization, Robust
Estimation, Statistical Learning, Stochastic Dynamical Systems. sensor defects, and degradation, etc [9]. We keep our con-
sideration generic by taking into account the possibility of
correlated measurement noise with a fully enumerated nominal
I. I NTRODUCTION noise covariance matrix. This is in contrast to the existing
approaches where noise in each data dimension is assumed
TATE estimation is a key fundamental task in analyzing
S different dynamical systems with subsequent decision-
making and control actions arising in a variety of fields
to be independent targeting a specific class of applications
[18]. However, this leaves out several important application
scenarios where noise correlation exists which should be taken
including cybernetics, robotics, power systems, sensor fusion, into account. For example, due to double differencing of
positioning, and target tracking [1]–[10] etc. The states de- the original measurements in Real Time Kinematic (RTK)
scribing the system dynamics can evolve intricately. Moreover, systems, noise correlation appears [19]. Likewise, a significant
these are not directly observable only manifesting themselves negative correlation exists between the range and range rate
in the form of external measurements. Mathematically, it measurement noise in radar data [20]. Similarly, due to the
means that for state estimation in general, the inference is use of a common reference sensor to extract the time differ-
performed considering stochastic nonlinear equations making ence of arrival (TDOA), correlated range measurement noise
it a nontrivial task. arises [21]. Besides, in different sensor networks, correlated
Filtering is the common term used for online state estima- observation noise also emerges [22]–[24].
tion where inference is carried out at each arriving sample. The problem of neutralizing data outliers during state es-
Kalman filter with its linear and nonlinear versions [11]–[13] timation has been approached with various proposals. The
are considered the primary choices for filtering given their traditional way of dealing with outliers is based on assuming
ease of implementation and estimation performance. Other fixed statistics for measurement noise or the residuals between
options for nonlinear filtering are also available including predicted and actual measurements. For example, different
methods based on Monte-Carlo (MC) approximations e.g. methods resort to describing observation noise using heavy-
tailed distributions like the Student-t and Laplace densities
The authors are with Department of Electrical Engineering, Lahore [25], [26]. Similarly, the theory of robust statistics suggests
University of Management Sciences, DHA Lahore Cantt., 54792, La-
hore Pakistan. (email: [email protected]; [email protected]; the use of prior models for residuals to downweigh the effect
[email protected]) of outliers during inference [27]–[29]. Moreover, some tech-
2

0 0
0 0

(a) Static loss functions in traditional methods (b) Adaptive loss functions in learning-based methods
Fig. 1: Typical loss functions for outlier-robust state estimators. In traditional approaches, the loss function is static. In learning-
based methods, the loss function adapts e.g. between a quadratic function and a constant to weight the data during inference.

niques are based on rejecting the data sample by comparing [35]. But the authors only consider linear systems and test
the normalized measurement residuals with some predefined the method with diagonal measurement noise covariance ma-
thresholds [30], [31]. trices. With the possibility of correlated measurements, the
Literature survey indicates that performances of the con- Variational Bayes Kalman Filter (VBKF) has been devised
ventional approaches are sensitive to the tuning of the design [19] by extending the work in [33]. However, we observe that
parameters which affect the static residual error loss func- VBKF assumes a complex hierarchical model. As a result,
tions during estimation [32]. Therefore, tuning-free learning- along with updating the state densities, it involves updating
based techniques have been justified in prior works that the nuisance parametric distributions and their hierarchical
make the error loss function adaptive [18], [19], [32]–[35]. distributions during the VB updates. This includes evaluation
These approaches consider appropriate distributions for the of the digamma function to find the expectation of logarithmic
measurement noise and subsequently learn the parameters expressions [19], [33]. Therefore, implementing VBKF can
describing the distributions and consequently the loss functions get complicated e.g. within an embedded computing device
during state estimation. Fig. 1 depicts the comparison of where access to such functions is not inherently available and
typical static loss functions in traditional approaches and additional libraries are required. Moreover, extending VBKF
dynamic loss functions in learning-based methods considering to outlier-robust smoothing also gets cumbersome. This calls
a uni-dimensional model for visualization (see Section III-D for simpler state estimation approaches, for systems with
[32] for more details). Resultingly, learning-based robust state correlated noise, that can weather the effect of outliers.
estimators offer more advantages by reducing user input, being With this background, considering the possibility of corre-
more general, and suiting better for one-shot scenarios. lated measurements in nonlinear dynamical systems, we make
Several learning-based methods for robust state estimation the following contributions in this work.
have been reported in the relevant literature. As exact inference • Using a suitable model and VB (more specifically
is not viable for developing these approaches, approximate Expectation-Maximization (EM)) we devise an outlier-
inference techniques like PFs and variational Bayesian (VB) robust filter availing the standard Gaussian filtering re-
methods can be used in their design. Since PFs can be compu- sults. The results are further utilized in deriving an
tationally prohibitive, VB-based techniques are the appealing outlier-robust smoother based on the standard Gaussian
alternative considering these can leverage the existing standard RTS smoothing. Since our proposed method is inspired
filtering and smoothing results. Our focus in this work remains by our prior work which considers independent measure-
on the learning-based outlier mitigation approaches designed ment noise [18], we also present insightful connections.
using VB. • We derive Bayesian Cramer-Rao Bounds (BCRBs) for

In previous works, we observe that various outlier-robust a filter and a smoother which can perfectly detect and
state estimators, devised using VB, treat the entire measure- reject outliers. This provides a useful benchmark to
ment vector collectively during estimation owing to under- assess the estimation ability of different outlier-mitigating
parameterized modeling [32]–[34]. Instead of treating each di- estimators.
mension individually, the complete vector is either considered • We evaluate the performance of the devised estimators as

or downweighted by varying the noise covariance matrix by a compared to the other similarly devised outlier-discarding
scalar multiplicative factor. This leaves room for improvement methods. Different scenarios of a relevant TDOA-based
considering useful information is unnecessarily lost during target tracking application are considered in numerical
inference. In this regard, we offer a vectorial parameterization experiments indicating the merits of the proposed meth-
to treat each dimension individually in [18]. Therein we also ods.
suggest a way to make the estimators in [32]–[34] selective. The rest of the article is organized as follows. Section II
However, these proposals are based on the assumption of provides the modeling details. In Section III, we present the
independent noise for each measurement dimension. Another derivation of the proposed filter. Thereafter, the derivation of
learning-based outlier-resilient filter has been presented in the proposed smoother is given in Section IV. In Section
3

V, BCRBs for a filter and a smoother with perfect outlier


detecting and rejecting capabilities are provided. Subsequently, Standard SSM
the performance evaluation results have been discussed in
Section VI. The paper ends with a conclusive commentary
in Section VII.
Notation: As a general notation in this work, r⊤ is the
transpose of the vector r, ri denotes the ith element of a
vector r; ri− is the vector r with its ith element removed;
the subscript k is used for time index; rk is the vector r
at time instant k; rk− is the group of vectors r considering
the entire time horizon except the time instant k; Ri,j is
the element of the matrix R present at the ith row and jth
column; R−1 is the inverse of R; |R| is the determinant of
R; R is the swapped form of R where the swapping operation
is defined in a particular context; δ(.) represents the delta
function; ⟨.⟩q(ψk ) denotes the expectation of the argument with
Proposed SSM
respect to the distribution q(ψk ); tr(.) is the trace operator; a
mod b denotes the remainder of a/b; the superscripts − and
+ are used for the predicted and updated filtering parameters Fig. 2: Probabilistic graphical model for the proposed method
respectively; the superscript s is used for the parameters of
the marginal smoothing densities. Other symbols are defined
in their first usage context. element Iki can assume two possible values: ϵ (close to zero)
and 1. Iki = ϵ denotes the presence of an outlier, whereas
II. S TATE - SPACE MODELING Iki = 1 indicates no outlier in the ith dimension at time
k. Since outliers can occur independently at any instant, we
A. Standard modeling
assume that the elements of I k are statistically independent
Consider a standard nonlinear discrete-time state-space of their past. Additionally, we assume that the entries of I k to
model (SSM) to represent the dynamics of a physical system be independent of each other since generally no knowledge of
given as correlations between outliers is available which are not easy
xk = f (xk−1 ) + qk−1 (1) to model anyway. Moreover, this choice is motivated by the
goal of inferential tractability. We also consider I k and xk to
yk = h(xk ) + rk (2) be statistically independent since the outlier occurrence does
where xk ∈ Rn and yk ∈ Rm denote the state and measure- not depend on the state value. The assumed distribution of I k
ment vectors respectively; the nonlinear functions f(.) : Rn → is given as
Rn and h(.) : Rn → Rm represent the process dynamics m
Y m
Y
and observation transformations respectively; qk ∈ Rn and p(I k ) = p(Iki ) = (1 − θki )δ(Iki − ϵ) + θki δ(Iki − 1) (3)
rk ∈ Rm account for the additive nominal process and i=1 i=1
measurement noise respectively. qk and rk are assumed to where θki denotes the prior probability of no outlier in the ith
be statistically independent, White, and normally distributed observation at time k. Further, the conditional measurement
with zero mean and known covariance matrices Qk and Rk likelihood given the current state xk and the indicator I k , is
respectively. We consider that Rk can be a fully enumerated proposed to be normally distributed as
matrix capturing the correlations between the measurement 
noise entries. p(yk |xk , I k ) = N yk |h(xk ), Rk (I k )
1 n
1 ⊤
=p exp − (yk − h(xk )) R−1 k (I k )
m
(2π) |Rk (I k )| 2
B. Modeling outliers for inference o
The model in (1)-(2) assumes that the measurements are (yk − h(xk )) (4)
only affected by nominal measurement noise rk . However, the
where Rk (I k )
observations in every dimension can be corrupted with outliers
Rk1,1 /Ik1 . . . Rk1,m δ(Ik1 −1)δ(Ikm −1)
 
leading to the disruption of standard state estimators as the
measurement data cannot be described by the regular model.  .. .. .. 
 . . . 
Therefore, data outliers need to be appropriately modeled m,1 m 1 m,m m
Rk δ(Ik −1)δ(Ik −1) · · · Rk /Ik
within the generative SSM with two basic objectives. Firstly,
(5)
the model should sufficiently capture the effect of outlier
contamination in the data. Secondly, the model should remain Rk (I k ) is the modified covariance matrix of the measure-
amenable to inference. ments considering the effect of outliers. The effect of Rk (I k )
To model the outliers in SSM, we consider an indicator on the data generation process can be understood by consid-
vector I k ∈ Rm having Bernoulli elements where its ith ering the possible values of Iki . In particular, Iki = ϵ leads
4

to a very large ith diagonal entry of Rk (I k ), while placing For our model, (8) becomes computationally unfriendly if
zeros at the remaining ith row and column of the matrix. we use the standard VB approach. In fact, the same complexity
Resultingly, when an outlier occurs in the ith dimension its order of O(m3 2m ) appears as with the basic marginalization
effect on state estimation is minimized. Moreover, the ith approach making this approach intractable too. We elaborate
dimension no longer has any correlation with any other entry, more on it in the upcoming subsection.
ceasing to have any effect on any other dimension during
inference. This is in contrast to Iki = 1 which ensures the
diagonal element and the off-diagonal correlation entries with Expectation-Maximization as a particular case of variational
other non-affected dimensions are preserved. Lastly, note that Bayes
the conditional likelihood is independent of the batch of all
To deal with the complexity issue, instead of considering
the historical observations y1:k−1 . Fig. 2 shows how the
distributions we can resort to point estimates for Iki . In
standard probabilistic graphical model (PGM) is modified into
particular, consider q f (Iki ) = δ(Iki − Îki ) where Îki denotes
the proposed PGM for devising outlier-robust state estimators.
the point approximation of Iki . Consequently, the variational
The suggested PGM meets the modeling aims of describing
distributions can be updated in an alternating manner in
the nominal and corrupted data sufficiently while remaining
the Expectation (E) and Maximization (M) steps in the EM
docile for statistical inference.
algorithm given as [37]

III. P ROPOSED ROBUST F ILTERING E-Step:


In filtering, we are interested in the posterior distribution q f (xk ) = p(xk |y1:k , Î k ) ∝ p(xk , Î k |y1:k ) (10)
of xk conditioned on all the observations y1:k that have been
observed till time k. For this objective, we can employ the M-Step:
i−
Îki = argmax ln(p(xk , Iki , Î k |y1:k ) qf (x


Bayes rule recursively. Given the proposed observation model, (11)
k)
the analytical expression of the joint posterior distribution of Iki
xk and I k conditioned on the set of all the observations y1:k
is given as where all Îki in the M-Step are successively updated using the
latest estimates.
p(yk |I k , xk )p(xk |y1:k−1 )p(I k )
p(xk , I k |y1:k ) = (6)
p(yk |y1:k−1 )
Theoretically, the joint posterior can further be marginalized A. Prediction
to obtain the required posterior distribution p(xk |y1:k ). As- For filtering, we first obtain the predictive distribu-
suming p(xk |y1:k−1 ) as a Gaussian distribution, we need to tion p(xk |y1:k−1 ) using the posterior distribution at the
run 2m Kalman filters corresponding to each combination previous instant p(xk−1 |y1:k−1 ) approximated as Gaus-
of I k to obtain the posterior. This results in computational sian q f (xk−1 ) ≈ N xk−1 |m+ , P +

. Using Gaussian
k−1 k−1
complexity of around O(2m m3 ) where m3 appears due to (Kalman) filtering results we make the following approxima-
matrix inversions (ignoring sparsity). Moreover, we need to tion [17]
resort to a debatable approximation of the resulting Gaussian
p(xk |y1:k−1 ) ≈ N xk |m− −

mixture distribution for p(xk |y1:k ) as single Gaussian den- k , Pk (12)
sity for recursive tractability. Therefore, this approach clearly
where
becomes impractical and unsuitable.
Z
To get around the problem, we can possibly employ the
m− f (xk−1 ) N xk−1 |m+ +

standard VB method where the product of VB marginals k = k−1 , Pk−1 dxk−1 (13)
is conveniently used to approximate the joint posterior. We
Z n
assume the following factorization of the posterior P−
k = (f (xk−1 ) − m− − ⊤
k )(f (xk−1 ) − mk )
Y o
p(xk , I k |y1:k ) ≈ q f (xk ) q f (Iki ) (7) N (xk−1 |m+ k−1 , P +
k−1 ) dxk−1 + Qk−1 (14)
i

The VB approximation aims to minimize the Kullback-Leibler


(KL) divergence between the product approximation and the B. Update
true posterior and leads to the following marginals [36] Using the expressions of the prior distributions from (3) and
q f (xk ) ∝ exp ln(p(xk , I k |y1:k )) qf (I )


(8) (12) along with the conditional measurement likelihood in (4),
k
we can express the joint posterior distribution as
f i


q (Ik ) ∝ exp ln(p(xk , I k |y1:k )) qf (x )qf (I i− ) ∀ i (9)
k k
N (xk |m− −
k , Pk ) 1
n

Using (8)-(9) alternately the VB marginals can be updated p (xk , I k |y1:k ) ∝ p exp − (yk − h (xk ))
(2π)m |Rk (I k )| 2
iteratively until convergence. The procedure provides a useful on Y
R−1 1 − θki

way to approximate the true marginals of the joint poste- k (I k ) (y k − h (xk ))
f i
Q f i these as p(xk |y1:k ) ≈ q (xk ) and
rior by approximating o
Iki θki δ Iki

p(I k |y1:k ) ≈ i q (Ik ). δ −ϵ + −1 (15)
5

1) Derivation of q f (xk ): With the E-Step in (10) we can Algorithm 1: The proposed filter: EMORF
write Initialize m+ +
0 , P0 ;
1
n
q f (xk ) ∝ exp − (yk − h (xk )) R−1
⊤ for k = 1, 2...K do
2 k (Î k ) (yk − h (xk ))
Initialize θki , Î k , Qk , Rk ;
1 ⊤ o
− xk − m− − −1 −

k (Pk ) x k − m k (16) Prediction:
2
Evaluate m− −
k , Pk with (13) and (14);
where R−1 k (Î k ) assumes a particular form resulting from the
Update:
inversion of Rk (Î k ) as described while not converged do

in Appendix A.
Note that we avoid evaluating R−1 Update m+ +

k (I )
k q f (I ) that would k and Pk with (17)-(18);
k
i
be required in the standard VB approach. This means that we Update Îk ∀ i with (22);
are able to evade the complexity level of around O(m3 2m ) end
since matrix inversion for each of the 2m combinations are end
required to evaluate the expectation. However, thanks to EM,
we are now working with R−1 k (Î k ) which can be evaluated
with the maximum complexity of O(m3 ) (considering a fully Resultingly, Îki can be determined as
populated matrix). (
To proceed further, we use the results of the general Gaus- i 1 if τ̂ki ≤ 0,
Îk = (22)
sian filter, to approximate q f (xk ) with a Gaussian distribution, 0 if τ̂ki > 0
+ +

N xk |mk , Pk , with parameters updated as
with

m+
k = mk + Kk (yk − µk ) (17) n  |R (I i = 1, Î i− )| 
k k
P+ − ⊤
τ̂ki = tr Wk △R̂−1 k

k = Pk − Ck Kk (18) k + ln i−
i
|Rk (Ik = ϵ, Î k )|
where 1 o
+ 2 ln i − 1 (23)
θk
Kk = Ck (Uk + Rk (Î k ))−1 = {Ck (R−1
k (Î k )
−R−1 (Î k )(I + Uk R−1 −1
Uk R−1 where
k (Î k )) k (Î k ))}
Z k i− i−
µk = h(xk ) N xk |m− −
 △R̂−1 −1 i −1 i
k = (Rk (Ik = 1, Î k ) − Rk (Ik = ϵ, Î k )) (24)
k , Pk dxk
Z Using the steps outlined in Appendix B, we can further
Uk = (h(xk ) − µk )(h(xk ) − µk )⊤ N (xk |m− −
k , Pk )dxk simplify τ̂ki as
Z
Ck = (xk − m− ⊤ − − R−i,i Ri,−i (R̂k−i,−i )−1
k )(h(xk ) − µk ) N (xk |mk , Pk )dxk
n
i −1 k k

τ̂k = tr Wk △R̂k + ln I −

Rki,i


2) Derivation of Îki : With the M-Step in (11) we can write 1 o
+ ln(ϵ) + 2 ln i − 1 (25)
i− θk
Îki = argmax ln(p(xk , Iki , Î k |y1:k ) qf (x


(19)
k)
Iki where R̂−i,−i
k is the submatrix of Rk (Î k ) corresponding to
i−
Using the Bayes rule we can proceed as entries of Î k . Ri,−i
k and R−i,i
k contain the measurement
covariances between ith and rest of the dimensions.
i− −1
Îki = argmax ln(p(yk |xk , Iki , Î k , y1:k−1 )) qf (x Though we can directly evaluate △R̂k in (24), we can


k)
Iki save computations by avoiding repetitive calculations. To this
i− end, we first need to compute
+ ln(p(Iki |xk , Î k , y1:k−1 )) + const. (20)  i,i
Ξi,−i

−1 Ξ
where const. is some constant and p(x|y, z) denotes the △R̂k = (26)
Ξ−i,i Ξ−i,−i
conditional independence of x and z given y. We can further
write with
n
1 i− 1 i− 1 ϵ
Îki = argmax − tr Wk R−1 i i Ξi,i = −

k (Ik , Î k ) − ln |Rk (Ik , Î k )|
(27)
Iki
2 2 Rki,i− Ri,−i
k (R̂−i,−i
k )−1 R−i,i
k Rki,i
o Rki,−i (R̂−i,−i )−1
+ ln (1 − θki )δ(Iki − ϵ) + θki δ(Iki − 1) (21) Ξi,−i = − i,i k
(28)
Rk − Ri,−ik (R̂−i,−i
k )−1 R−i,i
k
i−
where Rk (Iki , Î k ) denotes Rk (I k ) evaluated at I k with it (R̂−i,−i )−1 R−i,i
i− Ξ−i,i = − i,i k k
(29)
ith element as Iki and remaining entries Î k and Rk − Ri,−ik (R̂−i,−i
k )−1 R−i,i
k
Z (R̂−i,−i )−1 −i,i i,−i
R R ( R̂−i,−i −1
)

Wk = (yk − h (xk )) (yk − h (xk )) N (xk |m+ + Ξ−i,−i = k k k k
(30)
k , Pk )dxk i,i i,−i
Rk − Rk (R̂k −i,−i −1 −i,i
) Rk
6

−1
where the ith row/column entries in △R̂k have been conve- estimates for Iki ∀ i in [18] can be obtained with the following
niently swapped with the first row/column elements to obtain criterion
−1 −1
△R̂k . By swapping the first row/column entries of △R̂k to (
−1 i 1 if τ̄ki ≤ 0,
the ith row/column positions we can reclaim △R̂k . Appendix Îk = (34)
0 if τ̄ki > 0
C provides further details in this regard.
The resulting EM-based outlier-robust filter (EMORF) is To deduce Îki = 1, the following should hold
outlined as Algorithm 1. For the convergence criterion, we
suggest using the ratio of the L2 norm of the difference of the Ωik ≥ 1 − Ωik (35)
state estimates from the current and previous VB iterations Ωik ≥ 0.5 (36)
and the L2 norm of the estimate from the previous iteration.
This criterion has been commonly chosen in similar robust where Ωik denotes the posterior probability of Iki = 1. Using
filters [18], [33]. the expression of Ωik from [18] we can write (36) as
1
√ 1  i,i  ≥ 0.5 (37)
C. VB factorization of p(xk , I k |y1:k ) and the associated

W
1 + ϵ θi − 1 exp 2Rki,i (1 − ϵ)
computational overhead k k

Note that for better accuracy we can factorize leading to


p(xk , I k |y1:k ) ≈ q f (xk )q f (I k ) instead of forcing
τ̄ki ≤ 0 (38)
independence between all Iki . In this case, expression
for evaluating q f (xk ) remains same as in (8). However, this where
choice leads to the following VB marginal distribution of I k
Wki,i 1 
q f (I k ) ∝ exp ln(p(xk , I k |y1:k )) qf (x )


(31) τ̄ki = (1 − ϵ) + ln(ϵ) + 2 ln − 1 (39)
k Rki,i θki
This results in a modified M-Step in the EM algorithm given which can be recognized as the particular case of (25) given
as Ri,−i
k and R−i,i
k vanish as Rk is considered to be diagonal.
Wki,i 1−ϵ
M-Step: The first term in (25) reduces to i,i (1
Rk
− ϵ) given Ξi,i = i,i
Rk
and Ξi,−i , Ξ−i,i and Ξ−i,−i all reduce to zero. Moreover, the


Î k = argmax ln(p(xk , I k |y1:k )) qf (x (32)
k)
Ik
second term in (25) also disappears resulting in (39).
Proceeding further, with the prediction and update steps during
inference, we can arrive at
E. Choice of the parameters θki and ϵ
1  1
n
Î k = argmax − tr Wk R−1 k (I k ) − ln |Rk (I k )| For EMORF we propose setting the parameters θki and
Ik 2 2
o ϵ the same as in SORF. Specifically, we suggest choosing
+ ln (1 − θki )δ(Iki − ϵ) + θki δ(Iki − 1) (33) a neutral value of 0.5 or an uninformative prior for θki .
The Bayes-Laplace and the maximum entropy approaches for
It is not hard to notice that determining Î k using (33) involves obtaining uninformative prior for a finite-valued parameter
tedious calculations. In fact, we run into the same computa- lead to the choice of the uniform prior distribution [38], [39].
tional difficulty that we have been dodging. To arrive at the Moreover, the selection has been justified in the design of
result, we need to evaluate the inverses and determinants, for outlier-resistant filters assuming no prior information about the
each of the 2m combinations corresponding to the entries of outliers statistics is available [18], [32]. For ϵ we recommend
I k . This entails the bothering complexity level of O(m3 2m ). its value to be close to zero since the exact value of 0 denies
With the proposed factorization in (7), we obtain a more the VB/EM updates as in [18].
practical and scalable algorithm. The resulting complexity is
O(m4 ) following from the evaluation of matrix inverses and IV. P ROPOSED ROBUST S MOOTHING
determinants for calculating each of the Îki ∀ i = 1 · · · m in
(22). In smoothing, our interest lies in determining the posterior
distribution of all the states x1:K conditioned on the batch
of all the observations y1:K . With that goal, we take a sim-
D. Connection between EMORF and SORF [18] ilar approach to filtering and approximate the joint posterior
Since the construction of EMORF is motivated by selective distribution as a product of marginals
observations rejecting filter (SORF) [18], it is insightful to YY
p(x1:K , I 1:K |y1:K ) ≈ q s (x1:K ) q s (Iki ) (40)
discuss their connection. We derived SORF considering a
k i
diagonal measurement covariance matrix Rk [18]. In SORF,
we used distributional estimates for I k since it did not induce where the true marginals are approximated as p(x1:K |y1:K ) ≈
q s (x1:K ) and p(I 1:K |y1:K ) ≈ k i q s (Iki ). Let us assume
Q Q
any significant computational strain. It is instructive to remark
here that the if we use point estimates of I k in SORF it q s (Iki ) = δ(Iki −Ĭki ) where Ĭki denotes the point approximation
becomes a special case of EMORF. In particular, the point of Iki . Consequently, the EM steps are given as
7

E-Step: Algorithm 2: The proposed smoother: EMORS


q s (x1:K ) = p(x1:K |y1:K , Ĭ 1:K ) ∝ p(x1:K , Ĭ 1:K |y1:K ) Initialize m+ +
0 , P0 ;
(41) while not converged do
M-Step: for k = 1, 2...K do
i−
Ĭki = argmax ln(p(x1:K , Iki , Ĭ k , Ĭ k− |y1:K ) qs (x Initialize θki , Ĭ k , Qk , Rk ;


(42)
1:K ) −
Iki Evaluate m− k , P k with (45)-(46);
+ +
Evaluate mk , P k with (48)-(49);
where all Îki in the M-Step are sequentially updated using the end
latest estimates. msK = m+ K;
1) Derivation of q s (x1:K ): With the E-Step in (41) we can P sK = P + K ;
write for k = K − 1, ...1 do
q s (x1:K ) ∝ p(y1:K |x1:K , Ĭ 1:K )p(x1:K ) (43) Evaluate msk , P sk with (52)-(53);
Y end
q s (x1:K ) ∝ p(yk |xk , Ĭ k )p(xk |xk−1 ) (44) for k = 1, 2...K do
k
Update Ĭki ∀ i with (57);
We can identify that q s (x1:K ) can be approximated as a end
Gaussian distribution from the results of general Gaussian RTS end
smoothing [17]. Using the forward and backward passes, we
can determine the parameters of q s (xk ) ∼ N (msk , P sk ), which
denotes the marginalized densities of q s (x1:K ).
Using the Bayes rule we can proceed as
Forward pass: The forward pass essentially involves the
i−
filtering equations given as Ĭki = argmax ln(p(yk |xk , Iki , Ĭ k , Ĭ k− , yk− ))

Z Iki
− +
mk = f (xk−1 ) N xk−1 |m+

k−1 , P k−1 dxk−1 (45) i−
+ ln(p(x1:K , Iki , Ĭ k , Ĭ k− |yk− ))

Z n q s (x1:K )
+ const.
P− (f (xk−1 ) − m− − ⊤ (55)
k = k )(f (xk−1 ) − mk ) (46)
+
o which leads to
N (xk−1 |m+k−1 , P k−1 ) dxk−1 + Qk−1 (47) i−
Ĭki = argmax ln(p(yk |xk , Iki , Ĭ k )) qs (x


m+
k = mk + Kk (yk − νk ) (48) Iki
k)

P+
k = P−
k − C k K⊤
k (49) i−
+ ln(p(Iki |x1:K , Ĭ k , Î k− , yk− )) + const. (56)
where which is similar to (20) except that the expectation is taken
Kk = C k (U k + Rk (Ĭ k )) = {C k (R−1
−1 with respect to the marginal smoothing distribution q s (xk ).
k (Ĭ k )
Consequently, Ĭki can be determined as
−Rk (Ĭ k )(I + U k Rk (Ĭ k )) U k R−1
−1 −1 −1
k (Ĭ k ))} (
1 if τ̆ki ≤ 0,
Z

h(xk ) N xk |m−

i
νk = k , P k dxk Ĭk = (57)
Z 0 if τ̆ki > 0

Uk = (h(xk ) − νk )(h(xk ) − νk )⊤ N (xk |m−
k , P k )dxk where
Z
Ck = (xk − m− ⊤ − −
n R−i,i Ri,−i (R̆−i,−i )−1
k )(h(xk ) − νk ) N (xk |mk , P k )dxk
i −1 k k k

τ̆k = tr W k △R̆k + ln I −

Rki,i


Note that Rk (Ĭ k ) and R−1 k (Ĭ k ) can be evaluated similar to
1 o
+ ln(ϵ) + 2 ln i − 1 (58)
Rk (Î k ) and R−1
k (Î k ). θk
Backward pass: The backward pass can be completed as
Z with
+ − ⊤ + + i− i−

Lk+1 = (xk − mk )(f (xk ) − mk+1 ) N xk |mk , P k dxk △R̆−1 −1 i −1 i
k = (Rk (Ik = 1, Ĭ k ) − Rk (Ik = ϵ, Ĭ k )) (59)
(50) −i,−i
−1 which can be be calculated similar to △R̂−1 k in (24). R̆k

G k = Lk+1 P k+1 (51) denotes the submatrix of Rk (Ĭ k ) corresponding to entries of
s + s −
 i−
mk = mk + G k mk+1 − mk+1 (52) Ĭ k and

 ⊤
P sk = P + s
k + G k P k+1 − P k+1 G k (53)
Z

W k = (yk − h (xk )) (yk − h (xk )) N (xk |msk , P sk )dxk
i
2) Derivation of Ĭk : With the M-Step in (42) we can write
i−
The resulting EM-based outlier-robust smoother (EMORS)
Ĭki = argmax ln(p(x1:K , Iki , Ĭ k , Ĭ k− |y1:k ) qs (x ) (54) is outlined as Algorithm 2. We suggest using the same


1:K
Iki convergence criterion and parameters as for robust filtering.
8

V. P ERFORMANCE B OUNDS model in (1)-(2) that is infested with observation outliers we


It is useful to determine the performance bounds of outlier- can write
discarding state estimators considering correlated measure- ⊤ −1
ment noise. We evaluate the estimation bounds of filtering and D11
k = ⟨F̃ (xk )Qk F̃(xk )⟩p(xk ) (70)
smoothing approaches that are perfect outlier rejectors, having D12k = −⟨F̃ ⊤
(xk )⟩p(xk ) Q−1
k (71)
complete knowledge of the instances of outlier occurrences. D22
k (1) = Q−1
k (72)
In particular, we assume that the measurement covariance
D22
k (2) = ⟨H̃ ⊤
(xk+1 )R−1
k+1 (I k+1 )H̃(xk+1 )⟩p(xk+1 ) (73)
matrix is a function of perfectly known values of I k given
as Rk (I k ). In this case, Iki = 0 means rejection of the ith
where F̃(.) and H̃(.) are the Jacobians of the transformations
corrupted dimension, whereas Iki = 1 denotes inclusion of
f (.) and h(.) respectively.
the ith measurement. Resultingly, R−1 k (I k ) has zeros at the
diagonals, rows, and columns corresponding to dimensions
for which Iki = 0. Remaining submatrix of R−1 k (I k ) can
B. Smoothing
be evaluated as the inverse of submatrix of Rk considering
the dimensions with Iki = 1. Similarly, for the estimation error of xk during smoothing,
Note that we set Iki = ϵ, not exactly as 0, for outlier rejec- the BCRB matrix can be written as [42]
tion in the proposed state estimators as it declines inference.
However, for evaluating performance bounds this choice is BCRBsk ≜ (Jsk )−1 (74)
appropriate resulting in perfect outlier rejection. Also note
that during robust state estimation, we do not exactly know where JsK = J+
K . We can compute the associated smoothing
I k apriori and model it statistically for subsequent inference. FIM denoted as Jsk recursively as
The use of perfectly known I k for estimation bounds gives −1

us an idea of how well we can estimate the state if outliers Jsk = J+ 11 12 22 s
k + Dk − Dk Dk (1) + Jk+1 + Jk+1 D21
k
are somehow perfectly detected and rejected. (75)
We evaluate BCRBs for the perfect rejector for the model
in (1)-(2) that have been corrupted with measurement outliers
for both filtering and smoothing. VI. N UMERICAL E XPERIMENTS

To test the performance of the proposed outlier-resilient


A. Filtering state estimators, we carry out several numerical experiments.
For the estimation error of xk during filtering, the BCRB We use Matlab on a computer powered by an Intel i7-
matrix can be written as [40] 8550U processor. All the experiments were conducted while
considering SI units.
BCRBfk ≜ (J+
k)
−1
(49)
where the corresponding filtering Fisher information matrix
(FIM) denoted as J+
k can be evaluated recursively as
m Range Sensors

J− 22 21 + 11
−1 12 Target
k = Dk−1 (1) − Dk−1 Jk−1 + Dk−1 Dk−1 (60)

J+ 22
k = Jk + Dk−1 (2) (61)
350
where J+ x0
0 = ⟨−∆x0 ln p(x0 )⟩p(x0 ) and

∆Θ
Ψ = ∇Ψ ∇Θ (62)
 ⊤ 350
∂ ∂
∇Θ = ,..., (63)
∂Θ1 ∂Θr Fig. 3: Target tracking test example setup
D11
k = ⟨−∆xxkk ln p (xk+1 | xk )⟩p(xk+1 ,xk ) (64)
x For performance evaluation, we resort to a target tracking
D12
k = ⟨−∆xk+1
k ln p (xk+1 | xk )⟩p(xk+1 ,xk ) (65)
⊤ problem with TDOA-based range measurements inspired by
D21 xk 12
k = ⟨−∆xk+1 ln p (xk+1 | xk )⟩p(xk+1 ,xk ) = Dk [21]. Fig. 3 shows the setup of the considered example. Owing
(66) to the use of a common reference sensor to obtain the TDOA
D22 = D22 22 observations, from the difference of the time of arrival (TOA)
k k (1) + Dk (2) (67)
22 xk+1 measurements, the resulting covariance matrix becomes fully
Dk (1) = ⟨−∆xk+1 ln p (xk+1 | xk )⟩p(xk+1 ,xk ) (68)
x
populated.
D22
k (2) = ⟨−∆xk+1
k+1 ln p (yk+1 | xk+1 )⟩p(yk+1 ,xk+1 ) (69) We consider the process equation for the target assuming
The bound is valid given the existence of the following deriva- an unknown turning rate as [33]
tives and expectations terms for an asymptotically unbiased
estimator [41]. For the perfect rejector considering the system xk = f (xk−1 ) + qk−1 (76)
9

with Similarly, the parameter γ controls the variance of an outlier


 sin(ωk−1 ζ) cos(ωk−1 ζ)−1
 in each dimension respectively. Using the proposed model we
1 ωk−1 0 ωk−1 0
generate the effect of outliers in the data.
0 cos(ωk−1 ζ) 0 −sin(ωk−1 ζ) 0
 
1−cos(ωk−1 ζ) sin(ωk−1 ζ)
For filtering performance comparisons, we choose a hy-
f (xk−1 ) = 0 x
 
 ωk−1 1 ωk−1 0 k−1 pothetical Gaussian filter that is a perfect rejector having
0 sin(ωk−1 ζ) 0 cos(ωk−1 ζ) 0 apriori knowledge of all outlier instances. We also consider the
0 0 0 0 1 generalized and independent VBKF estimators [19], referred
(77) from hereon as Gen. VBKF and Ind. VBKF, for comparisons.
In VBKFs, we set the design parameter as N = 1 (essentially
where the state vector xk = [ak , a˙k , bk , b˙k , ωk ]⊤ is composed
resorting to the EM method) since higher N results in more
of the 2D position coordinates (ak , bk ), the corresponding
computational strain. Lastly, we use the derived BCRB-based
velocities (a˙k , b˙k ), the angular velocity ωk of the target at
filtering lower bounds to benchmark the performance of all
time instant k, ζ denotes the sampling period, and qk−1 ∼
the filters. Similarly, for smoothing we use the counterparts
N (0, Qk−1 ). Qk−1 is given in terms of scaling parameters η1
of all the considered filters i.e. a perfect outlier-rejecting gen-
and η2 as [33]
  eral Gaussian RTS smoother and the generalized/independent
η1 M 0 0  3 VBKF-based RTS smoothers denoted as Gen. VBKS and Ind.
ζ /3 ζ 2 /2

Qk−1 =  0 η1 M 0  , M = 2 VBKS.
ζ /2 ζ
0 0 η2 For simulations the following values of parameters are used:
Range readings are obtained using m sensors installed in a the initial state x0 = [0, 1, 0, −1, −0.0524]T , ζ = 1, η1 = 0.1,
zig-zag fashion as depicted in Fig. 3. The ith sensor is located η2 = 1.75 × 10−4 , σj2 = 10. The initialization parameters of
at aρi = 350(i − 1), bρi = 350 ((i − 1) mod 2) for i = estimators are: m+ + +
0 ∼ N (x0 , P0 ), P0 = Qk , ϵ = 10
−6
and
1 · · · m. We assume the first sensor as the common sensor for θki = 0.5 ∀ i. For each method we use the Unscented transform
reference resulting in m − 1 TDOA-based measurements. The (UT) for approximating the Gaussian integrals [43], in all
nominal measurement equation can be expressed as the considered methods. Resultingly, the Unscented Kalman
Filter (UKF) becomes the core inferential engine for all the
yk = h(xk ) + rk (78) techniques.
In each method, UT parameters are set as α = 1, β = 2, and
with
np κ = 0. Moreover, we use the same threshold of 10−4 for the
hj (xk ) = (ak − aρ1 )2 + (bk − bρ1 )2 convergence criterion in each algorithm. Other parameters for
p o VBKFs/VBKSs are assigned values as originally documented.
− (ak − aρj+1 )2 + (bk − bρj+1 )2 (79) All the simulations are repeated with a total time duration
K = 400 and 100 independent MC runs. Moreover, we use
for j = 1 · · · m − 1. The corresponding nominal covariance
box and whisker plots to visualize all the results.
measurement matrix is fully populated given as [21]
 2
σ1 + σ22 . . . σ12

A. Filtering Performance
Rk = 
 .. .. .. 
(80)
. . .  We assess the relative filtering performance under different
σ12 ··· σ12 + σm
2 scenarios.
where σi2 is the variance contribution of the ith sensor in the
resulting covariance matrix. To consider the effect of outliers
the measurement equation can be modified as
yk = h(xk ) + rk + ok (81)
where ok produces the effect of outliers in the measurements
and is assumed to obey the following distribution
m−1
Y
p(ok ) = Jkj N (ojk |0, γ(σ12 + σj2 ))
j=1

where Jkj is a Bernoulli random variable, with values 0 Fig. 4: MSE vs λ (m = 10, γ = 1000)
and 1, that controls whether an outlier in the jth dimension
occurs. Let λ denote the probability that a sensor’s TOA First, we choose 10 number of sensors with γ = 1000 and
measurement is affected. Therefore, the probability that no increase the TOA contamination probability λ. Fig. 4 shows
outlier appears in the jth dimension, corresponding to Jkj = 0, the mean squared error (MSE) of the state estimate of each
is (1 − λ)2 since the first sensor is a common reference for filter as λ is increased. For λ = 0 all the filters essentially
the TDOA-based measurements. We assume that the TOA work as the standard UKF having similar performance. As λ
measurements are independently affected and the corruption increases, MSE of each method and the lower bound value
of the first TOA observation affects all the measurements. are seen to increase. The hypothetical ideal UKF exhibits the
10

(i)
best performance followed by the proposed EMORF, Gen. and determinants for evaluating each of the Iki and zt
VBKF, and Ind. VBKF respectively. The trend remains the ∀ i = 1 · · · m in EMORF and Gen. VBKF respectively. This
same for each λ. Similar patterns have been observed for is the cost we pay for achieving robustness with correlated
other combinations of m and γ. Performance degradation measurement noise. Nevertheless, we find that EMORF gen-
of Ind. VBKF as compared to EMORF and Gen. VBKF is erally takes less processing time as compared to Gen. VBKF
expectable as it ignores the measurement correlations during as shown in Fig. 6. Moreover, similar performance has been
filtering. We find EMORF to be generally more robust in observed for other combinations of λ and γ. This can be
comparison to Gen. VBKF. Our results are not surprising given attributed to a simpler model being employed in EMORF
that we found the modified selective observation rejecting resulting in reduced computations.
(mSOR)-UKF to be more resilient to outliers as compared
to the modified outlier-detecting (mOD)-UKF [18], which B. Smoothing Performance
are designed for independent measurements having similar
structures to EMORF and VBKF respectively. For smoothing we perform analogous experiments and
observe similar performance.

Fig. 5: MSE vs m (λ = 0.3 and γ = 200)


Fig. 7: MSE vs λ (m = 15, γ = 100)
Next, we vary the number of sensors and assess the estima-
tion performance of the filters. Fig. 5 shows the MSE of each First, we choose 15 number of sensors and increase the
method as the number of sensors is increased with λ = 0.3 and TOA contamination probability λ with γ = 100. Fig. 7 shows
γ = 200. As expected, the error bound and MSE of each filter how MSE of the state estimate of each smoother changes as
decrease with increasing number of sensors since more sources λ is increased. Similar to filtering, we observe that MSE of
of information become available. We see a pattern similar to each estimator grows with increasing λ including the BCRB-
the previous case with the best performance exhibited by the based smoothing lower bound. The hypothetical RTS smoother
hypothetical ideal UKF followed by EMORF, Gen. VBKF, and performs the best followed by the proposed EMORS, Gen.
Ind. VBKF respectively. Moreover, we have observed similar VBKS, and Ind. VBKS respectively. The trend remains the
trends for other values of λ and γ as well. same for each λ. Similar patterns have been seen for other
combinations of m and γ.

Fig. 6: Time vs m (λ = 0.3 and γ = 200)


Fig. 8: MSE vs m (λ = 0.2 and γ = 100)
Subsequently, we evaluate the processing overhead of each
algorithm by varying the number of sensors. Fig. 6 shows Thereafter, we assess the estimation performance of the
the execution time taken by each algorithm as the number filters by varying the number of sensors. Fig. 8 depicts MSE
of sensors is increased with λ = 0.2 and γ = 100. We of each estimator as the number of sensors increase with
observe that the ideal UKF and Ind. VBKF take lesser time for λ = 0.2 and γ = 100. MSE for each smoother decreases
execution having a complexity of O(m3 ). However, EMORF with growing number of sensors including the BRCB-based
and Gen. VBKF induce more computational overhead, having lower bound. The hypothetical RTS smoother is the best
a complexity of O(m4 ), due to utilization of matrix inverses performing followed by EMORS, Gen. VBKS, and Ind. VBKS
11

respectively. We have observed similar trends for other values Î k as


of λ and γ as well.  
Rk /ϵ 0
Rk (Î k ) = (82)
0 R̂k

where Rk is a sub-matrix with diagonal entries of Rk . R̂k


contains the rest of the fully populated submatrix of Rk (Î k )
corresponding to entries of Î k = 1. Inversion of Rk (Î k )
results in
 −1 
−1 ϵRk 0
Rk (Î k ) = (83)
0 R̂−1
k

Finally, R−1
k (Î k ) can be swapped accordingly to obtain the
required matrix R−1k (Î k ) .
Fig. 9: Time vs m (λ = 0.2 and γ = 100)

Lastly, we evaluate the computational overhead of each A PPENDIX B


algorithm by varying the number of sensors. Fig. 9 shows S IMPLIFYING τ̂ki
the time each method takes as the number of sensors is We can swap the ith row/column entries of Rk (Iki =
i− i−
increased with λ = 0.2 and γ = 100. We observe similar 1, Î k ) and Rk (Iki = 1, Î k ) with the first row/column
patterns as for filtering. The ideal RTS smoother and Ind. elements to obtain
VBKS having a complexity of O(m3 ) take lesser execution i,i
Ri,−i

i− R
time. EMORS and Gen. VBKS having a complexity O(m4 ) i

|Rk (Ik = 1, Î k )| = −i,ik k (84)
R̂−i,−i

are more time consuming. EMORS generally induces lesser Rk k

i,i
computing overhead as compared to Gen. VBKS as depicted

i− R /ϵ 0
in Fig. 9. We have observed similar patterns for different |Rk (Iki = ϵ, Î k )| = k −i,−i (85)
0 R̂k
combinations of λ and γ .
Consequently, we can write
VII. C ONCLUSION  |R (I i = 1, Î i− )|   |R (I i = 1, Î i− )| 
k k k k k k
We consider the problem of outlier-robust state estimation ln i−
= ln i−
assuming the existence of measurement noise correlation. |Rk (Iki = ϵ, Î k )| |Rk (Iki = ϵ, Î k )|

Given their advantages, resorting to tuning-free learning-based R−i,i Ri,−i (R̂−i,−i )−1
= ln I − k k k
+ ln(ϵ) (86)

approaches is an attractive option in this regard. Identifying the
Rki,i
shortcomings of such existing VB-based tractable methods,
we propose EMORF and EMORS. Since the standard VB where we have used the following property from matrix
approach entails significant processing complexity, we adopt algebra [44]
EM in our algorithmic constructions. We can conclude that
A B
the presented methods are simpler and hence more practica- −1
C D = |A||D − CA B|

ble as compared to the state-of-the-art Gen. VBKF/VBKS,
devised for the same conditions. This is possible due to the Resultingly, we can simplify (23) to (25).
reduction of inference parameters resulting from the proposal
of an uncomplicated model. Also, the need of the specialized
digamma function during implementation is obviated. In ad- A PPENDIX C
dition, numerical experiments in an illustrative TDOA-based E VALUATING △R̂−1
k
target tracking example suggest further merits of the proposed To avoid redundant calculations during the evaluation of
methods. We find that EMORF/EMORS generally exhibit △R̂−1
k , we can first swap the ith row/column elements of
lesser errors as compared to Gen. and Ind. VBKF/VBKS matrices with the first row/column entries in (24) to obtain
in different scenarios of the example. Moreover, though the −1 i− i−
complexity order of EMORF/EMORS and Gen. VBKF/VBKS △R̂k = (R−1 i −1 i
k (Ik = 1, Î k ) − Rk (Ik = ϵ, Î k ))
is the same, the proposed estimators are found to be com-  i,i −1  i,i −1
Rk Ri,−i
k Rk /ϵ 0
putationally more competitive in general for different test = −
R−i,i
k R̂−i,−i
k 0 R̂k−i,−i
conditions. These merits make the proposed state estimators
(87)
worthy candidates for implementation in relevant scenarios.
To simplify (87), we use the following property from matrix
A PPENDIX A algebra [44]
E VALUATING R−1
k (Î k ) −1 
S−1 −S−1 BD−1
 
For evaluating R−1 A B
k (Î k ) we consider that Rk (Î k ) can =
easily be rearranged by swapping rows/columns depending on C D −D−1 CS−1 D−1 + D−1 CS−1 BD−1
12

where S is the Schur’s complement of D given as S = A − [20] Y. Bar-Shalom, “Negative correlation and optimal tracking with Doppler
BD−1 C. As a result, we obtain measurements,” IEEE Transactions on Aerospace and Electronic Sys-
 i,i tems, vol. 37, no. 3, pp. 1117–1120, 2001.
Ξi,−i

−1 Ξ [21] R. Kaune, J. Hörst, and W. Koch, “Accuracy analysis for TDOA
△R̂k = (88) localization in sensor networks,” in 14th International Conference on
Ξ−i,i Ξ−i,−i Information Fusion, 2011, pp. 1–8.
[22] Y. Zhu, R. Blum, Z.-Q. Luo, and K. M. Wong, “Unexpected properties
where the expressions of Ξi,i , Ξi,−i , Ξ−i,i and Ξ−i,−i are and optimum-distributed sensor detectors for dependent observation
given in (27)-(30). The redundant calculations in (27)-(30) can cases,” IEEE Transactions on Automatic Control, vol. 45, no. 1, pp.
62–72, 2000.
be computed once and stored for further computations e.g. [23] H. Ma and Y. Liu, “Correlation based video processing in video sensor
(Rki,i − Ri,−i
k (R̂−i,−i
k )−1 R−i,i
k )−1 and (R̂−i,−i
k )−1 . Lastly, networks,” in 2005 International Conference on Wireless Networks,
−1 Communications and Mobile Computing, vol. 2, 2005, pp. 987–992
the first row/column entries of △R̂k are interchanged to the vol.2.
actual ith row/column positions to obtain the required △R̂−1k . [24] K. Yuen, B. Liang, and B. Li, “A distributed framework for correlated
data gathering in sensor networks,” IEEE Transactions on Vehicular
Technology, vol. 57, no. 1, pp. 578–593, 2008.
R EFERENCES [25] Y. Huang, Y. Zhang, N. Li, and J. Chambers, “Robust Student’s t based
nonlinear filter and smoother,” IEEE Transactions on Aerospace and
[1] J. Kuti, I. J. Rudas, H. Gao, and P. Galambos, “Computationally relaxed Electronic Systems, vol. 52, no. 5, pp. 2586–2596, 2016.
Unscented Kalman filter,” IEEE Transactions on Cybernetics, vol. 53, [26] H. Wang, H. Li, W. Zhang, and H. Wang, “Laplace l1 robust Kalman
no. 3, pp. 1557–1565, 2023. filter based on majorization minimization,” in 2017 20th International
[2] B. Lian, Y. Wan, Y. Zhang, M. Liu, F. L. Lewis, and T. Chai, “Distributed Conference on Information Fusion (Fusion), 2017, pp. 1–5.
Kalman consensus filter for estimation with moving targets,” IEEE [27] C. D. Karlgaard, “Nonlinear regression Huber–Kalman filtering and
Transactions on Cybernetics, vol. 52, no. 6, pp. 5242–5254, 2022. fixed-interval smoothing,” Journal of guidance, control, and dynamics,
[3] H. Zhang, X. Zhou, Z. Wang, and H. Yan, “Maneuvering target tracking vol. 38, no. 2, pp. 322–330, 2015.
with event-based mixture Kalman filter in mobile sensor networks,” [28] L. Chang and K. Li, “Unified form for the robust Gaussian information
IEEE Transactions on Cybernetics, vol. 50, no. 10, pp. 4346–4357, 2020. filtering based on M-estimate,” IEEE Signal Processing Letters, vol. 24,
[4] A. Donder and F. R. y. Baena, “Kalman-filter-based, dynamic 3-D no. 4, pp. 412–416, 2017.
shape reconstruction for steerable needles with fiber bragg gratings in [29] L. Chang, B. Hu, G. Chang, and A. Li, “Multiple outliers suppression
multicore fibers,” IEEE Transactions on Robotics, vol. 38, no. 4, pp. derivative-free filter based on Unscented transformation,” Journal of
2262–2275, 2022. guidance, control, and dynamics, vol. 35, no. 6, pp. 1902–1906, 2012.
[5] M. Gardner and Y.-B. Jia, “Pose and motion estimation of free-flying [30] S. Wang, W. Gao, and A. P. S. Meliopoulos, “An alternative method for
objects: Aerodynamics, constrained filtering, and graph-based feature power system dynamic state estimation based on Unscented transform,”
tracking,” IEEE Transactions on Robotics, vol. 38, no. 5, pp. 3187– IEEE Transactions on Power Systems, vol. 27, no. 2, pp. 942–950, 2012.
3202, 2022. [31] A. K. Singh and B. C. Pal, “Decentralized dynamic state estimation in
[6] C. Mishra, L. Vanfretti, and K. D. Jones, “Synchrophasor phase angle power systems using Unscented transformation,” IEEE Transactions on
data unwrapping using an Unscented Kalman filter,” IEEE Transactions Power Systems, vol. 29, no. 2, pp. 794–804, 2014.
on Power Systems, vol. 36, no. 5, pp. 4868–4871, 2021. [32] A. Nakabayashi and G. Ueno, “Nonlinear filtering method using a
[7] B. L. Boada, M. J. L. Boada, and H. Zhang, “Sensor fusion based on switching error model for outlier-contaminated observations,” IEEE
a dual Kalman filter for estimation of road irregularities and vehicle Transactions on Automatic Control, vol. 65, no. 7, pp. 3150–3156, 2020.
mass under static and dynamic conditions,” IEEE/ASME Transactions [33] H. Wang, H. Li, J. Fang, and H. Wang, “Robust Gaussian Kalman filter
on Mechatronics, vol. 24, no. 3, pp. 1075–1086, 2019. with outlier detection,” IEEE Signal Processing Letters, vol. 25, no. 8,
[8] M. Jahja, D. Farrow, R. Rosenfeld, and R. J. Tibshirani, “Kalman filter, pp. 1236–1240, 2018.
sensor fusion, and constrained regression: Equivalences and insights,” [34] R. Piché, S. Särkkä, and J. Hartikainen, “Recursive outlier-robust filter-
Advances in Neural Information Processing Systems, vol. 32, 2019. ing and smoothing for nonlinear systems using the multivariate Student-
[9] A. H. Chughtai, M. Tahir, and M. Uppal, “A robust Bayesian approach t distribution,” in 2012 IEEE International Workshop on Machine
for online filtering in the presence of contaminated observations,” IEEE Learning for Signal Processing, 2012, pp. 1–6.
Transactions on Instrumentation and Measurement, vol. 70, pp. 1–15, [35] T. Zhang, S. Zhao, X. Luan, and F. Liu, “Bayesian inference for State-
2021. space models with Student-t mixture distributions,” IEEE Transactions
[10] A. H. Chughtai, U. Akram, M. Tahir, and M. Uppal, “Dynamic state on Cybernetics, vol. 53, no. 7, pp. 4435–4445, 2023.
estimation in the presence of sensor outliers using MAP-based EKF,” [36] K. P. Murphy, Machine learning: a probabilistic perspective, 2012.
IEEE Sensors Letters, vol. 4, no. 4, pp. 1–4, 2020. [37] V. Šmı́dl and A. Quinn, The variational Bayes method in signal
[11] R. E. Kalman, “A new approach to linear filtering and prediction processing. Springer Science & Business Media, 2006.
problems,” 1960. [38] A. H. Chughtai, A. Majal, M. Tahir, and M. Uppal, “Variational-
[12] M. S. Grewal and A. P. Andrews, Kalman filtering: Theory and Practice based nonlinear Bayesian filtering with biased observations,” IEEE
with MATLAB. John Wiley & Sons, 2014. Transactions on Signal Processing, vol. 70, pp. 5295–5307, 2022.
[13] S. J. Julier and J. K. Uhlmann, “New extension of the Kalman filter [39] M. A. A. Turkman, C. D. Paulino, and P. Müller, Computational
to nonlinear systems,” in Signal processing, sensor fusion, and target Bayesian statistics: an introduction. Cambridge University Press, 2019,
recognition VI, vol. 3068. Spie, 1997, pp. 182–193. vol. 11.
[14] N. J. Gordon, D. J. Salmond, and A. F. Smith, “Novel approach to [40] H. L. Van Trees, Detection, estimation, and modulation theory, part I:
nonlinear/non-Gaussian Bayesian state estimation,” in IEE proceedings detection, estimation, and linear modulation theory. John Wiley &
F (radar and signal processing), vol. 140, no. 2. IET, 1993, pp. 107– Sons, 2004.
113. [41] P. Tichavsky, C. Muravchik, and A. Nehorai, “Posterior Cramer-Rao
[15] G. Evensen, “The ensemble Kalman filter: Theoretical formulation and bounds for discrete-time nonlinear filtering,” IEEE Transactions on
practical implementation,” Ocean dynamics, vol. 53, pp. 343–367, 2003. Signal Processing, vol. 46, no. 5, pp. 1386–1396, 1998.
[16] H. E. Rauch, F. Tung, and C. T. Striebel, “Maximum likelihood estimates [42] M. Šimandl, J. Královec, and P. Tichavský, “Filtering, predictive, and
of linear dynamic systems,” AIAA journal, vol. 3, no. 8, pp. 1445–1450, smoothing Cramér–Rao bounds for discrete-time nonlinear dynamic
1965. systems,” Automatica, vol. 37, no. 11, pp. 1703–1716, 2001.
[17] S. Särkkä and L. Svensson, Bayesian filtering and smoothing. Cam- [43] E. A. Wan, R. Van Der Merwe, and S. Haykin, “The Unscented Kalman
bridge university press, 2023, vol. 17. filter,” Kalman filtering and neural networks, vol. 5, no. 2007, pp. 221–
[18] A. H. Chughtai, M. Tahir, and M. Uppal, “Outlier-robust filtering for 280, 2001.
nonlinear systems with selective observations rejection,” IEEE Sensors [44] F. Zhang, The Schur complement and its applications. Springer Science
Journal, vol. 22, no. 7, pp. 6887–6897, 2022. & Business Media, 2006, vol. 4.
[19] H. Li, D. Medina, J. Vilà-Valls, and P. Closas, “Robust variational-based
Kalman filter for outlier rejection with correlated measurements,” IEEE
Transactions on Signal Processing, vol. 69, pp. 357–369, 2021.

You might also like