Dynamic Active Subspaces AData Driven Approach To Computing Tim
Dynamic Active Subspaces AData Driven Approach To Computing Tim
by
Master of Science
2018
This thesis entitled:
DYNAMIC ACTIVE SUBSPACES: A DATA-DRIVEN APPROACH TO COMPUTING
TIME-DEPENDENT ACTIVE SUBSPACES IN DYNAMICAL SYSTEMS
written by Izabel Pirimai Aguiar
has been approved for the Department of Computer Science
Date
The final copy of this thesis has been examined by the signatories, and we find that both
the content and the form meet acceptable presentation standards of scholarly work in the
above mentioned discipline.
iii
we must understand how the errors in the models’ inputs (i.e., through measurement error)
affect the output of the systems: we must quantify the uncertainty that results from these
input errors. Uncertainty quantification (UQ) becomes computationally complex when there
are many parameters in the model. In such cases it is useful to reduce the dimension of the
problem by identifying unimportant parameters and disregarding them for UQ studies. This
makes an otherwise intractable UQ problem tractable. Active subspaces extend this idea
effective dimension reduction. Although active subspaces give model insight and computa-
tional tractability for scalar-valued functions, it is not enough. This analysis does not extend
subspaces, and introduce the analytical form of dynamic active subspaces for two cases. To
highlight these methods we find dynamic active subspaces for a linear harmonic oscillator
Acknowledgements
I would primarily like to thank my advisor, Paul Constantine. Paul has shown me,
through the opportunities and support he’s given me, how much he believes in my ability to
succeed. In doing so, he has given me some of the most positive and productive experiences
I’ve ever had. He has continually emboldened me to continue on my own path. Professionally,
no one else has impacted and influenced me as much as he has. I am extremely grateful for
I would also like to sincerely thank the members of my research group: Zachary Grey,
Andrew Glaws, and Jeffrey Hokanson, for their engaging discussions, cups of coffee, criti-
cal feedback, and overwhelming kindness. Many thanks to my thesis committee, Elizabeth
Bradley, Gianluca Iaccarino, and Jim Curry for their support as well as to Nathan Kutz,
Steven Brunton, John Butcher, Kyle Niemeyer, and Peter Schmid, for their interest and en-
couragement. I am also extremely grateful for the dozens of kind and enthusiastic conference
attendees who have inspired new questions and driven further research. I am excited and
Certainly not least, I am thankful to my family: Aristotle Johns, Jojo Clark, Sharon
Aguiar, Laura Aguiar, Matt Aguilar, Christina Whippen, Amanda Evans, Julie Clark, and
Xenia Johns. I would be nowhere without their unparalleled support, and I would know
CONTENTS
CHAPTER
1 1
1.1 INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.4 BACKGROUND . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
TEMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2 21
2.4 DISCUSSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
BIBLIOGRAPHY 45
viii
TABLES
Table
2.1 The initial conditions p in (2.26) have a uniform joint density with lower and
2.2 The parameters p have uniform joint density with upper and lower bounds
FIGURES
Figure
2.1 The trajectory of the x, y, and z components of the linear harmonic oscillator
2.2 The eigenvalues (left) and first eigenvector (middle) of C at time t = 5. Note
that there is only one nonzero eigenvalue. The shadow plot (right) at this time
2.3 (Left) the first eigenvector of C computed for the function (2.26) (Right) the
p
first eigenvector after being multiplied by λ0 (t). . . . . . . . . . . . . . . . 32
2.4 The maximum absolute error of the dynamic active subspace approximated
by (left) DMD and (right) SINDy. The data matrices for each algorithm have
2.5 The convergence of the approximation to the integral in (1.1) for the number
scheme. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.6 (Left) the average absolute error over time between Gronwall Gradients and
respect to parameters. (Right) the absolute error over time between Gronwall
Gradients and the second-order finite difference approximation for the gradient
2.8 The influence of varying α on the (left) cost and (right) absolute error of the
2.9 The maximum absolute error of the dynamic active subspace approximated
by (left) DMD and (right) SINDy. The data matrices for each algorithm have
2.10 The (left) approximated dynamic active subspace and its (right) absolute error
using (top) DMD and (bottom) SINDy. Solutions have been approximated
1.1 INTRODUCTION
we must understand how the errors in the models’ inputs (i.e., through measurement error)
affect the output of the systems: we must quantify the uncertainty that results from these
there are many parameters in the model. In such cases it is useful to reduce the dimension
of the model by identifying unimportant parameters and disregarding them for UQ studies.
these analyses are limited to local metrics that do not fully explore the parameter space,
or global metrics that are either limited to a static point in time or to the sensitivity of
individual parameters.
Active subspaces [11] identify linear combinations of the parameters that change the
quantity of interest the most on average. The active subspace of a system can be exploited
to identify coordinate-based global sensitivity metrics called activity scores [13] that provide
comparable sensitivity analysis to existing metrics. The existing approach for extending
active subspaces to time-dependent systems involves recomputing the active subspace at each
time step of interest [33, 14]. This analysis requires sampling across the parameter space,
each time step. For complex time-dependent systems or for those with many parameters
cation, sensitivity analysis, and parameter estimation for computational models that have
change the quantity of interest over time could inform important discipline-specific insights.
Such analysis would identify which linear combinations change the quantity of interest the
The contributions of this work. In this thesis we present a novel approach for
existence of a dynamical system describing the dynamic active subspace in the cases of three
ogy for approximating the dynamic active subspace with “data” in the form of the one-
dynamic active subspace using this methodology for (i) a linear harmonic oscillator and (ii)
the enzyme kinetics system. Our methodology provides a means by which to identify the
linear combination of parameters that will change the function the most on average, and
sensitivity metric given snapshots of the sensitivity metric is, to the best of our knowledge,
novel. Furthermore, the methodology presented in this work can be extended to develop
time-dependent insight for other global sensitivity metrics. This intellectual and technical
contribution of the novel ideas and methodology developed in this work lend to the advance-
systems.
3
isting global sensitivity analysis tools and related work and follow by defining the notation
and vocabulary used in the paper. In Section 1.4 we detail the background information on
active subspaces, dynamic mode decomposition (DMD), and sparse identification for nonlin-
ear dynamical systems (SINDy). In Section 2.2 we prove the existence of the analytic form
of three dynamic active subspaces. In Section 2.3 we propose the method of approximating
dynamic active subspaces with DMD and SINDy. We discuss the benefits and drawbacks
of such methods and conduct numerical experiments on the dynamic active subspace for an
output of a linear and nonlinear dynamical system. We finish by drawing conclusions on the
A note to the reader The pronoun we will be used from the perspective of the
authors. This choice is intended to directly include the reader in the instruction, analysis,
Sensitivity analysis methods can be separated into local and global metrics. As dis-
cussed in [34], local sensitivity analysis is appropriate when the parameters of a model are
known with little uncertainty. In such cases we can evaluate the partial derivative of the
quantity of interest (QoI) with respect to its parameters to evaluate the relative change
in the QoI with respect to each parameter. However, in many biological and engineering
systems the input parameters are associated with a range of uncertainty or possibility; e.g.
glycosis modeling [46], molecular systems [43, 3], battery chemistries [37], and HIV modeling
[38]. In such cases a global sensitivity metric is necessary to understand the sensitivity of
the system to its range of inputs. There are many existing global sensitivity analysis tools,
including “One-At-A-Time” screening methods [23] such as the Morris method [36], the full
factorial method [18], the analysis of variance (ANOVA) decomposition [29], Fourier Ampli-
tude Sensitivity Test (FAST) [17], and active subspaces [13]. For a more detailed review of
4
Local and global metrics exist for sensitivity analysis of dynamical systems, as well.
For a system of ordinary differential equations (ODEs), time-dependent local sensitivity met-
rics are given by the augmented ODE with additional terms for the partials of the states
with respect to each parameter [21]. Similarly, [47] and [7] develop a time-dependent local
sensitivity metric for evaluating the parameter dependence of dynamic population models.
These metrics, along with others reviewed in [51], are local and only assess small perturba-
tions from the nominal parameter values. Global sensitivity analyses of dynamical systems
are often limited to analyzing an aspect of the system that is not time-dependent, such as
the steady state [1] or known stages [28] of the system. For time-dependent analyses, the
Fourier Amplitude Sensitivity Test (FAST) computes the time dependent variances of each
state of a system. In [34], the Pearson Rank Correlation Coefficient [39] is computed at
multiple points in time. These time-dependent analyses are limited to assessing the vari-
ance in or correlation between the QoI and each parameter in the system. In the case of a
In this section we define the notation and vocabulary used throughout the remainder
of the paper to create a means by which definitions and notations can easily be referenced.
Although we define these terms below, they will also be explicitly detailed in the following
sections.
5
terest. The upper and lower bounds of these parameters are given
by p` ∈ Rm and pu ∈ Rm , respectively.
ρ(p) The joint probability density for the parameter space of f . For ex-
ues λi for i = 1, 2, . . . , m.
system.
0 q×(n−1)
XDM D ∈ R Data matrix used in DMD consisting of observations of u from
are columns.
tems and broadly used to refer to the method presented in [4] for
are rows.
Rq at n points in time.
[x2 xy xz y 2 yz z 2 ].
DyAS Shorthand for dynamic active subspaces and broadly used to refer
1.4 BACKGROUND
output of interest (e.g., cell count, drug concentration, temperature). We assume that the
parameters p have joint probability density ρ(p) : Rm 7→ R+ . This density represents the
uncertainty in the parameters. For example, if it is equally probable that the parameters
take any value in the range of its lower and upper bounds, p` and pu , respectively, then
p ∼ U(p` , pu ).
and has square integrable derivatives. Active subspaces [11] are eigenspaces of the matrix,
Z
C= ∇f (p)∇f (p)T ρ(p)dp = W ΛW T . (1.1)
The matrix C is the expectation of the outer product of the gradient of f , and is symmet-
ric and positive semidefinite. Thus the eigendecomposition of C yields real non-negative
following lemma.
Lemma 1.4.1 (Lemma 3.1, Constantine, 2015 [11]). The mean-squared directional derivative
Assuming the largest eigenvalue λ1 is unique, it indicates that w1 is the direction in the
parameter space along which f , changes the most on average. Let the eigendecomposition
be partitioned by
Λ1
Λ= , W = W1 W2 , (1.3)
Λ2
8
where Λ1 contains the k < m largest eigenvalues and W 1 the corresponding k eigenvectors.
y = W T1 p ∈ Rk , z = W T2 p ∈ Rm−k . (1.4)
The new parameters y and z are linear combinations of the original parameters p. The
parameters y are those that are most important in the following sense.
Lemma 1.4.2 (Constantine, 2015). The mean-squared gradients of f with respect to y and
z satisfy
Z
(∇y f )T (∇y f )ρ(p)dp = λ1 + . . . + λk (1.5)
Z
(∇z f )T (∇z f )ρ(p)dp = λk+1 + . . . + λm (1.6)
Thus the parameters defined by y are those for which the mean squared gradients of
f are largest in the 2-norm. Consider, for example, the case where λk+1 = . . . = λm = 0. In
this case, the only directions in which the function changes are those defined by W 1 .
Although finding the active subspace and principal component analysis (PCA) [24]
both involve the process of eigendecomposing a matrix, it is important to note that the two
analyses are not the same. The matrix to be eigendecomposed in PCA is the covariance of
the parameters p, whereas the matrix used to find the active subspace is that defined by
(1.1). The most critical distinction between these two analyses is that PCA gives insight
into the relationship between one parameter and another ; active subspaces give insight into
1.4.1.1 Practicalities
In this section we discuss the practicalities of finding and estimating the active sub-
space. We address the convention of centering and normalizing the parameters p and when
to take the log-transform of the parameters. Additionally, we discuss the methods by which
9
we estimate (i) the gradient of f with respect to p and (ii) the integral (1.1) defining the
active subspace.
Normalizing the parameters In the analysis we assume that p ∈ [−1, 1]m . To satisfy
this assumption in practice we often need to consider an initial map that shifts and scales
the model’s natural parameter space to the normalized space. For parameters p ∈ Rm with
upper and lower bounds pu , p` respectively, the appropriate shifted and scaled parameters
are given by
diag(pu − pl )p + (pu + p` )
p∗ = , p∗ ∈ [−1, 1]m . (1.7)
2
Log-transform of parameter space When the inputs of f have units, as in the enzyme-
kinetics system we study in Section 2.3.3, we take a log-transform of the parameter space.
Such a decision is made from experience and insight with other dimensionalized systems.
parameters for a physical system and identifying important linear combinations of parame-
ters. Since the non-dimensional parameters are products of powers of the original parameters,
a log-transform produces a linear combination of the logs of the original parameters. This im-
plies that, for systems where the original parameters are amenable to non-dimensionalization,
the log-transformed system depends only on a few linear combinations. Such structure is
Estimating gradients In practice we do not have the analytic form of ∇f (p) and thus
must approximate it. If this is the case one may approximate the gradient using first order
∂f f (pi + h) − f (pi )
≈ , i = 1, 2, ..., m. (1.8)
∂pi h
For dynamical systems with an analytical form ∇f (t, p) can be approximated by augmenting
the original ODE system with the time derivative of the states’ partial derivatives. These
10
will be referred to as Gronwall Gradients [21] and are detailed in Section 1.4.2.
Estimating the expectation The expectation defined by the integral in (1.1) must be
approximated. For this paper we employ two methods for approximating this integral using
(i) Monte Carlo and (ii) Gauss-Legendre quadrature. Both approximation rules are achieved
The choice of M depends on f and the dimension of the parameter space. With Monte
Carlo sampling, the points pi are randomly drawn according to density ρ(p) with weights
ωi = 1
M
. The estimate for Ĉ converges to C like O(M −1/2 ). When the dimension m of f is
low, and when f has sufficiently smooth derivatives we can achieve a much better approx-
imation using a Gauss quadrature rule to approximate C. Here, M nodes pi and weights
ωi are drawn per dimension according to the tensor product Gauss-Legendre quadrature
scheme. It is important to note that the number of nodes M per dimension m leads to M m
total function evaluations and gradient approximations. The choice of M again depends on
f and in practice one should conduct numerical convergence studies to find an appropriate
M . We implement and focus on these two methods for approximating the active subspace,
Assume that the q states of the dynamical system (1.12) are dependent upon at least
one parameter p. In [21], Gronwall derives a system of ordinary differential equations that
∂ui
includes the dynamics of both the original state variables and of their partial derivatives, .
∂p
Numerically integrating the system provides an approximation for the partial derivatives of
the state with respect to a parameter, p as a function of time. In the case where there is
more than one parameter, the system is easily extended to include the dynamics of these
partial derivatives. This approach for approximating partial derivatives was developed as a
local sensitivity metric, but can be used as an efficient way to approximate the gradient of
a function.
Let the following system of q differential equations describe the dynamics of the states
dui
= gi (t; u1 , . . . , uq ; p1 , . . . pm ). (1.10)
dt
∂gi
Let the initial conditions ui = u0i for t = t0 and assume that for j = 1, . . . , q the partials ∂uj
∂gi
and ∂pk
are continuous [21].
∂ui
The partial derivative of state ui with respect to parameter pk is written ∂pk
. The
As defined, the initial conditions for all of the partial derivatives are zero. The dynamical
system resulting from augmenting (1.10) with (1.11) has q+(m×q) states. The integration of
this system will result in the solutions of the states in time and the solutions of the partials
12
in time. For our purposes we approximate the integration of the system using python’s
used to approximate C.
states is described by governing equations dependent on states and parameters. The vector
In this form, u̇ denotes the time derivative of the states u, and g denotes a continuous
function of the states describing the governing time-dependence. The governing equations of
dynamical systems enables in-depth analysis and computational simulations of the physical
systems they model. However, in many systems the governing equations are not known.
To address such situations, methods exist to recover the underlying dynamical system and
predict future states from observations of the system with a local linear approximation [44];
governing equations through a sparse least squares problem [4]; nonlinear regression analysis
[52]; iterative symbolic regression [45]; nonlinear adaptive projection onto a sparse basis
[42]; and with embedding dimension and global equations of motion analysis [16]. Below
we present the methods of: (i) dynamic mode decomposition (DMD) developed by Schmid
[44] and further analyzed by Kutz, et al. in [30]; and (ii) sparse identification of nonlinear
dynamical systems (SINDy) developed by Brunton et al. in [4]. We review these data-driven
recovery methods to motivate their use for recovering and approximating the dynamics of
Assume that we can observe a dynamic process with q state variables from which we
can collect n snapshots of data. We further assume that the dynamical system (1.12) has
where A minimizes ||uk+1 − Auk ||2 over the data snapshots. Although DMD can be used
to provide insight into the dynamical system, we implement the algorithm to exploit only
its ability to estimate states in between data snapshots and predict future states using the
We implement a simple approach to the DMD algorithm presented in [30]. The algo-
rithm as-is in [30] consists of using a low-rank projection of A onto its proper orthogonal
14
decomposition (POD) modes. For our application of DMD we want the full-dimensional
representation of the dynamics and thus avoid this step, however it should be noted that for
X = U ΣV T , (1.18)
(2) Construct matrix A from (1.17) with the pseudoinverse of X given by the above
SVD:
X † = V Σ−1 U T , (1.19)
A = X 0 V Σ−1 U T . (1.20)
AW = W Λ, (1.21)
where the eigenvectors of A are the columns of W and the eigenvalues λi are the
diagonal entries of Λ.
(4) Construct
Φ = X 0 V Σ−1 W . (1.22)
x1 = Φb.
Thus, given snapshots of a dynamical system in time, we can approximate the dynamics
16
and future states with a locally linear approach given by dynamic mode decomposition.
TEMS
The method of sparse identification of nonlinear dynamical systems (SINDy) uses snap-
shots of a dynamical system to recover the governing equations of the system. We assume
that we can observe and measure a time-dependent process of one or more states. For ex-
ample, for the movement of an oscillator in three dimensional space, the state vector u ∈ R3
would have three components x, y, z. We aim to recover the governing system of equations
in time. We approximate the time derivative of these states in time using a differentiation
scheme to form Ẋ. In this paper we implement a first order forward difference scheme,
u(ti + h) − u(ti )
u̇i ≈ (1.24)
h
where ui are the observations at time ti and h is the time-step used to compute derivatives.
The authors of [4] suggest that when the snapshots are collected from a system with noise,
a total variation regularized derivative scheme [8] should be used to compute Ẋ.
We next construct the library matrix Θ ∈ Rn×p . This library matrix is formed by
the functions which might be present in the true governing system g. Thus we have
T T
u (t1 ) u̇ (t1 )
uT (t2 ) u̇T (t2 )
X = . Ẋ = . (1.25)
. .
. .
T T
u (tn ) u̇ (tn )
17
Θ(X) = 1 X X P2 X P3 · · · sin(X) cos(X) · · · . (1.26)
The notation X Pi indicates monomials of the states up to order i. For example, for the
such that
Ẋ ≈ Θ(X)Ξ (1.28)
The coefficients of Ξ will thus indicate which, and how much, of the functions of Θ are in the
true governing equations, g(u). Given the various approximations present in the formulation
of (1.28) (i.e., data observation errors, approximation of Ẋ, and errors in Θ(X)), an exact
solution is unlikely or impossible. Thus we seek a solution to the minimization problem given
by
by intuition of the underlying physical system or of the observed dynamics. The authors of
[4], however, argue that the library matrix can contain as many functional transformations
as the user chooses. This argument is based on the experience that unnecessary functions
in the true governing equations is not included in the library matrix, it will be impossible
to recover with this algorithm. If we know nothing about the true governing equations of
the system from which we collect data, then we have no way to verify whether or not we are
The method of sequential thresholded least squares was developed in [4] to solve the
problem given by (1.29). At the heart of this algorithm is an assumption of sparsity in the
space of all functions. The thresholding aspect of the method imposes this sparsity and aids
in the recovery of the underlying dynamical system. As an example of how this sparsity
x2 x3
may be imposed, consider the function ex ≈ 1 + x + 2
+ 6
.
If we observed data from this
function and formed the library matrix as Θ(X) = eX 1 X X X 2 3 A reasonable
identifying only ex , however, is most sparse. The following method has been developed for
such examples and has been used to recover the governing dynamics of many systems [4].
(a) For all entries ξi,j of matrix Ξ, if ξi,j < λ, update such that ξi,j = 0.
(i) Let r1 , . . . r` be the rows of ξj whose entries are not equal to zero. Let ξ ∗
be the column vector whose rows are the corresponding nonzero entries of
ξj .
The coefficient matrix, Ξ returned from Algorithm 3 indicates which functions of the
library matrix are in the true governing equations, g(x). For example, in a three state system
where
0 3 22
Ξ = 6 0 9 , Θ = sin x xy z ,
3
0 0.8 2
the governing equations recovered by SINDy would be
ẋ = 6xy
ẏ = 3 sin x + 0.8z 3
ż = 22 sin x + 9xy + z 3
Practicalities
Assumption of sparsity The choice of using this method to solve the overdetermined
least squares problem is based on a few assumptions that are necessary to address. The first
assumption is that the functions composing the governing system g(x) are within the library
matrix Θ. The second assumption is that the true governing system is sparse in the space
of all functions. This means that of all possible functions, the governing equations of the
dynamical system will only be composed of a few. The validity of this assumption for the
case in which we will employ the algorithm will be discussed in Section 1.5.2.3.
for solving the `1 minimization problem given by (1.29) is extremely and unpredictably sensi-
tive to the choice of λ. Because the values of the coefficient matrix are thresholded according
to this parameter at each step, the choice of λ will drastically affect the recovered dynam-
ical system. This sensitivity and unpredictability of the choice of λ is also what makes
the complexity analysis of this algorithm difficult. The cost of each least squares computa-
tion at each iteration in the sequential thresholded least squares approach to the problem is
determined by the size of the matrix and vector. These sizes are dependent on the value of λ.
20
Although sequential thresholded least squares has been empirically shown to recover the
governing dynamical system, the practicalities noted above make it difficult to understand
how the method of SINDy can be translated to new problems. For example, to actually
recover a dynamical system about which one has no intuition (stay tuned!), choosing an
appropriate value for λ can be difficult. Alternatively, if there is no system governing the
observed dynamics, one may want to model the dynamics instead. In such a case, this notion
least squares problem (1.29) by taking advantage of the well-developed regression analysis
methods developed to solve the problem of the least absolute shrinkage and selection operator
(LASSO) [48] [22], defined as (1.29). For our purposes we use the Lasso function in python’s
scikit-learn package. This method for solving (1.29) involves the tuning of the parameter
α which controls the relative importance of the 1-norm of the solution. The effect of this
ability to recover a solution with least squares. To assist in the solution we impose the `1
Now that we’ve presented the necessary background information on active subspaces
and the data-driven recovery of dynamical systems techniques of DMD and SINDy, we
Active subspaces enable dimension reduction and sensitivity analysis for computational
models. As is, this analysis does not explicitly extend to functions that are dependent on both
parameter inputs and time. Extending active subspaces to time-dependent systems enables
Let us consider a specific time-dependent quantity of interest (QoI) of these states, dependent
Note that f can be a nonlinear function of the states, and that these states can be nonlinearly
The main idea Suppose we want to identify the linear combinations of the pa-
rameters that change the QoI the most on average, and we want to know how these linear
combinations change in time. Doing so would identify the dynamic active subspaces (DyAS)
and would give quantifiable insight into the parameter sensitivity of time-dependent sys-
tems. We propose and discuss the following three approaches to identifying dynamic active
subspaces.
dependent active subspaces is to independently compute them at each step in time. Specif-
ically, we could find the QoI at time t0 , compute the active subspace, find the QoI at time
t1 , compute the active subspace, find the QoI at time t2 , compute the active subspace, and
repeat this until we reach final time tn . Such analysis and identification has been done in
[33] and [14]. This approach to finding time-dependent active subspaces, however, is ex-
tremely expensive, especially in systems with many parameters. This computation can be
reduced by only computing active subspaces at “interesting” points in time (i.e., stages of
a disease, transient dynamics, final time, time of maximum concentration). However, what
is considered an “interesting” point in time for nominal parameter values might not be for
Analytically defining DyAS Rather than computing the active subspace inde-
pendently throughout time, consider the possibility of defining a time-dependent C(t) with
eigenvalues λi (t) and a governing dynamical system for wi (t). The computation involved in
computing the active subspace at each time step could be bypassed by simply integrating the
dynamical system for wi (t). Classical analysis for dynamical systems could then give further
insight into these dynamic active subspaces and, in turn, the physical systems from which
they are derived. In Section 2.2 we present three cases for which we can use this approach
to define DyAS.
23
Approximating the DyAS In cases for which there does not exist an analytical
form for the DyAS or for which there is no analytical form for the dynamical system, we
can consider approximating the dynamics of the active subspace. Below we propose the use
of DMD and SINDy (see Section 1.5 for details) to approximate the one-dimensional DyAS
Just as we can describe the dynamics of the state space of a physical system with
an ordinary differential equation, we propose similarly describing the dynamics of the state
space of an eigenvector. An ODE describing the first eigenvector of C over time would give
an analytical form of the time derivative for the one-dimensional dynamic active subspace.
Below we prove the existence of two such dynamical systems for dynamic active subspaces.
Consider a time-dependent scalar output of the dynamical system, f (t, p) = φT u(t, p), for
φ ∈ Rq . For example, where f (t, p) is the ith state of the dynamical system, φ is the ith
column of the identity matrix. Assume that the initial conditions of the dynamical system
(2.3) have joint density ρ(p). Thus the time-dependent scalar output of the parameterized
Note that Rank(φφT ) = 1 and thus Rank(C(t)) = 1. Therefore C(t) has only one
(eAt )T φ
v1 (t) = . (2.11)
||(eAt )T φ||2
This eigenvector is only unique up to a constant. Thus we seek a governing dynamical system
Let w(t) = (eAt )T φ. Then the dynamical system for the unnormalized first eigenvector
For invertible A the solution to this dynamical system is given by Duhamel’s Principle [32],
Consider a time-dependent scalar output of the dynamical system, f (t, p) = φT u(t, p), for
φ ∈ Rn . Assume that p has a joint density ρ(p). Thus the time-dependent scalar output of
Note that Rank(φφT ) = 1 and thus Rank(C(t)) = 1. Therefore C(t) has only one
((eAt )T − I)A−T φ
v1 (t) = . (2.21)
||((eAt )T − I)A−T φ||2
This eigenvector is only unique up to a constant. Thus we seek a governing dynamical system
Let
Then the dynamical system of the unnormalized first eigenvector of C(t) is described by the
Consider A = W ΛW −1 and recall the definition of the matrix exponential [32], eAt =
W eΛt W −1 . The expression for w(t) given by Duhamel’s principle is equivalent to (2.22),
= ((eAt )T − I)A−T φ.
The analyses of sections 2.2.1 and 2.2.2 can be extended for the case where the pa-
rameters of an inhomogeneous linear dynamical system (2.13) are the initial conditions η.
Thus we prove the existence for the analytic form of the dynamic active subspace for three
cases. For these specific problems, the cost of computing the active subspace at each time
point of interest is avoided by simply evaluating a linear dynamical system which gives ex-
actly the one-dimensional active subspace of the time-dependent QoI. The existence of the
governing equations of the dynamic active subspace for these cases is directly comparable
to the existence of an analytical static active subspace for a QoI with linear dependence on
its parameters. As we saw in the above analyses, when the gradient of f is not parameter-
dependent, the outer product in (1.1) can be pulled outside of the integral, and the resulting
However exciting as this may be (we think it’s pretty cool), we must acknowledge that
these analytic forms exist (so far!) for three very specific cases. We would like to have a
form of the dynamic active subspace for any time dependent, differentiable QoI. Thus in the
following section we propose a method for finding an approximate dynamic active subspace.
27
In this section we propose the use of DMD (see Section 1.5.1) and SINDy (see Section
The methods of DMD and SINDy to recover and approximate dynamics rely on a
collection of snapshots of the system. In the case of dynamic active subspaces, these snap-
shots will be components of the first eigenvector of C in time. We propose collecting these
snapshots and using DMD and SINDy to approximate the dynamic active subspace between
(2) Augment the dynamical system with states for the partial derivatives to compute
(4) For each point pi in the set of integration nodes, integrate the dynamical system to
(5) For each ti , estimate C(ti ) and compute the first eigenvector w(ti ).
(a) If using SINDy to approximate the dynamics, estimate C(ti ) and w(ti ) at n
h. Note that h is the time step used to approximate the time derivative for the
(6) Construct the data matrices with the first eigenvector w at each ti
For DMD:
0 m×(n−1)
(b) Construct XDM D ∈ R with the last (n − 1) eigenvectors:
0
XDM D =
w(t2 ) w(t 3 ) · · · w(tn−1 ) w(t )
n
For SINDy:
(a) Construct XSIN Dy ∈ Rn×m with the first eigenvector w at [t1 , t2 , t3 , · · · , tn−1 , tn ]:
T
w(t1 )
w(t )T
2
..
XSIN Dy =
.
w(tn−1 )T
w(tn )T
(7) Let T ∈ Rn be the vector of times at which to predict the dynamic active subspace.
Below we demonstrate this methodology to approximate the dynamic active subspace for
two dynamical systems: (i) a linear harmonic oscillator and (ii) an enzyme kinetics system.
The initial conditions, p ∈ R3 are uncertain between the lower and upper bounds,
At nominal initial conditions, pnominal = [ 1.2, 5.0, 10.0 ]T , the trajectory of the linear har-
Figure 2.1: The trajectory of the x, y, and z components of the linear harmonic oscillator
Note that for this system with the initial conditions as parameters, the governing system of
the dynamic active subspace is given by (2.12). For the sake of demonstrating the methodol-
ogy for this simple example we ignore this until it comes time to evaluate our approximation.
The gradient of this function with respect to p can be taken exactly from (2.26) for all
time,
T
∇f (t, p) = eA t φ. (2.27)
We will use this evaluation of the gradient to approximate C with Algorithm 1 and Monte
Carlo sampling. Although we would normally conduct a convergence study on the number
of samples M for our Monte Carlo approximation, it is not necessary in this case because we
know that the gradients (2.27) are independent of our samples pi . To have enough points
to construct the shadow plot in Figure 2.2, we let N = 50. We assume that p is uniformly
distributed with upper and lower bounds 20% above and below the nominal values (see Table
2.1).
31
Parameter Lower Bound Nominal Value Upper Bound
Table 2.1: The initial conditions p in (2.26) have a uniform joint density with lower and
To examine and explain the different aspects of the active subspace, we first compute
C at time t = 5.
Figure 2.2: The eigenvalues (left) and first eigenvector (middle) of C at time t = 5. Note
that there is only one nonzero eigenvalue. The shadow plot (right) at this time shows the
The rightmost plot of Figure 2.2 shows the shadow plot for the active subspace at time
Given the methodology from Section 2.3.1, we seek to form data matrices using snap-
shots of the eigenvector w at n points in time. We compute the first eigenvector from
t0 = 0.001 to tF = 10.0 at 8000 points in time by independently computing the active sub-
space at each time step. The time-series of this eigenvector will be used as the “truth” to
32
which we will compare all future approximations. Figure 2.3 shows an interesting aspect
of the computation. Using numpy’s eig function, the eigenvectors of C are normalized at
each time step. Using snapshots from the left of Figure 2.3 will be problematic if aiming
covering the dynamics. Following analysis from Section 2.2 we observe that multiplying the
computed time series by the square root of the first computed eigenvalue of C yields the
same dynamics given by the dynamical system (2.12). Thus in the following experiments we
use this unnormalized form of the eigenvector to construct the data matrices for DMD and
SINDy.
Important to note is that the sign of an eigenvector is not unique (an eigenvector is
still an eigenvector if multiplied by negative one). Thus when computing eigenvectors for
multiple points in time, we must normalize the sign of the eigenvector at each time step. We
do so by checking,
If (2.28) is true, w(ti+1 ) is updated such that w(ti+1 ) = −w(ti+1 ). This ensures that any
Figure 2.3: (Left) the first eigenvector of C computed for the function (2.26) (Right) the
p
first eigenvector after being multiplied by λ0 (t).
33
The methods of DMD and SINDy depend on data matrices formed with snapshots
of the dynamical system in time. We test the accuracy of these methods in recovering the
dynamic active subspace as a function of (i) the proportion of time the snapshots represent,
i.e. snapshots from the first two out of ten seconds of a system would represent 0.2 of the
In the following experiments we construct the data matrices XDM D and XSIN Dy with
snapshots from proportions of the data from 0.1 to 1.0. For each proportion we also construct
the data matrices with 10, 20, 30, 40, 50, 75, 100, 200, 500, 1000, 2000, and 5000 snapshots.
degree 3. In Figure 2.4, the error for each experiment is computed as the maximum absolute
error between the approximated dynamic active subspace, and the “true” active subspace
Figure 2.4: The maximum absolute error of the dynamic active subspace approximated
by (left) DMD and (right) SINDy. The data matrices for each algorithm have been created
One of the reasons DMD and SINDy poorly approximate the dynamic active subspace
when given few snapshots of data is due to the non-uniqueness of eigenvectors discussed
above. For a small time step between computed eigenvectors (i.e., there are many snapshots),
the normalization of signs by checking (2.28) ensures that the nearby eigenvectors have
the same sign. However, when the time step between computed eigenvectors is large, the
trajectory may have evolved significantly enough that (2.28) will not properly indicate what
the sign “should” be. In such cases, the DMD and SINDy data matrices will be given
dynamical system. The simple enzyme kinetics system given by (2.29) describes the chemical
reaction of the catalysis of enzymes [26], and the reaction is written as,
k+ k c
−−
S+E )−*− ES −− → P + E. (2.29)
− k
35
The reaction (2.29) is described by the system of nonlinear ordinary differential equations
in (2.33).
d[S]
= −k + [E][S] + k − [ES] (2.30)
dt
d[E]
= −k + [E][S] + (k − + k c )[ES] (2.31)
dt
d[P ]
= k c [ES] (2.32)
dt
d[ES]
= k + [E][S] − (k − + k c )[ES] (2.33)
dt
We assume that the parameters, p = [k + , k − , k c ]T have uniform joint density with lower and
upper bounds 50% below and above the nominal values (see Table 2.2).
Table 2.2: The parameters p have uniform joint density with upper and lower bounds 50%
we approximate the integral in (1.1) with the tensor product Gauss-Legendre quadrature
scheme. We conduct a numerical experiment to choose the number of quadrature points per
Figure 2.5: The convergence of the approximation to the integral in (1.1) for the number
Gradients We augment the system of ODEs in (2.33) to include the Gronwall Gra-
We compare the solutions for the partial derivative of f with respect to the parameters
numerically integrated from (2.35) with a second-order finite difference method for decreasing
h as well as over time for a fixed h = 10−3 (Figure 2.6). We compare the wall clock time for
Figure 2.6: (Left) the average absolute error over time between Gronwall Gradients and the
second-order finite difference approximation for the gradient of f with respect to parameters.
(Right) the absolute error over time between Gronwall Gradients and the second-order finite
difference approximation for the gradient of f with respect to parameters for h = 10−3 .
Figure 2.7: The wallclock time in computing M second-order finite difference approxima-
tions to the gradient versus integrating the augmented system in (2.35), averaged over 15
runs.
Given that, in this case, Gronwall Gradients are less expensive and comparably accurate
to a second order finite difference method, we integrate the augmented system (2.35) to
time by independently computing the active subspace at each time step. The time-series of
this eigenvector (with signs normalized according to (2.28)) will be used as the “truth” to
which we will compare all future approximations. For the following experiments we begin our
approximations from t0 = 0.001 because the initial condition of ∇f for Gronwall Gradients
is ∇f = 0.
We again test the accuracy of DMD and SINDy in recovering the dynamic active
subspace as a function of (i) the proportion of time the snapshots represent, and (ii) the
In the following experiments we construct the data matrices XDM D and XSIN Dy with
snapshots from proportions of the data from 0.1 to 1.0. For each proportion we construct
the data matrices from 10, 20, 30, 40, 50, 75, 100, 200, 500, 1000, 2000, and 5000 snapshots.
To determine the influence of α in the accuracy of the SINDy approximation of the dynamic
active subspace, we test the absolute relative error and wall clock time for an approximation
with 30 snapshots from 60% of time for varying the α parameter in scikit-learn’s lasso
function.
Figure 2.8: The influence of varying α on the (left) cost and (right) absolute error of the
We determine that for these experiments α = 10−7 provides the most accurate approx-
imation with not significantly more cost (see Figure 2.8). Thus in the implementation of
which to approximate Ẋ. In Figure 2.9, the error for each experiment is computed as the
maximum absolute error between the approximated dynamic active subspace, and the “true”
active subspace computed at all 8000 time steps. The library matrix Θ is constructed with
combinations of up to degree 3.
Figure 2.9: The maximum absolute error of the dynamic active subspace approximated
by (left) DMD and (right) SINDy. The data matrices for each algorithm have been created
subspace with 10 samples from the first 50% of time (Figure 2.10). Recall that although there
are 10 samples in the XSIN Dy data matrix, there were 20 active subspace computations.
41
Figure 2.10: The (left) approximated dynamic active subspace and its (right) absolute
error using (top) DMD and (bottom) SINDy. Solutions have been approximated with 10
2.4 DISCUSSION
Our methodology has shown to be effective in recovering the dynamic active subspace
for (i) the linear case for which we also have the analytical form, and (ii) the nonlinear
In the linear case, Figure 2.4 shows that the dynamic active subspace can be recovered
with DMD to within 10−8 from only ten samples from the first 10% of time. Conversely
we see that SINDy is only ever able to recover the dynamic active subspace to within 10−2 .
Given that it is necessary to compute the active subspace at twice as many points in time for
the SINDy approximation, it is clear that, for this case, DMD recovers the dynamic active
subspace for less cost and with more accuracy than SINDy.
42
In the nonlinear enzyme kinetics system Figure 2.9 shows that the dynamic active
subspace can be recovered with SINDy to within 10−3 with 40 samples (80 including those
used to approximate the derivative) from 60% of the total time. Conversely, we see that
the approximation given by DMD is bounded below by 10−1 . This example given in Figure
2.10 allows us to visualize what the error in Figure 2.9 means in the context of approxi-
mating the dynamic active subspace. We see that, qualitatively, both the DMD and SINDy
approximations recover the dynamic active subspace fairly well. However, we see that the
approximation found with SINDy overall captures the dynamic active subspace better than
DMD.
DMD so accurately recovered the dynamic active subspace for the linear case due
to the fact that the true dynamic active subspace is indeed a linear dynamical system.
Furthermore, DMD accurately recovered the dynamic active subspace for the snapshots that
were in some sense post-processed when multiplied by the square root of the time-dependent
first eigenvalue. Without this scalar multiplier, the snapshots in DMD would not be from
a linear dynamical system. We only knew that this was, in some sense, the correct scalar
multiplier because of the analysis in Section 2.2. In all other cases, this insight into what
the dynamic active subspace should be or look like does not exist.
SINDy, however, has accurately recovered the dynamic active subspace for the non-
linear case. In this case, it was not necessary to have any prior knowledge of the dynamic
active subspace in order for the methodology to recover the dynamics both qualitatively and
quantitatively well from very little data. In future applications of this work to analyze and
understand other systems, it is recommended that one use the SINDy approach to estimate
the dynamic active subspace. In these applications it has shown to provide approximations
to the one-dimensional dynamic active subspace from little computation and to within 10−3 .
To fully assess the usefulness of this methodology on other systems, however, we must either
find a mathematical guarantee of its ability to approximate the dynamics, or test it on bigger
In this thesis we have presented a novel approach for uncertainty quantification and
sensitivity analysis in time-dependent systems. In the above sections we have discussed (i)
the ideas and algorithms in finding the active subspace of a function, (ii) the use of Gronwall
Gradients in approximating the gradient of a time-dependent function, and (iii) the methods
of dynamic mode decomposition and sparse identification for nonlinear dynamical systems
We have proven that, in the case where the QoI is a function of a linear dynamical
tem dependent on the inhomogeneous part of the system, there exists a dynamical system
the dynamic active subspace given snapshots of the active subspace. We have approximated
the dynamic active subspace using this methodology for (i) a linear harmonic oscillator
and (ii) the enzyme kinetics system. We have quantitatively and qualitatively compared
the recovered dynamic active subspace obtained with DMD and SINDy. We quantitatively
assessed the tradeoff between the proportion of time represented in the data snapshots, and
the number of snapshots, used in DMD and SINDy, and showed that for the linear harmonic
oscillator, DMD was able to recover the dynamics with more accuracy and for less cost.
Conversely, in the enzyme kinetics example, SINDy was able to recover the dynamics with
The insight given from a time-dependent sensitivity metric allows for an in-depth un-
are often approximated with uncertainty resulting from measurement or observation error or
the linear combination of parameters that will change the function the most on average, and
44
There is great potential for future work in line with this research. By properly ensuring
orthogonality between states, the methodology presented in this work could be extended to
recover the k-dimensional dynamic active subspace given by the first k eigenvectors. It is pos-
sible that the analytical form for the dynamical system of the dynamic active subspace exists
for other time-dependent systems; further research in the overlap with differential geome-
try, classical dynamical systems analysis, and the study of manifolds could yield promising
The concept of using data-driven recovery techniques (such as DMD and SINDy) to
approximate a time-dependent sensitivity metric given snapshots of the sensitivity metric is,
to the best of our knowledge, novel. Furthermore, the methodology presented in this work
can be extended to develop time-dependent insight for other global sensitivity metrics. This
intellectual and technical contribution of the novel ideas and methodology developed in this
work lend to the advancement of uncertainty quantification and sensitivity analysis for time-
[5] E. Candes and J. Romberg, l1-magic: A collection of MATLAB routines for solving
the convex optimization programs central to compressive sampling, www. acm. caltech.
edu/l1magic, (2006).
[12] P. G. Constantine, Z. del Rosario, and G. Iaccarino, Many physical laws are
ridge functions, arXiv:1605.07974 [math.NA], (2016), pp. 1–20.
[13] P. G. Constantine and P. Diaz, Global sensitivity metrics from active subspaces,
Reliability Engineering & System Safety, 162 (2017), pp. 1–13.
[16] J. Crutchfield and B. McNamara, Equations of motion from a data series, Com-
plex Systems, 1 (1987), pp. 417–452.
[19] M. Elad and M. Aharon, Image denoising via sparse and redundant representations
over learned dictionaries, IEEE Transactions on Image Processing, (2006), pp. 3736–
3745.
[21] T. Gronwall, Note on the derivatives with respect to a parameter of the solutions of
a system of differential equations, Annals of Mathematics, 20 (1919), pp. 292–296.
[23] B. Ioos and P. Lemaı̂tre, A review on global sensitivity analysis methods, 2015.
[24] I. Jolliffe, Principal Component Analysis, Springer Series in Statistics, New York,
1986.
[27] J. Kim and H. Park, Fast active-set-type algorithms for l1- regularized linear
regression, Proceedings of the International Conference on Artificial Intelligence and
Statistics, 13 (2010), pp. 397–404.
[32] R. LeVeque, Finite Difference Methods for Ordinary and Partial Differential
Equations: Steady-State and Time-Dependent Problems (Classics in Applied
Mathematics), SIAM, Society for Industrial and Applied Mathematics, 2007.
[33] T. Loudon and S. Pankavich, Mathematical analysis and dynamic active subspaces
for a long term model of HIV, Mathematical Biosciences & Engineering, 14 (2016),
pp. 709–733.
[36] M. Morris, Factorial sampling plans for preliminary computational experiments, Tech-
nometrics, 33 (1991), pp. 161–174.
[38] S. Pankavich and D. Shutt, An in-host model of HIV incorporating latent infection
and viral mutation, arXiv:1508.07616 [q-bio.PE], (2015).
[39] K. Pearson, Notes on regression and inheritance in the case of two parents, Proceed-
ings of the Royal Society of London, 58 (1895), pp. 240–242.
48
[40] R. Sachs, L. Hlatky, and P. Hahnfeldt, Simple ode models of tumor growth
and anti-angiogenic or radiation treatment, Mathematical and Computer Modeling, 33
(2001), pp. 1297–1305.
[44] P. Schmid, Dynamic mode decomposition of numerical and experimental data, Journal
of Fluid Mechanics, 656 (2010), pp. 5–28.
[45] M. Schmidt and H. Lipson, Distilling free-form natural laws from experimental data,
Science, 324 (2009), pp. 81–85.
[48] R. Tibshirani, Regression shrinkage and selection via the lasso, Journal of the Royal
Statistical Society, (1996), pp. 267–288.
[49] G. Tran and R. Ward, Exact recovery of chaotic systems from highly corrupted
data, arXiv:1607.01067 [math.DS], (2016).
[51] T. Turányi, Sensitivity analysis of complex kinetic systems. tools and applications,
Journal of Mathematical Chemistry, 5 (1990), pp. 203–248.