DeepMartNet: Solving High-Dimensional PDEs
DeepMartNet: Solving High-Dimensional PDEs
Abstract. In this paper, we propose DeepMartNet - a Martingale based deep neural network
learning method for solving Dirichlet boundary value problems (BVPs) and eigenvalue problems for
elliptic partial differential equations (PDEs) in high dimensions or domains with complex geometries.
The method is based on Varadhan’s Martingale problem formulation for the BVPs/eigenvalue problems
where a loss function enforcing the Martingale property for the PDE solution is used for an efficient
optimization by sampling the stochastic processes associated with corresponding elliptic operators.
High dimensional numerical results for BVPs of the linear and nonlinear Poisson-Boltzmann equation
and eigenvalue problems of the Laplace equation and a Fokker-Planck equation demonstrate the
capability of the proposed DeepMartNet learning method in solving high dimensional PDE problems.
Key words. Martingale problem, Deep neural network, boundary value problems, Eigenvalue
problems.
∗ December 20, 2023, first version appeared in arxiv preprint arXiv:2311.09456. 2023 Nov 15.
† Corresponding author, Department of Mathematics, Southern Methodist University, Dallas, TX
75275 ([email protected])
‡ Department of Mathematics, Southern Methodist University, Dallas, TX 75275.
§ Department of Mathematics, Southern Methodist University, Dallas, TX 75275.
1
In addition to the Pardoux-Peng BSDE approach by linking stochastic processes
and the solution of PDEs, another powerful probabilistic method is the Varadhan’s
Martingale problem approach [23, 15], which were used to derive a probabilistic weak
form for the PDE’s solution with a specific Martingale related to the PDE solution
through the Ito formulas [17]. The equivalence of the classic weak solution of boundary
value problems (BVPs) of elliptic PDEs using bilinear forms and the probabilistic one
were established for the Schrodinger equation for the Neumann BVP [10, 11], and
then, for the Robin BVP [18]. This Martingale weak form for the PDEs solution and
in fact also for a wide class of stochastic control problem [4, 3] provides a new venue
to tackle high dimensional PDE and stochastic control problems [2], and this paper
focuses on the case of high dimensional PDEs only. The Martingale formulation of the
PDE solution is a result of Ito formula and the Martingale nature of Ito integrals. As
a simple conclusion from the broader Martingale property, the Feynman-Kac formula
provides one point solution of the PDE using expectation of the underlying SDE
paths originating from that point. The Martingale based DNN to be studied in this
paper, termed DeepMartNet [2], is trained based on a loss function enforcing the
conditional expectation definition of the Martingale. It will be shown with extensive
numerical tests that using the same set of SDE paths originating from one single
point, the DeepMartNet can in fact provide approximation to solutions of the BVPs
and eigenvalue problems of elliptic PDEs over the whole solution doamin in high
dimensions. In this sense, the DeepMartNet is able to extract more information for the
PDE solutions from the SDE paths originating from one point than the (pre-machine
learning) traditional use of the one point solution Feynman-Kac formula.
The rest of this paper is organized as follows. In section 2, a brief review of
existing SDE-based DNN methods for solving PDEs is given and Section 3 will present
the Martingale problem formulation for the BVP problem of a general elliptic PDE
for the case of third kind Robin boundary condition, which includes both Dirichlet
and Neumann BVPs as special limiting cases. Section 4 will present the Martingale
based DeepMartNet for solving high dimensional PDE problems. Numerical results
for the Dirichlet BVPs and eigenvalue problems will be present in Section 5. The
implementation of the DeepMartNet for the Neumann and Robin boundary conditions
will be addressed in a follow-up paper, which will involve reflecting diffusion processes
in finite domains and the computation of local times of the processes. Section 6 will
present conclusions and future work.
2. A review of SDE based DNNs for solving PDEs. To set the background
for the Martingale based DNNs, we will first briefly review some existing DNN based
on diffusion paths from SDEs.
Let us first consider a terminal value problem for quasi-linear elliptic PDEs
(2.1) ∂t u + Lu = ϕ, x ∈ Rd ,
with a terminal condition u(T, x) = g(x), where the differential operator L is given as
1 1
(2.2) L = µ⊤ ∇ + T r(σσ ⊤ ∇∇⊤ ) = µ⊤ ∇ + T r(A∇∇⊤ ),
2 2
and µ = µ(t, x, u, ∇u), σ = σ(t, x, u, ∇u) and the diffusion coefficient matrix
The aim is to find the solution at x, t = 0, u(0, x), and the solution of (2.1) is
2
related to a coupled FBSDE [19]
DeepBSDE. As the first work of using SDEs to train DNN, the Deep BSDE [7]
trains the network with input X0 = x and output Y0 = u(0, x). Applying the
Euler–Maruyama scheme (EM) to the FBSDE (2.4) and (2.5), respectively, we have
The missing Zn+1 at tn+1 will be then approximated by a neural network (NN)
with parameters θn
Loss function: with an ensemble average approximation, the loss function is defined as
2
(2.10) Lossbsde (Y0 , θ) = E ∥u(T, XT ) − g(XT )∥ ,
where
u(T, XT ) = YN .
Trainable parameters are {Y0 , θn , n = 1, · · · , N }.
FBSNN. A forward backward neural network (FBSNNs) proposed in [21] uses the
mismatch between two stochastic processes to build the loss function for the DNN,
which aims to train a DNN uθ (x, t) in the whole domain. The following is an improved
version of the approach in [24].
• Markov chain one. Starting with X0 = x , Y0 = uθ (x, 0),
(2.14) LΨ = λΨ.
The discretized backward in time evolution (with time step ∆t of the backward
parabolic equation) mimics the power iteration of the semi-group operator e−n∆tL ,
which will converge to the lowest eigenfunction for L as in a power method. The loss
function is set to be ||PTλ Ψ − Ψ||2 , while the evolution of PDE solution is done by
two SDEs, instead of solving the parabolic equation directly,
The original algorithm in [9] uses a separate DNN to approximate the gradient of
Ψ(x). And the loss function for the DNN Ψθ (x) approximating the eigen-function and
the eigenvalue is then defined by
where f (x, u) = λu, g = 0 for the case of an eigenvalue problem with an eigenvalue λ
and eigenfunction u(x), and the boundary operator Γ could be Dirichlet, Neumann,
or Robin type or a decay condition will be given at ∞ if D = Rd . The following
shorthand will be used in the rest of the paper
∂P (y, t; x, s)
(3.5) = L∗y P (y, t; x, s),
∂t
where the adjoint operator
1
(3.6) L∗y = −∇⊤ ⊤
y µ + T r(∇y ∇y A).
2
(Robin Problem) Let us consider the BVP of (3.1) with a Robin type boundary
condition
(3.7) Γ(u) = γ ⊤ · ∇u + cu = g,
With the Martingale problem approach [23], the Martingale problem for the BVP
with the third kind boundary condition (3.7) can formulated using a reflecting diffusion
process Xref based on the process X (3.4) through the following Skorohod problem.
with the understanding that it now stands for a reflecting diffusion process within the
closed domain D̄. Using the Ito formula [17] for the semi-martingale X(t) (namely,
Xref (t)) [10] [18],
d d d
X ∂u 1 XX ∂2u
du(X(t)) = (X(t))dXi (t) + aij (X(t)) (X(t))dt,
i=1
∂xi 2 i=1 j=1 ∂xi ∂xj
with the notation of the generator L, we have for the solution u(x) of (3.1) the following
differential
du(X(t)) =
d X
d
X ∂u
Lu(X(t))dt − γ ⊺ · ∇u(u(Xt ))L(dt) + σij (X(t))dBi (t)
i=1 j=1
∂xi
= [f (X(t), u(X(t)) − V (X(t), u(X(t)), ∇u(X(t)))] dt − [g(X(t)) − cu(X(t))] L(dt)
d X
d
X ∂u
+ σij (X(t))dBi (t),
i=1 j=1
∂xi
due to the Martingale nature of the Ito integral at the end of the equation above.
the underlying diffusion process is the original diffusion process (3.1), but killed at the
boundary at the first exit time
For the case of a linear PDE, i.e. f (x, u) = f (x), V = 0 , by taking expectation of
(3.17) and letting t → ∞, we will have
resulting in the well-known Feynman-Kac formuula for the Dirichlet boundary value
problem
Z τD
(3.19) u(x) = E[g(XτD )] − E[ f (Xs )ds], x ∈ D,
0
The Martingale problem of the BVPs states the equivalence of Mtu being a
Martingale (i.e. a probabilistic weak form of the BVPs) and the classic weak form:
2
For every test function ϕ(x) ∈ C∂D = {ϕ : ϕ ∈ C 2 (D) ∩ C 1 (D), (γ ′ · ∇ − µ⊺ n) ϕ = 0},
3
∂a
X
γ ′ = γ − α, αj = ∂xi
ij
, we have
i=1
Z Z
∗
u(x)L ϕdx = [f (x, u(x)) − V (x, u(x), ∇u(x))] ϕ(x)dx
D
(3.20) ZD
+ ϕ(x)[g(x) − cu(x)]dsx ,
∂D
where
1
(3.21) L∗ ϕ = T r(∇∇⊺ A)ϕ − div(µϕ).
2
This equivalence has been proven for the Schrodinger operator Lu = 12 ∆u + qu for the
Neumann problem [10] and the Robin problem [18].
4. DeepMartNet - a Martingale based neural network. In this section, we
will propose a DNN method for solving the BVPs and eigenvalue problems for elliptic
PDEs using the equivalence between its Martingale problem formulation and classic
weak form of the PDEs.
For simplicity of our discussion, let us assume that s ≤ t ≤ τD , by the Martingale
property of Mt ≡ Mtu of (3.17), we have
where
Z t
Mt − Ms =u(Xt ) − u(Xs ) − Lu(Xz )dz
s
Z t
(4.5) =u(Xt ) − u(Xs ) − (f (z, u(Xz )) − v(Xz ))dz.
s
if both ti+k , ti ≥ τD .
Remark 4.2. We could define a different generator L by not including µ⊤ ∇ in
(2.2), then the Martingale in (4.9) will be changed to
Z t
(4.9) Mt∗ = u(Xt ) − u(x0 ) − (λu(Xs ) − µ⊤ (Xs )∇u(Xs ) − v(Xs ))ds,
0
(4.10) Ω′ = {ωm }M
m=1 ⊂ Ω
of the Ito process using the Euler-Maruyama scheme with M -realizations of the
(m)
Brownian motions Bi , 0 ≤ m ≤ M ,
(m)
Xi (ωm ) ∼ X(ti , ωm ), 0 ≤ i ≤ N,
where
(m) (m) (m) (m) (m)
(4.11) Xi+1 = Xi + µ(Xi )∆ti +σ(Xi )·∆Bi ,
(m)
(4.12) X0 = x0
We will build the loss function Loss(θ) for neural network uθ (x) approximation
of the BVP solution using the Martingale property (4.4) and the M-realization of the
Ito diffusion (3.4).
For each ti , we randomly take a subset of Ai ⊂ Ω′ with a uniform sampling
(without replacement), corresponding to the mini-batch in computing the stochastic
gradient for the empirical training loss. Assuming that the mini-batch in each Ai
(m) (m)
is large enough such that {Xi+1 } and {Xi }, ωm ∈ Ai sample the distribution
P (ti+1 , ., ti , A), P (ti , ., 0, x0 ) well, respectively, then equation (4.22) gives the following
approximate identity for the solution uθ (Xt ) using the Ai −ensemble average,
As E[Mtui+k θ
− Mtuiθ ] ≈ 0, for a randomly selected Ai ∈ Ω′ = Fti ( mini-batches),
we can set
N −1
1 X
Lossmart (θ) = (Mtui+k
θ uθ
∧τD − Mti ∧τD )
2
N i=0
N −1 |Ai |
1 X I(ti ≤ τD ) X (m) (m)
= uθ (Xi+k ) − uθ (Xi )−
N i=0 |Ai |2 m=1
k
!2
(m) (m) (m)
X
(4.13) ∆t ωl (f (Xi+l , uθ (Xi+l )) − vθ (Xi+l )) ,
l=0
where vθ (x) is defined similarly as in (3.3). Refer to Remark 4.4 for the discussion of
the size of the mini-batch |Ai |.
Now, we define the total loss for the boundary value problem as
k
!2
(m) (m)
X
(4.17) ∆t ωl (λuθ (Xi+l ) − vθ (Xi+l ))
l=0
and in the case of a bounded domain, the boundary loss Lossbdry (θ) will be added
for the homogeneous boundary condition g = 0, i.e., Lossbdry (θ) = ||uθ (x)||22 . For
the decay condition at infinite, a molifier will be used to enforce explicitly the decay
condition there (see (5.22) ).
Also, in order to prevent the DNN eigenfunction going to a zero solution, we
introduce a simple normalization term using lp (p=1, 2) norm of the solution at some
randomly selected location
m
!2
1 X p
(4.18) Lossnormal (θ) = |uθ (xi )| − c ,
m i=1
which then provides a native mechanism for the mini-batch in the SGD. Therefore,
the Martingale based DNN is an ideal fit for deep learning of high-dimensional PDEs.
Remark 4.4. (Size Rof mini-batch Ai ) The loss function of the DeepMartNet is
based on the fact that Ai (Mtu − Msu ) P (dω) = 0 for the exact solution u(x), where
the expectation will be computed by an ensemble average of selected paths from the
M -paths. In theory, using the transitional probability, using left end point quadrature
for the integral over [s,t] in (4.5) with |t − s| << 1, (4.4) can be rewritten as
(m)
where B = X−1 s (Ai ) and Xt , 1 ≤ m ≤ M are the M-sample paths of the diffusion
process Xt .
Therefore, the size of mini-batch |Ai | should be large enough to give an accurate
sampling of the continuous distribution P (t, y, s, A), y ∈ Rd and P (s, z, 0, x0 ), z ∈ B,
so in our simulation we select a sufficient large M and set the size of mini-batch Ai in
the following range,
where M is the total number of paths used and m1 > m2 are hyper-parameters of the
training.
11
The set Ai can be the same for each 0 ≤ i ≤ N − 1 per epoch or can be randomly
selected as the subset of Ω′ depending how much stochasticity is put into the calculation
of the stochastic gradient in the SGD optimizations.
For a low memory implementation of the DeepMartNet, at any time we can just
generate enough |Ai | number of paths to be used for some epochs of training, and
then regenerate them again without first generating a large number of paths upfront.
5. Numerical Results. The numerical parameters for the DeepMartNet consist
of
• M - number of total paths
• T - terminal time of the paths
• N - number of time steps over [0, t]
• ∆t = T /N
• Mb = |Ai | size of mini-batches of paths selected according to (4.23).
• Size of networks
The training is carried out on a Nvidia Superpod one GPU node A100. The Optmizer
is Adamax [16].
5.1. Dirichet BVPs of the Poisson-Boltzmann equation . We will first
apply the DeepMartNet to solve the Dirichlet BVP of the Poisson-Boltzmann equation
(PBE) arising from solvation of biomolecules in inoic solvents [1].
(
∆u(x) + cu(x) = f (x), x ∈ D
(5.1)
u(x) = g(x), x ∈ ∂D
In this case, the generator for the stochastic process is L = 12 ∆, so the corre-
sponding diffusion is simply the Brownian motion B(t). For the M − Brownian paths
B(j) , j = 1, · · · , M originating from x0 , the Martingale loss (4.13) becomes
N −1 |Ai |
1 X 1 X (j) (j)
Lossmart (θ) := uθ (Bti+1 ) − uθ (Bti )
N i=0 |Ai |2 j=1
1 2
(j) (j) (j)
(5.3) − f (Bti ) − cu(Bti ) I(ti ≤ τD )∆t .
2
Meanwhile, we could use the Feynman-Kac formula [17] to compute the solution at
the point x0 by
N −1
M
!
1 X
(j)
cτD 1 X (j) cti (j)
(5.4) u(x0 ) ≈ g BτD e 2 + f Bti e 2 I ti ≤ τD ∆t ,
M j=1 2 i=0
and also define a point solution loss (or an integral identity for the solution for PDE
with quasi-linear term V and f in (3.1) ), termed Feynman-Kac loss, which is added
to the total loss (4.14)
(5.5) LossF-K (θ) = (uθ (x0 ) − u(x0 ))2 ,
12
and a boundary loss is approximated as
Nbdry
1 X
(5.6) Lossbdry (θ) = ||uθ − g||22 ≈ |uθ (xk ) − g(xk )|2 ,
Nbdry
k=1
where uniformly sampled boundary points xk , 1 ≤ k ≤ Nbd are used to compute the
boundary integral. Therefore, the total loss for the boundary value problem of the PB
equation is
(5.7) Lossbvp (θ) = Lossmart (θ) + αF-K LossF-K (θ) + αbdry Lossbdry (θ),
where the penalty parameter αF-K ranges from 10 to 1000 and αbdry ranges from 1000
to 10,000. An Adamax optimizer with learning rate 0.05 is applied for training and
αF-K = 10, αbdry = 103 are taken for the following numerical tests.
• Test 1: PBE in a d=20 dimensional cube [−1, 1]d . In this test, we solve
the PBE in a 20 dimensional cube with the Dirichlet boundary condition
given by the exact solution (5.2). The total number of paths is M = 100, 000
starting from origin, and mini-batch size Mb = |Ai | = 1000; ∆t = 0.01 and
T=9. A fully connected network with layers (20, 64, 32, 1) with first hidden
layer Tanh and second hidden layer GeLU activation function is used. A
mini-batch of Nbdry = 2000 points xk are uniformly sampled points on the
boundary for every epoch of training.
Fig. 5.1 show the learned solution along diagonal of the cube (top left) as well
as along the first coordinate (top right) and the history of the loss (bottom
left) and relative error L2 (computed by Monte Carlo sample) (bottom right).
The training takes about less than 5 minutes.
Fig. 5.1: DeepMartNet solution of PBE in D = [−1, 1]d , d = 20. (Upper left): true
and predicted value of u along the diagonal of the unite cube; (Upper right): true and
predicted value of u along the first coordinate axis of Rd . (Lower left): The loss L
history; (lower right): The history of relative error L2 over the cube.
13
• Test 2. Effect of path starting point x0 . In previous test, the Deep-
MartNet uses all diffusion paths originating from a fixed point x0 to explore
the solution domain. In this test, we consider paths starting from different
x0 = (l, 0, · · · , 0), l = 0.1, 0.3, 0.7 to investigate the effect of different starting
point x0 on the accuracy of the DeepMartNet. A fully connected network
with layers (20, 64, 32, 1) with first hidden layer Tanh and second hidden
layer GeLU activation function will be used as in Test 1. The total number
of paths M = 100, 000, and for each epoch, we randomly choose Mb = 1000
from the paths; the time step of the paths is ∆t = 0.01.
Fig. 5.2 shows that the three different choices of x0 produce similar numerical
results with the same numerical parameters as in Test 1.
Fig. 5.2: DeepMartNet for PBE in D = [−1, 1]d , d = 20 with 3 different starting points
for the paths. From up to down: x0 = (l, 0, · · · , 0), l = 0.1, 0.3, 0.7. (left):u(xe1 ) where
e1 = d−1/2 (1, 1 · · · , 1),(Right): u(xe) where e = (1, 0 · · · , 0).
Moreover, the DeepMartNet can use diffusion paths starting from different
initial position x0 in training the DNN as long as the mini-batch of the paths
Ai for a given epoch corresponds to paths originating from a common initial
point x0 . In Fig. 5.3, we compare the numerical result using 120,000 total
paths starting from x0 = (0.3, 0, · · · , 0, 0) with that with three choices of x0
as in Fig. 5.2 with 40,000 paths for each x0 . In both cases, for each epoch,
we randomly choose mini-batch size Mb = 1000 from the paths; the time step
of the paths is ∆t = 0.01. An Adamax optimizer with learning rate 0.05 is
14
Fig. 5.3: Comparison of DeepMartNet with paths from one starting point and from
3 starting points with same number of total paths in both cases. Upper left: u(xe)
where e = (1, 0 · · · , 0); Upper right: the relative error |u − utrue |/∥utrue ∥∞ for the
upper left plot; Lower left: u(xe1 ) where e1 = d−1/2 (1, 1 · · · , 1); Lower right: the
error ∥utrue ∥∞ for the lower left plot.
applied for training. αF-K = 10, αbdry = 103 . The DeepMartNet produce
similar results for these two cases with same other numerical parameters as in
Test 1.
• Test 3: PBE in a d=100 dimensional unit ball. In this test, we solve
the PBE in a 100 dimensional space, We set T = 0.25 and the time step of the
paths is ∆t = 0.005, and the total number of paths is M = 100, 000, and for
each epoch, we randomly choose Mb = 1000 from the paths. A fully connected
network with layers (100, 128, 32, 1) with first hidden layer Tanh and second
hidden layer GeLU activation function is used.
Fig. 5.4 show the learned solution (Upper left): true and predicted value
of u at the diagonal of Rd , i.e. x versus u(xe) where e = d−1/2 (1, 1 · · · , 1);
(Upper right): true and predicted value of u at the first coordinate axis
of Rd , i.e. x versus u(xe1 ) where e = d−1/2 (1, 0 · · · , 0). ( Middle left):
u(xe′ ) where e′ = 2−1/2 (1, 1, 0, 0, · · · , 0) (Middle right): u(xe′′ ) where e′′ =
10−1/2 (1, · · · , 1, 0, 0, · · · , 0) (Lower left): The loss L versus the number of
ten1′ s
epoch; (lower right): The relative error L2 error versus the number of epoch.
The training takes about less than 30 minutes.
• Test 4: nonlinear PBE in a d=10 dimensional unit ball. For a 1:1
symmetric two species ionic solvent, the electrostatic potential based on the
Debye-Huckel theory [1] is given by a nonlinear PBE before linearization, and
15
Fig. 5.4: DeepMartNet solution of the PBE in D = {x ∈ Rd , |x| < 1}, d=100.
(Upper left): true and predicted value of u at the diagonal of Rd , i.e. x versus
u(xe) where e = d−1/2 (1, 1 · · · , 1); (Upper right): true and predicted value of u at
the first coordinate axis of Rd , i.e. x versus u(xe1 ) where e = d−1/2 (1, 0 · · · , 0). (
Middle left): u(xe′ ) where e′ = 2−1/2 (1, 1, 0, 0, · · · , 0) (Middle right): u(xe′′ ) where
e′′ = 10−1/2 (1, · · · , 1, 0, 0, · · · , 0) (Lower left): The loss L versus the number of epoch;
ten1′ s
(lower right): The relative error L2 error versus the number of epoch.
we will consider the following model nonlinear PBE to test the capability of
the DeepMartNet for solving nonlinear PDEs,
(
−△u + sinh u = f x ∈ D,
(5.8)
u=g x ∈ ∂D,
where
g ≡ αL2 ,
and !
d
X
f = −2αd + sinh α x2i .
i=1
(5.12)
(j)
1 X I ti ≤ τ∂D
N −1
Lossmart (θ) =
N i=0 Mb2
2
Mb ∆t
(j) (j) (j) (j)
X
· uθ Wti+1 − uθ Wti − sinh u(Wti ) − f Wti ,
j=1
2
A DeepMartNet with a fully connected structure (10, 20, 10, 1) and GELU activa-
tion function are used. The total number of paths is M = 100, 000, and for each epoch,
we randomly choose Mb = 1000 from the paths; the time step of the paths is ∆t = 0.05.
A mini-batch of Nbdry = 2000 points xk are uniformly sampled points on the boundary
for every epoch of training. An Adamax optimizer with learning rate 0.05 is applied for
training. αbdry = 103 in (4.19). The constant αnormal = 10, p = 1, m = 1, x1 = 0, c = 1
in (4.18).
To accelerate the convergence of the eigenvalue, We included an extra term into
the loss function (4.19)
where i is the index of the current epoch, which corresponds to the residual of the
equation dλdt = 0 as the training time t → ∞, and ensures that the eigenvalue λ will
converge and the penalty constant αeig is chosen as 2.5 × 10−8 in this case. The
terminal time for the paths is T = 0.6 and time step is t = 0.01. The learning rate
starts at 0.02, and decreased by a factor of 0.995 every 100 epochs. The training takes
less than 20 minutes.
18
Fig. 5.6: A numerical result for eigenvalue problem of Laplace equation in a cube
[−L, L]10 in R10 . Upper left to middle right: true and predicted value of u at xe where
e = k −1/2 (1, · · · , 1, 0, 0, · · · , 0) for k = 1, 2, 5, 10 Lower left: The predicted eigenvalue
k1′ s
λ versus the number of epoch, (orange) horizontal line shows the true eigenvalue; lower
right: The relative error L2 error versus the number of epoch.
5
where α is a constant and α = 11 .
The Martingale loss of (4.17) for this case will be
N −1 |Ai |
1 1 X 1 X (m) (m)
Lossmart = uθ (xi + 1 ) − uθ (xi )
∆t N i=0 |Ai |2 m=1
1 1 1 2
(m) (m)
(5.23) +( ∆W (xi ) − c + λ)uθ (xi )∆t .
2 2 2
M
For the mini-batches in this case, we take the size of each Ai to be between 200
M
and 25 as a random assortment of the M total trajectories.
To speed up the convergence of the eigenvalue and eigenfunctions, we found out
that by taking fractional powers of each individual loss term is helpful, namely, the
total loss is now modified as
p r
(5.24) Losstotal = ((Lossmart ) + αnormal (Lossnormal )q ) .
In our numerical tests, a typical choice is p = 3/8, q = 1, r = 3/4, and for the
normalization Lossnormal , we set αnormal = 50 and c = 30, m = 2, x1 = 0, x2 =
(1, 0, · · · , 0) in (4.18).
Figs. 5.7 and 5.8 show the learned eigenvalues (λ = 5 for d = 5 and λ = 200 for
d = 200) and eigenfunctions, respectively. The k = 3-trapezoidal rule is used in the
Martingale loss Lossmart and the other numerical parameters used are listed as follows
• The total number of paths starting from the origin M = 9, 000, 24, 000 for
d = 5, 200, respectively.
• The number of time steps N = 1350, 1300 and the terminal time T = 9 for
d = 5, 200, respectively.
• the learning rate is 1/150 and is halved every 500 epochs starting at epoch
500 and halved twice at epoch 7500 for stabilization.
• A In the following numerical tests, a fully connected network with layers (d,
6d, 3d, 1) with a Tanh activation function is used for the eigenfunction while
a fully connected network (1, d, 1) with a ReLu9 activation function with a
constant value input is used to represent the eigenvalue.
• An Adamax optimizer is applied for training.
The final relative error in the eigenvalues after 10,000 epochs of training is
1.3 × 10−2 , 6.7 × 10−3 for d = 5, 200, respectively. And the relative L2 error of the
eigenfunction calculated along the diagonal of the domain is 2.6 × 10−2 , 2.9 × 10−2 for
d = 5, 200, respectively. The training takes 25 minutes for the case of d = 200.
20
Fig. 5.7: Eigen-value problem of Fokker-Planck equation in Rd d = 5 for eigenvalue
λ = 5. (Top left) Convergence of eigenvalue, (top right) history of eigenvalue error,
(bottom left) history of loss function, (bottom right) Learn and exact eigenfunction
along the diagonal of the domain.
the value function [2]. Another important area for the application of the DeepMartNet
is to solve low-dimension PDEs in complex geometries as the method uses diffusion
paths to explore the domain and therefore can handle highly complex geometries such
as the interconnects in microchip designs and nano-particles in material sciences and
molecules in biology. Finally, The convergence analysis of the DeepMartNet, especially
to a given eigenvalue, and the choices of mini-batches of diffusion paths and various
hyper-parameters, which can affect the convergence of the learning strongly, in the
loss functions are important issues to be addressed.
Acknowledgement. W. C. would like to thank Elton Hsu and V. Papanicolaou
for the helpful discussion about their work on probabilistic solutions of Neumann and
Robin problems.
REFERENCES
23