Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
67 views23 pages

DeepMartNet: Solving High-Dimensional PDEs

Non-final draft of a paper I contributed to about using Machine Learning with a Martingale loss function.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views23 pages

DeepMartNet: Solving High-Dimensional PDEs

Non-final draft of a paper I contributed to about using Machine Learning with a Martingale loss function.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

DEEPMARTNET - A MARTINGALE BASED DEEP NEURAL

NETWORK LEARNING METHOD FOR DIRICHLET BVPS AND


EIGENVALUE PROBLEMS OF ELLIPTIC PDES IN Rd ∗
WEI CAI† , ANDREW HE‡ , AND DANIEL MARGOLIS§

Abstract. In this paper, we propose DeepMartNet - a Martingale based deep neural network
learning method for solving Dirichlet boundary value problems (BVPs) and eigenvalue problems for
elliptic partial differential equations (PDEs) in high dimensions or domains with complex geometries.
The method is based on Varadhan’s Martingale problem formulation for the BVPs/eigenvalue problems
where a loss function enforcing the Martingale property for the PDE solution is used for an efficient
optimization by sampling the stochastic processes associated with corresponding elliptic operators.
High dimensional numerical results for BVPs of the linear and nonlinear Poisson-Boltzmann equation
and eigenvalue problems of the Laplace equation and a Fokker-Planck equation demonstrate the
capability of the proposed DeepMartNet learning method in solving high dimensional PDE problems.

Key words. Martingale problem, Deep neural network, boundary value problems, Eigenvalue
problems.

AMS subject classifications. 35Q68, 65N35, 65T99,65K10

1. Introduction. Computing eigenvalues and/or eigenfunctions for elliptic op-


erators or solving boundary value problem of partial differential equations (PDEs)
in high dimensions and optimal stochastic control problems are among the key tasks
for many scientific computing applications, e.g., ground states and band structure
calculation in quantum systems, complex biochemical systems, communications and
manufacturing productions, etc. Deep neural network (DNN) has been utilized to
solve high dimensional PDEs and stochastic control problems due to its capability of
approximating high dimensional functions. The first attempt to use DNN to solve high
dimensional quasi-linear parabolic PDE was carried out in [7] using Pardoux-Peng
theory [19] of nonlinear Feynman-Kac formulas, which connect the solution of the
parabolic PDEs with that of backward stochastic differential equations (SDEs) [14].
The loss function for the training of the DeepBSDE uses the terminal condition of
the PDEs. The DeepBSDE framework was also applied to solve stochastic control
problems [8]. The DeepBSDE started a new approach of SDE based DNN approxima-
tions to high dimensional PDEs and stochastic control problems. The work in [21]
[24] extended this idea to use a loss function based on the pathwise comparison of
two stochastic processes, one from the BSDE in the Pardoux-Peng theory and one
from the PDEs solution, this variant of SDE based DNNs has the potential to find
the solution in the whole domain compared with one point solution in the original
DeepBSDE. Recently, a diffusion Monte Carlo DNN method was developed [9] using
the connection between stochastic processes and the solution of elliptic equations and
the backward Kolmogorov equation to build a loss function for eigen-value calculations.
The capability of SDE paths exploring high dimensional spaces make them a good
candidate to be used for DNN learning. It should be mentioned that other approaches
to solving high dimensional PDEs include the Feynmann-Kac formula based Picard
iteration [13] [6] and stochastic dimension gradient descent method [12].

∗ December 20, 2023, first version appeared in arxiv preprint arXiv:2311.09456. 2023 Nov 15.
† Corresponding author, Department of Mathematics, Southern Methodist University, Dallas, TX
75275 ([email protected])
‡ Department of Mathematics, Southern Methodist University, Dallas, TX 75275.
§ Department of Mathematics, Southern Methodist University, Dallas, TX 75275.

1
In addition to the Pardoux-Peng BSDE approach by linking stochastic processes
and the solution of PDEs, another powerful probabilistic method is the Varadhan’s
Martingale problem approach [23, 15], which were used to derive a probabilistic weak
form for the PDE’s solution with a specific Martingale related to the PDE solution
through the Ito formulas [17]. The equivalence of the classic weak solution of boundary
value problems (BVPs) of elliptic PDEs using bilinear forms and the probabilistic one
were established for the Schrodinger equation for the Neumann BVP [10, 11], and
then, for the Robin BVP [18]. This Martingale weak form for the PDEs solution and
in fact also for a wide class of stochastic control problem [4, 3] provides a new venue
to tackle high dimensional PDE and stochastic control problems [2], and this paper
focuses on the case of high dimensional PDEs only. The Martingale formulation of the
PDE solution is a result of Ito formula and the Martingale nature of Ito integrals. As
a simple conclusion from the broader Martingale property, the Feynman-Kac formula
provides one point solution of the PDE using expectation of the underlying SDE
paths originating from that point. The Martingale based DNN to be studied in this
paper, termed DeepMartNet [2], is trained based on a loss function enforcing the
conditional expectation definition of the Martingale. It will be shown with extensive
numerical tests that using the same set of SDE paths originating from one single
point, the DeepMartNet can in fact provide approximation to solutions of the BVPs
and eigenvalue problems of elliptic PDEs over the whole solution doamin in high
dimensions. In this sense, the DeepMartNet is able to extract more information for the
PDE solutions from the SDE paths originating from one point than the (pre-machine
learning) traditional use of the one point solution Feynman-Kac formula.
The rest of this paper is organized as follows. In section 2, a brief review of
existing SDE-based DNN methods for solving PDEs is given and Section 3 will present
the Martingale problem formulation for the BVP problem of a general elliptic PDE
for the case of third kind Robin boundary condition, which includes both Dirichlet
and Neumann BVPs as special limiting cases. Section 4 will present the Martingale
based DeepMartNet for solving high dimensional PDE problems. Numerical results
for the Dirichlet BVPs and eigenvalue problems will be present in Section 5. The
implementation of the DeepMartNet for the Neumann and Robin boundary conditions
will be addressed in a follow-up paper, which will involve reflecting diffusion processes
in finite domains and the computation of local times of the processes. Section 6 will
present conclusions and future work.
2. A review of SDE based DNNs for solving PDEs. To set the background
for the Martingale based DNNs, we will first briefly review some existing DNN based
on diffusion paths from SDEs.
Let us first consider a terminal value problem for quasi-linear elliptic PDEs

(2.1) ∂t u + Lu = ϕ, x ∈ Rd ,

with a terminal condition u(T, x) = g(x), where the differential operator L is given as
1 1
(2.2) L = µ⊤ ∇ + T r(σσ ⊤ ∇∇⊤ ) = µ⊤ ∇ + T r(A∇∇⊤ ),
2 2
and µ = µ(t, x, u, ∇u), σ = σ(t, x, u, ∇u) and the diffusion coefficient matrix

(2.3) A = (aij )d×d = σσ ⊤ .

The aim is to find the solution at x, t = 0, u(0, x), and the solution of (2.1) is
2
related to a coupled FBSDE [19]

dXt = µ(t, Xt , Yt , Zt )dt + σ(t, Xt , Yt )dBt ,


(2.4)
X0 = ξ,
dYt = ϕ(t, Xt , Yt , Zt )dt + ZTt σ(t, Xt , Yt )dBt ,
(2.5)
YT = g(XT ),

where Xt , Yt and Zt are d, 1 and d-dimensional stochastic processes, respectively,


that are adapted to {Ft : 0 ≤ t ≤ T } - the natural filtration from the d-dimensional
Brownian motion Bt . Specifically, we have the following relations,

(2.6) Yt = u(t, Xt ), Zt = ∇u(t, Xt ).

DeepBSDE. As the first work of using SDEs to train DNN, the Deep BSDE [7]
trains the network with input X0 = x and output Y0 = u(0, x). Applying the
Euler–Maruyama scheme (EM) to the FBSDE (2.4) and (2.5), respectively, we have

(2.7) Xn+1 ≈ Xn + µ(tn , Xn , Yn , Zn )∆tn + σ(tn , Xn , Yn )∆Bn ,


(2.8) Yn+1 ≈ Yn + ϕ(tn , Xn , Yn , Zn )∆tn + ZnT σ(tn , Xn , Yn )∆Bn .

The missing Zn+1 at tn+1 will be then approximated by a neural network (NN)
with parameters θn

(2.9) ∇u(tn , Xn |θn ) ∼ Zn = ∇u(tn , Xn ).

Loss function: with an ensemble average approximation, the loss function is defined as
2
(2.10) Lossbsde (Y0 , θ) = E ∥u(T, XT ) − g(XT )∥ ,

where
u(T, XT ) = YN .
Trainable parameters are {Y0 , θn , n = 1, · · · , N }.
FBSNN. A forward backward neural network (FBSNNs) proposed in [21] uses the
mismatch between two stochastic processes to build the loss function for the DNN,
which aims to train a DNN uθ (x, t) in the whole domain. The following is an improved
version of the approach in [24].
• Markov chain one. Starting with X0 = x , Y0 = uθ (x, 0),

Xn+1 = Xn + µ(tn , Xn , Yn , Zn )∆tn + σ(tn , Xn , Yn )∆Bn ,


(2.11) Yn+1 = Yn + ϕ(tn , Xn , Yn , Zn )∆tn + ZTn σ(tn , Xn , Yn )∆Bn ,
Zn+1 = ∇u(tn+1 , Xn+1 ).

• Markov chain two



(2.12) Yn+1 = u(tn+1 , Xn+1 ).

• The loss function is a Monte Carlo approximation of


(2.13)
" #
N
1 X 2 2 2
E ∥Yn − Yn⋆ ∥ + 0.02 ∥YN⋆ − g(XN )∥ + 0.02 ∥ZN − ∇g(XN )∥ .
N n=1
3
Numerical results of half-order convergence of uθ (x, t), similar to the order of the
underlying Euler-Maruyama scheme, is observed.
Diffusion Monte Carlo DNN eigensolver. In another approach similar to the
power iteration method, a diffusion-Monte Carlo method was proposed [9] through
a fixed point of semi-group formulation for the eigenvalue problem of the following
linear elliptic operator (i.e. µ = µ(x), σ = σ(x)),

(2.14) LΨ = λΨ.

Equation (2.14) can be reformulated as a virtual time dependent backward parabolic


PDE with the sought-after eigenfunction as the terminal condition, i.e.,

(2.15) ∂t u(t, x) + Lu(t, x) − λu(t, x) = 0, u(T, x) = Ψ(x).

Thus, the following fix-point property holds,

(2.16) u(T − t, ·) = Ptλ Ψ, PTλ Ψ = Ψ,

where the semi-group for the evolutionary system is formally defined by

(2.17) Ptλ = e−(T −t)L .

The discretized backward in time evolution (with time step ∆t of the backward
parabolic equation) mimics the power iteration of the semi-group operator e−n∆tL ,
which will converge to the lowest eigenfunction for L as in a power method. The loss
function is set to be ||PTλ Ψ − Ψ||2 , while the evolution of PDE solution is done by
two SDEs, instead of solving the parabolic equation directly,

Xn+1 =Xn + σ∆Bn ,


(2.18) un+1 =un + (λΨθ − µT ∇Ψθ )(Xn )∆t + ∇Ψθ (Xn )∆Bn .

The original algorithm in [9] uses a separate DNN to approximate the gradient of
Ψ(x). And the loss function for the DNN Ψθ (x) approximating the eigen-function and
the eigenvalue is then defined by

(2.19) Losssemigroup (θ, λ) = EX0 ∼π0 [|uN − Ψθ (XN )|2 ].

3. Martingale problem formulation of elliptic PDEs. In this section, we


will present the Martingale problem formulation for the BVPs and eigenvalue problems
of elliptic PDEs, and for this purpose, let us consider a general PDE for the linear
elliptic operator L with µ = µ(x), σ = σ(x),

(3.1) Lu + V (x, u, ∇u) = f (x, u), x ∈ D ⊂ Rd ,


(3.2) or Lu = f (x, u) − V (x, u, ∇u),
Γ(u) = g, x ∈ ∂D,

where f (x, u) = λu, g = 0 for the case of an eigenvalue problem with an eigenvalue λ
and eigenfunction u(x), and the boundary operator Γ could be Dirichlet, Neumann,
or Robin type or a decay condition will be given at ∞ if D = Rd . The following
shorthand will be used in the rest of the paper

(3.3) v(x) = V (x, u(x), ∇u(x)).


4
The vector µ = µ(x), σd×d = σd×d (x) can be associated with the variable drift
and diffusion, respectively, of the following stochastic Ito process Xt (ω) ∈ Rd , ω ∈ Ω
(random sample space) with L as its generator,

(3.4) dXt = µ(Xt )dt+σ(Xt )·dBt


Xt = x0 ∈ D,

where Bt = (Bt1 , · · · , Btd )⊤ ∈Rd is the Brownian motion in Rd .


The transition probability P (y, t; x, s) for the process Xt will satisfy the following
Fokker-Planck equation

∂P (y, t; x, s)
(3.5) = L∗y P (y, t; x, s),
∂t
where the adjoint operator

1
(3.6) L∗y = −∇⊤ ⊤
y µ + T r(∇y ∇y A).
2

(Robin Problem) Let us consider the BVP of (3.1) with a Robin type boundary
condition

(3.7) Γ(u) = γ ⊤ · ∇u + cu = g,

where the vector


1
(3.8) γ(x) = A · n,
2
and n is the outward normal at x ∈ ∂D. The Dirichlet BC u = f can be considered
as a limiting case of c → N, g → N g, N → ∞ and the Neumann BC as the case c = 0.

With the Martingale problem approach [23], the Martingale problem for the BVP
with the third kind boundary condition (3.7) can formulated using a reflecting diffusion
process Xref based on the process X (3.4) through the following Skorohod problem.

(Skorohod problem): Assume D is a bounded domain in Rd with a C 2 boundary.


Let X(t) be a (continuous) path of (3.4) in Rd with X(0) ∈ D̄. A pair (Xref (t), L(t))
is a solution to the Skorohod problem S(X; D) if the following conditions are satisfied:
1. Xref is a path in D̄;
2. (local time) L(t) is a non-decreasing function which increases only when
X ref ∈ ∂D, namely,
Z t
(3.9) L(t) = I∂D (Xref (s))L(ds),
0

3. The Skorohod equation holds:


Z t
ref
(3.10) S(X; D) : X (t) = X(t) − γ(Xref (s))L(ds).
0
5
Here, L(t) is the local time of the reflecting diffusion process, where an oblique
reflection with respect to the direction γ at the boundary ∂D occurs once the process
X(t) hits the boundary [22]. The sampling of reflecting process and the computation
of local time L(t) can be found in [5].
As we will only use the reflecting diffusion Xref (t) for the rest of our discussion,
we will keep the same notation

(3.11) X(t) ← Xref (t)

with the understanding that it now stands for a reflecting diffusion process within the
closed domain D̄. Using the Ito formula [17] for the semi-martingale X(t) (namely,
Xref (t)) [10] [18],
d d d
X ∂u 1 XX ∂2u
du(X(t)) = (X(t))dXi (t) + aij (X(t)) (X(t))dt,
i=1
∂xi 2 i=1 j=1 ∂xi ∂xj

with the notation of the generator L, we have for the solution u(x) of (3.1) the following
differential

du(X(t)) =
d X
d
X ∂u
Lu(X(t))dt − γ ⊺ · ∇u(u(Xt ))L(dt) + σij (X(t))dBi (t)
i=1 j=1
∂xi
= [f (X(t), u(X(t)) − V (X(t), u(X(t)), ∇u(X(t)))] dt − [g(X(t)) − cu(X(t))] L(dt)
d X
d
X ∂u
+ σij (X(t))dBi (t),
i=1 j=1
∂xi

which in turns gives a Martingale Mtu defined by


.
(3.12) Mtu =
Z t
(3.13) u(Xt ) − u(X0 ) − [f (Xs , u(Xs )) − V (Xs , u(Xs ), ∇u(Xs ))] ds
0
Z t Z tXd X d
∂u
+ [g(Xs ) − cu(Xs )] L(ds) = σij (Xs )dBi (s),
0 0 i=1 j=1 ∂xi

due to the Martingale nature of the Ito integral at the end of the equation above.

(Dirichlet Problem) For Dirichlet problem of (3.1) with a boundary condition

(3.14) Γ[u] = u = g, x ∈ ∂D,

the underlying diffusion process is the original diffusion process (3.1), but killed at the
boundary at the first exit time

(3.15) τD = inf{t, Xt ∈ ∂D},

and it can be shown that in fact

(3.16) τD = inf{t > 0, L(t) > 0},


6
u
and also that Mt∧τ D
will be still a Martingale [17], which will not involve the integral
with respect to local time L(t), i.e.
(3.17)
Z t∧τD
u
Mt∧τD = u(Xt∧τD ) − u(X0 ) − [f (Xs , u(Xs )) − V (Xs , u(Xs ), ∇u(Xs ))] ds.
0

For the case of a linear PDE, i.e. f (x, u) = f (x), V = 0 , by taking expectation of
(3.17) and letting t → ∞, we will have

0 = E[M0u ] = limt→∞ E[Mt∧τ


u
] = E[MτuD ]
R DτD
= E[u(XτD ) − u(X0 )] − 0 f (Xs )ds

(3.18) = E[g(XτD )] − u(x) − E[ 0 D f (Xs )ds],

resulting in the well-known Feynman-Kac formuula for the Dirichlet boundary value
problem
Z τD
(3.19) u(x) = E[g(XτD )] − E[ f (Xs )ds], x ∈ D,
0

where the diffusion process Xt , X0 = x is defined by (3.4).

The Martingale problem of the BVPs states the equivalence of Mtu being a
Martingale (i.e. a probabilistic weak form of the BVPs) and the classic weak form:
2
For every test function ϕ(x) ∈ C∂D = {ϕ : ϕ ∈ C 2 (D) ∩ C 1 (D), (γ ′ · ∇ − µ⊺ n) ϕ = 0},
3
∂a
X
γ ′ = γ − α, αj = ∂xi
ij
, we have
i=1
Z Z

u(x)L ϕdx = [f (x, u(x)) − V (x, u(x), ∇u(x))] ϕ(x)dx
D
(3.20) ZD
+ ϕ(x)[g(x) − cu(x)]dsx ,
∂D

where
1
(3.21) L∗ ϕ = T r(∇∇⊺ A)ϕ − div(µϕ).
2
This equivalence has been proven for the Schrodinger operator Lu = 12 ∆u + qu for the
Neumann problem [10] and the Robin problem [18].
4. DeepMartNet - a Martingale based neural network. In this section, we
will propose a DNN method for solving the BVPs and eigenvalue problems for elliptic
PDEs using the equivalence between its Martingale problem formulation and classic
weak form of the PDEs.
For simplicity of our discussion, let us assume that s ≤ t ≤ τD , by the Martingale
property of Mt ≡ Mtu of (3.17), we have

(4.1) E[Mt |Fs ] = Ms ,

which implies for any measurable set A ∈ Fs ,

(4.2) E[Mt |A] = Ms = E[Ms |A],


7
thus,
(4.3) E[(Mt − Ms ) |A] = 0,
i,e,
Z
(4.4) (Mt − Ms ) P (dω) = 0,
A

where
Z t
Mt − Ms =u(Xt ) − u(Xs ) − Lu(Xz )dz
s
Z t
(4.5) =u(Xt ) − u(Xs ) − (f (z, u(Xz )) − v(Xz ))dz.
s

In particular, if we take A = Ω ∈ Fs in (4.3), we have


(4.6) E[Mt − Ms ] = 0,
i.e. the Martingale Mt has a constant expectation. However, it should be noted
that constant expectation by itself does not mean a Martingale, for this we have the
following lemma [3].
Lemma 4.1. If E[MS ] = E[MT ] holds for any two stopping time S ≤ T , then
Mt , t ≥ 0 is a Martingale, i.e. E[Mt |Fs ] = Ms for s ≤ t.
For a given time interval [0, T ], we define a partition
(4.7) 0 = t0 < t1 < · · · < ti < ti+1 < · · · < tN = T,
and the increment of the Mt over [ti , ti+k ] can be approximated by using a trapezoidal
rule for the integral term
Z ti+k
Mti+k − Mti =u(Xi+k ) − u(Xi ) − Lu(Xz )dz
ti
k
. X
=u(Xi+k ) − u(Xi ) − ∆t ωl Lu(Xi+l )
l=0
k
X
(4.8) =u(Xi+k ) − u(Xi ) − ∆t ωl (f (Xi+l , u(Xi+l )) − v(Xi+l )),
l=0

where for k ≥ 1 ω0 = ωk = 12 , ωl = 1, 2 ≤ l ≤ k − 1 and for k = 0, ω0 = 1.


Adding back the exit time τD , we note that
Z ti+k ∧τD
Mti+k ∧τD − Mti ∧τD = u(Xti+k ∧τD ) − u(Xti ∧τD ) − Lu(Xz )dz = 0
ti ∧τD

if both ti+k , ti ≥ τD .
Remark 4.2. We could define a different generator L by not including µ⊤ ∇ in
(2.2), then the Martingale in (4.9) will be changed to
Z t
(4.9) Mt∗ = u(Xt ) − u(x0 ) − (λu(Xs ) − µ⊤ (Xs )∇u(Xs ) − v(Xs ))ds,
0

where the process Xt is given simply by dXt = σ·dBt , instead.


8
• DeepMartNet for Dirichlet BVPs
Let uθ (x) be a neural network approximating the BVP solution with θ denoting
all the weight and bias parameters of a DNN. For a given time interval [0, T ], we define
a partition
0 = t0 < t1 < · · · < ti < ti+1 < · · · < tN = T,
and M -discrete realizations

(4.10) Ω′ = {ωm }M
m=1 ⊂ Ω

of the Ito process using the Euler-Maruyama scheme with M -realizations of the
(m)
Brownian motions Bi , 0 ≤ m ≤ M ,
(m)
Xi (ωm ) ∼ X(ti , ωm ), 0 ≤ i ≤ N,

where
(m) (m) (m) (m) (m)
(4.11) Xi+1 = Xi + µ(Xi )∆ti +σ(Xi )·∆Bi ,
(m)
(4.12) X0 = x0

where ∆ti = ti+1 − ti ,


(m) (m) (m)
∆Bi = Bi+1 − Bi .

We will build the loss function Loss(θ) for neural network uθ (x) approximation
of the BVP solution using the Martingale property (4.4) and the M-realization of the
Ito diffusion (3.4).
For each ti , we randomly take a subset of Ai ⊂ Ω′ with a uniform sampling
(without replacement), corresponding to the mini-batch in computing the stochastic
gradient for the empirical training loss. Assuming that the mini-batch in each Ai
(m) (m)
is large enough such that {Xi+1 } and {Xi }, ωm ∈ Ai sample the distribution
P (ti+1 , ., ti , A), P (ti , ., 0, x0 ) well, respectively, then equation (4.22) gives the following
approximate identity for the solution uθ (Xt ) using the Ai −ensemble average,
As E[Mtui+k θ
− Mtuiθ ] ≈ 0, for a randomly selected Ai ∈ Ω′ = Fti ( mini-batches),
we can set

N −1
1 X
Lossmart (θ) = (Mtui+k
θ uθ
∧τD − Mti ∧τD )
2
N i=0

N −1 |Ai |
1 X I(ti ≤ τD )  X (m) (m)
= uθ (Xi+k ) − uθ (Xi )−
N i=0 |Ai |2 m=1

k
!2
(m) (m) (m)
X
(4.13) ∆t ωl (f (Xi+l , uθ (Xi+l )) − vθ (Xi+l )) ,
l=0

where vθ (x) is defined similarly as in (3.3). Refer to Remark 4.4 for the discussion of
the size of the mini-batch |Ai |.
Now, we define the total loss for the boundary value problem as

(4.14) Losstotal−bvp (θ) = Lossmart (θ) + αbdry Lossbdry (θ),


9
and αbdry is a hyper-parameter and Lossbdry (θ) = ||uθ (x) − g||22 , which can be approx-
imated by evaluations at the boundary.
DeepMartNet solution- uθ∗ (x), here

(4.15) θ∗ = argminLosstotal−bvp (θ).

• DeepMartNet for Dirichlet eigenvalue problems


For the eigenvalue problem

(4.16) Lu + V (x, u, ∇u) = λu, x ∈ D ⊂ Rd ,


Γ(u) = u = 0, x ∈ ∂D,

and the Martingale loss becomes



N −1 |Ai |
1 X 1 X (m) (m)
Lossmart (λ, θ) = uθ (Xi+k ) − uθ (Xi )−
N i=0 |Ai |2 m=1

k
!2
(m) (m)
X
(4.17) ∆t ωl (λuθ (Xi+l ) − vθ (Xi+l ))
l=0

and in the case of a bounded domain, the boundary loss Lossbdry (θ) will be added
for the homogeneous boundary condition g = 0, i.e., Lossbdry (θ) = ||uθ (x)||22 . For
the decay condition at infinite, a molifier will be used to enforce explicitly the decay
condition there (see (5.22) ).
Also, in order to prevent the DNN eigenfunction going to a zero solution, we
introduce a simple normalization term using lp (p=1, 2) norm of the solution at some
randomly selected location
m
!2
1 X p
(4.18) Lossnormal (θ) = |uθ (xi )| − c ,
m i=1

where xi are m arbitrarily selected fixed points and c is a nonzero constant.


Finally, we have the total loss for the eigenvalue problem as
(4.19)
Losstotal−eig (λ, θ) = Lossmart (λ, θ) + αbdry Lossbdry (θ) + αnormal Lossnormal (θ),

where αbdry and αnormal are hyper-parameters.


DeepMartNet eigen-problem solution - (λ∗ , uθ∗ (x)), here

(4.20) (λ∗ , θ∗ ) = argminLosstotal−eig (λ, θ).

Remark 4.3. (Mini-batch in SGD training and Martingale property) Due


to the equivalence between (4.4) and (4.1), the loss function defined above ensures
that Mtuθ of (3.17) for uθ (x) will be a Martingale approximately if the mini-batch Ai
explores all subsets of the sample space Ω′ during the SGD optimization process of
the training, and the sample size M = |Ω′ | → ∞, the time step max |∆ti | → 0, and
the training converges (see Fig. 4.1).
10
Also, if we take Ai = Ω′ for all i, there will be no stochasticity in the gradient
calculation for the loss function, we will have a traditional full gradient descent method
and the full Martingale property for uθ (x) is not enforced either. Therefore, the mini-
batch procedure in DNN SGD optimization corresponds perfectly with the Martingale
definition (4.1).
In summary, the Martingale property implies that for any measurable set A ∈ Fs ,
we require that

(4.21) E[Mt |A] = Ms ,

which then provides a native mechanism for the mini-batch in the SGD. Therefore,
the Martingale based DNN is an ideal fit for deep learning of high-dimensional PDEs.

Fig. 4.1: DeepMartNet training and Martingale property

Remark 4.4. (Size Rof mini-batch Ai ) The loss function of the DeepMartNet is
based on the fact that Ai (Mtu − Msu ) P (dω) = 0 for the exact solution u(x), where
the expectation will be computed by an ensemble average of selected paths from the
M -paths. In theory, using the transitional probability, using left end point quadrature
for the integral over [s,t] in (4.5) with |t − s| << 1, (4.4) can be rewritten as

0 = E[(Mtu − Msu ) |Ai ]


Z Z
.  
= u(y) − u(z) − (f (z, u(z)) − v(z))(t − s) P (t, y, s, z)p(s, z, 0, x0 )dzdy
z∈B y∈R d
Z Z
 
= u(y)P (t, y, s, Ai )dy − u(z) + (f (z, u(z)) − v(z))(t − s) P (s, z, 0, x0 )dz
y∈Rd z∈B
(4.22)
|Ai |
1 X (m) 
u(Xt ) − u(X(m) (m) (m) (m)

∼ s ) + (f (Xs , u(Xs )) − v(Xs ))(t − s) ,
|Ai | m=1

(m)
where B = X−1 s (Ai ) and Xt , 1 ≤ m ≤ M are the M-sample paths of the diffusion
process Xt .
Therefore, the size of mini-batch |Ai | should be large enough to give an accurate
sampling of the continuous distribution P (t, y, s, A), y ∈ Rd and P (s, z, 0, x0 ), z ∈ B,
so in our simulation we select a sufficient large M and set the size of mini-batch Ai in
the following range,

(4.23) M/m1 ≤ |Ai | ≤ M/m2 ,

where M is the total number of paths used and m1 > m2 are hyper-parameters of the
training.
11
The set Ai can be the same for each 0 ≤ i ≤ N − 1 per epoch or can be randomly
selected as the subset of Ω′ depending how much stochasticity is put into the calculation
of the stochastic gradient in the SGD optimizations.
For a low memory implementation of the DeepMartNet, at any time we can just
generate enough |Ai | number of paths to be used for some epochs of training, and
then regenerate them again without first generating a large number of paths upfront.
5. Numerical Results. The numerical parameters for the DeepMartNet consist
of
• M - number of total paths
• T - terminal time of the paths
• N - number of time steps over [0, t]
• ∆t = T /N
• Mb = |Ai | size of mini-batches of paths selected according to (4.23).
• Size of networks
The training is carried out on a Nvidia Superpod one GPU node A100. The Optmizer
is Adamax [16].
5.1. Dirichet BVPs of the Poisson-Boltzmann equation . We will first
apply the DeepMartNet to solve the Dirichlet BVP of the Poisson-Boltzmann equation
(PBE) arising from solvation of biomolecules in inoic solvents [1].
(
∆u(x) + cu(x) = f (x), x ∈ D
(5.1)
u(x) = g(x), x ∈ ∂D

where c < 0 (c = −1 in the numerical tests) with an high-dimensional solution given


by
d
X
(5.2) u(x) = cos(ωxi ), ω = 2.
i=1

In this case, the generator for the stochastic process is L = 12 ∆, so the corre-
sponding diffusion is simply the Brownian motion B(t). For the M − Brownian paths
B(j) , j = 1, · · · , M originating from x0 , the Martingale loss (4.13) becomes

N −1 |Ai |
1 X 1 X (j) (j)
Lossmart (θ) := uθ (Bti+1 ) − uθ (Bti )
N i=0 |Ai |2 j=1

1   2
(j) (j) (j)
(5.3) − f (Bti ) − cu(Bti ) I(ti ≤ τD )∆t .
2
Meanwhile, we could use the Feynman-Kac formula [17] to compute the solution at
the point x0 by
N −1
M
!
1 X 
(j)
 cτD 1 X  (j)  cti  (j)

(5.4) u(x0 ) ≈ g BτD e 2 + f Bti e 2 I ti ≤ τD ∆t ,
M j=1 2 i=0

and also define a point solution loss (or an integral identity for the solution for PDE
with quasi-linear term V and f in (3.1) ), termed Feynman-Kac loss, which is added
to the total loss (4.14)
(5.5) LossF-K (θ) = (uθ (x0 ) − u(x0 ))2 ,
12
and a boundary loss is approximated as
Nbdry
1 X
(5.6) Lossbdry (θ) = ||uθ − g||22 ≈ |uθ (xk ) − g(xk )|2 ,
Nbdry
k=1

where uniformly sampled boundary points xk , 1 ≤ k ≤ Nbd are used to compute the
boundary integral. Therefore, the total loss for the boundary value problem of the PB
equation is

(5.7) Lossbvp (θ) = Lossmart (θ) + αF-K LossF-K (θ) + αbdry Lossbdry (θ),

where the penalty parameter αF-K ranges from 10 to 1000 and αbdry ranges from 1000
to 10,000. An Adamax optimizer with learning rate 0.05 is applied for training and
αF-K = 10, αbdry = 103 are taken for the following numerical tests.
• Test 1: PBE in a d=20 dimensional cube [−1, 1]d . In this test, we solve
the PBE in a 20 dimensional cube with the Dirichlet boundary condition
given by the exact solution (5.2). The total number of paths is M = 100, 000
starting from origin, and mini-batch size Mb = |Ai | = 1000; ∆t = 0.01 and
T=9. A fully connected network with layers (20, 64, 32, 1) with first hidden
layer Tanh and second hidden layer GeLU activation function is used. A
mini-batch of Nbdry = 2000 points xk are uniformly sampled points on the
boundary for every epoch of training.
Fig. 5.1 show the learned solution along diagonal of the cube (top left) as well
as along the first coordinate (top right) and the history of the loss (bottom
left) and relative error L2 (computed by Monte Carlo sample) (bottom right).
The training takes about less than 5 minutes.

Fig. 5.1: DeepMartNet solution of PBE in D = [−1, 1]d , d = 20. (Upper left): true
and predicted value of u along the diagonal of the unite cube; (Upper right): true and
predicted value of u along the first coordinate axis of Rd . (Lower left): The loss L
history; (lower right): The history of relative error L2 over the cube.

13
• Test 2. Effect of path starting point x0 . In previous test, the Deep-
MartNet uses all diffusion paths originating from a fixed point x0 to explore
the solution domain. In this test, we consider paths starting from different
x0 = (l, 0, · · · , 0), l = 0.1, 0.3, 0.7 to investigate the effect of different starting
point x0 on the accuracy of the DeepMartNet. A fully connected network
with layers (20, 64, 32, 1) with first hidden layer Tanh and second hidden
layer GeLU activation function will be used as in Test 1. The total number
of paths M = 100, 000, and for each epoch, we randomly choose Mb = 1000
from the paths; the time step of the paths is ∆t = 0.01.
Fig. 5.2 shows that the three different choices of x0 produce similar numerical
results with the same numerical parameters as in Test 1.

Fig. 5.2: DeepMartNet for PBE in D = [−1, 1]d , d = 20 with 3 different starting points
for the paths. From up to down: x0 = (l, 0, · · · , 0), l = 0.1, 0.3, 0.7. (left):u(xe1 ) where
e1 = d−1/2 (1, 1 · · · , 1),(Right): u(xe) where e = (1, 0 · · · , 0).

Moreover, the DeepMartNet can use diffusion paths starting from different
initial position x0 in training the DNN as long as the mini-batch of the paths
Ai for a given epoch corresponds to paths originating from a common initial
point x0 . In Fig. 5.3, we compare the numerical result using 120,000 total
paths starting from x0 = (0.3, 0, · · · , 0, 0) with that with three choices of x0
as in Fig. 5.2 with 40,000 paths for each x0 . In both cases, for each epoch,
we randomly choose mini-batch size Mb = 1000 from the paths; the time step
of the paths is ∆t = 0.01. An Adamax optimizer with learning rate 0.05 is
14
Fig. 5.3: Comparison of DeepMartNet with paths from one starting point and from
3 starting points with same number of total paths in both cases. Upper left: u(xe)
where e = (1, 0 · · · , 0); Upper right: the relative error |u − utrue |/∥utrue ∥∞ for the
upper left plot; Lower left: u(xe1 ) where e1 = d−1/2 (1, 1 · · · , 1); Lower right: the
error ∥utrue ∥∞ for the lower left plot.

applied for training. αF-K = 10, αbdry = 103 . The DeepMartNet produce
similar results for these two cases with same other numerical parameters as in
Test 1.
• Test 3: PBE in a d=100 dimensional unit ball. In this test, we solve
the PBE in a 100 dimensional space, We set T = 0.25 and the time step of the
paths is ∆t = 0.005, and the total number of paths is M = 100, 000, and for
each epoch, we randomly choose Mb = 1000 from the paths. A fully connected
network with layers (100, 128, 32, 1) with first hidden layer Tanh and second
hidden layer GeLU activation function is used.
Fig. 5.4 show the learned solution (Upper left): true and predicted value
of u at the diagonal of Rd , i.e. x versus u(xe) where e = d−1/2 (1, 1 · · · , 1);
(Upper right): true and predicted value of u at the first coordinate axis
of Rd , i.e. x versus u(xe1 ) where e = d−1/2 (1, 0 · · · , 0). ( Middle left):
u(xe′ ) where e′ = 2−1/2 (1, 1, 0, 0, · · · , 0) (Middle right): u(xe′′ ) where e′′ =
10−1/2 (1, · · · , 1, 0, 0, · · · , 0) (Lower left): The loss L versus the number of
ten1′ s
epoch; (lower right): The relative error L2 error versus the number of epoch.
The training takes about less than 30 minutes.
• Test 4: nonlinear PBE in a d=10 dimensional unit ball. For a 1:1
symmetric two species ionic solvent, the electrostatic potential based on the
Debye-Huckel theory [1] is given by a nonlinear PBE before linearization, and
15
Fig. 5.4: DeepMartNet solution of the PBE in D = {x ∈ Rd , |x| < 1}, d=100.
(Upper left): true and predicted value of u at the diagonal of Rd , i.e. x versus
u(xe) where e = d−1/2 (1, 1 · · · , 1); (Upper right): true and predicted value of u at
the first coordinate axis of Rd , i.e. x versus u(xe1 ) where e = d−1/2 (1, 0 · · · , 0). (
Middle left): u(xe′ ) where e′ = 2−1/2 (1, 1, 0, 0, · · · , 0) (Middle right): u(xe′′ ) where
e′′ = 10−1/2 (1, · · · , 1, 0, 0, · · · , 0) (Lower left): The loss L versus the number of epoch;
ten1′ s
(lower right): The relative error L2 error versus the number of epoch.

we will consider the following model nonlinear PBE to test the capability of
the DeepMartNet for solving nonlinear PDEs,
(
−△u + sinh u = f x ∈ D,
(5.8)
u=g x ∈ ∂D,

where

(5.9) D = {x ∈ Rd : ∥x∥2 ≤ L}, L = 1.

We consider the case of a true solution as


d
X
(5.10) u=α x2i , α = 2,
i=1
16
which gives the boundary data and right hand side of the PBE as

g ≡ αL2 ,

and !
d
X
f = −2αd + sinh α x2i .
i=1

This time, we use the loss as

(5.11) Loss(θ) = Lossmart (θ) + αbdry Lossbdry (θ),

where αbdry = 10−4 , and the Martingale loss (4.13) is

(5.12)
 
(j)
1 X I ti ≤ τ∂D
N −1
Lossmart (θ) =
N i=0 Mb2
 2
Mb     ∆t   
(j) (j) (j) (j)
X
· uθ Wti+1 − uθ Wti − sinh u(Wti ) − f Wti  ,
j=1
2

and the boundary condition loss is again


Nbdry
1 X
(5.13) Lossbdry (θ) = (uθ (xk ) − g(xk ))2 ,
Nbdry
k=0

and a mini-batch of Nbdry = 2000 points xk are uniformly sampled points on


the boundary for every epoch of training. And in this case, we do not have a
Feynman-Kac loss at the starting point x0 .
The numerical results in Fig. 5.5 is done with the DeepMartNet with the
fully connected NN (10, 10, 10, 1). The activation function for the first hidden
layer is Tanh, and the one for the second hidden layer is GELU with Tanh
approximation. Mtot = 106 paths are used and the size of mini-batch Mb =
4000 paths are sampled in each epoch. The total number of epoch of the
training is 3000. The terminal time for the paths is T = 0.25 and time step is
t = 0.01. The learning rate starts at 0.01, and decreased by a factor of 0.99
every 100 epochs. The training takes less than 20 minutes.
5.2. Eigenvalue problem for elliptic equations. In this section, we will apply
the DeepMartNet to solve elliptic eigenvalue problems in bounded and unbounded
domains for both self-adjoint and non-self adjoint operators.
5.2.1. Eigenvalue problem of the Laplace equation in a cube in R10 .
First, we consider a self-adjoint eigenvalue problem

(5.14) −△u = λu, x ∈ D = [−L, L]d , d = 10, L=1


(5.15) u|∂D = 0,

with the following eigenfunction


πx1 πxd
(5.16) u(x) = sin( ) · · · sin( ),
L L
17
Fig. 5.5: DeepMartNet solution of nonlinear PBE (5.8) in a 10-dimensional unit ball.
Upper left: u(xe1 ) where e1 = (1, 0, · · · , 0); Upper right: u(xe) where e = (1, 1, · · · , 1);
Lower left: loss vs. epoch; right: relative L2 error vs epoch.

for the lowest eigenvalue


π
(5.17) λ1 = d( )2 .
L

A DeepMartNet with a fully connected structure (10, 20, 10, 1) and GELU activa-
tion function are used. The total number of paths is M = 100, 000, and for each epoch,
we randomly choose Mb = 1000 from the paths; the time step of the paths is ∆t = 0.05.
A mini-batch of Nbdry = 2000 points xk are uniformly sampled points on the boundary
for every epoch of training. An Adamax optimizer with learning rate 0.05 is applied for
training. αbdry = 103 in (4.19). The constant αnormal = 10, p = 1, m = 1, x1 = 0, c = 1
in (4.18).
To accelerate the convergence of the eigenvalue, We included an extra term into
the loss function (4.19)

(5.18) αeig |λ(i) − λ(i−100) |2 ,

where i is the index of the current epoch, which corresponds to the residual of the
equation dλdt = 0 as the training time t → ∞, and ensures that the eigenvalue λ will
converge and the penalty constant αeig is chosen as 2.5 × 10−8 in this case. The
terminal time for the paths is T = 0.6 and time step is t = 0.01. The learning rate
starts at 0.02, and decreased by a factor of 0.995 every 100 epochs. The training takes
less than 20 minutes.
18
Fig. 5.6: A numerical result for eigenvalue problem of Laplace equation in a cube
[−L, L]10 in R10 . Upper left to middle right: true and predicted value of u at xe where
e = k −1/2 (1, · · · , 1, 0, 0, · · · , 0) for k = 1, 2, 5, 10 Lower left: The predicted eigenvalue
k1′ s
λ versus the number of epoch, (orange) horizontal line shows the true eigenvalue; lower
right: The relative error L2 error versus the number of epoch.

5.2.2. A non-self adjoint eigenvalue problem for the Fokker-Planck


equations. Here we consider the following a non-self adjoint eigenvalue problem of
the Fokker-Planck equation
(5.19) − ∆ψ − ∇ · (ψ∇W ) + cψ = −∆ψ − ∇W · ∇ψ − ∆W ψ + cψ = λψ, x ∈ Rd
where the eigenfunction for the eigenvalue λ = c with a zero decay condition at ∞ is
(5.20) ψ(x) = e−W (x) .
Here, we will consider a quadratic potential for our numerical tests
W (x) = ||x||2 , x ∈ Rd .
19
Equation (5.19) can re-written as
 
1 1 1 1 1
(5.21) Lψ = ∆ψ + ∇W · ∇ψ = − ∆W − c + λ ψ.
2 2 2 2 2
The generator for the SDE L will have a drift and a diffusion as
1
µ= ∇W and σ = Id×d .
2
In order to enforce explicitly the decay condition of the eigenfunction at the
infinite, a mollifier of the following form will be used as a pre-factor for the DNN
solution uθ (x) = ρ(x)ũθ (x) to the eigenfunction„
1
(5.22) ρ(x) = ,
1 + ( ||x||
α )
2

5
where α is a constant and α = 11 .
The Martingale loss of (4.17) for this case will be

N −1 |Ai |
1 1 X 1 X (m) (m)
Lossmart = uθ (xi + 1 ) − uθ (xi )
∆t N i=0 |Ai |2 m=1
1 1 1 2
(m) (m)
(5.23) +( ∆W (xi ) − c + λ)uθ (xi )∆t .
2 2 2
M
For the mini-batches in this case, we take the size of each Ai to be between 200
M
and 25 as a random assortment of the M total trajectories.
To speed up the convergence of the eigenvalue and eigenfunctions, we found out
that by taking fractional powers of each individual loss term is helpful, namely, the
total loss is now modified as
p r
(5.24) Losstotal = ((Lossmart ) + αnormal (Lossnormal )q ) .
In our numerical tests, a typical choice is p = 3/8, q = 1, r = 3/4, and for the
normalization Lossnormal , we set αnormal = 50 and c = 30, m = 2, x1 = 0, x2 =
(1, 0, · · · , 0) in (4.18).
Figs. 5.7 and 5.8 show the learned eigenvalues (λ = 5 for d = 5 and λ = 200 for
d = 200) and eigenfunctions, respectively. The k = 3-trapezoidal rule is used in the
Martingale loss Lossmart and the other numerical parameters used are listed as follows
• The total number of paths starting from the origin M = 9, 000, 24, 000 for
d = 5, 200, respectively.
• The number of time steps N = 1350, 1300 and the terminal time T = 9 for
d = 5, 200, respectively.
• the learning rate is 1/150 and is halved every 500 epochs starting at epoch
500 and halved twice at epoch 7500 for stabilization.
• A In the following numerical tests, a fully connected network with layers (d,
6d, 3d, 1) with a Tanh activation function is used for the eigenfunction while
a fully connected network (1, d, 1) with a ReLu9 activation function with a
constant value input is used to represent the eigenvalue.
• An Adamax optimizer is applied for training.
The final relative error in the eigenvalues after 10,000 epochs of training is
1.3 × 10−2 , 6.7 × 10−3 for d = 5, 200, respectively. And the relative L2 error of the
eigenfunction calculated along the diagonal of the domain is 2.6 × 10−2 , 2.9 × 10−2 for
d = 5, 200, respectively. The training takes 25 minutes for the case of d = 200.
20
Fig. 5.7: Eigen-value problem of Fokker-Planck equation in Rd d = 5 for eigenvalue
λ = 5. (Top left) Convergence of eigenvalue, (top right) history of eigenvalue error,
(bottom left) history of loss function, (bottom right) Learn and exact eigenfunction
along the diagonal of the domain.

6. Conclusion. In this paper, we introduced a Martingale based neural network,


DeepMartNet, for solving the BVPs and eigenvalue problems of elliptic operators with
Dirichlet boundary conditions. The DeepMartNet enforces the Martingale property
for the PDE solutions through the stochastic gradient descent (SGD) optimization
of the Martingale property loss functions. The connection between the Martingale
definition and the mini-batches used in stochastic gradient computation shows a
natural fit between the SGD optimization in DNN training and the Martingale
problem formulation of the PDE solutions. Numerical results in high dimensions for
both BVPs and eigenvalue problems show the promising potential in approximating
high dimensional PDE solutions. The numerical results show that the DeepMartNet
extracts more information for the PDEs solution over the whole solution domain than
the traditional one point solution Feynman-Kac formula from the same set of diffusion
paths originating from just one point.
Future work on DeepMartNet will include PDEs with Neumann and Robin bound-
ary conditions where the underlying diffusion will be a reflecting one and the local
time of the reflecting diffusion will be computed and included in the definition of the
Martingale loss function. Another area of application is in optimal stochastic control
where the Martingale optimality condition for the control and the backward SDE for
the value function can be used to construct the loss function for the control as well as
21
Fig. 5.8: Eigenvalue problem of Fokker-Planck equation in Rd d = 200 for eigenvalue
λ = 200. (Top left) Convergence of eigenvalue, (top right) history of eigenvalue error,
(bottom left) history of loss function, (bottom right) Learn and exact eigenfunction
along the diagonal of the domain.

the value function [2]. Another important area for the application of the DeepMartNet
is to solve low-dimension PDEs in complex geometries as the method uses diffusion
paths to explore the domain and therefore can handle highly complex geometries such
as the interconnects in microchip designs and nano-particles in material sciences and
molecules in biology. Finally, The convergence analysis of the DeepMartNet, especially
to a given eigenvalue, and the choices of mini-batches of diffusion paths and various
hyper-parameters, which can affect the convergence of the learning strongly, in the
loss functions are important issues to be addressed.
Acknowledgement. W. C. would like to thank Elton Hsu and V. Papanicolaou
for the helpful discussion about their work on probabilistic solutions of Neumann and
Robin problems.

REFERENCES

[1] Cai W. Computational Methods for Electromagnetic Phenomena: electrostatics in solvation,


scattering, and electron transport. Cambridge University Press; 2013 Jan 3.
[2] Cai W. DeepMartNet – A Martingale based Deep Neural Network Learning Algorithm for Ei-
genvalue/BVP Problems and Optimal Stochastic Controls. arXiv preprint arXiv:2307.11942.
2023 Jul 21.
[3] Cohen SN, Elliott RJ. Stochastic calculus and applications. New York: Birkhäuser; 2015 Nov
22
18.
[4] Davis MH. Martingale methods in stochastic control. InStochastic Control Theory and Stochastic
Differential Systems: Proceedings of a Workshop of the Sonderforschungsbereich 72 der
Deutschen Forschungsgemeinschaft an der Universität Bonn “which took place in January
1979 at Bad Honnef 2005 Oct 6 (pp. 85-117). Berlin, Heidelberg: Springer Berlin Heidelberg.
[5] Ding CY, Zhou, YJ, Cai W, Zeng X, and Yan CH. A Path Integral Monte Carlo (PIMC)
Method based on Feynman-Kac Formula for Electrical Impedance Tomography, Journal of
Computational Physics., 476 (2023) 111862. 121.
[6] E W, Han J, Jentzen A. Algorithms for solving high dimensional PDEs: from nonlinear Monte
Carlo to machine learning. Nonlinearity. 2021 Dec 9;35(1):278.
[7] Han J, Jentzen A, E W. Solving high-dimensional partial differential equations using deep
learning. Proceedings of the National Academy of Sciences. 2018 Aug 21;115(34):8505-10.
[8] Han J, E W. Deep learning approximation for stochastic control problems, Deep Reinforcement
Learning Workshop, NIPS. arXiv preprint arXiv:1611.07422. 2016.
[9] Han J, Lu J, Zhou M. Solving high-dimensional eigenvalue problems using deep neural networks:
A diffusion Monte Carlo like approach. Journal of Computational Physics. 2020 Dec
15;423:109792.
[10] Hsu, P. Reflecting Brownian Motion, Boundary Local Time and the Neumann Problem, Disser-
tation Abstracts International Part B: Science and Engineering [DISS. ABST. INT. PT. B-
SCI. ENG.], 45(6), 1984.
[11] Hsu P. Probabilistic approach to the Neumann problem. Communications on pure and applied
mathematics. 1985 Jul;38(4):445-72.
[12] Hu Z, Shukla K, Karniadakis GE, Kawaguchi K. Tackling the curse of dimensionality with
physics-informed neural networks. arXiv preprint arXiv:2307.12306. 2023 Jul 23.
[13] Hutzenthaler M, Jentzen A, Kruse, T, Nguyen TA and von Wurstemberger, P. Overcoming
the curse of dimensionality in the numerical approximation of semilinear parabolic partial
differential equations. Proceedings of the Royal Society A, 476(2244):20190630, 2020.
[14] Ji S, Peng, S, Peng Y, Zhang X. Three algorithms for solving high-dimensional fully coupled
FBSDEsthrough deep learning, IEEE Intell. Syst. 35(3) (Feb 2020) 71–84.
[15] Karatzas I, Shreve S. Brownian motion and stochastic calculus. Springer Science & Business
Media; 2012 Dec 6.
[16] Kingma DP, Ba J. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
2014 Dec 22.
[17] Klebaner FC. Introduction to stochastic calculus with applications. World Scientific Publishing
Company; 2012 Mar 21.
[18] Papanicolaou VG. The probabilistic solution of the third boundary value problem for second
order elliptic equations, Probab. Theory Relat. Fields 87 (1990) 27-77.
[19] Pardoux E, Peng S. Backward stochastic differential equations and quasilinear parabolic partial
differential equations. In Stochastic partial differential equations and their applications 1992
(pp. 200-217). Springer, Berlin, Heidelberg
[20] Pfau D, Spencer JS, Matthews AG, Foulkes WM. Ab initio solution of the many-electron
Schrödinger equation with deep neural networks. Physical Review Research. 2020 Sep
16;2(3):033429.
[21] Raissi M. Forward-backward stochastic neural networks: Deep learning of high-dimensional
partial differential equations. arXiv preprint arXiv:1804.07010. 2018 Apr 19.
[22] Schuss Z. Brownian dynamics at boundaries and interfaces. Springer-Verlag New York; 2015.
[23] Stroock DW, Varadhan SRS.Diffusion processes with boundary conditions. Commun. Pure Appl.
Math. 24, 147-225 (1971).]
[24] Zhang W, Cai W. FBSDE based neural network algorithms for high-dimensional quasilinear
parabolic PDEs. Journal of Computational Physics. 2022 Dec 1;470:111557.

23

You might also like