ERROR ANALYSIS OF DEEP PDE SOLVERS FOR OPTION PRICING
JASPER ROU
A BSTRACT. Option pricing often requires solving partial differential equations (PDEs). Although deep learning-
based PDE solvers have recently emerged as quick solutions to this problem, their empirical and quantitative ac-
curacy remain not well understood, hindering their real-world applicability. In this research, our aim is to offer
actionable insights into the utility of deep PDE solvers for practical option pricing implementation. Through com-
parative experiments in both the Black–Scholes and the Heston model, we assess the empirical performance of two
neural network algorithms to solve PDEs: the Deep Galerkin Method and the Time Deep Gradient Flow method
arXiv:2505.05121v1 [q-fin.CP] 8 May 2025
(TDGF). We determine their empirical convergence rates and training time as functions of (i) the number of sam-
pling stages, (ii) the number of samples, (iii) the number of layers, and (iv) the number of nodes per layer. For the
TDGF, we also consider the order of the discretization scheme and the number of time steps.
1. I NTRODUCTION
Option pricing is a fundamental problem in finance. Since the seminal work of Black and Scholes [5], nu-
merous mathematical models and computational approaches have been developed to determine option prices.
One common approach formulates the price of an option as the solution to a partial differential equation (PDE),
which can be solved numerically using methods such as finite differences or finite elements. However, tradi-
tional grid-based methods suffer from the curse of dimensionality: the number of grid points grows exponen-
tially with the dimension of the problem. This challenge is particularly acute in high-dimensional settings, such
as basket options or Markovian approximations of rough volatility models [1, 18].
Neural networks provide a promising alternative by efficiently approximating solutions to PDEs. Once
trained, they can generate option prices rapidly, bypassing the limitations of conventional numerical meth-
ods. Deep learning-based approaches have been successfully applied in other financial contexts, such as risk
management [6], portfolio optimization [21], and optimal stopping [4], as well as data-driven methods to price
options [17]. Given their potential, various deep learning-based PDE solvers have been proposed [11], including
Backward Stochastic Differential Equation (BSDE) methods [12], Deep Galerkin Methods (DGMs) [19] and
Time Deep Gradient Flow (TDGF) methods [9, 18]. However, their empirical accuracy remains not sufficiently
well understood, which limits their practical adoption in financial applications.
This study focuses on two neural network methods for solving PDEs: the DGM and the TDGF. Our primary
objective is to assess their empirical accuracy. For theoretical convergence analyses, see the work of Jiang,
Sirignano, and Cohen [14] for DGM and Liu, Papapantoleon, and Rou [16] for TDGF. For empirical studies of
other deep PDE solvers, such as BSDE-based methods, see the work of Assabumrungrat, Minami, and Hirano
[3].
To provide actionable insights into the applicability of deep PDE solvers for option pricing, we systematically
analyze the impact of key parameters on accuracy and training time. First, we investigate the effect of training
by varying the number of sampling stages and the number of samples. Second, we investigate the effect of the
size of the neural network by varying the number of layers and the number of nodes per layer. For a comparison
of different architectures, see the work of Van Mieghem, Papapantoleon, and Papazoglou-Hennig [20]. Finally,
for TDGF, we also examine the discretization order and the number of time steps.
Our main findings are: the L2 -error decreases almost linearly with the number of sampling stages; the num-
ber of layers tends to decrease the error, but not with a clear rate; and increasing the number of time steps
decreases the L2 -error with the second-order method decreasing quicker than the first-order method. These
three parameters increase the training time linearly. The number of samples and nodes per layer did not show
2020 Mathematics Subject Classification. 91G20, 91G60, 68T07.
Key words and phrases. Option pricing, PDE, artificial neural network, deep PDE solvers.
1
2 J. ROU
a clear relationship with neither the L2 -error nor the training time. The first three parameters concern training
stages done sequentially, while the other two concern training stages which can be done in parallel.
The remainder of this paper is structured as follows. Section 2 introduces the two neural network-based PDE
solvers. Section 3 outlines the option pricing models under consideration: Black–Scholes and Heston. Section 4
details the implementation aspects. Section 5 presents the numerical results for each of the five parameters.
Finally, Section 6 summarizes our findings and conclusions.
2. N EURAL NETWORK METHODS
This section explains the two neural network methods used in this paper. Subsection 2.1 elaborates on the
TDGF and Subsection 2.2 on the DGM.
2.1. Time Deep Gradient Flow Method. The TDGF is a neural network method to efficiently solve high-
dimensional PDEs [9, 10, 18]. Consider the general PDE
∂
u(t, x) + Au(t, x) + ru(t, x) = 0, (t, x) ∈ [0, T ] × Ω,
∂t
u(0, x) = Ψ(x), x ∈ Ω,
with A a second-order differential operator of the form
d d
X ∂2u X ∂u
Au = − aij + βi . (2.1)
∂xi ∂xj ∂xi
i,j=1 i=1
Using the splitting method from Papapantoleon and Rou [18], A can be rewritten in the form
Au = −∇ · (A∇u) + b · ∇u, (2.2)
with a symmetric an positive semi-definite matrix
11 21
ad1
1
a a ... b
a21 a22 . . . ad2 b2
A= . and vector b = .. . (2.3)
.. .. ..
.. . . . .
a1d a2d . . . add bd
Let us divide the time interval (0, T ] into K equally spaced intervals (tk−1 , tk ], with h = tk − tk−1 = K1 for
k = 0, 1, . . . , K. Let U k denote the approximation to the solution of the PDE u(tk , x) at time step tk , using
either a first- or second-order discretization scheme [2]
U k − U k−1
− ∇ · A∇U k + F U k−1 + rU k = 0,
h
3 k k−2 + 1 U k−1
U − 2U
2 2
− ∇ · A∇U k + 2F U k−1 − F U k−2 + rU k = 0,
h
with F (u) = b · ∇u, U = Ψ and in the second-order scheme we take U 1 from the first-order scheme. Then
0
we can rewrite the discretized PDE as an energy functional [10, 18]
U k = arg min I k (u),
u∈H01
with H01 the Sobolev space in which the derivatives up to order 1 have finite L2 -norm and the energy functionals
Z
k 1 k−1
2 1 T 2
k−1
I1 (u) = u − U +h (∇u) A∇u + ru + F U udx ,
2 L2 (Ω) Ω 2
2
1 4 1
I2k (u) = u − U k−1 + U k−2 (2.4)
2 3 3 L2 (Ω)
Z
2h 1 T 2
k−1
k−2
+ (∇u) A∇u + ru + 2F U −F U udx ,
3 Ω 2
ERROR ANALYSIS OF DEEP PDE SOLVERS FOR OPTION PRICING 3
for the first- and second-order discretization respectively. Let f k (x; θ) denote a neural network approximation
of U k with trainable parameters θ. Applying a Monte Carlo approximation to the integrals, the discretized cost
functional takes the form
2
M n
|Ω| X X
Lkn (θ; x) = f k (xm ; θ) + αnj f k−j (xm ) + βn hNnk (θ; x) , (2.5)
2M
m=1 j=1
with
M
"
|Ω| X 1 k T 2
k k k
Nn (θ; x) = ∇f (xm ; θ) A∇f (xm ; θ) + r f (xm ; θ)
M 2
m=1
n
#
X
j k−j k
+ b · γn ∇f (xm ) f (xm ; θ) .
j=1
Here, M denotes the number of samples xm ; n ∈ {1, 2} the order of the discretization and αnj , βn and γnj are
the corresponding coefficients.
In order to minimize this cost function, we use a stochastic gradient descent-type algorithm, i.e. an iterative
scheme of the form:
θn+1 = θn − α∇θ Lk (θn ; x). (2.6)
The hyperparameter α is the step size of our update, called the learning rate. An overview of the TDGF method
appears in Algorithm 1.
Algorithm 1 Time Deep Gradient Flow method
1: Initialize θ00 .
2: Set f 0 (x; θ) = Ψ(x).
3: for each time step k = 1, . . . , K do
4: Initialize θ0k = θk−1 .
5: for each sampling stage n = 1, ..., N do
6: Generate M random points xm for training.
7: Calculate the cost functional Lk (θnk ; x) for the selected points.
8: k
Take a descent step θn+1 = θnk − α∇θ Lk (θnk ; x).
9: end for
10: end for
2.2. Deep Galerkin Method. We compare the TDGF method with a popular deep learning method for solving
PDEs: the DGM of Sirignano and Spiliopoulos [19]. In the DGM approach, we minimize the square L2 -error
of the PDE:
2
∂u
− ∇ · (A∇u) + b · ∇u + ru + ∥u(0, x) − Ψ(x)∥2L2 (Ω) .
∂t L2 ([0,T ]×Ω)
Then the cost functional for the neural network approximation f (t, x; θ) of u, takes the form
M1
T |Ω| X
L(θ; t, x) = [f (t, xm ; θ) − ∇ · (A∇f (t, xm ; θ)) + b · ∇f (t, xm ; θ) + rf (t, xm ; θ)]
M1
m=1
M2
|Ω| X
+ [f (0, xm ; θ) − Ψ(xm )] .
M2
m=1
The solution of the PDE is approximated by a neural network using stochastic gradient descent as in equation
(2.6). Contrary to the TDGF there is no time stepping. Instead of training a neural network for each time step,
there is one neural network with time as input parameter. An overview of the DGM appears in Algorithm 2.
4 J. ROU
Algorithm 2 Deep Galerkin Method
1: Initialize θ0 .
2: for each sampling stage n = 1, ..., N do
3: Generate M random points (tm , xm ) for training.
4: Calculate the cost functional L(θn ; t, x) for the selected points.
5: Take a descent step θn+1 = θn − α∇θ L(θn ; t, x).
6: end for
3. O PTION PRICING MODELS
This section explains the two option pricing models in which we solve the pricing PDE. Subsection 3.1
elaborates on the Black–Scholes model and Subsection 3.2 on the Heston model.
3.1. Black–Scholes. In the Black and Scholes [5] model, the dynamics of the stock price S is a geometric
Brownian motion:
dSt = rSt dt + σSt dWt , S0 > 0,
with r, σ ∈ R+ the risk free rate and the volatility respectively.
Consider a European call option on S with payoff Ψ(ST ) = (ST − K)+ at maturity time T > 0. Using the
fundamental theorem of asset pricing and the Feynman–Kac formula, the price of this derivative can be written
as the solution to a PDE in this model. Indeed, let u : [0, T ] × Ω → R denote the price of this derivative, with
Ω ⊆ R and t the time to maturity. Then, u solves the Black–Scholes PDE:
∂u 1 2 2 ∂ 2 u ∂u
− σ S − rS + ru = 0, (t, S) ∈ [0, T ] × Ω,
∂t 2 ∂S 2 ∂S
u(0, S) = Ψ(S), S ∈ Ω.
This PDE has an exact solution:
u(t, S) = SΦ(d1 ) − Ke−rt Φ(d2 ),
with Φ the standard normal cumulative distribution function,
S
2
log K + r + σ2 t √
d1 = √ and d2 = d1 − σ t.
σ t
The operator A takes the form (2.2) with the coefficients in (2.3) provided by
1
a = σ2S 2,
2
b = (σ 2 − r)S.
3.2. Heston. The Heston [13] model is a popular stochastic volatility model with dynamics
p
dSt = rSt dt + Vt St dWt , S0 > 0,
p
dVt = λ(κ − Vt )dt + η Vt dBt , V0 > 0.
Here V is the variance process, W, B are correlated (standard) Brownian motions, with correlation coefficient
ρ, and λ, κ, η ∈ R+ . The generator corresponding to these dynamics, in the form (2.1), equals
∂u ∂u 1 ∂2u 1 2 ∂2u ∂2u
Au = −rS − λ(κ − V ) − S2V − η V − ρηSV .
∂S ∂V 2 ∂S 2 2 ∂V 2 ∂S∂V
This PDE does not have an exact solution. The characteristic function of the Heston model does have an
analytical representation [13], from which a reference price can be determined using the COS method [8].
ERROR ANALYSIS OF DEEP PDE SOLVERS FOR OPTION PRICING 5
The operator A takes the form (2.2) with the coefficients in (2.3) provided by
1
a11 = S 2 V,
2
1
a21 = a12 = ρηSV,
2
1
a = η 2 V,
22
2
1
b1 = −r + V + ρη S,
2
1 1
b2 = λ(V − κ) + η 2 + ρηV.
2 2
4. I MPLEMENTATION DETAILS
For the architecture we use the architecture from Papapantoleon and Rou [18] including the use of informa-
tion about the option price in order to facilitate the training of the neural network:
X 1 = σ1 W 1 x + b1 ,
Z l = σ1 U z,l x + W z,l X l + bz,l , l = 1, . . . , L,
Gl = σ1 U g,l x + W g,l X l + bg,l , l = 1, . . . , L,
Rl = σ1 U r,l x + W r,l X l + br,l , l = 1, . . . , L,
H l = σ1 U h,l x + W h,l X l ⊙ Rl + bh,l , l = 1, . . . , L,
X l+1 = 1 − Gl ⊙ H l + Z l ⊙ X l , l = 1, . . . , L,
+
f (x; θ) = S − Ke−rt + σ2 W X L+1 + b ,
with activation functions the hyperbolic tangent function, σ1 (x) = tanh(x), and the softplus function, σ2 (x) =
log (ex + 1), which guarantees that the option price remains above the no-arbitrage bound. The parameters of
the network have dimensions W 1 , U m,l ∈ RD×d ; b1 , bm,l ∈ RD ; W m,l ∈ RD×D ; W ∈ R1×D and b ∈ R for
m = z, g, r, h and l = 1, ..., L, with x ∈ Rd .
We consider the effect of five parameters on the error: the number of sampling stages, N in Algorithm 1;
the number of samples, M in equation (2.5); the number of layers L; the number of nodes per layer D; and
for the TDGF also the number of time steps, K in Algorithm 1. In the last case we consider both the first- and
second-order discretization scheme in equation (2.4). As the default parameter set we take 600 samples per
dimension in each sampling stage. To obtain the total number of samples from this number, multiply by 2 for
DGM in the Black–Scholes (time and stock price) and TDGF in the Heston model (stock price and volatility)
and multiply by 3 for the DGM in the Heston model (time, stock price and volatility). The default network size
is 3 layers and 50 nodes per layers. For the TDGF we take 100 time steps and 500 sampling stages in each time
step and for the DGM we take 100,000 sampling stages. After this many sampling stages the error does not
decrease further.
For both DGM and TDGF we use the Adam optimizer [15] with a learning rate α = 3 × 10−4 , (β1 , β2 ) =
(0.9, 0.999) and zero weight decay. The training is performed on the DelftBlue supercomputer [7], using one
seventh instance of a NVidia Tesla A100 GPU. We run each problem for five different random seed and compare
the average error of the five runs.
As the modeling problem we take the price of a European call option with interest rate r = 0.05 and maturity
T = 1.0 year. We consider the Black–Scholes model with volatility σ = 0.25 and the Heston model with
η = 0.1, ρ = 0.0, κ = 0.01 and λ = 2.0. For the domain Ω we consider S ∈ [0.01, 3.0] and V ∈ [0.001, 0.1].
The solution of the Black–Scholes PDE with these parameters together with the solution produced by the TDGF
with the default training parameters is in Fig. 1.
6 J. ROU
F IGURE 1. Exact price of European call option with r = 0.05, σ = 0.25 and T = 1.0
compared to the price of computed by the TDGF.
5. R ESULTS
In the next subsections we vary one of the parameters while keeping the others constant at the default value.
We compute the L2 -error on an equidistant grid of 47 points in each dimension on the domain.
5.1. Sampling stages. First, we consider the number of sampling stages. For the TDGF we vary the number of
sampling stages per time step from 16 to 500. For the DGM we vary the number of sampling stages from 2048
to 100,000. After this many sampling stages, the error does not decrease anymore in the Black–Scholes model.
In the Heston model, the error seems to quit decreasing quicker for both methods. The fitted convergence rates
for both methods and both models are in Table 1. All convergence rates are slightly larger than -1. The plots
of the L2 -error on linear and log scale are in Figures 2-5 together with the training time. The training time
increases linearly with the number of sampling stages.
ERROR ANALYSIS OF DEEP PDE SOLVERS FOR OPTION PRICING 7
Method Black–Scholes Heston
TDGF -0.91 -0.63
DGM -0.73 -0.75
TABLE 1. Convergence rates for the number of sampling stages.
Method Black–Scholes Heston
TDGF -0.27 -0.11
DGM -0.12 0.2
TABLE 2. Convergence rates for the number of samples.
Method Black–Scholes Heston
TDGF -0.63 -0.47
DGM -0.33 -0.70
TABLE 3. Convergence rates for the number of layers.
Method Black–Scholes Heston
TDGF -1.11 -0.39
DGM 0.15 0.07
TABLE 4. Convergence rates for the number of nodes per layer.
5.2. Samples. Second, we consider the number of samples per dimension in each sampling stage. We vary
the number of samples per dimension from 16 to 600. The fitted convergence rates for both methods and both
models are in Table 2. In general, it is hard to draw conclusions. For the TDGF the rates are slightly negative,
but far from -0.5, which would be the expected rate of convergence for Monte Carlo sampling. For the DGM
the rates are larger and the error does not decrease as uniformly with the number of samples as for the TDGF.
The plots of the L2 -error on linear and log scale are in Figures 6-9 together with the training time. The number
of samples does not have a big impact on the training time.
5.3. Layers. Third, we consider the number of layers of the neural network. We vary the number for layers
from 1 to 4. The fitted convergence rates for both methods and both models are in Table 3. The rates vary but
are all negative so, in general, more layers improves the result. The plots of the L2 -error on linear and log scale
are in Figures 10-13 together with the training time. The training time increases linearly with the number of
layers.
5.4. Nodes per layer. Fourth, we consider the number of nodes per layers of the neural network. We vary
the number of layers from 10 to 50. The fitted convergence rates for both methods and both models are in
Table 4. The rates vary across different methods and models and are even positive for the DGM. The plots of
the L2 -error on linear and log scale are in Figures 14-17 together with the training time. The number of nodes
per layers does not have a big impact on the training time.
5.5. Time steps. Fifth and final, we consider the number of time steps. We vary the number of time steps from
2 to 25. The fitted convergence rates for both models and for both first and second order time-stepping are in
Table 5. After 25 time steps, the second-order scheme does not improve any further, but the first-order scheme
does. The rates for the Black–Scholes are lower than for Heston. In both cases O(2) outperforms O(1). The
plots of the L2 -error on linear and log scale are in Figures 18-19 together with the training time. The training
time grows linearly with the number of time steps with the second order method growing faster than the first
order method.
8 J. ROU
Method Black–Scholes Heston
O(1) -0.29 -0.15
O(2) -0.56 -0.25
TABLE 5. Convergence rates for the number of time steps
6. C ONCLUSION
This research analyzed the error of two neural network methods to solve option pricing PDEs: TDGF and
DGM. We determined the empirical convergence rates of the L2 -error of five parameters in both the Black–
Scholes and the Heston model. We also considered the effect of these parameters on the training time. Based
on the experiments we can give some recommendations that can assist anyone who want to use the methods in
a practical setting.
• For both the TDGF and the DGM, the L2 -error decreases almost linearly with the number of sampling
stages up to some point where it stops converging. Since the training time grows linearly with the the
number of sampling stages, it would be optimal to stop at this point. Unfortunately, there is no method
to locate this point beforehand and we recommend choosing the number of sampling stages based on
whether speed or accuracy is more important in the practical setting.
• For the TDGF the L2 -error decreases slightly with the number of samples. Since the number of samples
does not influence the training time, we recommend using a large number of samples like six hundred
per dimension or even more.
• For the DGM the L2 -error does not decrease with the number of samples. Therefore, it is hard to give
any recommendation.
• For both TDGF and DGM, the number of layers tends to decrease the error, but not with a clear rate.
One layer is clearly not enough, but four layers does not improve the results compared to two or three
layers. Since the number of layers has a big influence on the training time, we recommend using two
or three layers.
• For the TDGF, the L2 -error decreases with the number of nodes per layer. Since the number of nodes
per layer does not influence the training time, we recommend choosing a large number of nodes per
layer like forty of fifty.
• For the DGM, the L2 -error does not decrease with the number of nodes per layer. We recommend
choosing a smaller number of nodes per layer like thirty.
• For the TDGF, the number of time steps decreases the L2 -error. Using a second-order time-stepping
method, the error decreases quicker than using a first-order method. We recommend using the second-
order time stepping method. Since the training time increases linearly with the number of time steps,
we again recommend choosing the number of time steps based on whether speed or accuracy is more
important in the practical setting.
R EFERENCES
[1] E. Abi Jaber and O. El Euch. Multifactor approximation of rough volatility models. SIAM Journal on
Financial Mathematics, 10(2):309–349, 2019.
[2] G. Akrivis and Y.-S. Smyrlis. Implicit–explicit BDF methods for the Kuramoto–Sivashinsky equation.
Applied numerical mathematics, 51(2-3):151–169, 2004.
[3] R. Assabumrungrat, K. Minami, and M. Hirano. Error analysis of option pricing via deep PDE solvers:
Empirical study. In 2024 16th IIAI International Congress on Advanced Applied Informatics (IIAI-AAI),
pages 329–336. IEEE, 2024.
[4] S. Becker, P. Cheridito, and A. Jentzen. Deep optimal stopping. Journal of Machine Learning Research,
20(74):1–25, 2019.
[5] F. Black and M. Scholes. The pricing of options and corporate liabilities. Journal of Political Economy,
81(3):637–654, 1973.
[6] H. Buehler, L. Gonon, J. Teichmann, and B. Wood. Deep hedging. Quantitative Finance, 19(8):1271–
1291, 2019.
ERROR ANALYSIS OF DEEP PDE SOLVERS FOR OPTION PRICING 9
( A ) Linear scale ( B ) Logarithmic scale
F IGURE 2. L2 -error of the TDGF for a call option in the Black–Scholes model against number
of sampling stages varying from 16 to 500.
[7] Delft High Performance Computing Centre (DHPC). DelftBlue Supercomputer (Phase 1). https:
//www.tudelft.nl/dhpc/ark:/44463/DelftBluePhase1, 2022.
[8] F. Fang and C. W. Oosterlee. A novel pricing method for European options based on Fourier-cosine series
expansions. SIAM Journal on Scientific Computing, 31:826–848, 2009.
[9] E. H. Georgoulis, M. Loulakis, and A. Tsiourvas. Discrete gradient flow approximations of high dimen-
sional evolution partial differential equations via deep neural networks. Communications in Nonlinear
Science and Numerical Simulation, 117:106893, 2023.
[10] E. H. Georgoulis, A. Papapantoleon, and C. Smaragdakis. A deep implicit-explicit minimizing movement
method for option pricing in jump-diffusion models. arXiv preprint arXiv:2401.06740, 2024.
[11] L. Gonon, A. Jentzen, B. Kuckuck, S. Liang, A. Riekert, and P. von Wurstemberger. An overview on
machine learning methods for partial differential equations: from physics informed neural networks to
deep operator learning. arXiv preprint arXiv:2408.13222, 2024.
[12] J. Han, A. Jentzen, and W. E. Solving high-dimensional partial differential equations using deep learning.
Proceedings of the National Academy of Sciences, 115(34):8505–8510, 2018.
[13] S. L. Heston. A closed-form solution for options with stochastic volatility with applications to bond and
currency options. The Review of Financial Studies, 6(2):327–343, 1993.
[14] D. Jiang, J. Sirignano, and S. Cohen. Global convergence of deep galerkin and PINNs methods for solving
partial differential equations. arXiv preprint arXiv:2305.06000, 2023.
[15] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980,
2014.
[16] C. Liu, A. Papapantoleon, and J. Rou. Convergence of time-stepping deep gradient flow methods. To
appear, 2025.
[17] S. Liu, C. W. Oosterlee, and S. M. Bohte. Pricing options and computing implied volatilities using neural
networks. Risks, 7(1):16, 2019.
[18] A. Papapantoleon and J. Rou. A time-stepping deep gradient flow method for option pricing in (rough)
diffusion models. arXiv preprint arXiv:2403.00746, 2024.
[19] J. Sirignano and K. Spiliopoulos. DGM: A deep learning algorithm for solving partial differential equa-
tions. Journal of Computational Physics, 375:1339–1364, 2018.
[20] L. Van Mieghem, A. Papapantoleon, and J. Papazoglou-Hennig. Machine learning for option pricing: an
empirical investigation of network architectures. arXiv preprint arXiv:2307.07657, 2023.
[21] Z. Zhang, S. Zohren, and S. Roberts. Deep learning for portfolio optimization. arXiv preprint
arXiv:2005.13665, 2020.
D ELFT I NSTITUTE OF A PPLIED M ATHEMATICS , EEMCS, TU D ELFT, 2628CD D ELFT, T HE N ETHERLANDS
Email address: [email protected]
10 J. ROU
( A ) Linear scale ( B ) Logarithmic scale
F IGURE 3. L2 -error of the TDGF for a call option in the Heston model against number of
sampling stages varying from 16 to 500.
( A ) Linear scale ( B ) Logarithmic scale
F IGURE 4. L2 -error of the DGM for a call option in the Black–Scholes model against number
of sampling stages varying from 2048 to 100,000.
( A ) Linear scale ( B ) Logarithmic scale
F IGURE 5. L2 -error of the DGM for a call option in the Heston model against number of
sampling stages varying from 2048 to 100,000.
ERROR ANALYSIS OF DEEP PDE SOLVERS FOR OPTION PRICING 11
( A ) Linear scale ( B ) Logarithmic scale
F IGURE 6. L2 -error of the TDGF for a call option in the Black–Scholes model against number
of samples varying from 16 to 600.
( A ) Linear scale ( B ) Logarithmic scale
F IGURE 7. L2 -error of the TDGF for a call option in the Heston model against number of
samples varying from 16 to 600.
( A ) Linear scale ( B ) Logarithmic scale
F IGURE 8. L2 -error of the DGM for a call option in the Black–Scholes model against number
of samples varying from 16 to 600.
12 J. ROU
( A ) Linear scale ( B ) Logarithmic scale
F IGURE 9. L2 -error of the DGM for a call option in the Heston model against number of
samples varying from 16 to 600.
( A ) Linear scale ( B ) Logarithmic scale
F IGURE 10. L2 -error of the TDGF for a call option in the Black–Scholes model against num-
ber of layers varying from 1 to 4.
( A ) Linear scale ( B ) Logarithmic scale
F IGURE 11. L2 -error of the TDGF for a call option in the Heston model against number of
layers varying from 1 to 4.
ERROR ANALYSIS OF DEEP PDE SOLVERS FOR OPTION PRICING 13
( A ) Linear scale ( B ) Logarithmic scale
F IGURE 12. L2 -error of the DGM for a call option in the Black–Scholes model against number
of layers varying from 1 to 4.
( A ) Linear scale ( B ) Logarithmic scale
F IGURE 13. L2 -error of the DGM for a call option in the Heston model against number of
layers varying from 1 to 4.
( A ) Linear scale ( B ) Logarithmic scale
F IGURE 14. L2 -error of the TDGF for a call option in the Black–Scholes model against num-
ber of nodes per layer varying from 10 to 50.
14 J. ROU
( A ) Linear scale ( B ) Logarithmic scale
F IGURE 15. L2 -error of the TDGF for a call option in the Heston model against number of
nodes per layer varying from 10 to 50.
( A ) Linear scale ( B ) Logarithmic scale
F IGURE 16. L2 -error of the DGM for a call option in the Black–Scholes model against number
of nodes per layer varying from 10 to 50.
( A ) Linear scale ( B ) Logarithmic scale
F IGURE 17. L2 -error of the DGM for a call option in the Heston model against number of
nodes per layer varying from 10 to 50.
ERROR ANALYSIS OF DEEP PDE SOLVERS FOR OPTION PRICING 15
( A ) Linear scale ( B ) Logarithmic scale
F IGURE 18. L2 -error of the TDGF for a call option in the Black–Scholes model against num-
ber of time steps varying from 2 to 25.
( A ) Linear scale ( B ) Logarithmic scale
F IGURE 19. L2 -error of the TDGF for a call option in the Heston model against number of
time steps varying from 2 to 25.