Let's Be Rational: J Ac06 Vog07
Let's Be Rational: J Ac06 Vog07
Peter Jäckel∗
h √ i • The highest and lowest segments are defined via non-
ln(F/K) σ̂· T
−K ·Φ θ· √
σ̂· T
− 2 linear transformations that ensure the correct asymp-
totic behaviour of the initial guess function to first or-
where θ = 1 for call options and θ = −1 for put options. der, not just dominance.
The starting point was gaining a fundamental understand-
ing of the difficulties involved, which lie largely with the • Define three branches for the objective function, based
fact that the Black function permits no Taylor expansion on the reciprocal of the logarithm of the price for the
around σ̂ = 0 when F 6= K. Based on the asymptotics of lower branch, the price itself in the middle branch, and
the Black function for small and large values of σ̂, the key the logarithm of the distance of the price from its limit
components of the method published in “By Implication” value for infinite volatility for the upper branch.
were:- 1
E.g., see that in figure 6 in [Jäc06] the number of iterations for a
∗
OTC Analytics relative accuracy of 10−8 in implied volatility goes as high as 10.
1
• Use two iterations of the third order Householder iter- of the approximations is only about 10−2 . Vogt [Vog07],
ation method which is a rational function of the object- as already mentioned, points out that the method described
ive residual and has convergence order four [Hou70]. in our previous publication on the subject suffers from an
increased required number of iterations when the strike is
In order to avoid any misunderstandings, we state at this close to the forward and the input price is very low. Vogt
point the purpose of this communication. The aim of the gave an improved asymptotic guess for this parameter re-
method presented here is not to provide 15 digits of ac- gion based on a transformation to Lambert’s W function,
curacy of implied volatility for trading purposes, or for the which preserves the correct asymptotic behaviour as the in-
sake of gaining an intellectual understanding of the rela- put price goes to zero. This was indeed the original start-
tionship between volatility and option price. This would ing point of the work presented here, only that we avoid
of course be ridiculous. For the latter, the publications the Lambert W function and instead express all transform-
by Brenner and Subrahmanyam [BS88] and Corrado and ations (used in aid of correct asymptotics) in terms of the
Miller [CM96, CM04] are excellent resources. Indeed, cumulative normal function Φ(·) and its inverse since we
Corrado and Miller themselves emphasize that that is the already have those as part of our standard financial ana-
aim of their own publication, and that for industrial applic- lytics library. Grunspan [Gru11] demonstrates impressive
ations, numerical solutions should be employed. The pur- stamina and gives higher order asymptotic expansions de-
pose of this communication, instead, is to satisfy that in- rived by the aid of the formal transseries framework, but
dustrial need. One of the main reasons is that the Black for- we make no use of those results here.
mula has become an integral part of many analytical repres-
entations of other models and approximations, and is part
of a range of analytical transformations. In those applica- 2 Preliminaries
tions, the Black formula can end up being used with input
parameters that, per se, in the context of a trading desk’s Instead of the standard Black function (1.1), we prefer to
purposes, would never be encountered, and, all this may work with the normalization
yet be combined with numerical calibration routines which
may end up exploring even more extreme input parameters. x := ln(F/K) (2.1)
And of course it isn’t just about the mapping from volatil- √
σ := σ̂ T (2.2)
ity to option prices, but the reverse, too, is needed. For the √
sake of brevity, we name but three such analytical mapping b(x, σ, θ) := B(F, K, σ̂, T, θ)/ F K (2.3)
h x
situations: the representation of 1) CEV volatility, 2) dis- = θ · e /2 · Φ(θ[ x/σ + σ/2]) (2.4)
placed diffusion volatility, and 3) dividend model process
−x
i
volatility as a Black implied volatility smile, especially, for −e /2 · Φ(θ[ x/σ − σ/2])
short maturities. Also, in some applications, local volatility
is numerically computed from implied volatility and its de- The normalized Black function (2.4) satisfies the
rivatives up to second order2 via finite differencing by the “reciprocal-strike-put-call invariance”
aid of their analytical relationship, e.g., [BBF02, equation
15]. It is in these applied analytical calculations when prac- b(x, σ, θ) = b(−x, σ, −θ) (2.5)
titioners really should be able to use the Black formula and
its inverse to reproduce inputs close to within machine ac- and the “time-value-put-call invariance”
curacy, just as we would demand for the exponential func-
tion and the natural logarithm, or for the sine and cosine b(x, σ, θ) − ι(x, θ) = b(x, σ, −θ) − ι(x, −θ) (2.6)
functions and their inverses. What’s more, the calculation
with
of implied volatility may be part of analytical computations
that reside within modules that are executed a great many ι(x, θ) := bmax − b−1
(2.7)
max +
number of times (e.g., in local volatility precomputations
on a refined grid), and for that reason, may need to be very and
θx
fast, in addition to accurate. bmax := e /2 . (2.8)
For the sake of at least partial completeness, we include
From here on, we shall only deal with out-of-the-money
a brief literature review. Li [Li06] gave a rational approx-
call options, i.e., the case θ = +1 and x ≤ 0 which is
imation for |x| ≤ 1/2 and σ > |x|/2 (though we were only
without loss of generality by virtue of the invariances (2.5)
informed of his work after having conducted the research
and (2.6). With this restriction, we have the bounds
presented here). This range of parameters is not even wide
enough for normal trading desk purposes, and the accuracy
0 ≤ b ≤ bmax ≤ 1 (2.9)
2
Numerical differentiation inherently loses accuracy, requiring the
underlying function to be of significantly higher precision. As a rule of with
thumb, if a function f has relative accuracy , then its numerical second x/2
bmax = e . (2.10)
√
order derivative f 00 can only attain , i.e., half on a logarithmic scale. θ=1
2
3 Asymptotics whence, having zero slope at both ends of its range, it is
of sigmoid shape. In its central region, near σc , with that
In [Jäc06], we gave the asymptotic behaviour of b for small point being a turning point, it is comparatively linear. To
and large σ, which (for θ = +1 and x ≤ 0) are:- take advantage of this near-linearity in the central section
σ 3 for an initial guess, we need to identify a lower and an up-
lim b ≈ x · ϕ( x/σ) · (3.1) per limit of this as yet only vaguely defined central region.
σ→0 x An obvious and easy choice is to draw a tangent through
lim b ≈ bmax − /σ · ϕ( σ/2)
4 (3.2) the point (σc , bc ), and let the location of the intersections of
σ→∞
this tangent with the limit levels of b be the lower limit σl
which can be derived using [AS84, (26.2.12)]. With the and the upper limit su , i.e.,
same formula, we can convert (3.1) and (3.2) to
bc
2π|x|
−|x|
3 σl := σc − 0
(4.3)
lim b ≈ √ ·Φ √ (3.3) b (σ c)
σ→0 3 3 3σ
(bmax − bc )
lim b ≈ bmax − 2Φ( −σ/2) (3.4) σu := σc + (4.4)
σ→∞ b0 (σc )
3
1 the rational cubic interpolation f rc (x) reads
bmax(x)
f rc (x; xl , xr , fl , fr , fl0 , fr0 , r) = (4.10)
bu(x)
bc(x) fr s3 + (rfr − hfr0 )s2 (1 − s) + (rfl + hfl0 )s(1 − s)2 + fl (1 − s)3
0.1 1 + (r − 3)s(1 − s)
bl(x)
with
h := xr − xl , and s := (x − xl )/h . (4.11)
0.01 The parameter r is a control parameter that can be chosen
freely subject to r > −1, else the interpolation would incur
a pole inside [xl , xr ]. In the limit of r → ∞, the rational cu-
0.001
bic interpolation converges to a linear form. Delbourgo and
0 2 4 6 8 10 Gregory [DG85] also provide simple conditions for r such
|x|
Figure 2: The four zones of the initial guess function. that the interpolation preserves monotonicity [their equa-
tion (3.8)] and convexity [their equation (3.18)], when the
In the two central regions [bl , bc ] and (bc , bu ], we know input data permit it. Conveniently, it is easy to configure r
that σ(β) is linear to second order near bc and only mod- to meet a given second derivative of f (·) at either the left
eratly curved at the outside ends of the two zones. Here, hand side edge of the interpolation bracket as
we take note of the extensive literature on the subject of 00
1
2 hfl + (fr0 − fl0 )
function approximation. By far the most commonly used rl (xl , xr , fl , fr , fl0 , fr0 , fl00 ) = (4.12)
∆ − fl0
approach, especially for high-efficiency implementations,
is to approximate the target function as a rational function, or, respectively, at the right hand side edge via
i.e., as the ratio of two polynomials, on carefully selected 1 00 + (fr0 − fl0 )
2 hfr
regions. Often, this is combined with non-linear transform- rr (xl , xr , fl , fr , fl0 , fr0 , fr00 ) = (4.13)
fr0 − ∆
ations that are specific to each interpolation zone in order
to match certain asymptotic features of the target function. with ∆ := (fr − fl )/h. We choose the parameter r such
In practical applications, this approach is behind the im- that, on both the centre left and the centre right segment,
plementation of virtually any special function. In this con- respectively, we obtain a rational interpolation form that
text here, we mention three examples of particular relev- matches the second derivative of σ(β) in the inflexion point
ance, namely, the cumulative normal function and its cousin bc , subject to the aforementioned monotonicity and convex-
the error function [Mar04, Cod69, Cod90, Mic93b], the in- ity restrictions. We compute the second derivative σ 00 (β)
verse cumulative normal function [Wic88], and of course from
the Lambert W function [Veb09], though there are entire d 1
dβ σ(β) = b0 (4.14)
libraries of special functions based on rational approxim-
d2 d 1
ations, e.g., [Mic93a]. Also, Halley’s iteration method is dβ 2
σ(β) = dβ b0
ultimately based on a rational form as is its generalisation = d 1
· d
dσ b0 dβ σ(β)
to higher order, the Householder method [Hou70]. For uni- 00
variate functions, extensions of the Remez algorithm can = − bb0 3 (4.15)
be used to find rational approximations that are numeric- whence
ally effectively optimal in the sense of the minimax solu- 00
σ 00 (β)β=bc = − bb0 (σ(σc))3 = 0
tion, and this is how most of the above mentioned rational (4.16)
c
approximations were computed. It is in principle possible due to b00 (σc ) ≡ 0. This gives us the initial guess function
to extend this to two dimensions when there is an extra in the centre left region
dependency (as is the case with σ(β) which also depends
on x), though, the resulting formulae can readily involve σ0 (β)
= fclrc (β) (4.17)
a significant number of coefficients, rendering it more ef- β∈[bl ,bc ]
4
with which is asymptotically linear in β when β → 0. Con-
tinuing with our rational theme, we approximate this func-
rc
fcr (β)= f rc (β; bc , bu , σc , σu , 1/b0c , 1/b0u , r(bc ,bu ] ) (4.21) tion, too, by the Delbourgo-Gregory interpolation, match-
ing level and slope at either end of the region, and setting
and
the control parameter r to match the second derivative at
r(bc ,bu ] = rl (bc , bu , σc , σu , 1/b0c , 1/b0u , 0) . (4.22) the right hand side edge. For this, we compute
2 + σ2
We note that the extra calculation effort required for the fl0 (β) = 2π · z 2 · Φ(z)2 · ez 8 (4.32)
evaluation of the respective rational cubic interpolation for- σ2
2 2
mulae over and above what already has been computed is fl00 (β) = π
6 · σz 3 · Φ(z) · e2z + 4 · (4.33)
only √
· 8 3 σ|x| + 3σ 2 (σ 2 − 8) − 8x2 · Φ(z)
b0 (σl ) when β ∈ [bl , bc ] ϕ(z)
b0 (σu ) when β ∈ (bc , bu ] .
and
The evaluation of the rational cubic form itself is very little
effort. Depending on the hardware and compiler, the CPU lim fl (β) = 0 (4.34)
β→0
effort is little more than, or possibly the same as, the evalu-
ation of a single vega expression b0 (σ). lim fl0 (β) = 1 (4.35)
β→0
In the upper region β ∈ (bu , bmax ), we use the asymptotic
√
formula (3.4) to define a non-linear transformation fu (β) of wherein, as before z = −|x|/ 3 σ and σ = σ(β), to obtain
σ(β) that is asymptotically linear in β when β → bmax :
flrc (β) := f rc (β; 0, bl , 0, fl (bl ), 1, fl0 (bl ), r[0,bl ) ) (4.36)
fu (β) := Φ( −σ(β)/2) (4.23)
with
We approximate this function by a rational cubic interpol-
ation that matches its level, slope, and second derivative at r[0,bl ) = rr (0, bl , 0, fl (bl ), 1, fl0 (bl )fl00 (bl )) . (4.37)
the left edge of the interval, and its level and slope at the
right edge, using The initial guess in the lower region is then the result of
solving (4.31) for σ, and replacing flrc for fl :
1 x 2
fu0 (β) = − 12 · e 2 σ2 (4.24)
−1
x √
q
σ0 (β) = √ Φ−1 3 · 3 flrc (β)/2π|x| .
00
p π x2 x 2 + σ 2
fu (β) = 2 · σ 3 · e σ 2 8 (4.25) β∈[0,b l ) 3
(4.38)
and
The net function σ0 (β) divides into four branches:
lim fu (β) = 0 (4.26)
β→bmax
see expression (4.38) for β ∈ [0, bl )
lim fu0 (β) = − 2 1
(4.27)
β→bmax
f rc (β) for β ∈ [bl , bc ]
cl
wherein σ = σ(β). This gives us σ 0 (β) =
fcr rc
(β) for β ∈ (bl , bu ]
furc (β) := f rc (β; bu , bmax , fu (bu ), 0, fu0 (bu ), −1/2, r(bu ,bmax ) )
−2 · Φ−1 (furc (β))
for β ∈ (bu , bmax )
(4.28)
(4.39)
with
Overall, it is of class C 1 with σ000 (β) being discontinuous
0 00
r(bu ,bmax ) = rl (bu , bmax , fu (bu ), 0, fu (bu ), −1/2, fu (bu )) . at β = bl and β = bu , and σ0000 (β) being discontinuous
(4.29) at β = bc . We show examples of σ0 (β) for four different
values of x in figure 3. The quality of the approximation
The initial guess in the upper region is then composed by speaks for itself.
solving (4.23) for σ, and replacing furc for fu :
σ0 (β) = −2 · Φ−1 (furc (β)) . (4.30) 5 Rational iteration
β∈(bu ,bmax )
This leaves us to define the initial guess function for the Having established the initial guess function σ0 (β), with
lower region β ∈ [0, bl ). Here, we make use of the asymp- β being the normalized input price, we now determine the
totic form (3.3) to define the non-linear transformation iteration procedure to obtain an accurate implied volatility
figure. To specify the iteration, we need to choose a) an
−|x|
fl (β) := 2π|x|
√ · Φ(z)3
3 3
with z := √
3 σ(β)
(4.31) objective function, and b) the iteration functional.
5
7 σ(exact) with
σ0 |x| = 1/16 b̃u := max(bu , bmax/2) (5.2)
6
5 where we have again suppressed the dependence of the nor-
malized Black function b on x. The respective transforma-
4
tions in (5.1) have been chosen to improve the convergence
3 of the respective Lagrange inversion series of g(σ). In com-
σu
2 mon parlance, this means that we chose the objective func-
1 tion branches in order to make the inverse of the objective
σc function well approximated by a low order local rational
0σlbl bc bu bmax
0 0.2 0.4 0.6 0.8 1 approximation.
β
In order to find the sought implied volatility, we need to
7 σ(exact) locate the the root of g(σ). For this, we use an iterative
σ0 |x| = 1
6 procedure. Whilst most practitioners are familiar primarily
5
with the Newton-Raphson method, there are in fact quite
a few generic techniques for this purpose in the literature.
4
In [Jäc06], we used Halley’s method which consists of a
3σu rational function of order (1, 1) of the residual g(σ), i.e., it
2 can be written as the ratio of a polynomial of first order in
σc
1σl g divided by another polynomial of first order in g.
0 bl bc bu bmax
Other authors have suggested the use of the Chebyshev
0 0.1 0.2 0.3 0.4 0.5 0.6 method which is a second order polynomial in g, which
β
means the iteration would have the form
σ(exact)
8
σ0 |x| = 8 Cheby
σn+1 = αnCheby + γnCheby gn + δnCheby gn2 (5.3)
7
6 with gn := g(σn ) and all coefficients being functions of σn
σu
5 that, generically, do not become small or infinite as g → 0
4σc (whence they do not affect convergence considerations). In
3σl comparison, Halley’s method takes on the form
2
1 Halley αnHalley + γnHalley gn
σn+1 = . (5.4)
0 bl bc bu bmax
1 + δnHalley gn
0 0.0020.0040.0060.008 0.01 0.0120.0140.0160.018
β In aid of clarification, we mention that both Halley’s and
18 σ(exact) Chebyshev’s method are of the same convergence order,
16 σ0
|x| = 64 i.e., order three, and that, in fact, Chebyshev’s method is
14 identical to a second order Taylor expansion of Halley’s
σu method, and, in turn, Halley’s method is identical to the
12σc
10 σ l Padé(1,1) approximant (which is a kind of rational func-
8 tion expansion of the same convergence order) of Cheby-
6 shev’s method. The reason we chose Halley’s method, and
4 not Chebyshev’s method in [Jäc06] was that, in general, ra-
2 tional function approximations tend to be more flexible and
0 bl bc bu bmax are overall preferred3 , though as for the convergence order
0 2E-15 4E-15 6E-15 8E-15 1E-14 1.2E-14
β there is of course no difference whatsoever.
Figure 3: Four examples of the initial guess function σ0 (β). When it comes to the choice of an iteration procedure for
the purpose of high accuracy solutions, we have the choice
Assuming as before that we have used the invari- between either going for a higher convergence order, hop-
ances (2.5) and (2.6) to transform to the case of x ≤ 0 and ing to need fewer iterations, or to save the effort to compute
θ = 1, we define the objective function in three branches
3
according to To highlight this point we quote from [PTVF92] on the subject of
(rational) Padé approximants: “It is sometimes quite mysterious how
well this can work” and “Padé approximation has the uncanny knack of
1 1 picking the function you had in mind from among all the possibilities.”.
ln(b(σ)) − ln(β) for β ∈ [0, bl )
This is followed by some caveats that we need not worry about here
g(σ) = b(σ) − β since we choose our objective function to be amenable to rational ap-
for β ∈ [bl , b̃u ] (5.1)
proximation, and since we are guaranteed to be close to the solution by
ln bmax −β
for β ∈ (b̃u , bmax ) the excellent quality of our initial guess, thus receiving the full benefit
bmax −b(σ) of the local rational approximation of the inverse function.
6
the extra coefficients, and carry out more iterations. As a 6 The Black function
rule of thumb, when the effort to compute derivatives of the
objective function is higher than the evaluation of the ob- Irrespective of any transformations we may choose in our
jective function itself, it is advisable to use a lower order target objective function whose root will be our sought im-
method and iterate more, else, use a higher order method plied volatility number, such as those given in (5.1), we in-
with fewer iterations. In our context, the higher order de- evitably need to evaluate the Black function which is con-
rivatives of the objective function are all easier to compute ventionally implemented directly in the form in which it is
than the objective function itself. This is ultimately be- written. In our case, for the normalized Black function (2.4)
cause Φ(·) is more effort to evaluate than ϕ(·). A generic with x ≤ 0 and θ = 1, this means we take the numerical
iteration procedure of arbitrary order d is Householder’s difference of two exponentially weighted cumulative nor-
method [Hou70] given by mal functions:
(1/g)(d−1) (σn ) b = Φ(h + t) · φ − Φ(h − t)/φ , (6.1)
σn+1 = σn + d · . (5.5)
(1/g)(d) (σn )
The first and second order versions are identical to the with
x/2
Newton-Raphson and Halley’s method, respectively. We φ=e , h := x/σ , and t := σ/2 . (6.2)
have chosen to use the third order method When both 0 < |x| 1 and σ 1, as is the case for
1 + 12 γn νn almost all options that are near the money, this means that
σn+1 = σn + νn · (5.6)
1 + νn γn + 16 δn νn we have φ ≈ 1, and the numerical value of b is domin-
ated by the result of the subtraction of two cumulative nor-
with
mal function values of nearby arguments, centred around
g 00 (σn ) g 000 (σn )
νn := − gg(σ n)
0 (σ ) ,
n
γn := g 0 (σn ) , δn := g 0 (σn ) , (5.7) Φ(h) ≈ 1/2. This poses one of the most common and stand-
which, somewhat confusingly, whilst being the third or- ard problems of error propagation in numerical analysis:
der Householder method, is of fourth order convergence the divergence of the relative error of a function defined as
in the residual error. We remark that the third or- a difference, also known as Subtractive Cancellation. The
der Householder method is a rational function of or- error analysis of this case, to first order, is as follows. First,
der (2, 2) in the residual g. We spare the reader denote by εi a real-valued number that is randomly4 some-
the listing of all the involved terms of the third or- where in the range [−, ] where is defined as the IEEE
der Householder method for all three branches of g(σ) 64 bit constant DBL EPSILON. Since all numerical evalu-
but mention that they are explicitly given in the source ations are only accurate to within on a relative scale, when
code comments of the reference implementation available numerically evaluated, the normalized Black function actu-
at www.jaeckel.org/LetsBeRational.7z [Jäc13]. Preempt- ally returns
ing our numerical results later on somewhat, we mention
b ≈ Φ(h + t)(1 + ε1 ) − Φ(h − t)(1 + ε2 ) (6.3)
that the combination of our four-branch initial guess func-
tion, with our three-branch objective function, and the third where we have dropped φ since it is a number near 1 and
order Householder method enables us to attain the max- irrelevant for our analysis. By expansion, this becomes
imum achievable accuracy on standard IEEE 754 (53 bit
mantissa) floating point hardware with exactly two itera- b ≈ Φ(h)(1 + ε1 ) + ϕ(h)t(1 + ε1 ) (6.4)
tions for all possible input values. The subtle point here − Φ(h)(1 + ε2 ) + ϕ(h)t(1 + ε2 )
is the maximum achievable accuracy which surprisingly
strongly depends on the implementation of the (normal- ≈ 2Φ(h)ε3 + 2ϕ(h)t(1 + ε4 ) (6.5)
ized) Black function that we use in our iteration, as we shall
≈ 2ϕ(h)t + 2 [Φ(h)ε3 + ϕ(h)tε4 ] (6.6)
discuss in the next section.
Remark 5.1. The reason for the choice of the third order where we have consolidated ε1 − ε2 ≈ 2ε3 and ε1 + ε2 ≈
method is the balancing of comparative efforts. With a 2ε4 . This makes the relative numerical evaluation error
second order method (e.g., Halley’s), we would often need
bnumerical Φ(h)ε3 + ϕ(h)tε4
three iterations to reach maximum accuracy. On the other −1 ≈ (6.7)
b exact
ϕ(h)t
hand, to obtain full attainable precision with a single itera-
tion we would either need to go to at least 14th order (i.e., 1 Φ(h)
≈ · · ε3 (6.8)
15th order of convergence), or improve our initial guess t ϕ(h)
by at least two decimal orders of magnitude in its weakest
points which invariably would be numerically more effort for small t. As t → 0, the relative error grows like the
than an additional iteration of the third order Householder inverse of t, and there is nothing we can do about it. Unless,
method. As a compromise, we have settled for the initial that is, we don’t carry out the subtraction in (6.1) in the first
guess function presented in section 4 and combined it with 4
The shape of the distribution is irrelevant here: we only need to
two iterations of the third order Householder method. know the attainable range.
7
place! The obvious thing to do is to use a Taylor expansion abscissa ν in the shown examples is directly the scale of
in t around zero. Don’t! Remember that the main reason relative accuracy of the implied volatility σ. It is clear that
that computing implied volatilities is so difficult is the fact any root-finding procedure cannot resolve a root σ ∗ below
that the Black function does not permit a Taylor expansion a relative resolution of ∆ν if the objective function appears
around σ = 0 (unless x ≡ 0) since all of its derivatives to have multiple roots within σ ∗ ± ∆σ with ∆σ = ∆ν · σ ∗ .
in σ = 0 are identically zero! Unfortunately, this dilemma This is what we alluded to earlier when we referred to the
is not resolved by viewing the Black function as weighted maximum attainable accuracy: in order to be able to com-
differences of Φ(h ± t), with h and t as defined in (6.2), pute implied volatility to a relative accuracy of, say, 10−15 ,
keeping h constant, and expanding in t. If you try this, we first need to have a Black function that near the solution
you will find that some of the coefficients still diverge such is smooth down to the same relative accuracy.
that your numerical results are spoiled when σ is very small Unfortunately, the previously handled region is not the
unless you keep increasing the expansion order to ludicrous only area where the conventional implementation of the
levels. So, instead, we reformulate the normalized Black Black function suffers catastrophic loss of accuracy. An-
function according to other such region is the limit of large but negative h ≡ x/σ
even when σ itself is not small at all. In this case, the Black
b = Φ(h + t)eht − Φ(h − t)e−ht (6.9) function may not incur any significant subtractive cancel-
1 2 +t2 ) lation of cumulative normal function values since it is very
= √1
2π
· e− 2 (h · [Y (h + t) − Y (h − t)] (6.10)
possible that the two evaluations Φ(h±t) are of largely dif-
with ferent orders of magnitude. The problem here is different
Φ(z) in nature, and comes down to the inevitable loss of accur-
Y (z) := (6.11)
ϕ(z) acy of the cumulative normal function itself as is explained
which we show in figure 4. The advantage of casting the in the excellent article by George Marsaglia [Mar04]. The
1.4
cause of this is that all implementations of the cumulative
1.2 normal function of a large negative argument in some way
1 or another involve an evaluation of the exponential func-
0.8 Y(z)
q tion with a large and negative argument. Unless we imple-
0.6 π
0.4
2
ment our own exponential function, we are therefore at the
0.2 mercy of the platform’s built-in exponential function which
0 tends to be a low-level assembler function call, and, typ-
0 2 4 -z 6 8 10
Figure 4: The function Y (z). ically, only gives us about 14–15 decimal digits of relative
accuracy for large negative arguments. This teaches us two
normalized Black function in this form is that the expres-
lessons. The first is that we should avoid (whenever af-
sion
fordable on balance) computing the Black function as the
[Y (h + t) − Y (h − t)] (6.12)
difference of terms involving individual exponential terms
permits a perfectly usable Taylor expansion in t for h ≤ 0, since this exacerbates the loss of accuracy due to subtract-
even when h is exactly zero, and that is how we do it. ive cancellation. The second is that we may prefer formula-
We skip the details of the actual expansion and refer the tions that have fewer exponential function evaluations when
reader to the code comments in the reference implement- available. In the region of large negative h, we can realize
ation in [Jäc13], though we mention that we use it when these preferences by the aid of the formulation (6.10) for
√
t < τsmall with τsmall := 2 16 ≈ 0.21 (and |h| not too large). the normalized Black function, and make use once again of
We show two examples as to how noisy the Black func- the asymptotic expansion [AS84, (26.2.12)] to write Y (z)
tion b(x, σ) can be as a function of σ, on a relative scale, as the rational function
in figure 5, in comparison with the results we obtain when (−1)n 1·3...(2n−1)
1 1 1·3
using an expansion of expression (6.12) in t. Note that the Y (z) ≈ z − z3
+ z5
+ ... + z (2n+1)
(6.13)
4e-13 4e-07
3e-13
direct evaluation
via expansion 3e-07
direct evaluation
via expansion
when z 0. Note that this is a divergent series which
2e-13 2e-07 means that for any value of z, there is a critical level for
1e-07
1e-13
0
n beyond which the approximation series worsens as you
0
-1e-13
-1e-07 increase n. In other words, there is some optimal level n
-2e-07
-2e-13
x = -0.0001
-3e-07
x = -1E-10 where the relative error of the approximation series com-
σmid = 0.001 σmid = 1E-09
-3e-13 -4e-07 pared to the exact value of Y (z) is minimal. Obviously, the
-4e-13
-2E-13 -1E-13 0
ν
1E-13 2E-13
-5e-07
-2E-07 -1E-07 0
ν
1E-07 2E-07 larger |z|, the larger the optimal level n at which the relative
Figure 5: Two examples for the noise on the Black function when eval- error is minimal. We found that for n = 17, the approxim-
uated directly, in comparison to the use of an expansion of expres- ation series has a maximum relative error of 1.64 · 10−16
sion (6.12). The abscissa ν is the relative distance to an arbitrarily
for all z ≤ −10, which makes it accurate to within the
chosen mid value σmid . The ordinate is the function value’s relative dis-
b(x,σmid ·(1+ν)) best attainable limit on 64 bit floating point hardware. In
tance from its value in the centre, i.e., b(x,σmid )
− 1.
order to avoid any subtractive cancellation in the normal-
8
2
ized Black function when h < hlarge , with hlarge := −10 e−y for sizeable y > 0 is required, instead, the product of
(and t somewhat smaller than |h|), in addition to minimiz- two exponential evaluations is computed, one aiming at the
ing the exponential noise, we proceed as follows. We use magnitude of the result, and one aiming at the fine resolu-
the asymptotic series (6.13) with n = 17 and substitute it tion according to
into [Y (h + t) − Y (h − t)] in (6.10), and analytically eval- 2 2
uate and simplify the resulting expression (which is of con- e−y = e−ỹ · e−(y−ỹ)·(y+ỹ) (6.16)
siderable length), in order to take advantage of all possible
where ỹ is chosen to give the overall magnitude (down to
analytical cancellations of terms such as +t and −t. The
one 16th) as
resulting asymptotic expression, whilst somewhat lengthy, by · 16c
turns out to give us a reliable and smooth normalized Black ỹ := . (6.17)
16
function that enables us to compute implied volatilities even
This does not completely solve the issue, but it helps a long
when x approaches the absolute lower limit5 on 64 bit hard-
way. It is worth mentioning that this technique is also used
ware, which is about -707.
in other implementations, e.g., the one given in [Mic93b].
When neither t 1, nor h < hlarge , as we mentioned
We show in figure 6 the four different evaluation zones
above as the lessons we learned from George Marsaglia’s
for the normalized Black function for h ≤ 0 and θ ≡ 1. In
excellent article, we should still avoid computing the Black
function as the difference of terms involving individual ex- 2.5
ponential terms, or at least minimize the number of expo- III.
2
nentials, and so we stick with the formulation (6.10). Here,
we take advantage of the fact that Y (z) is related to the 1.5
little known special function called the scaled complement- t IV.
1
ary error function erfcx() via the simple relationship
√ 0.5
Y (z) = 21 · 2π · erfcx( −z/√2 ) . (6.14) I.
0 II.
Since there is a highly accurate and efficient numerical im- 0 2 4 6 8 10 12
|h|
plementation for erfcx() based on rational approximations Figure 6: The four different evaluation regimes of the normalized Black
involving at most one exponential function evaluation, we function in the (h, t)-plane with h = x/σ, t = σ/2, x ≤ 0, and θ = 1.
at least halve the noise level. Otherwise, if we go the con-
summary, these are, in order of precedence:-
ventional route to evaluate b(), we incur at least one ex-
ponential inside the implementation of the cumulative nor- I. (|h| > |hlarge |) ∧ (t < |h| − |hlarge | + τsmall ) with
√
mal function, and another one for each of the scaling terms τsmall = 2 16 ≈ 0.21: substitute the series approxim-
−x −x
e /2 and e /2 , resulting in the subtraction of two terms ation (6.13) of order n = 17 into (6.10). Analytically
that each involved two exponentials. What’s more, for simplify the sub-expression (6.12)
x ≥ 0.46875, Cody’s implementation in [Cod69, Cod90]
of erfcx(x), is given only as a rational function approx- [Y (h + t) − Y (h − t)]
imation, which means that we obtain Y (z) represented by
after the substitution (6.13) to take advantage of exact
a pure rational approximation, without any exponentials,
cancellation of terms such as +t and −t. The net result
when z ≤ −0.66291260736239.
gives the sub-expression (6.12) as a rational function
Having emphasized the benefits of the formula- of h, multiplied with one exponential.
tion (6.10), we must, alas, make an exception when b() is
dominated by the first of the two terms in the Black formula II. t < τsmall : substitute a twelvth order Taylor expansion
when expressed as (6.9). We then retain more relative ac- of the sub-expression (6.12)
curacy by not attempting to combine the two terms in any
way, and sticking with the formulation (6.9). As a rule of [Y (h + t) − Y (h − t)]
thumb, we do this when t > 0.85 + |h| (with h ≤ 0 and
in t around zero in the normalized Black function for-
θ ≡ 1 as before). We then use the equality
mulation (6.10).
Φ(z) = 1
2 · erfc( −z/√2 ) (6.15) III. t > 0.85 + |h|: evaluate b() as the exponentially-
and evaluate Cody’s [Cod69, Cod90] implementation of weighted difference of cumulative normals as given
the complementary error function erfc() which contains a in (6.9). Use a highly accurate version of the
round-off limiting technique specifically aimed at the inac- cumulative normal such as mapping it to Cody’s
curacy of the exponential function for large negative argu- erfc() [Cod69, Cod90] via (6.15).
ment mentioned in George Marsaglia’s article. Whenever IV. everywhere else: evaluate b() in the formulation (6.10)
5
The limit for x is determined by the minimum attainable ratio of with Y () being computed via (6.14) by the aid of
F/K, which is about 10−307 , making xmin ≈ ln(10−308 ) ≈ −707. Cody’s erfcx() [Cod69, Cod90].
9
Finally, we mention that there is yet another reason for <9
9<
loss of accuracy, though this one is directly for the implied
volatility, and not for the Black function. This happens
when the input price is near the maximum: β . bmax . In
this limit, the relevant quantity of information content is the
difference from the maximum, namely (bmax − β), and the
relative accuracy of the output implied volatility can only
be expected to be as good as
bmax
· (6.18)
bmax − β
that the input number β, when it is, say, within 10−m (relat-
ive) of bmax , only contains approximately (N − m) decimal Figure 8: Relative error | ∆σ
σ
| for (|x|, σ) ∈ [0, 10−5 ] × [10−5 , 0.18].
digits of relevant information, with N := | log10 ()|, and
<9
9<
thus we cannot produce a result that has the full N digits
of accuracy. This, limit case, however, is in practice of no
concern since this is the situation of volatilities and prices
being so high that prices have no discernible vega. What’s
more, whilst we do in practical calculations encounter low
volatility scenarios of any imaginable magnitude, the ultra-
high volatilities just don’t arise. We will, however, in our
numerical charts see the gradual increase of the residual
noise level in the limit of β → bmax .
7 Numerical results 9
n=2
We now show a number of graphs with numerical results.
First, in figures 7 and 8, we show two diagrams depicting Figure 9: Relative accuracy | ∆σ
σ
| for (|x|, σ) ∈ [0, 16] × [10−5 , 7.07].
10
0 0
-2
log10(|σ0 iterations/σ−1|)
log10(|σ1 iteration/σ−1|) -2
As for its execution speed, the presented method eval-
log10(|σ0 iterations/σ−1|)
log10(|σ1 iteration/σ−1|)
-4
log10(|σ2 iterations/σ−1|)
log10(|σ3 iterations/σ−1|) -4 uates
log10(|σ2 iterations/σ−1|)
log10(|σ3 iterations/σ−1|)
a single implied volatility with two iterations on a
-6
log10(DBL_EPSILON)
pin locations
-6 log10(DBL_EPSILON)
standard computer in just under one microsecond most of
-8 -8 which is spent in the normalized Black function. This
-10 -10
Householder(3) for |x| = 1/2 Householder(3) for |x| = 1/2 is more than 5 times faster than the algorithm of [Jäc06]
-12 -12
-14 -14
which takes about 5 microseconds on the same hardware
-16 -16 when configured to have comparable accuracy, specifically,
0 0.2 0.4 β/b 0.6 0.8 1 -100
max
-80 -60
log (β/b )
-40 -20 0
1E-15. The speed advantage goes down to only about 30%
10 max
Figure 10: Residual relative errors for |x| = 1/2. when the algorithm of [Jäc06] is evaluated with a target
0
log10(|σ0 iterations/σ−1|)
0
precision of 2E-12 (though at that level we could here get
log10(|σ0 iterations/σ−1|)
-2 log10(|σ1 iteration/σ−1|) -2 log10(|σ1 iteration/σ−1|)
-4
log10(|σ2 iterations/σ−1|) away with just one iteration most of the time, and perhaps
log10(|σ2 iterations/σ−1|)
log10(|σ3 iterations/σ−1|) -4 log10(|σ3 iterations/σ−1|)
-6
log10(DBL_EPSILON)
-6 log10(DBL_EPSILON) increase the Householder order by one). It is clear that
pin locations
-8 -8 the algorithm of “By Implication” converges relatively eas-
-10
Halley for |x| = 1/2
-10
Halley for |x| = 1/2
ily, on average, to that lesser accuracy (albeit that it needs
-12 -12
more iterations), but requires significantly more effort to
-14 -14
-16 -16
then home in on the higher precision. The difficulty of con-
0 0.2 0.4 β/b 0.6 0.8
max
1 -100 -80 -60
log (β/b )
-40 -20 0 vergence to high accuracy is caused by the much noisier
10 max
Figure 11: Residual relative errors for |x| = 1/2 with Halley’s method. normalized Black function used there. The net effect is that
0
log10(|σ0 iterations/σ−1|)
0 each single iteration of “By Implication” is faster than here,
log10(|σ0 iterations/σ−1|)
-2 log10(|σ1 iteration/σ−1|)
log10(|σ2 iterations/σ−1|)
-2 which is due to the simpler, but much noisier, normalized
log10(|σ1 iteration/σ−1|)
log10(|σ2 iterations/σ−1|)
-4 -4
log10(|σ3 iterations/σ−1|)
log10(DBL_EPSILON)
Black function, but high accuracy is very difficult to attain
log10(|σ3 iterations/σ−1|)
log10(DBL_EPSILON)
-6 -6
-8
pin locations
-8
for the very same reason. On a like for like comparison,
-10 -10 i.e., when using the same normalized Black function, the
Newton for |x| = 1/2 Newton for |x| = 1/2
-12 -12 algorithm of “By Implication” will be significantly slower
-14 -14
overall since it will almost everywhere need more itera-
-16 -16
0 0.2 0.4 β/b 0.6 0.8 1 -100 -80 -60 -40 -20 0
tions, and will be much slower in all those low volatility
max
log (β/b )
regions where it was previously identified to need signific-
10 max
Figure 12: Residual relative errors for |x| = 1/2 with Newton’s method.
0 0
ant numbers of iterations.
log10(|σ0 iterations/σ−1|) log10(|σ0 iterations/σ−1|)
-2 log10(|σ1 iteration/σ−1|) -2 log10(|σ1 iteration/σ−1|)
log10(|σ2 iterations/σ−1|) log10(|σ2 iterations/σ−1|)
-4 log10(|σ3 iterations/σ−1|) -4 log10(|σ3 iterations/σ−1|)
-6
log10(DBL_EPSILON)
pin locations
-6 log10(DBL_EPSILON) 8 Conclusion
-8 -8
-10 -10
We have introduced an algorithm for the calculation of
-12 Householder(3) for |x| = 32 -12 Householder(3) for |x| = 32
-14 -14
Black implied volatility that can for all intents and pur-
-16 -16
poses be considered to be within attainable machine ac-
0 0.2 0.4 β/b 0.6
max
0.8 1 -100 -80 -60 -40
log10(β/bmax)
-20 0 curacy where the latter is defined to mean within what
Figure 13: Residual relative errors for |x| = 32. can be supported by the used normalized Black function.
0
log10(|σ0 iterations/σ−1|)
0
log10(|σ0 iterations/σ−1|)
This has been accomplished by combining a four-branched
-2 log10(|σ1 iteration/σ−1|) -2
log10(|σ2 iterations/σ−1|)
log10(|σ1 iteration/σ−1|)
log10(|σ2 iterations/σ−1|)
initial guess function based on two asymptotically correct
-4 log10(|σ3 iterations/σ−1|) -4 log10(|σ3 iterations/σ−1|)
-6
log10(DBL_EPSILON)
-6 log10(DBL_EPSILON) transformations and rational function approximations, with
pin locations
-8 -8 two iterations of the Householder(3) root finding algorithm.
-10 -10 The objective function is separated into three branches, the
-12 Halley for |x| = 32 -12 Halley for |x| = 32 top and bottom of which involve non-linear transforma-
-14 -14
-16
tions. Crucially, the objective function is based on a high-
-16
0 0.2 0.4 β/b 0.6
max
0.8 1 -100 -80 -60 -40 -20 0 accuracy and low-noise implementation of the normalized
log10(β/bmax)
Figure 14: Residual relative errors for |x| = 32 with Halley’s method. Black function, which turns out to be a problem as difficult
0 0 as implied volatility calculation in its own right.
log10(|σ0 iterations/σ−1|) log10(|σ0 iterations/σ−1|)
-2 log10(|σ1 iteration/σ−1|)
log10(|σ2 iterations/σ−1|)
-2 log10(|σ1 iteration/σ−1|)
log10(|σ2 iterations/σ−1|)
We mention that the mere two steps of our House-
-4 -4
log10(|σ3 iterations/σ−1|)
log10(DBL_EPSILON)
log10(|σ3 iterations/σ−1|)
log10(DBL_EPSILON)
holder(3) procedure can, instead of viewing them as numer-
-6 -6
-8
pin locations
-8
ical iteration, also be seen as an analytical approximation
-10 -10 based on the recursive definition
-12 Newton for |x| = 32 -12 Newton for |x| = 32
-14 -14
σ(β) = σ HH3 (σ HH3 (σ0 (β))) (8.1)
-16 -16
0 0.2 0.4 β/b 0.6 0.8 1 -100 -80
where ς → σ HH3 (ς) is defined to be the Householder(3)
-60 -40 -20 0
max
log10(β/bmax)
propagation step σn → σn+1 given in (5.6), and σo () is our
Figure 15: Residual relative errors for |x| = 32 with Newton’s method.
initial guess function (4.39). The twice-recursive formula-
ley’s method is negligible in comparison to the evaluation tion (8.1) of the solution presented here might help assuage
of the normalized Black function required in each iteration.
11
the concerns of those who intrinsically dislike numerical [Veb09] D. Veberic. Having Fun with Lambert W(x) Function. ArXiv Math-
ematics e-prints, June 2009. arxiv.org/pdf/1003.1628.
solutions.
A reference implementation of the discussed new [Vog07] A. Vogt. Initial Estimating and Refining volatility, 2007.
www.axelvogt.de/maplekram/initialVolatility_
method for implied Black volatility is available refined.pdf.
at www.jaeckel.org/LetsBeRational.7z [Jäc13], including a
[Wic88] M.J. Wichura. Algorithm AS 241: The Percentage Points of the Nor-
total of 187 figures demonstrating the accuracy in various mal Distribution. Journal of the Royal Statistical Society. Series C
parameter regions. (Applied Statistics), 37(3):477–484, 1988. lib.stat.cmu.edu/
apstat/241.
References
[AS84] M. Abramowitz and I.A. Stegun. Pocketbook of Mathematical Func-
tions. Harri Deutsch, 1984. ISBN 3-87144-818-4.
[CM96] C.J. Corrado and T.W. Miller. Volatility without tears. Risk, 9(7):49–
52, 1996.
[CM04] C.J. Corrado and T.W. Miller. Tweaking Implied Volatility. Technical
report, Deakin University and Mississippi State University, Septem-
ber 2004. ssrn.com/abstract=584982.
[Li06] M. Li. You Dont Have to Bother Newton for Implied Volatility.
Technical report, Georgia Institute of Technology, November 2006.
ssrn.com/abstract=952727.
12