Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
17 views29 pages

10.1007@s00211 005 0618 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views29 pages

10.1007@s00211 005 0618 1

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Numer. Math.

(2005) 101: 221–249


DOI 10.1007/s00211-005-0618-1
Numerische
Mathematik

Steffen Börm · Lars Grasedyck

Hybrid cross approximation of integral


operators

Received: 18 November 2004 / Revised: 16 February 2005 / Published online: 29 June 2005
© Springer-Verlag 2005

Abstract The efficient treatment of dense matrices arising, e.g., from the finite
element discretisation of integral operators requires special compression tech-
niques. In this article we use the H-matrix representation that approximates the
dense stiffness matrix in admissible blocks (corresponding to subdomains where
the underlying kernel function is smooth) by low-rank matrices. The low-rank
matrices are assembled by a new hybrid algorithm (HCA) that has the same proven
convergence as standard interpolation but also the same efficiency as the (heuristic)
adaptive cross approximation (ACA).
Mathematics Subject Classification (2000) 45B05 · 65N38 · 68P05

1 Introduction

The efficient treatment of dense matrices arising, e.g., from the finite element dis-
cretisation of integral operators requires special compression techniques to avoid
the quadratic cost for the assembly and storage.
The standard techniques in this field of research include but are not limited
to panel clustering [19,22], multipole expansions [21,16], interpolation [5] and
(adaptive) cross approximation (ACA) [23,1,2]. If the underlying geometry can
be described by a small number of smooth maps, wavelet techniques can be used
in order to compress the resulting dense matrix [9].
Panel clustering often uses an explicit Taylor series expansion of the kernel
function, which implies that suitable recursion formulae have to be derived ana-
lytically for any given kernel function. This disadvantage can be overcome by the
use of general interpolation formulae. However, the separation rank produced by
these methods is rather large since one neglects the special structure of the kernel
Steffen Börm · Lars Grasedyck (B)
Max-Planck-Institute for Mathematics in the Sciences, Inselstrasse 22–26, 04103 Leipzig,
Germany
E-mail: {lgr,sbo}@mis.mpg.de
222 S. Börm, L. Grasedyck

function. Multipole expansion on the other hand exploits this special structure but
requires an explicit expansion of the kernel. Thus, this method is limited to the
standard kernels where these are known.
The (adaptive) cross approximation is an algebraic method where no expansion
of the kernel is needed. However, a convergence proof does only exist for Nystrøm
discretisations of the single layer potential. Moreover, a simple counterexample
shows that this method may fail for more general kernel functions and geometries
with edges.
Our contribution is a new hybrid method that combines the ACA algorithm
with the interpolation-based separation of the kernel function. For the new method
we are able to rigorously prove convergence, both for single layer and double layer
potentials of asymptotically smooth kernels as well as Nystrøm, collocation or
Galerkin boundary element formulations.
A convenient format to store the arising matrices is the H-matrix format [17,
18, 13,15] which is at the same time useful to construct an efficient preconditioner
or even an accurate inverse.
The rest of this article is organised as follows: in Section 2 we introduce a
simple model problem, describe in short the H-matrix format and summarise two
standard compression methods. In Section 3 we introduce the new hybrid cross
approximation (HCA) algorithm, estimate the approximation error in Section 4
and provide numerical examples that underline the theoretical results in the last
Section 5.

2 The H-matrix format

2.1 Model problem: integral equation

We consider a Fredholm integral operator of the form



G [u](x) = g(x, y)u(y) dy (1)


on a submanifold or subdomain  of R3 with a kernel function


g :  ×  → R.
The kernel function might be (but is not limited to) the classical single or double
layer kernel for the Laplacian on a manifold :
1
g SLP (x, y) := ,
4πx − y
x − y, n(y) ∂g SLP
g DLP (x, y) := = (x, y).
4πx − y3 ∂n(y)
H-matrices are based on the fact that the typical kernel functions can be approxi-
mated by local degenerate expansions. We assume that the kernel function g results
from applying a partial differential operator to a sufficiently smooth generator func-
tion
γ : R3 × R3 → R,
Hybrid cross approximation of integral operators 223

i.e., that there are coefficients


y
c x , c y :  → R3 , c0x , c0 :  → R (2)
such that
g = Dx Dy γ (3)
holds for differential operators

3
Dx := cx , ∇x  + c0x = cix ∂xi + c0x ,
i=1

y

3
y y
Dy := cy , ∇y  + c0 = cj ∂yj + c0 .
j =1

In typical applications, the generator function γ is asymptotically smooth, i.e.,


there exist constants Cas1 and Cas2 and a singularity degree σ ≥ 0 such that for all
α, β ∈ Nd0 the inequality

|∂xα ∂yβ γ (x, y)| ≤ Cas1 (Cas2 x − y)−|α+β|−σ (α + β)! (4)


holds. In the case of the classical Laplace operator, the function γ = g SLP satis-
fies this property with Cas1 = Cas2 = 1, and setting cx (x) = 0, cy (y) = n(y),
y
c0x (x) = 1 and c0 (x) = 0 yields g DLP = Dx Dy γ .
A standard Galerkin discretisation of G for a basis (ϕi )i∈I , I = {1, . . . , n},
requires the computation of the stiffness matrix G ∈ Rn×n given by
 
Gij := ϕi (x)g(x, y)ϕj (y) dy dx. (5)
 
Since the support of the kernel g is in general not local, one expects a dense
matrix G. The algorithmic complexity for computing and storing a dense matrix
is quadratic in the number of degrees of freedom, so for large problem dimensions
data-sparse representations or approximations have to be used.

2.2 Low-rank approximation

Hierarchical matrices [17,18,6,15,5] are the algebraic counterpart of the local


degenerate approximations used in panel clustering and multipole techniques. On
the discrete level, replacing a function by a degenerate expansion means that blocks
of the matrix are approximated by low-rank matrices.
Definition 1 (R(k)-Matrix Format) Let X ∈ Rn×m , k ∈ N such that the rank of X
is bounded by k. An R(k)-matrix representation of X is a factorisation of the form
BT

X = AB T A

with matrices A ∈ Rn×k , B ∈ Rm×k .


224 S. Börm, L. Grasedyck

Bs

t Bt
s
Fig. 1 The block t × s ⊂ I × I corresponds to a subset t × s of  × 

The storage requirements for an R(k)-matrix are k(n + m) instead of the


quadratic cost nm for standard full matrices. Since the rank k is expected to
be k ≈ log(n) this is a data-sparse representation of the matrix X. Moreover,
the matrix-vector multiplication w := Xv can be split into two multiplications
x := B T v, w := Ax so that only 2k(n + m) − k − n additions and multiplications
of real numbers are necessary to perform the exact matrix-vector multiplication.

2.2.1 Low-rank approximation by interpolation

Let us now investigate how a low-rank approximation of the matrix G (cf. (5)) can
be constructed. Let t × s ⊂ I × I . We set
t := ∪i∈t supp(ϕi ), s := ∪j ∈s supp(ϕj )
and fix axially parallel boxes Bt and Bs such that t ⊆ Bt and s ⊆ Bs hold. We
require that the strong η-admissibility condition
max{diam(Bt ), diam(Bs )} ≤ η dist(Bt , Bs ) (6)
holds (cf. Figure 1). A low-rank approximation of the corresponding matrix block
G|t×s can be computed by interpolation: let γ̃ be the polynomial constructed by
m-th order tensor-product Chebyshev interpolation of γ .
Let M := (m + 1)3 . Since γ̃ has been constructed by tensor-product inter-
polation, there are interpolation points (xνt )M t s M s
ν=1 in B and (xµ )µ=1 in B with
corresponding Lagrange polynomials (Ltν )M ν=1 and (Lµ )µ=1 such that
s M


M 
M
γ̃ (x, y) = γ (xνt , xµs )Ltν (x)Lsµ (y) (7)
ν=1 µ=1

holds. By applying Dx and Dy to γ̃ , we can construct an approximation


g̃(x, y) := Dx Dy γ̃ (x, y)

M 
M
= γ (xνt , xµs )(Dx Ltν )(x)(Dy Lsµ )(y) (8)
ν=1 µ=1
Hybrid cross approximation of integral operators 225

 with entries
of g, and replacing g by g̃ in (5) yields an approximation G ≈ G
 

Gij := ϕi (x)g̃(x, y)ϕj (y) dy dx
 

M 
M  
= γ (xνt , xµs ) ϕi (x)(Dx Ltν )(x) dx ϕj (y)(Dy Lsµ )(y) dy
ν=1 µ=1  

= (Ut St,s VsT )ij (9)

for i ∈ t, j ∈ s. The matrices Ut ∈ Rt×M , Vs ∈ Rs×M and St,s ∈ RM×M are given
by
 
(Ut )iν := ϕi (x)(Dx Lν )(x) dx, (Vs )j µ :=
t
ϕj (y)(Dy Lsµ )(y) dy, (10)
 
(St,s )νµ := γ (xνt , xµs ). (11)

Since the coupling matrix St,s is of dimension M, the rank of the factorised matrix
Ut St,s VsT is bounded by M, i.e., the matrix block G|t×s can be approximated by
the R(M)-matrices (Ut St,s )VsT or Ut (Vs St,s
T )T .
We note that Ut depends only on t and Vs depends only on s, while the coupling
matrix St,s depends on t, s and the generator function γ . Therefore, Ut and Vs can
be constructed by standard quadrature algorithms, while St,s contains pointwise
evaluations of the generator function.

2.2.2 Low-rank approximation by ACA

We will now introduce the ACA algorithm for a matrix

Xij := g(xi , yj ), i = 1, . . . , n, j = 1, . . . , m.

The procedure is given in detail in Algorithm 1.


For matrix entries Xij that stem from the evaluation of an asymptotically smooth
function, as it is the case for the matrix X = St,s above, [1, Theorem 4] states
 √
3

|Xij − (AB T )ij | = O 2k (η/2) k ,

where η is the admissibility parameter and k is the rank. In practice, the factor
2k does
 not appear
√ 
and the convergence rate for the single layer kernel seems to
be O (η/2) k , so we observe that k = log(ε)2 is often sufficient for a relative
approximation error of O(ε).
The ACA algorithm with partial pivoting (as it is used in [2] or modified as in
[11]) may fail if the kernel function is not asymptotically smooth with respect to
both variables. In order to prove this we give a simple counterexample.
Example 2 (Counterexample for Partial Pivoting) Let t = t1 ∪ t2 and let
s = s1 ∪ s2 with
226 S. Börm, L. Grasedyck

Algorithm 1 ACA with partial pivoting


procedure ACA(X, var A, B)
Choose an initial pivot index i1∗
k := 1
repeat
Compute the entries of the vector bk ∈ Rm by


k−1
(bk )j := Xik∗ ,j − (aµ )ik∗ (bµ )j .
µ=1

Determine an index jk∗ that maximises δ := |(bk )jk∗ |


Compute the entries of the vector ak ∈ Rn by
 

k−1
(ak )i := Xi,jk∗ − (aµ )i (bµ )jk∗  /(bk )jk∗
µ=1


Determine the next pivot index ik+1  = ik∗ that maximises δ := |(ak )ik+1
∗ |

k := k + 1
until ak 2 bk 2 ≤ a1 2 b1 2

Ω t2 Ω s1

 t1 := [0, 1] × [0, 1] × {0},


 t2 := [0, 1] × {0} × [0, 1],
 s1 := [4, 5] × {0} × [0, 1],
 s2 := [4, 5] × [0, 1] × {0}.
Ω Ω
√ t1 s2
We have diam(t ) = diam(
√ s ) = 3 and dist( t ,  s ) = 3, so the domains are
admissible for all η > 1/ 3.
Let us fix n, m ∈ N and points (xi )i=1
n+m
and (yj )jn+m
=1 satisfying

 t1 if i ≤ n  s1 if j ≤ n
xi ∈ yj ∈
 t2 otherwise  s2 otherwise

for i, j ∈ {1, . . . , n + m}. Evaluating the double layer potential g DLP (which is
asymptotically smooth in the x variable, but not in the y variable) in the points xi
and yj yields the matrix X ∈ R(n+m)×(n+m) given by Xij := g DLP (xi , yj ). X is a
typical block in the Nystrøm discretisation of the double layer potential operator on
a cube in R3 . The domains t1 and s2 as well as t2 and s1 lie in the same plane,
while the outer normal vectors of t1 and s1 and t2 and s2 respectively are per-
pendicular. Thus, all entries Xij with i ≤ n and j > n or i > n and j ≤ n vanish,
while the diagonal blocks are non-zero. The matrix X bears the block structure

X11 0
X= X11 ∈ Rn×n , X22 ∈ Rm×m .
0 X22

In [2, Algorithm 4.2], the first n row indices fulfil 1 ≤ iν∗ ≤ n for ν ∈ {1, . . . , n}.
Due to the block structure, we can infer 1 ≤ jν∗ ≤ n for all ν ∈ {1, . . . , n}. For
Hybrid cross approximation of integral operators 227

these pivot indices, the row and column vectors vanish outside of the block X|n×n
and the generated approximant AB T is of the form
T
A1 B1 A1 B1T 0
AB T = =
0 0 0 0
with A1 , B1 ∈ Rn×n . As a consequence, the approximation error satisfies
X − AB T 2 ≥ X22 2 ,
which means that no convergence will occur until the rank exceeds n. In other
words, the partially pivoted ACA algorithm works on the first block X11 (where
we can expect it to converge exponentially) but does not recognise the second block
X22 . If we chose the initial pivot index i1∗ > n, we would instead approximate the
second block but not the first. In both cases, the error estimator aµ 2 bµ 2 (as it
is proposed in [2]) converges exponentially to zero, thus suggesting a good con-
vergence of the total error, when, in fact, no convergence occurs.
Obviously, similar statements hold for collocation or Galerkin discretisations.
Example 2 implies that asymptotic smoothness of the kernel function in only
one of the two variables does not ensure convergence of the ACA method presented
in [2]. At least for our examples, there are heuristic modifications (cf. Algorithm 4)
that seem to lead to good convergence.
We conclude that ACA works nicely for matrices X of the form Xij = g(xi , yj )
where g is asymptotically smooth with respect to both variables, while it will pos-
sibly fail if this is not the case. Later, we will apply ACA to the matrix St,s from
(11) where these requirements are fulfilled.

2.3 Clustering and standard H-matrix compression

The low-rank approximations apply only to matrix blocks t × s ⊂ I × I that


are admissible with respect to (6). Therefore we have to subdivide the matrix into
blocks that are admissible, preferably large blocks in order to compress the matrix
by a maximal factor. The standard way to choose the blocks (testing all blocks
would be too expensive and not lead to a useful partition) is to cluster the index
set I hierarchically in a cluster tree TI and use a canonical construction for the
partition of the matrix.
Notation 3 (Tree) Let T = (V , E) be a tree with vertex set V and edge set E. The
unique vertex v ∈ V with (w, v) ∈ V for all w  = v is called the root of T and
denoted by root(T ). The levels of the tree are defined inductively by
T 0 := {root(T )}, T i+1 := {w ∈ V | ∃v ∈ T i : (v, w) ∈ E}.
The set of successors of a node v ∈ T i is defined as sons(v) := {w ∈ T i+1 |
(v, w) ∈ E}. The set of leaves (sons(v) = ∅) is denoted by L(T ). The depth of the
tree T is given by
p(T ) := 1 + max i.
T i =∅

We will use the short notation v ∈ T instead of v ∈ V and p instead of p(T ).


228 S. Börm, L. Grasedyck

Definition 4 (Cluster Tree TI ) A cluster tree {0, . . . , 7}


H

 j
H
TI for an index set I is a tree with root {0, . . . , 3} {4, . . . , 7}
root(TI ) = I where each non-leaf vertex ful-
@
R @
R
fils
{0, 1} {2, 3} {4, 5} {6, 7}
˙ AAU AAU AAU AAU
t= s and t = ∅.
{0} {1} {2} {3} {4} {5} {6} {7}
s∈sons(t)
For all practical cases the cluster tree TI is a binary tree (each node has exactly
two successors or it is a leaf), the depth p is O(log #I ) and the cardinality of TI
is O(#I ). Next, we want to construct the cluster tree TI with underlying index
set I = {1, . . . , n}. Each index i ∈ I corresponds to one of the basis functions
ϕi ∈ Vn (cf. (5)). The geometric location of the index i is given by the Chebyshev
centre xi of the support of ϕi (the Chebyshev centre is the centre of the smallest ball
containing the support of ϕi ). Instead of the Chebyshev centre one could choose
any point xi from the support of ϕi .
Construction 5 (Geometrically Balanced Clustering) We fix a bound nmin ∈ N
for the size of leaf clusters and construct the cluster tree TI recursively. If a cluster
t and a bounding box B satisfying xi ∈ B for all i ∈ t are given, we intro-
duce new bounding boxes B1 and B2 by splitting B in the coordinate direction
of maximal extent and new son clusters by setting t1 := {i ∈ t : xi ∈ B1 } and
t2 := {i ∈ t : xi ∈ B2 }. If #t1 > nmin or #t2 > nmin , we proceed by recursion.
Construction 5 terminates if and only if the number of points xi with the same
geometric position is less than nmin , and a minimal value of nmin guaranteeing ter-
mination can be determined a priori by an algorithm of complexity O(#I log #I ).
Pairs of clusters (t, s) ∈ TI × TI are candidates for blocks of the matrix,
and among these blocks we can choose those that we want to approximate in the
R(k)-matrix format (cf. Definition 1). The number of all possible pairs is too large,
therefore we test only pairs of clusters on the same level of the tree. This motivates
the definition of a block cluster tree TI ×I that stores the admissible pairs of clusters
in a hierarchical form.
Definition 6 (Block Cluster Tree TI ×I ) A block cluster tree TI ×I based on the
cluster tree TI is a cluster tree for I × I such that (cf. Figure 2)
∀ v ∈ TIi×I ∃ t, s ∈ TIi : v = t × s.

0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7
0 0 0 0
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
6 6 6 6
7 7 7 7

Fig. 2 Depicted are four levels of a block cluster tree based on the cluster tree from Definition
4. On level 2 of the tree there are six leaves (shaded), e.g., the block {0, 1} × {6, 7}
Hybrid cross approximation of integral operators 229

Remark 7 (Cluster Tree yields Partition) The leaves of a cluster tree TI form a
partition of the index set I . As a consequence, the leaves of a block cluster tree
TI ×I yield a partition of the product index set I × I .
If the underlying tree TI is a binary tree, then the block cluster tree TI ×I is
a quad-tree. Based on the cluster tree TI (Construction 5) and the η-admissibility
condition (6), we construct the canonical block cluster tree TI ×I as follows.
Construction 8 (Canonical Block Cluster Tree TI ×I ) Let the cluster tree TI be
given. We define the block cluster tree TI ×I by root(T ) := I × I and for each
vertex t × s ∈ T the set of successors by

∅ if min{#t, #s} ≤ nmin
S(t × s) := ∅ if t × s is η-admissible (6)
 S(t) × S(s) otherwise.

The block cluster tree TI ×I is the basis for the hierarchical matrix format. It
defines the partition of the matrix into sub-matrices that are represented in the
R(k)-matrix format.
Definition 9 (H-Matrix) Let T := TI ×I be a block cluster tree and k : L(T ) → N0
a rank distribution. We define the set H(T , k) of hierarchical matrices (H-matrices)
by
  
H(T , k) := X ∈ RI ×I  ∀t × s ∈ L(T ) : rank(X|t×s ) ≤ k(t × s) .

A matrix X ∈ H(T , k) is said to be given in H-matrix representation, if for all


leaves t × s with #t ≤ nmin or #s ≤ nmin the corresponding matrix block X|t×s is
given in full matrix representation and in R(k)-matrix representation for the other
leaves.
The result of Construction 5 followed by Construction 8 is a block cluster
tree TI ×I for which we can estimate the depth and the sparsity Csp defined next.
The sparsity is needed to estimate the storage requirements and complexity of the
matrix-vector multiplication of an H-matrix.
Definition 10 (Sparsity) Let TI ×I be a block cluster tree. We define the sparsity
(constant) Csp of TI ×I by
Csp := max #{s ∈ TI | t × s ∈ TI ×I }.
t∈TI

So far, we have not imposed any condition on the locality of the supports of
the basis functions ϕi . If all the supports cover the whole domain , then there is
no admissible block, so we have to demand locality of the supports in order to be
able to apply our technique.
Assumption 11 (Locality) We assume that the supports are locally separated in
the sense that there exist two constants Csep and nmin such that
−1
max #{j ∈ I | dist(suppϕi , suppϕj ) ≤ Csep diam(suppϕi )} ≤ nmin . (12)
i∈I

The left-hand side is the maximal number of basis functions with ‘relatively close’
supports (see Figure 3). Note that we will apply Construction 5 with a parameter
nmin that satisfies (12).
230 S. Börm, L. Grasedyck

Fig. 3 The triangle i := suppϕi under√ consideration is dark grey. The area with a distance of
−1 diam( ) is light grey (C
Csep i sep := 4 2). Here, 15 triangles (including i ) are ‘rather close’ to
i

Based upon Assumption 11 the sparsity constant Csp as well as the depth p of
the block cluster tree TI ×I is estimated in [15, Lemma 4.5]. It should be noted that
Csp = O(η−3 ) and thus the choice of η in the admissibility condition (6) enters
the estimate in a critical way.
The estimates for the storage requirements and the complexity of the matrix-
vector multiplication of an H-matrix depend only on the cardinality of I and the
depth and sparsity of the block cluster tree T :
Lemma 12 (Storage) Let T be a block cluster tree based on the cluster tree TI
with sparsity constant Csp (Definition 10) and minimal block size nmin . Then the
storage requirements NH,St (T , k) for an H-matrix X ∈ H(T , k) are bounded by

NH,St (T , k) ≤ 2(1 + depth(T ))Csp max{k, nmin }#I.

Proof [15, Lemma 2.4] 




Lemma 13 (Matrix-Vector Product) Let T be a block cluster tree. The complexity


NH·v (T , k) of the matrix-vector product in the set of H-matrices can be bounded
from above and below by

NH,St (T , k) ≤ NH·v (T , k) ≤ 2NH,St (T , k).

2.4 Numerical example: ACA versus interpolation

In order to compare the algebraic method ACA and the analytic tensor interpolation
we apply the two methods to two different geometries , first the unit sphere and
second the unit cube. We consider the model problem from Section 2.1 with the
kernel function g DLP of the double layer potential. The two domains are discretised
with ncube = 30000 and nsphere = 20000 panels and basis functions ϕi that are
constant on each panel. The minimal block size is set to nmin := 20 and the admis-
sibility parameter is η := 2. The parameter ε in the stopping criterion for ACA is
chosen as 10−j for j = 2, . . . , 5.
Hybrid cross approximation of integral operators 231

Table 1 Comparison of ACA and tensor interpolation on the unit sphere and the unit cube (with-
out further recompression)

Sphere Cube
Time Strg. Rel. Err. Time Strg. Rel. Err.
ACA, ε = 10−2 238 9.2 3.3×10−3 474 8.1 4.7×10−2
ACA, ε = 10−3 281 11.2 7.5×10−4 553 10.7 4.4×10−2
ACA, ε = 10−4 350 14.2 3.3×10−5 658 13.9 4.2×10−2
ACA, ε = 10−5 419 17.1 3.5×10−6 771 17.2 4.2×10−2
Interpol., m = 2 267 44.1 1.3×10−2 282 39.1 6.5×10−2
Interpol., m = 3 561 82.3 6.0×10−3 568 74.8 5.2×10−3
Interpol., m = 4 1183 124.5 5.2×10−4 1164 127.6 8.1×10−4
Interpol., m = 5 2426 175.9 7.0×10−5 2082 178.8 5.0×10−5

The results in Table 1 show that ACA is advantageous for the sphere (lower
storage requirements and shorter setup time), but also that the method fails to
reduce the relative inversion error below 10−2 in the case of the unit cube (compare
Example 2). The column labeled “Time” gives the time in seconds for constructing
the approximation, the column “Strg.” gives the storage required for the resulting
H-matrix in kilobytes per degree of freedom (keep in mind that this amount will
be reduced by coarsening strategies) and the column “Rel. Err.” gives the relative
error I − G−1 G2 approximated using a power iteration.
It should be noted that the storage requirements are not an immediate difficulty,
since a simple algebraic recompression [14] eliminates the unnecessary singular
vectors and coarsens the block structure automatically. However, the initial rank
generated by eitherACA or tensor interpolation enters the complexity quadratically,
so that the time for the coarsening will be increased. For all computed examples
the time for the coarsening is less than for the assembly.
The results in Table 1 were obtained by the HLib package [4] on one 900 MHz
UltraSPARC IIIcu processor in a SunFire 6800 computer.
In the following section we present a combined method, the hybrid cross
approximation (HCA) technique, that inherits the provable approximation property
from the tensor interpolation while keeping the computational complexity close to
that of the ACA heuristic.

3 Hybrid cross approximation

3.1 First Approach: Lagrange Polynomials

Since the ACA algorithm works well for pointwise evaluations of an asymptotically
smooth kernel function, we will apply it to the coupling matrix St,s from (11). The
matrices Ut and Vs will neither be computed nor stored, instead we exploit the fact
that they are nested.
Let t be a cluster with a son t  . Since we use interpolation of constant order,
we have

span{Ltν : ν ∈ {1, . . . , M}} = span{Ltν  : ν  ∈ {1, . . . , M}},
232 S. Börm, L. Grasedyck

Algorithm 2 HCA(I)
procedure HCA1(St,s , var A, B)
 B)
Call ACA(St,s , A,  with A,
B  ∈ RM×k so that

B
St,s − A T 2 ≤ εSt,s 2

 and B := Vs B
Multiply A := Ut A  using the backward transformation.

i.e., the different Lagrange bases span the same polynomial space. This implies


M
 
Ltν = Ltν (xνt  )Ltν  .
ν  =1

We collect the coefficients of the basis transformation in a transfer matrix Tt 


defined by

(Tt  )ν  ν := Ltν (xνt  ).
For i ∈ t  , we find

(Ut )iν = φi (x)Ltν (x) dx
t

M 

= (Tt  )ν  ν φi (x)Ltν  (x) dx = (Ut  Tt  )iν .
ν  =1 t

Obviously, the same relationship holds for the matrices Vs , so we have to compute
Ut and Vs only for leaves of the cluster tree and can construct all remaining matrices
by using the transfer matrices.
We are now able to formulate the hybrid cross approximation based on tensor
interpolation. This first variant HCA(I) (cf. Algorithm 2) is closer to interpolation
than ACA, in particular one could replace ACA by any rank revealing scheme.
In order to set up the matrices Ut and Vs , we have to compute (#t + #s)M
integrals of the form  φi (x)Ltν (x)dx. Since we use nested cluster bases, we have
to compute these integrals only for the leaves of the cluster tree, and since the
leaves form a partition of I , we only have to evaluate 2M#I integrals.
A further orthogonalisation [3] of the cluster basis can be done on-the-fly and
reduces the storage requirements for the basis. The multiplications Ut A, Vt B in
the nested structure can be performed efficiently by using the H2 -matrix backward
transformation [7,8]. Instead of the ACA algorithm with complexity O(k 2 M) one
could as well compute an SVD of St,s in complexity O(M 3 ).

3.2 Numerical example for HCA(I)

We consider the model problems from Section 2.4, namely the Galerkin discreti-
sation of the double layer potential operator on the unit sphere and unit cube by
piecewise constant basis functions on a quasi-uniform grid.
Hybrid cross approximation of integral operators 233

Table 2 Comparison of ACA and HCA(I)

Sphere Cube
ε m Time Strg. Rel. Err. Time Strg. Rel. Err.
ACA 10−2 238 9.2 3.3×10−3 474 8.1 4.7×10−2
ACA 10−3 281 11.2 7.5×10−4 553 10.7 4.4×10−2
ACA 10−4 350 14.2 3.3×10−5 658 13.9 4.2×10−2
ACA 10−5 419 17.1 3.5×10−6 771 17.2 4.2×10−2
HCA(I) 10−3 2 205 16.2 1.6×10−2 229 13.3 6.3×10−2
HCA(I) 10−4 3 338 21.6 3.3×10−3 372 18.0 1.8×10−2
HCA(I) 10−5 4 619 27.5 3.5×10−4 609 25.5 3.1×10−3
HCA(I) 10−6 5 1135 33.9 1.4×10−4 1170 29.4 2.7×10−4

Table 2 contains the results of a comparison between the ACA heuristic and
the interpolation-based HCA(I) algorithm. The column labeled “Time” gives the
time in seconds for constructing the approximation, the column “Strg.” gives the
storage required for the resulting H-matrix in kilobytes per degree of freedom
(keep in mind that this amount will be reduced by coarsening strategies) and the
column “Rel. Err.” gives the relative error I − G−1 G2 approximated using a
power iteration.
We can see that HCA(I) requires more time than the ACA heuristic, but also
that it converges on the unit cube while ACA fails.

3.3 Second approach: cross approximation

The idea for our second approach is to approximate the generator function γ (x, y)
in an admissible bounding box Bt × Bs by some functional skeleton
γ̃1 (x, y) = γ (x, yj1 )γ (xi1 , y)/γ (xi1 , yj1 ),
where xi1 and yj1 are appropriate interpolation points from an mth order interpo-
lation scheme in Bt × Bs . The pivot elements i and j coincide with those from
an ACA approximation of St,s , because a rank 1 ACA approximation is just the
evaluation of γ̃1 in (xi , yj )M
i,j =1 .
k−1
Successively, we approximate the remainder γ − =1 γ̃ in the same way and
obtain in the end an approximation of the form
  
 k   

γ̃ (x, y) :=  γ (x, yjq )C,q   γ (xiq , y)D,q  ,
=1 q=1 q=1

where C,q , D,q are given by recursion formulae. The final degenerate kernel is
defined by
g̃ := Dx Dy γ̃
  
k   
=  Dx γ (x, yjq )C,q   Dy γ (xiq , y)D,q  .
=1 q=1 q=1
234 S. Börm, L. Grasedyck

Algorithm 3 HCA(II)
procedure HCA2(St,s , var A, B)
 B)
Call ACA(St,s , A,  with A,
B  ∈ RM×k so that

B
St,s − A T 2 ≤ εSt,s 2

and store the pivot indices (i )k=1 , (j )k=1


Initialise C, D ∈ Rk×k and c, d ∈ Rk by zero
for  = 1, . . . , k do
for i = 1, . . . ,  − 1 do
di := 0, ci := 0
for q = 1, . . . , i do
ci := ci + Ci,q γ (xi , yjq )
di := di + Di,q γ (xiq , yj )
end for
end for  
C, := 1/ |(â )i |, D, := sign((â )i )/ |(â )i |
for q = 1, . . . ,  − 1 do
C,q := 0, D,q := 0
for i = q, . . . ,  − 1 do
C,q := C,q − Ci,q di C,
D,q := D,q − Di,q ci D,
end for
end for
end for
Compute the entries of U  ∈ Rt×k and V  ∈ Rs×k by
 
Ui := φi (x)Dx γ (x, yj ), Vj  := φj (y)Dy γ (xi , y)
t s

C T and B := V
Multiply A := U D T

 
The twofold integrals   φi (x)g(x, y)φj (y) will now resolve into single inte-
grals of the form
 
φi (x)Dx γ (x, yj )dx, φj (y)Dy γ (xi , y)dy,
 
and thus the complexity of a q-point quadrature per basis function in the Galerkin
discretisation is reduced from squared cost ((#t + #s)kq 2 ) to linear cost ((#t +
#s)kq). For collocation discretisations the integration with respect to one of the
two variables is replaced by a simple evaluation of the kernel.
In order to compute the k 2 entries of C, D in Algorithm 3, we have to per-
form O(k 3 ) arithmetical operations. The total complexity for HCA(II) amounts to
O((#t + #s)k 2 + Mk 2 ).

3.4 Numerical example for HCA(II)

Again, we consider the model problem from Section 2.4. Table 3 contains the
results of a comparison between the ACA heuristic and the cross-approximation
based HCA(II) algorithm. The column labeled “Time” gives the time in seconds
for constructing the approximation, the column “Strg.” gives the storage required
Hybrid cross approximation of integral operators 235

Table 3 Comparison of ACA and HCA(II)

Sphere Cube
ε m Time Strg. Rel. Err. Time Strg. Rel. Err.
ACA 10−2 238 9.2 3.3×10−3 474 8.1 4.7×10−2
ACA 10−3 281 11.2 7.5×10−4 553 10.7 4.4×10−2
ACA 10−4 350 14.2 3.3×10−5 658 13.9 4.2×10−2
ACA 10−5 419 17.1 3.5×10−6 771 17.2 4.2×10−2
HCA(II) 10−3 2 213 17.1 3.4×10−3 246 14.4 1.5×10−2
HCA(II) 10−4 3 275 24.8 9.5×10−4 326 21.2 2.9×10−3
HCA(II) 10−5 4 368 32.5 9.9×10−5 451 28.2 6.8×10−4
HCA(II) 10−6 5 494 40.5 4.2×10−6 580 36.8 1.5×10−4

for the resulting H-matrix in kilobytes per degree of freedom (keep in mind that
this amount will be reduced by coarsening strategies) and the column “Rel. Err.”
−1 G2 approximated using a power iteration.
gives the relative error I − G
We can see that HCA(II) requires roughly the same time as the ACA heuristic,
but additionally it converges on the unit cube while ACA fails.

4 Analysis

4.1 Interpolation error

Let Im : C[−1, 1] → Pm for m ∈ N be a stable interpolation operator of order m


satisfying the projection property

Im [p] = p for all p ∈ Pm (13)

and the stability property

Im [f ]∞ ≤ m f ∞ for all f ∈ C[a, b]. (14)

For a given set (xm,ν )m


ν=0 of interpolation points, the operator Im is of the form


m
Im [f ](x) = f (xm,ν )Lm,ν (x),
ν=0

where the Lagrange polynomials are given by


m
x − xµ
Lm,ν (x) := .
xν − xµ
µ=0,µ =ν

The Chebyshev interpolation operator is defined by setting xm,ν = cos(π(ν +


1)/(2m + 2)) and for this operator the stability constant is bounded by m ≤
(2/π) ln(m + 1) + 1 ≤ m + 1 (cf. [20]).
236 S. Börm, L. Grasedyck

Lemma 14 (Stability of derivatives) The estimate

(Im [f ]) ∞ ≤ m m2 f  ∞

holds for all f ∈ C 1 [−1, 1].

Proof Let f0 ≡ f (0). Due to f0 = 0 and Im [f0 ] = f0 (cf. (13)), we find

(Im [f ]) ∞ = (Im [f ] − f0 ) ∞ = (Im [f − f0 ]) ∞ .

Markov’s inequality [10, Theorem 4.1.4] implies

(Im [f − f0 ]) ∞ ≤ m2 Im [f − f0 ]∞ ,

and we can use the stability (14) of Im to find

Im [f − f0 ]∞ ≤ m f − f0 ∞ .

By the standard error estimate for the zeroth-order Taylor expansion, we get

f − f0 ∞ ≤ f  ∞

and can conclude

(Im [f ]) ∞ ≤ m m2 f  ∞ .

This is the desired estimate (note that this proof can be generalized for higher
derivatives by replacing f0 by appropriate Taylor polynomials). 


The multi-dimensional case can be treated by tensor arguments. Let d ∈ N


and B := J 1 × . . . × J d for closed intervals J 1 , . . . , J d of positive length. For
each i ∈ {1, . . . , d}, we fix the (unique) affine monotonous bijective map i :
[−1, 1] → J i and define the transformed interpolation operator

Imi : C(J i ) → Pm , f → (Imi [f ◦ i ]) ◦ (i )−1 .

Lemma 14 also holds for the operators Imi , i ∈ {1, . . . , d}. The tensor interpolation
operator for the box B is given by

ImB := Im1 ⊗ . . . ⊗ Imd

and we can prove the following generalisation of the result of [5, Lemma 2.1]:
Lemma 15 (Multidimensional Stability) Let α ∈ {0, 1}d . We have

∂ α ImB [f ]∞ ≤ dm m2|α| ∂ α f ∞

for all f ∈ C 1 (B).


Hybrid cross approximation of integral operators 237

Proof Let f ∈ C 1 (B) and k,  ∈ {1, . . . , d}. We denote the interpolation points
and Lagrange polynomials corresponding to J  by


xm,ν :=  (xm,ν ), Lm,ν := Lm,ν ◦ ( )−1

and define the interpolation operator in the th component by

ImB, := (I ⊗ . . . ⊗ I ⊗ Im ⊗ I ⊗ . . . ⊗ I ).

We observe


m
ImB, [f ](x) = 
f (x1 , . . . , x−1 , xm,ν , x+1 , . . . , xd )Lm,ν (x ).
ν=0

Differentiating and using (14) and Lemma 14 yields

  m m2 ∂k f ∞ if  = k
 
∂k (ImB, [f ]) ≤
∞ m ∂k f ∞ otherwise.

Due to


d
ImB = ImB, ,
=1

this inequality implies the desired estimate. 




In order to prove an error estimate for the derivatives of multi-dimensional


tensor interpolations, we will apply [5, Theorem 3.2] to a suitable derivative of
γ . Since the error estimate only holds for asymptotically smooth functions, we
have to demonstrate that derivatives of asymptotically smooth functions are again
asymptotically smooth.
Lemma 16 (Smoothness of Derivatives) Let α ∈ {0, 1}d , let f ∈ C ∞ (B) such
that there are constants C0 , c0 ∈ R≥1 satisfying

|β|
∂ β f ∞ ≤ C0 c0 β!

for all β ∈ Nd0 . Let c1 ∈ R>c0 . There is a constant C ∈ R, depending only on α, d


and c0 /c1 , such that

|β|
∂ β ∂ α f ∞ ≤ C1 c1 β!

|α|
holds with C1 = CC0 c0 for all β ∈ Nd0 .
238 S. Börm, L. Grasedyck

Proof We find
∂ β ∂ α f ∞ = ∂ α+β f ∞
 
|α| |β| |α|

d
|β|
≤ C0 c0 c0 (α + β)! = C0 c0  (βi + 1) c0 β!
i=1,αi  =0
 
|α|

d
c0 |β|
|β|
≤ C0 c0  (βi + 1) c1 β!.
c1
i=1,αi =0

Due to c1 > c0 , we have c0 /c1 < 1 and can find a constant C ∈ R>0 satisfying
 
 d
c0 |β|
 (βi + 1) ≤C
c1
i=1,αi =0

|α|
for all β ∈ Nd0 . The proof closes by setting C1 := CC0 c0 and observing
 
|α|
d
c0 |β| |β|
∂ β ∂ α f ∞ ≤ C0 c0  (βi + 1) c1 β!
c1
i=1,αi =0
|α| |β| |β|
≤ C0 c0 Cc1 β! = C1 c1 β!. 

Now we can proceed to prove an error estimate for the derivatives of multi-
dimensional tensor interpolants by generalising the result of [5, Theorem 3.2]:
Theorem 17 (Approximation of Derivatives) Let α, β ∈ {0, 1}3 . Let m ≥ 1. Let
c1 > c0 . There is a polynomial Capx satisfying
Capx (m)6m 6m−1 η m
∂xα ∂yβ (γ − ImBt ×Bs [γ ])∞ ≤ .
(Cas2 dist(Bt , Bs ))σ +|α+β| η + Cas2
for all m ∈ N and all admissible bounding boxes Bt and Bs .
Proof Let B := Bt × Bs , let d = 6, let µ := (α, β) ∈ Nd0 . The generator function
γ is asymptotically smooth with
−1
C0 = Cas1 (Cas2 dist(Bt , Bs ))−σ and c0 := Cas2 ,

and this implies γ |B ∈ C ∞ (B). Due to Lemma 16, the function γ  := ∂xα ∂y γ =
β

∂ µ γ is also asymptotically smooth with



C1 = C(Cas2 dist(Bt , Bs ))−σ −|µ| and c1 := 2c0
for a constant C that does not depend on Bt , Bs or m. We can apply [5, Theorem
3.2] in order to find a tensor polynomial γ̃  of order m − 1 and a constant C̃ := 8e
satisfying
C̃(1 + c1 diam(Bt × Bs ))C1 dm−1 m
γ  − γ̃  ∞ ≤  m . (15)
1 + c1 diam(B
2
t ×Bs )
Hybrid cross approximation of integral operators 239

Since Bt and Bs are admissible, (6) holds and we find


c1 diam(Bt × Bs ) ≤ c1 (diam(Bt )2 + diam(Bs )2 )1/2

≤ c1 2 max{diam(Bt ), diam(Bs )}
2η dist(Bt , Bs )
≤ 2c0 η dist(Bt , Bs ) =
Cas2 dist(Bt , Bs )
= 2η/Cas2 ,
so the error estimate (15) takes the form
C C̃(1 + 2η/Cas2 )dm−1 m −m
Cas2
γ  − γ̃  ∞ ≤ 1+
(Cas2 dist(Bt , Bs ))σ +|µ| η
C  (m)dm−1 η m
=
(Cas2 dist(Bt , Bs ))σ +|µ| η + Cas2
with C  (m) := C C̃(1 + 2η/Cas2 )m.
Let γ̃ be an antiderivative of γ̃  satisfying
∂xα ∂yβ γ̃ = γ̃  .
Since γ̃  is a tensor polynomial of order m − 1, γ̃ is a tensor polynomial of order
m and we can apply Lemma 14 to find
∂xα ∂yβ γ − ∂xα ∂yβ ImB [γ ]∞ ≤ ∂xα ∂yβ γ − ∂xα ∂yβ γ̃ ∞ + ∂xα ∂yβ ImB [γ − γ̃ ]∞
≤ (1 + dm m2|α+β| )∂xα ∂yβ (γ − γ̃ )∞
= (1 + dm m2|α+β| )γ  − γ̃  ∞
C  (m)dm−1 dm (1 + m2|α+β| ) η m
≤ .
(Cas2 dist(Bt , Bs ))σ +|α+β| η + Cas2
Setting Capx (m) := C  (m)(1 + m2|α+β| ) concludes the proof. 

In order to find an error estimate for g̃, we have to bound the norms of the
coefficients. We set
c0x  := sup{|c0x (x)| : x ∈ },
cx  := sup{|c1x (x)| + |c2x (x)| + |c3x (x)| : x ∈ },
y y
c0  := sup{|c0 (y)| : y ∈ },
y y y
cy  := sup{|c1 (y)| + |c2 (y)| + |c3 (y)| : y ∈ }
and get the following error bound:
Corollary 18 (Separable Approximation) There is a polynomial Cg such that
Cg (m)6m 6m−1 η m
g − g̃∞,t ×s ≤
(Cas2 dist(Bt , Bs ))σ η + Cas2
  
−1 x −1 y
Cas2 c  y Cas2 c 
c0x  + c0  +
dist(Bt , Bs ) dist(Bt , Bs )
holds for all m ∈ N and all admissible bounding boxes Bt , Bs .
240 S. Börm, L. Grasedyck

Proof This is a direct consequence of Theorem 17. 



We will now use this estimate to derive error bounds for the discrete matrices.
In order to do so, we fix a constant Cfe ∈ R>0 satisfying

 xi ϕi 2L2 ≤ Cfe hd x22 (16)
i∈I

for all vectors x ∈ a mesh size h ∈ R>0 and the intrinsic dimension d ∈ N
RI ,
of .
Lemma 19 (Low-rank Approximation) Let t × s be an admissible block, and let g̃
be an approximation of g. Let Cfe , d , h be as in (16). Then the factorized matrix
from (9) satisfies
G|t×s − Ut St,s VsT 2 ≤ Cfe |t |1/2 |s |1/2 hd g − g̃∞,t ×s .
Proof Let u ∈ Rt and v ∈ Rs . Let

û := ui ϕi ,
i∈t

v̂ := vj ϕj .
j ∈s

Using the short notation δ := g − g̃∞,t ×s we find


  
 
|u, (G|t×s − Ut St,s VsT )v| =  (g − g̃)(x, y)û(x)v̂(y) dy dx 
 
t s 
≤ δ |û(x)| dx |v̂(y)| dy
t s
C−S
≤ δ|t | 1/2
ûL2 |s |1/2 v̂L2
(16)
≤ Cfe δ|t |1/2 |s |1/2 hd u2 v2 .
Since this estimate holds for arbitrary u and v, it implies the desired upper bound.


Corollary 20 (Chebyshev Interpolation) Let t ×s be an admissible block. Let d , h
be as in (16). If the local interpolants are constructed by Chebyshev interpolation,
the factorised matrix from (9) satisfies
m
Cc (m)|t |1/2 |s |1/2 hd η
G|t×s − Ut St,s VsT 2 ≤
(Cas2 dist(Bt , Bs )) σ η + Cas2
 
−1 x
Cas2 c 
× c0x  +
dist(Bt , Bs )
 
−1 y
y Cas2 c 
× c0  +
dist(Bt , Bs )
for a polynomial Cc that does not depend on t, s, m or h.
Proof Combine Lemma 19 with Corollary 18 and the fact that m ≤ m + 1 holds
for Chebyshev interpolation (cf. [20]). 

Hybrid cross approximation of integral operators 241

4.2 HCA based on Lagrange polynomials

In HCA(I), we replace St,s by a low-rank approximation 


St,s , so the total error in
an admissible block t × s can be split into the sum

G|t×s − Ut 
St,s VsT 2 ≤ G|t×s − Ut St,s VsT 2
+Ut (St,s − 
St,s )VsT 2
≤ G|t×s − Ut St,s VsT 2
+Ut 2 St,s − 
St,s 2 Vs 2 .
The first term is bounded due to Corollary 20, so we only have to consider the sec-
ond. Here, the error St,s − 
St,s 2 introduced by the ACA algorithm is scaled by
the matrices Ut 2 and Vs 2 corresponding to discretised Lagrange polynomials.
The ACA error can be controlled directly, so we have a complete error estimate
once we can bound the latter two norms.
Lemma 21 (Bound of Ut ) Let Cfe , d , h be as in (16). Then the matrix Ut from
(10) satisfies

1/2 2m2
Ut 2 ≤ Cfe 3m |t |1/2 hd /2 c0x  + cx  .
diam(Bt )

Proof Let u ∈ Rt and v ∈ RM . We set



û := ui ϕi ,
i∈t

M
v̂ := vν Ltν
ν=1

and find
 
 
|u, Ut v| =  û(x)(Dx v̂)(x) dx  ≤ Dx v̂∞ |t |1/2 ûL2
t
(16)
≤ Cfe |t |1/2 hd /2 Dx v̂∞ u2 .
We have to bound

3
Dx v̂∞ ≤ c0x v̂∞ + cjx ∞ ∂j v̂∞ .
j =1

For the first term, we use the stability (14) of the interpolation in order to find

v̂∞ ≤ 3m v∞ ≤ 3m v2 .


For the remaining terms, we use Markov’s inequality [10, Theorem 4.1.4] to find
2m2
∂i v̂∞ ≤ v̂∞
diam(Bt )
242 S. Börm, L. Grasedyck

and then use the previous estimate. We end up with the bound

1/2 2m2
|u, Ut v| ≤ Cfe 3m |t |1/2 hd /2 c0x  + cx  v2 u2 ,
diam(Bt )

and since this holds for all u and v, the proof is complete. 


Obviously this Lemma can also be used to find an upper bound for the norm
Vs 2 . The difference εACA := St,s − St,s  is due to the ACA approximation
(see Assumption 22) but any rank revealing scheme could be used, even a singular
value decomposition of the M × M matrix St,s .

4.3 HCA based on cross approximation

In HCA(II) we replace the kernel function g(x, y) = Dx Dy γ (x, y) by the approxi-


mant g̃ := Dx Dy γ̃ (x, y) with
  

k  

γ̃ (x, y) :=  γ (x, yjq )C,q   γ (xiq , y)D,q  . (17)
=1 q=1 q=1

The coefficient matrices C, D above are given by the following recursion for-
mula (see Algorithm 3):


ι
dι() := γ (xiq , yj )Dι,q
q=1

cι() := γ (xi , yjq )Cι,q
q=1

|(â )i |−1/2 q=


C,q := −1 () , (18)
−C, ι=q Cι,q dι q < 

sign((â )i )|(â )i |−1/2 q = 


D,q := −1 () . (19)
−D, ι=q Dι,q cι q<

In order to estimate the approximation error |g − g̃| we consider the error at first
in the interpolation points for the generator function γ . Since we use the adaptive
cross approximation from [1] and only the pessimistic estimate

3
B
St,s − A T 2 = O(2k η k
)

has been proven, we have to check convergence and stability. The existence of a
low rank cross approximation is proven in [12] which gives rise to the hope that a
convergent and stable cross approximation algorithm can be found. Until then, we
have to check this a posteriori.
Hybrid cross approximation of integral operators 243

Assumption 22 We assume that the coefficients for the ACA approximation are
stable in the sense
|C,q |, |D,q | ≤ Cγ ,
where the stability constant may depend on γ , and we assume that the estimate
B
|(St,s − A T )ij | ≤ εACA

holds for all i ∈ t, j ∈ s and a prescribed accuracy εACA .


If the previous Assumption 22 is not fulfilled (it is a pointwise estimate for M 2
pairs of interpolation points and can be tested in O(M 2 k)), then we use the standard
tensor interpolation in the block t × s, i.e., the interpolant of γ with separation
rank k = M instead of γ̃ . Therefore, even if ACA fails to converge for the matrix
St,s — which we don’t expect — we still have a reliable approximation scheme.
Theorem 23 (Approximation Error in Interpolation Points) Under Assumption 22
the function γ̃ from (17) with coefficients C,q and D,q defined in (18) and (19)
satisfies
|γ (x, y) − γ̃ (x, y)| ≤ εACA (20)
for all interpolation points x = x1 , . . . , xM , y = y1 , . . . , yM .
Proof We apply ACA [1, Section 2] to the function γ (x, y). In the first step we
produce the rank 1 function
γ1 (x, y) := γ (x, yj1 )γ (xi1 , y)γ (xi1 , yj1 )−1 .
The th approximation γ (x, y),  = 1, . . . , k, will be of the form
  

 ι  ι
γ (x, y) =  γ (x, yjq )Cι,q   γ (xiq , y)Dι,q  , (21)
ι=1 q=1 q=1

which is true for the first term with


C11 = |σ1 |−1/2 , D11 = sign(σ1 )|σ1 |−1/2 , σ1 := γ (xi1 , yj1 ) = (â1 )i1 .
For the induction step from  to  + 1 we apply the ACA step
γ+1 (x, y) = γ (x, y)
  
γ (x, yj+1 ) − γ (x, yj+1 ) γ (xi+1 , y) − γ (xi+1 , y)
+ .
γ (xi+1 , yj+1 ) − γ (xi+1 , yj+1 )
Inserting the ansatz (21) and comparing the coefficients of the functions γ (x, yjq )
and γ (xiq , y) yields the desired formulae for Cι,q and Dι,q . The approximation error
is estimated in [1, Theorem 4] for all interpolation points xi , yi , i = 1, . . . , M,
and due to Assumption 22

|γ (xi , yj ) − γ̃ (xi , yj )| = |(St,s − ÂB̂ T )ij | ≤ εACA .




244 S. Börm, L. Grasedyck

The previous theorem proves the convergence of γ̃ → γ in the interpolation


points. In order to estimate the approximation error for arbitrary points x, y, we
have to relate γ̃ and γ to their respective interpolants. For γ we have required the
asymptotic smoothness, while γ̃ inherits this property from γ .
Lemma 24 (Smoothness of γ̃ ) Let γ be asymptotically smooth with constants
Cas1 , Cas2 . The function γ̃ is asymptotically smooth with constants
Cas1 (γ̃ ) := k 3 Cγ2 Cas1
2
(Cas2 (1 + η)−1 dist(Bt , Bs ))−σ ,
Cas2 (γ̃ ) := (1 + η)−1 Cas2 .
Proof Let x ∈ Bt , y ∈ Bs . Then the (α, β)-derivative of γ̃ is estimated by
 

k 

|∂xα ∂yβ γ̃ (x, y)| ≤  |∂xα γ (x, yjq )| |C,q |
=1 q=1
 

× |∂yβ γ (xiq , y)| |D,q |
q=1

≤k 3
Cγ2 ∂xα γ (x, ·)∞,Bs ∂yβ γ (·, y)∞,Bt .
Due to the admissibility condition we conclude x − y ≤ x − ỹ + diam(Bs ) ≤
x − ỹ + ηx − ỹ and thus
∂xα γ (x, ·)∞,Bs ≤ Cas1 (Cas2 (1 + η)−1 x − y)−|α|−σ α!,
β
analogously for the ∂y derivative. Both together yield
|∂xα ∂yβ γ̃ (x, y)| ≤ Cas1
2
(Cas2 (1 + η)−1 x − y)−|α|−|β|−2σ α!β!


Corollary 25 (Approximation Error) Under Assumption 22 the function γ̃ with
coefficients C,q , D,q defined in (18) and (19) satisfies
γ − γ̃ ∞,Bt ×Bs ≤ 2d
m εACA + 2εint , (22)
where εint is the interpolation error that is estimated separately in Theorem 17.
Proof We apply the tensor interpolation ImBt ×Bs to the asymptotically smooth func-
tions γ and γ̃ and define γint := ImBt ×Bs [γ ], γ̃int := ImBt ×Bs [γ̃ ]. Both interpolants
fulfil the estimate
γ − γint ∞,Bt ×Bs ≤ εint ,
γ̃ − γ̃int ∞,Bt ×Bs ≤ εint
with an accuracy εint estimated in Theorem 17. Since both interpolants use the
same interpolation points and the difference between γ and γ̃ in the interpolation
points is bounded in Theorem 23 by εACA , the stability of the interpolation scheme
yields
γint − γ̃int ∞,Bt ×Bs ≤ 2d
m εACA .


Hybrid cross approximation of integral operators 245

Theorem 26 (Separable Approximation) Under Assumption 22 the function


  

k  

g̃(x, y) :=  g(x, yjq )C,q   g(xiq , y)D,q  . (23)
=1 q=1 q=1

with coefficients C,q , D,q defined in (18) and (19) satisfies


2m2 cx  y 2m2 cy 
g − g̃∞,Bt ×Bs ≤ c0x  + c0  + 2d
m εACA
diam(Bt ) diam(Bs )
+2εint ,
where εint is the interpolation error of the derivatives estimated separately in Cor-
ollary 18.
Proof By definition we have g = Dx Dy γ and g̃ = Dx Dy γ̃ . We apply the tensor
interpolation ImBt ×Bs to the asymptotically smooth functions γ and γ̃ and define
γint := ImBt ×Bs [γ ], γ̃int := ImBt ×Bs [γ̃ ]. Both interpolants fulfil the estimate
Dx Dy γ − Dx Dy γint ∞,Bt ×Bs ≤ εint ,
Dx Dy γ̃ − Dx Dy γ̃int ∞,Bt ×Bs ≤ εint
with an accuracy εint estimated in Corollary 18. It remains to estimate the difference
between the interpolants gint := Dx Dy γint and g̃int := Dx Dy γ̃int :
(3) y
gint − g̃int ∞,Bt ×Bs ≤ c0x c0 γint − γ̃int ∞,Bt ×Bs
+c0x cy , ∇y (γint − γ̃int )∞,Bt ×Bs
y
+c0 cx , ∇x (γint − γ̃int )∞,Bt ×Bs
+cx , ∇x cy , ∇y (γint − γ̃int )∞,Bt ×Bs
Markov y
≤ c0x c0 2d
m εACA
2m2
+c0x cy  2d εACA
diam(Bs ) m
y 2m2
+c0 cx  2d εACA
diam(Bt ) m
4m4
+cx cy  2d εACA .
diam(Bt )diam(Bs ) m


Corollary 27 (Cross Approximation) Let t × s be an admissible block. Let Cfe ,
d , h be as in (16). If the degenerate kernel approximation is based on the cross
approximation (23), the factorised matrix AB T from Algorithm 3 satisfies

G|t×s − AB T 2 ≤ Cfe |t |1/2 |s |1/2 hd 2εint + 2d


m εACA ×

2m2 cx  y 2m2 cy 


× c0x  + c0  + .
diam(Bt ) diam(Bs )
246 S. Börm, L. Grasedyck

Fig. 4 The crank shaft geometry from NETGEN

Proof The estimate from Lemma 19 applies in the same way and with the same
estimates for g̃ constructed by cross approximation and AB T instead of Ut St,s VsT .
The error due to the degenerate kernel approximation is bounded in Theorem 26.



5 Numerical results

We will now consider a more complicated (and more realistic) geometry, namely
the crank shaft from the NETGEN package of Joachim Schöberl (see Figure 4).
For all numerical tests reported in the tables of this section, “Time” gives the time
in seconds for constructing the approximation (including the automatic coarsening
from [14]), “Strg.” gives the storage required for the resulting H-matrix in kilo-
bytes per degree of freedom and “Rel. Err.” gives the relative error I − G −1 G2
approximated using a power iteration and a sufficiently accurate representation of
the stiffness matrix G.
In order to compare our new approximation scheme with a working heuristic,
we have to modify Algorithm 1 (ACA). This is done by intercepting the situation
from Example 2 as well as the situations from [14, Example 2.3] depicted in Figure
5. We do this by inspecting an arbitrary row iref of the matrix block G|t×s to be
assembled. A column index jref is chosen among the minimisers (not maximisers !)
of |Giref ,j |. The row iref and column jref serve as a pivoting guide in ACA+ [14].
For the sake of completeness the ACA+ algorithm is contained in Algorithm 4. It
should be noted that the above mentioned modification of ACA remedies the three

Fig. 5 Three situations where ACA fails to converge (the shaded regions are the non-zero parts
of the matrix)
Hybrid cross approximation of integral operators 247

Algorithm 4 ACA+ with partial pivoting


procedure ACA+(X, var A, B)
Choose an initial reference row iref and compute the entries bjref := Xiref ,j of bref ∈ Rm
Determine an index jref that minimises |bjref ref
|
Compute the entries airef := Xi,jref of a ref ∈ Rn
k := 1, Iref := {1, . . . , n}, Jref := {1, . . . , m}
if mini |airef | < 10−8 maxi |airef | and minj |bjref | < 10−8 maxj |bjref | then
Iref := Iref \ {i | |airef | > 108 mini |airef |} and Jref := Jref \ {j | |bjref | > 108 minj |bjref |}
end if
repeat
if maxi |airef | > maxj |bjref | then
Determine an index ik∗ that maximises |airef ∗ | and compute bk ∈ R
m by
k


k−1
(bk )j := Xik∗ ,j − (aµ )ik∗ (bµ )j .
µ=1

Determine an index jk∗ that maximises |(bk )jk∗ | and compute ak ∈ Rn by


 

k−1
(ak )i := Xi,jk∗ − (aµ )i (bµ )jk∗  /(bk )jk∗
µ=1

else
Determine an index jk∗ that maximises |bjref∗ | and compute ak ∈ Rn by
k


k−1
(ak )i := Xi,jk∗ − (aµ )i (bµ )jk∗ .
µ=1

Determine an index ik∗ that maximises |(ak )ik∗ | and compute bk ∈ Rm by


 

k−1
(bk )j := Xik∗ ,j − (aµ )ik∗ (bµ )j  /(ak )ik∗
µ=1

end if
if ik∗ = iref then
Choose iref ∈ Iref and set Iref := Iref \ {iref } and compute bjref := Xiref ,j −
k−1
ν=1 (aν )iref (bν )j
end if
if jk∗ = jref then
Choose jref ∈ Jref and set Jref := Jref \ {jref } and compute airef := Xi,jref −
k−1
ν=1 (aν )i (bν )jref
end if
Update the reference entries airef := airef − (ak )i (bk )jref and bjref := bjref − (ak )iref (bk )j
k := k + 1
until ak 2 bk 2 ≤ a1 2 b1 2
248 S. Börm, L. Grasedyck

Table 4 Comparison of ACA and HCA(II) on the Crank Shaft

ACA HCA(II)
Time Strg. Rel. Err. Time Strg. Rel. Err.
333 4.9 1.9×10−1 362 5.0 2.0×10−1
449 8.9 1.8×10−2 537 8.9 1.7×10−2
837 14.5 3.4×10−3 796 13.2 1.6×10−3
2306 42.0 1.8×10−4 1151 18.7 1.5×10−4

situations

from Figure 5 but it does not prove convergence, not even the pessimistic
3
O(2 η ) estimate.
k k

5.1 Dependency on the accuracy of the quadrature

In our first numerical test we investigate the dependency of the compression by


ACA and HCA(II) on the accuracy of the quadrature used in the farfield. Typically,
one uses a given quadrature formula and chooses the accuracy for the compression
so that it is below the quadrature error. In the farfield one can even lower the order
of the quadrature, because the integrand is smooth compared to the diameter of
the elements. In the numerical test we try to approximate the stiffness matrix G
(assembled with a fixed quadrature) with n = 25744 degrees of freedom by ACA
and HCA(II) with increasing accuracy.
The results in Table 4 indicate that ACA is deteriorated by the quadrature error
and produces approximations with much higher storage requirements.
Both methods yield an approximation of roughly the same quality in almost the
same time — except for the higher accuracy where ACA is deteriorated. We con-
clude that for ACA one should choose a sufficiently accurate quadrature formula
and omit the lowering of the quadrature order in the farfield.
The situation is different for HCA. Since we separate the kernel in the bound-
ing box before applying the quadrature, the blockwise rank produced by HCA is
independent of the quadrature order, the basis functions or the element size.

5.2 Dependency on the meshsize

According to Lemma 12, we expect the storage requirements as well as the time
for the setup of an H-matrix approximation with fixed accuracy (rank) to be

Table 5 HCA(II) on the Crank Shaft with increasing number of degrees of freedom, the elapsed
time t is measured in 1000 seconds
n = 25744 n = 102976 n = 411904
t Strg. Rel. Err. t Strg. Rel. Err. t Strg. Rel. Err.
.27 4.9 1.9×10−1 1.06 7.0 1.6×10−1 4.45 8.4 1.6×10−1
.56 8.9 1.5×10−2 2.30 12.2 1.3×10−2 9.72 14.7 1.4×10−2
.83 13.2 1.1×10−3 3.43 18.2 1.2×10−3 14.77 21.9 1.7×10−3
1.73 18.6 7.4×10−5 7.09 25.4 6.6×10−5 30.04 30.6 7.0×10−5
Hybrid cross approximation of integral operators 249

O(n log(n)), i.e., an increase per degree of freedom as the mesh is regularly re-
fined. This behaviour can be observed in Table 5 where we approximate the stiffness
matrix on three different levels with n = 25744, 102976, 411904 degrees of free-
dom. The order of the quadrature is increased from the first to the second row and
from the third to the fourth row.

References

1. Bebendorf, M.: Approximation of boundary element matrices. Numer. Math. 86(4), 565–589
(2000)
2. Bebendorf, M., Rjasanov, S.: Adaptive Low-Rank Approximation of Collocation Matrices.
Computing 70(1), 1–24 (2003)
3. Börm, S.: Approximation of integral operators by H2 -matrices with adaptive bases. Pre-
print 18, Max Planck Institute for Mathematics in the Sciences, 2004. To appear in
Computing
4. Börm, S., Grasedyck, L.: HLib – a library for H- and H2 -matrices, 1999. Available at
http://www.hlib.org/
5. Börm, S., Grasedyck, L.: Low-rank approximation of integral operators by interpolation.
Computing 72, 325–332 (2004)
6. Börm, S., Grasedyck, L., Hackbusch, W.: Introduction to hierarchical matrices with appli-
cations. Engineering Analysis with Boundary Elements 27, 405–422 (2003)
7. Börm, S., Hackbusch, W.: Data-sparse approximation by adaptive H2 -matrices. Computing
69, 1–35 (2002)
8. Börm, S., Hackbusch, W.: H2 -matrix approximation of integral operators by interpolation.
Applied Numerical Mathematics 43, 129–143 (2002)
9. Dahmen, W., Schneider, R.: Wavelets on manifolds I: Construction and domain decomposi-
tion. SIAM Journal of Mathematical Analysis 31, 184–230 (1999)
10. Ronald A. DeVore, George G. Lorentz.: Constructive Approximation. Springer-Verlag, 1993
11. Ford, J.M., Tyrtyshnikov, E.E.: Combining kronecker product approximation with discrete
wavelet transforms to solve dense, function-related linear systems. SIAM J. Sci. Comput.
25(3), 961–981 (2003)
12. Goreinov, S.A., Tyrtyshnikov, E.E., Zamarashkin, N.L.: A theory of pseudoskeleton approx-
imations. Lin. Alg. Appl. 261, 1–22 (1997)
13. Grasedyck, L.: Theorie und Anwendungen Hierarchischer Matrizen. Doctoral thesis, Uni-
versität Kiel, 2001
14. Grasedyck, L.: Adaptive recompression of H-matrices for BEM. Technical report 17, Max
Planck Institute for Mathematics in the Sciences, 2004
15. Grasedyck, L., Hackbusch, W.: Construction and arithmetics of H-matrices. Computing
70(4), 295–334 (2003)
16. Greengard, L., Rokhlin, V.: A new version of the fast multipole method for the Laplace in
three dimensions. In: Acta Numerica 1997, Cambridge University Press, 1997, pp. 229–269
17. Hackbusch, W.: A sparse matrix arithmetic based on H-matrices. Part I: Introduction to
H-matrices. Computing 62, 89–108 (1999)
18. Hackbusch, W., Khoromskij, B.: A sparse matrix arithmetic based on H-matrices. Part II:
Application to multi-dimensional problems. Computing 64, 21–47 (2000)
19. Hackbusch, W., Nowak, Z.P.: On the fast matrix multiplication in the boundary element
method by panel clustering. Numerische Mathematik 54, 463–491 (1989)
20. Rivlin, T.J.: The Chebyshev Polynomials. Wiley-Interscience, New York, 1984
21. Rokhlin, V.: Rapid solution of integral equations of classical potential theory. J. Comput.
Phys. 60, 187–207 (1985)
22. Sauter, S.: Variable order panel clustering (extended version). Technical report 52, Max-
Planck-Institut für Mathematik, Leipzig, Germany, 1999
23. Tyrtyshnikov, E.: Incomplete cross approximation in the mosaic-skeleton method. Comput-
ing (64), 367–380 (2000)

You might also like