Quantum information theory (MAT4430) Spring 2021
Lecture 14: The Haar measure and Hasting’s counterexample
Lecturer: Alexander Müller-Hermes
In the previous lecture, we have seen how to associate quantum channels to subspaces of
tensor product spaces. By choosing subspaces in a clever way, we can use such constructions
to show that certain operator norms are not additive under tensor products. Unfortunately,
such concrete constructions have so far been too weak to give a counterexample to the
additivity of the minimum output entropy, and hence the Holevo information. In this lecture,
we will show how random subspaces can lead to counterexamples for these conjectures. For
this, we will first introduce the Haar measure on the unitary group U C
d , which is used
to construct random subspaces. We will then sketch the main lines of argument behind
Hasting’s proof.
Cd
1 The Haar measure on the unitary group U
1.1 Preliminaries from measure theory
We will start with some preliminaries from measure theory, which we will specialize to
the finite-dimensional setting of complex Euclidean spaces for simplicity. Note that these
concepts are also valid in more general settings.
Definition 1.1 (Borel σ-algebras and Borel measures). Let A ⊂ H denote a subset of a
complex Euclidean space H.
1. The Borel σ-algebra Borel (A) is the σ-algebra generated by the open subsets of A in
the subset topology inherented from the topology of H. Specifically, Borel (A) is the
smallest σ-agebra containing the open subsets of A and which is closed under taking
complements and countable unions. The elements of Borel (A) are called Borel sets.
2. A Borel measure on A is a function µ : Borel (A) → [0, ∞] such that
µ(∅) = 0,
and
∞ ∞
!
[ X
µ Sk = µ(Sk ),
k=1 k=1
for any family {Sk }∞
k=1 ⊂ Borel (A) of pairwise disjoint Borel sets. A Borel measure
on A is called a probability measure if µ(A) = 1.
Having defined the Borel σ-algebra, we may consider Borel functions (or Borel measurable
functions) f : A → B between subsets A ⊆ H and B ⊆ H0 of complex Euclidean spaces H
and H0 , i.e., such that f −1 (S 0 ) ∈ Borel (A) for any S 0 ∈ Borel (B). In particular, we may
C
consider Borel functions f : A → . From the definition of the Borel σ-algebra it is easy to
see that any continuous function f : A → B is a Borel function. Given any Borel measure
µ on A and a Borel function f : A → B, we may define a Borel measure ν on B by setting
ν(S) = µ(f −1 (S)) for any Borel set S. This measure is called the pushforward measure
For any subset A ⊆ H and any Borel measure µ : Borel (A) → [0, ∞] we will define
integrals Z
f (x)dµ(x),
A
for certain measurable functions f : A → R in the usual way:
1
1. Consider a characteristic function χS : A → R
of some Borel set S ∈ Borel (A) given
by (
1, if x ∈ S
χS (x) =
0, otherwise.
For any such function, we define
Z
χS (x)dµ(x) = µ(S).
A
2. A function f : A → R is called a simple function if it can be written as
K
X
f= αk χSk ,
k=1
with αk ≥ 0 and Sk ∈ Borel (A) for each k ∈ {1, . . . , K}. For such a function, we define
Z K
X
f (x)dµ(x) = αk µ(Sk ).
A k=1
3. For every non-negative Borel function f : A → [0, ∞) we define
Z Z
f (x)dµ(x) = sup g(x)dµ(x),
A g≤f A
where the supremum goes over all simple functions g : A → [0, ∞) satisfying f (x) ≤
g(x) for every x ∈ A. We say that f is integrable if the supremum in the last equation
is finite.
R
4. A Borel function f : A → is called integrable if it can be written as f = f+ − f− for
non-negative integrable Borel functions f+ , f− : A → [0, ∞) and in this case we define
Z Z Z
f (x)dµ(x) = f+ (x)dµ(x) − f− (x)dµ(x).
A A A
Similarly, a Borel function f : A → C
is called integrable if it can be written as
R
f = g + ih for integrable Borel functions g, h : A → and in this case we define
Z Z Z
f (x)dµ(x) = g(x)dµ(x) + i h(x)dµ(x).
A A A
It can be shown that the values of the integrals defined above are independent of the
decompositions chosen for the function f .
The following lemma is an easy consequence of the third point from above:
Lemma 1.2. If f : A → [0, ∞) is an integrable non-negative Borel function, then we have
Z
f (x)dµ(x) ≥ 0,
A
for any Borel measure µ.
We will also need the following theorem showing that continuous functions are integrable
in the cases we are interested in.
2
Theorem 1.3. If A ⊂ H is a compact subset and µ : Borel (A) → [0, ∞) a finite Borel
C
measure, then any continuous function f : A → is µ-integrable.
Proof. For a continuous function f : A → C
we define fmax = maxz∈A |f (z)|, which exists
by compactness of A and continuity of f . We may decompose
f = g + ih = g+ − g− + i(h+ − h− ),
where g = (f + f )/2 and h = (f − f )/(2i), and g+ = g · χg−1 ([0,∞)) and h+ = h · χh−1 ([0,∞)) ,
and g− = −(g − g+ ) and h− = −(h − h+ ). Clearly, all involved functions are Borel functions.
Now, note that
g+ (x) ≤ |Re (f (x)) | ≤ fmax ,
for all x ∈ A, which shows that
Z Z
g+ (x)dµ(x) = sup q(x)dµ(x) ≤ µ(A)fmax < ∞,
A q≤g+ A
where the supremum goes over all non-negative simple functions q : A → [0, ∞) satisfying
q(x) ≤ g+ (x) ≤ fmax for every x ∈ A. We conclude that g+ is integrable. In the same way
we can show that g− , h+ and h− are also integrable. This shows that f is integrable.
Finally, we will need operator-valued integrals, which in our (finite-dimensional) setting
may be defined component-wise. Specifically, given a subset A ⊆ H of a complex Euclidean
space H, and another complex Euclidean space H0 we will call a function f : A → B(H0 )
C
integrable if its entries fkl : A → in the computational basis given by fkl = hk|f (·)|li are
integrable. In this case, we denote by
Z
X= f (x)dµ(x) ∈ B(H0 ),
A
the operator with entries Z
Xkl = fkl (x)dµ(x),
A
in the computational basis. We finish this section with another useful observation concerning
the integrals of operator-valued functions mapping into a closed cone:
Theorem 1.4. Consider complex Euclidean spaces H and H0 , a compact subset A ⊂ H, and
a closed convex cone C ⊂ B(H0 )sa . If f : A → B(H0 ) is continuous and if f (x) ∈ C for every
x ∈ A, then Z
f (x)dµ(x) ∈ C.
A
Proof. Consider the dual cone C ∗ ⊂ B(H0 )sa and recall that
C = {y ∈ B(H0 )sa : hz, yiHS ≥ 0 : for all z ∈ C ∗ }.
Since x 7→ hz, f (x)iHS is continuous and non-negative, we conclude by Theorem 1.3 and
Lemma 1.2 that Z
hz, f (x)iHS dµ(x) ≥ 0,
A
Using the linearity of the integral we have
Z Z
hz, f (x)dµ(x)iHS = hz, f (x)iHS dµ(x) ≥ 0,
A A
for any z ∈ C ∗ which, by duality, finishes the proof.
3
Cd
1.2 The Haar measure on U
We will first state a definition/theorem, which can be seen as the main result of this section.
Definition 1.5 (Haar measure on U C
d ). For d ∈
N
, the unitary Haar measure η is the
unique Borel measure on U C d satisfying the following conditions:
1. Normalization:
Cd
η U = 1.
2. Unitary invariance:
η(S) = η (SU ) = η (U S) ,
for any Borel subset S ⊆ U C d and any unitary U ∈ U
Cd.
We will now go through the main steps to see how the Haar measure can be constructed
C R
and why it is unique. Identifying with 2 , we may consider the usual Lebesgue measure λ
C
on , which is a Borel measure. Next, we consider the standard Gaussian measure γ defined
by the density function
1 2
z 7→ e−|z| ,
π
i.e., such that Z
1 2
γ(S) = e−|z| dλ(z),
π S
for any Borel set S. Note that γ is a probability measure, which is quantifies the probability
of events involving standard Gaussian random variables. Using the product of d2 of such
standard Gaussian measures, we may define a Borel probability measure on B( d ) ' dC C 2
by
Z Y d Z
1 −|zk,l |2 1 †
Γ(S) = d2 e dΛ(Z) = d2 eTr[Z Z ] dΛ(Z),
π S k,l=1 π S
where we used the notation Z = [zk,l ]dk,l=1 and a simple trace formula. Here, Λ arises as the
C
product measure of d2 Lebesgue measures λ on and hence is the usual Lebesgue measure
C 2
R2
on d ' 2d . Again, we could have chosen a more probabilistic point of view by defining
C
random matrices M ∈ B( d ) with entries Mkl chosen i.i.d. according to the standard normal
C
distribution on . The measure Γ is the probability measure corresponding to these random
matrices. In the following, we will need two properties of the measure Γ:
Lemma 1.6. For any d ∈ N let Γ denote the Borel probability measure defined above. Then,
we have:
1. We have
Cd)
Γ {M ∈ B( : det(M ) = 0} = 0.
2. For any Borel set S ∈ Borel B( Cd) and any unitary U ∈ U Cd we have
Γ (U S) = Γ (S) .
Before proving the lemma, we will need a result about Lebesgue null sets:
C
Lemma 1.7. For a multivariate polynomial p ∈ [x1 , . . . , xn ] we consider the set S = {x ∈
Cn : p(x) = 0}. Then, the set S is Borel and we either have λ(S) = 0 or S = n . C
4
Proof. Since the function x 7→ p(x) is continuous the set S is closed and hence Borel. In the
following, we assume that p 6= 0 and we will show that λ(S) = 0 in this case. The proof
proceeds by induction in n. The statement is certainly true if n = 1 since p has an at most
finite number of zeros forming a Lebesgue null set. Assume that the statement is true for
N C
some n ∈ and consider a non-zero polynomial p ∈ [x1 , . . . , xn+1 ]. For (x, xn+1 ) ∈ n+1 C
we can write
Xk
p(x, xn+1 ) = pi (x)xin+1 ,
i=0
with polynomials pi ∈ C [x1, . . . , xn] for i ∈ {1, . . . , k}. Next, we define two sets:
A = {(x, xn+1 ) ∈ Cn+1 : p0 (x) = p1 (x) = · · · = pk (x) = 0},
and
k
C
X
n+1
B = {(x, xn+1 ) ∈ : pi (x)xin+1 = 0 and pi (x) 6= 0 for at least one i}.
i=0
Clearly, we have S = A ∪ B. By assumption at least one of the polynomials p0 , . . . , pk say pl
is non-zero and by the induction hypothesis we can conclude that
C
λ (A) ≤ λ {(x, xn+1 ) ∈ n+1 : pl (x) = 0}
C
Z
= λ ({x ∈ n : pl (x) = 0}) dλ(xn+1 ) = 0.
Therefore, we have λ(A) = 0.
C
x ∈ n such that pi (x) 6= 0 for some i there are at most
Consider now the set B. For any P
finitely many values xn+1 such that ki=0 pi (x)xin+1 = 0. We conclude that
k
!
C
Z X
λ(B) = λ {xn+1 ∈ : pi (x)xin+1 = 0} dλ(x)
{x∈ Cn : pl (x)6=0 for some l} i=0
= 0.
We conclude that
λ(S) ≤ λ(A) + λ(B) = 0,
and hence λ(S) = 0.
Now, we proceed with the proof of the properties of the measure Γ:
Proof of Lemma 1.6. For the first statement, note that M 7→ det(M ) is a polynomial and
C
therefore the set S = {M ∈ B( d ) : det(M ) = 0} is Borel (polynomials are continuous).
Now, we note that if S is any set of zeros of a multivariate polynomial, then we have either
C
Λ(S) = 0 or S = B( d ), where dZ denotes the usual Lebesgue measure on B( d ) viewed as C
C d2 . Since S = C
6 B( d ) we have Λ(S) = 0 and we conclude that Γ(S) = 0 as well since Γ is
absolutely continuous with respect to Λ since it is defined from Λ using a density function.
d C
For the second statement consider a Borel set S ∈ Borel B( ) and a unitary U ∈
U C d . Note that the Jacobian of the transformation X 7→ U X (viewed as a transformation
C 2
of d to itself) equals JU = U ⊕d such that | det (JU ) | = 1. Now, we can apply the
transformation formula for the Lebesgue integral to compute
Z
1 †
Γ (U S) = d2 eTr[Z Z ] dΛ(Z)
π
ZU S
1 † † †
= d2 | det (JU ) |eTr[(U Z) (U Z)] dΛ(Z)
π
ZS
1 †
= d2 eTr[Z Z ] dΛ(Z) = Γ (S) .
π S
5
We can now introduce the Haar measure on the unitary group U Cd.
Definition 1.8 (The Haar measure – concrete construction). For d ∈ N, we consider the
continous function F : B(Cd ) \ {M : det(M ) = 0} → U Cd given by
−1/2
F (M ) = M M † M .
Then, the Haar measure η : Borel U C
d
→ [0, ∞) is defined as the pushforward measure
of Γ under F , i.e.,
η (S) = Γ F −1 (S) ,
for any S ∈ Borel U C
d .
We will first show that the previous definition gives rise to a Borel measure with the
properties of the Haar measure.
Theorem 1.9. The function η : Borel U C
d
→ [0, ∞) from the previous definition is a
normalized and unitarily invariant Borel measure.
Proof. Note that the function F is continuous and hence a Borel function. Since Γ is a Borel
measure, we conclude that η = Γ ◦ F −1 is a Borel measure as well. It is easy to see that η is
normalized since M M † M
−1/2
∈U Cd if and only if M is invertible. Therefore, we have
C C
d
η U = Γ {M ∈ B( d ) : det(M ) 6= 0} = 1,
by Lemma 1.6. Finally, we need to show that η is unitarily invariant. To see this, note that
−1/2 −1/2
F (U M ) = U M (U M )† U M = U M M †M = U F (M ),
for any unitary U ∈ U Cd and any invertible M . Therefore, we have
η (U S) = Γ F −1 (U S) = Γ U F −1 (S) = Γ F −1 (S) = η (S) ,
for any Borel set S, where we used Lemma 1.6, and the proof is finished.
It is not difficult to show that the Haar measure is unique if it exists:
Lemma 1.10 (Uniqueness of the Haar measure). For d ∈ N let ν : Borel U Cd
→ [0, ∞)
denote a Borel probability measure such that at least one of the following is true:
• For every S ⊆ U C d and every U ∈ U
C
d we have ν (U S) = ν (S).
• For every S ⊆ U C d and every U ∈ U
C
d we have ν (SU ) = ν (S).
Then, ν equals the Haar measure η on U Cd .
Proof. We will state the proof in the case where ν satisfies the first property. The case
of
the second property works in the same way. Consider a Borel set S ∈ Borel U d C and
denote by χS : U C d
C
→ the characteristic function of S. Note that
Z Z
ν (S) = χS (U ) dν(U ) = χS (V U ) dν(U ),
for every unitary V ∈ U C d . Since the Haar measure η is normalized and unitarily invari-
ant, we have
Z Z Z Z Z
ν (S) = χS (V U ) dν(U )dη(V ) = χS (V U ) dη(V )dν(U ) = η(SU )dν(U ) = η (S) ,
where we used the Fubini-Tonelli theorem for the second equality.
6
2 Random constructions and extremely entangled subspaces
We have seen in the previous lecture that some of the norms k · k1→p are not multiplicative.
C C
The proof of this result relied on choosing nice subspaces of dE ⊗ dB . Unfortunately,
nobody has so far come up with a explicit subspace showing multiplicativity of k · k1→p for
values p close to 1, or to show additivity of the minimum output entropy. The constructions
known to show these results are all probabilistic, i.e., the subspace is chosen at random. In
the following, we will comment on some aspects of these constructions, but many details
exceed the scope of this course.
2.1 How can we choose subspaces at random?
To define random subspaces we need the notion of a Haar-random unitary, which is simply
a random unitary matrix distributed according to the Haar measure. The construction of
the previous section suggests a way to generate such unitaries in practice:
Theorem 2.1 (Concrete Haar-random unitaries). Consider a random unitary U ∈ U d
C
constructed as follows:
C
1. Consider a random matrix M ∈ B( d ) with entries Mkl = Xkl + iYkl , where Xkl and
Ykl are chosen i.i.d. normally distributed.
the singular value decomposition M = W SV and define a unitary U = W V ∈
2. Compute
U C
d .
Then, U is a Haar-random unitary.
We will use the following definition of a random subspace:
Definition 2.2 (Random subspaces). A random k-dimensional subspace of Cd is given by
C
U ( k ⊕ 0d−k ), where U ∈ U C
d is a Haar-random unitary.
2.2 Many subspaces are extremely entangled
We start by defining a measure of entanglement for pure states:
Definition 2.3. For a bipartite pure state |ψi ∈ HA ⊗ HB we define the entropy of entan-
glement as
E (|ψi) = H (TrB [|ψihψ|]) .
The entropy of entanglement measures the entanglement in pure states, and it can be
verified easily that
E (|ψA i ⊗ |ψB i) = 0,
for pure states |ψA i ∈ HA and |ψB i ∈ HB . We also have
E (|Ωi) = log(dim(H)),
if |Ωi ∈ H ⊗ H is the maximally entangled state. The following theorem is the key part of
the counterexample to the additivity of the minimum output entropy. We will not prove it
here. Its proof relies on the so-called Dvoretzky-Milman theorem in asymptotic geometric
analysis, which goes beyond the scope of this course.
Theorem 2.4 (Extremely entangled subspaces). There are absolute constants c, C > 0 such
that we have the following: With dB = d2E and dA = cd2E a random dA -dimensional subspace
C C
S ⊂ dE ⊗ dB satisfies
C
min E (|ψi) ≥ log(dE ) − ,
|ψi∈S,hψ|ψi=1 dE
with high probability.
7
Let us comment briefly on the main idea behind the proof. The main tool used is the
following theorem by Dvoretzky and Milman:
Theorem 2.5. There are absolute constants c, c0 > 0 such that we have the following: Fix
C
an ∈ (0, 1]. For any circled1 convex body K ⊂ n , there exists an M ∈ and a number R
R
d(K) ∈ such that a random k-dimensional subspace E with k = c2 d(K) satisfies
(1 − )M kxk2 ≤ kxkK ≤ (1 + )M kxk2 ,
for all x ∈ E, with probability larger than 1 − exp(−c0 2 d(K)). Here, we used the norm
kxkK = inf{t ≥ 0 : x ∈ tK}.
In the previous theorem, the number d(K) is also called the Dvoretzky-dimension of K
and it can be computed using geometric properties of K. Geometrically, the Dvoretzky-
Milman theorem (which also has a version over the reals) states that taking intersections
of a sufficiently low-dimensional hyperplane with a sufficiently high-dimensional circled (or
symmetric in the real case) convex body results in almost Euclidean balls with high proba-
bility. Let us now bring the statement of Theorem 2.4 into a form where it becomes more
clear that the Dvoretzky-Milman theorem plays a role. We need the following lemma:
Lemma 2.6. For any quantum state ρ ∈ D(H) with d = dim(H) we have
1H 2
H(ρ) ≥ log(d) − dkρ − k .
d 2
The following theorem implies Theorem 2.4 when combined with the previous lemma:
Theorem 2.7. There are absolute constants c, C > 0 such that we have the following: Set
C C
dB = d2E and dA = cd2E and let Bk·k2 ⊂ B( dE , dB ) denote the Hilbert-Schmidt unit ball.
We define a function g : Bk·k2 → by R
1dE
g(X) = kXX † − k2 .
dE
With large probability, a random dA -dimensional subspace E ⊂ B( Cd , Cd
E B ) satifies
C
sup g(M ) ≤ .
X∈Bk·k2 ∩E dE
How is this related to the Dvoretzky-Milman theorem? The statemement of the theorem
can be reformulated as follows: For every X ∈ Bk·k2 ∩ E we have
4 2 Tr X † X
C2 † 1dE 2 Tr [1dE ]
2 ≥ kXX − k2 = Tr |X| − + ≥ 0.
dE dE dE dE
Rearranging these inequalities yields
1/4 1/4
C2 C2
−1/4 −1/4 −1/4
dE kXk2 ≤ kXk4 ≤ dE 1+ kXk2 ≤ dE 1+ kXk2 ,
dE 4dE
which is a statement as in Theorem 2.5. Unfortunately, going through the constants (and
1/2 1
using that k(Bk·k4 ) = dE dB ) only gives an 1/4 , which is not good enough to prove the
dE
previous inequalities and Theorem 2.7. Getting the correct scaling is a bit more involved
and the interested reader might want to check out the book “Alice and Bob meet Banach”
by Guillaume Aubrun and Stanislaw Szarek (from which the material in this lecture was
mostly adapted from).
1
such that eiα x ∈ K for every α ∈ R whenever x ∈ K
8
2.3 Counterexample to the additivity of the Holevo quantity
Using this theorem, we can find a counterexample to the additivity problem:
Corollary 2.8. There exist quantum channels S, T : B( CdA ) → B( Cd B ) such that
Hmin (S ⊗ T ) < Hmin (S) + Hmin (T ) .
Proof. Choose a quantum channel T : B( Cd A C
) → B( dB ) with Stinespring dilation
h i
T (X) = TrE V XV † ,
C C C C
with an isometry V : dA → dE ⊗ dB such that Im (V ) ⊂ dE ⊗ dB is extremely C
entangled, as in Theorem 2.4. Moreover, we choose S = T as the conjugate channel. By the
properties of the subspace Im (V ), we have
C
Hmin (T ) = Hmin (S) = min E (|ψi) ≥ log(dE ) − .
|ψi∈S,hψ|ψi=1 dE
Using Lemma ??, we find that the quantum state (S ⊗ T ) (ωA0 A ) has an eigenvalue larger
than
dA c
= .
dB dE dE
By the exercises, we find that
c log(dE ) 1
Hmin (S ⊗ T ) ≤ H ((S ⊗ T ) (ωA0 A )) ≤ 2 log(dE ) − + .
dE dE
For large enough values of dE we conclude that
Hmin (S ⊗ T ) < Hmin (S) + Hmin (T ) .
The previous theorem shows that the additivity is violated for two different channels.
Using two corollaries from the previous lecture shows the following:
Theorem 2.9. There exists a quantum channel T : B(HA ) → B(HB ) such that
χ(T ⊗ T ) > χ(T ) + χ(T ).
A consequence of this theorem is that the regularization in the HSW theorem is necessary,
and it cannot be replaced by the single-letter formula χ(T ).