0% found this document useful (0 votes)

91 views35 pages

Quantum Random Tensor Theory

This document discusses extending random matrix theory to mixtures of random product states. It considers determining the largest eigenvalue of a sum of random product states as the dimension d goes to infinity while keeping the ratio of p/d fixed. When k=1, the Marchenko-Pastur law determines the largest eigenvalue and spectral density. The document shows that for k>1, the largest eigenvalue is still approximately the same and the spectral density approaches that of the Marchenko-Pastur law, generalizing the random matrix theory result to random tensors. It introduces new techniques for analyzing mixtures of random product states that do not have independent entries or unitary invariance.

Uploaded by

Sorin Opincariu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

91 views35 pages

Quantum Random Tensor Theory

Uploaded by

Sorin Opincariu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

Random tensor theory: extending random matrix theory to mixtures of random

product states
A. Ambainis
Faculty of Computing, University of Latvia, Riga, Latvia

A. W. Harrow
Department of Mathematics, University of Bristol, Bristol, U.K.

M. B. Hastings
Microsoft Research, Station Q, CNSI Building, University of California, Santa Barbara, CA, 93106
arXiv:0910.0472v2 [quant-ph] 12 Jan 2010

We consider a problem in random matrix theory that is inspired by quantum information theory:
determining the largest eigenvalue of a sum of p random product states in (Cd )⊗k , where k and p/dk
are fixed while d → ∞. When k = 1, the p Marčenko-Pastur law determines (up to small corrections)
p
not only the largest eigenvalue ((1 + p/dk )2 ) but the smallest eigenvalue (min(0, 1 − p/dk )2 )
and the spectral density in between. We use p the method of moments to show that for k > 1 the
largest eigenvalue is still approximately (1+ p/dk )2 and the spectral density approaches that of the
Marčenko-Pastur law, generalizing the random matrix theory result to the random tensor case. Our
bound on the largest eigenvalue has implications both for sampling from a particular heavy-tailed
distribution and for a recently proposed quantum data-hiding and correlation-locking scheme due
to Leung and Winter.
Since the matrices we consider have neither independent entries nor unitary invariance, we need
to develop new techniques for their analysis. The main contribution of this paper is to give three dif-
ferent methods for analyzing mixtures of random product states: a diagrammatic approach based on
Gaussian integrals, a combinatorial method that looks at the cycle decompositions of permutations
and a recursive method that uses a variant of the Schwinger-Dyson equations.

I. INTRODUCTION AND RELATED WORK

A. Background

A classical problem in probability is to throw p balls into d bins and to observe the maximum occupancy of any
bin. If we set the ratio x = p/d to a constant and take d large, this maximum occupancy is O(ln d/ ln ln d) with high
probability (in fact, this bound is tight, but we will not discuss that here). There are two natural ways to prove this,
which we call the large deviation method and the trace method. First, we describe the large deviation method. If the
occupancies of the bins are z1 , . . . , zd , then each zi is distributed approximately according to a Poisson distribution
with parameter x; i.e. Pr[zi = z] ≈ xz /ex z!. Choosing z ≫ ln d/ ln ln d implies that Pr[zi ≥ z] ≪ 1/d for each i.
Thus, the union bound implies that with high probability all of the zi are ≤ O(ln d/ ln ln d). More generally, the large
deviation method proceeds by: (1) representing a bad event (here, maximum occupancy being too large) as the union
of many simpler bad events (here, any one zi being too large), then (2) showing that each individual bad event is very
unlikely, and (3) using the union bound to conclude that with high probability none of the bad events occur. This
method has been used with great success throughout classical and quantum information theory [17, 18, 22].
This paper will discuss a problem in quantum information theory where the large deviation method fails. We will
show how instead a technique called the trace method can be effectively used. For the problem of balls into bins, the
trace method starts with the bound max zim ≤ z1m + . . . + zdm , where m is a large positive integer. Next, we take the
expectation of both sides and use convexity to show that
1 1 1
E[max zi ] ≤ (E[max zim ]) m ≤ d m (E[z1m ]) m .

Choosing m to minimize the right-hand side can be shown to yield the optimal ln d/ ln ln d + O(1) bound for the
expected maximum occupancy. In general, this approach is tight up to the factor of d1/m .
The quantum analogue of balls-into-bins problem is to choose p random unit vectors |ϕ1 i, . . . , |ϕp i from Cd and to
consider the spectrum of the matrix
p
X
Mp,d = ϕs , (1)
s=1
2

where we use ϕ to denote |ϕihϕ|. Again we are interested in the regime where x = p/d is fixed and d → ∞. We refer
to this case as the “normalized ensemble.” We also consider a slightly modified version of the problem in which the
states |ϕ̂s i are drawn from a complex Gaussian distribution with unit variance, so that the expectation of hϕ̂s |ϕ̂s i is
Pp
equal to one. Call the ensemble in the modified problem the “Gaussian ensemble” and define M̂p,d = s=1 ϕ̂s . Note
that M̂p,d = Φ̂† Φ̂, where Φ̂ = ps=1 |ϕihs| is a p × d matrix where each entry is an i.i.d. complex Gaussian variable
P
Pp Pd
with variance 1/d. That is, Φ̂ = s=1 j=1 (as,j + ibs,j )|sihj|, with as,j , bs,j i.i.d. real Gaussians each with mean
zero and variance 1/2d.
What we call the Gaussian ensemble is more conventionally known as the Wishart distribution, and has been
extensively studied. Additionally, we will see in Section II A that the normalized ensemble is nearly the same as
the Gaussian ensemble for large d. In either version of the quantum problem, the larger space from which we draw
vectors means fewer collisions than in the discrete classical case.√ The nonzero part of the spectrum of M has been
well studied[12, 32, 33], and it lies almost entirely between (1 ± x)2 as d → ∞. This can be proven using a variety
of techniques. When M is drawn according to the Gaussian ensemble, its spectrum is described by chiral random
matrix theory[32, 33]. This follows from the fact that the spectrum of M has the same distribution as the spectrum
of the square of the matrix

0 Φ̂
, (2)
Φ̂† 0

where Φ̂ is defined above. A variety of techniques have been used to compute the spectrum[5, 6, 12, 25]. The ability to
use Dyson gas methods, or to perform exact integrals over the unitary group with a Kazakov technique, has allowed
even the detailed structure of the eigenvalue spectrum near the edge to be worked out for this chiral random matrix
problem.
A large-deviation approach for the x ≪ 1 case was given in [23, appendix B]. In order to bound the spectrum of
′
Mp,d , they instead studied the Gram matrix Mp,d := ΦΦ† , which has the same spectrum as M . Next they considered
′
hφ|Mp,d |φi for a random choice of |φi. This quantity has expectation 1 and, by Levy’s lemma, is within ǫ of its
expectation with probability ≥ 1 − exp(O(dǫ2 )). On the other hand, |φi ∈ Cp , which can be covered by an ǫ-net of
′
size exp(O(p ln 1/ǫ)). Thus the entire spectrum of Mp,d (and equivalently Mp,d ) will be contained in 1 ± O(ǫ) with
high probability, where ǫ is a function of x that approaches 0 as x → 0.
In this paper, we consider a variant of the above quantum problem in which none of the techniques described above
is directly applicable. We choose our states |ϕs i to be product states in (Cd )⊗k ; i.e.

|ϕs i = |ϕ1s i ⊗ |ϕ2s i ⊗ · · · ⊗ |ϕks i,

for |ϕ1s i, . . . , |ϕks i ∈ Cd . We choose the individual states |ϕas i again either uniformly from all unit vectors in Cd
(the normalized product ensemble) or as Gaussian-distributed vectors with E[hϕ̂as |ϕ̂as i] = 1 (the Gaussian product
ensemble). The corresponding matrices are Mp,d,k and M̂p,d,k respectively. Note that k = 1 corresponds to the case
considered above; i.e. Mp,d,1 = Mp,d and M̂p,d,1 = M̂p,d . We are interested in the case when k > 1 is fixed. As above,
we also fix the parameter x = p/dk , while √ we take d → ∞. And as above, we would like to show that the spectrum
lies almost entirely within the region (1 ± x)2 with high probability.
However, the Dyson gas and Kazakov techniques[5, 6, 12, 25] that were used for k = 1 are not available for k > 1,
which may be considered a problem of random tensor theory. The difficulty is that we have a matrix with non-i.i.d.
entries and with unitary symmetry only within the k subsystems. Furthermore, large-deviation
p techniques are known
to work only in the x ≫ 1 limits. Here, Ref. [27] can prove that √ kM − xIk ≤ O( xk log(d)) with high probability,
which gives the right leading-order behavior only when x ≫ k log d. (The same bound is obtained √with different
techniques by Ref. [4].) The case when x ≫ 1 is handled by Ref. [2], which can bound kM − xIk ≤ O( x) with high
probability when k ≤ 2. (We will discuss this paper further in Section I C 2.)
However, some new techniques will be needed to cover the case when x ≤ O(1). Fortunately it turns out that
the diagrammatic techniques for k = 1 can be modified to work for general k. In Section II, we will use these
techniques to obtain an expansion in 1/d. Second, the large deviation approach of [23] achieves a concentration
bound of exp(−O(dǫ2 )) which needs to overcome an ǫ-net of size exp(O(p ln(1/ǫ))). This only functions when p ≪ d,
but we would like to take p nearly as large as dk . One approach when k = 2 is to use the fact that hψ|Mp,d,k |ψi
exhibits smaller fluctuations when |ψi is more entangled, and that most states are highly entangled. This technique
was used in an unpublished manuscript of Ambainis to prove that kMp,d,k k = O(1) with high probability when
p = O(d2 / poly ln(d)). However, the methods in this paper are simpler, more general and achieve stronger bounds.
Our strategy to bound the typical value of the largest eigenvalue of Mp,d,k will be to use a trace method: we bound
the expectation value of the trace of a high power, denoted m, of Mp,d,k . This yields an upper bound on kMp,d,k k
3

because of the following key inequality

kMp,d,k km ≤ tr(Mp,d,k
m
). (3)
m m
We then proceed to expand E[tr Mp,d,k ] (which we denote Ep,d,k ) as

X p
p X p
X X
m m
Ep,d,k = E[tr Mp,d,k ] = ... Ed [s1 , s2 , ..., sm ]k := Ed [~s]k (4)
s1 =1 s2 =1 sm =1 s∈[p]m
~

where

Ed [~s] = E [tr(ϕs1 ϕs2 . . . ϕsm )] , (5)

m M
and [p] = {1, . . . , p}. Similarly we define Êp,d,k = E[tr M̂p,d,k ] and Êd [~s] = E [tr(ϕ̂s1 ϕ̂s2 . . . ϕ̂sm )], and observe that
−k m −k m
they obey a relation analogous to (4). We also define the normalized traces em p,d,k = d Ep,d,k and êm p,d,k = d Êp,d,k ,
which will be useful for understanding the eigenvalue density.
m
The rest of the paper presents three independent proofs that for appropriate choices of m, Ep,d,k = (1 +
√ 2m √ 2
x) exp(±o(m)). This will imply that E[kMp,d,k k] ≤ (1 + x) ± o(1), which we can combine √ with standard
measure concentration results to give tight bounds on the probability that kMp,d,k k is far from (1 + x)2 . We will
m
also derive nearly matching lower bounds on Ep,d,k which show us that the limiting spectral density of Mp,d,k matches
that of the Wishart distribution (a.k.a. the k = 1 case). The reason for the multiple proofs is to introduce new
techniques to problems in quantum information that are out of reach of the previously used tools. The large-deviation
techniques used for the k = 1 case have had widely successful applicability to quantum information and we hope that
the methods introduced in this paper will be useful in the further exploration of random quantum states and processes.
Such random states, unitaries, and measurements play an important role in many area of quantum information such
as encoding quantum[1], classical[16, 19], and private[28] information over quantum channels, in other data-hiding
schemes[17], in quantum expanders[8, 14], and in general coding protocols[34], among other applications.
The first proof, in Section II, first uses the expectation over the Gaussian ensemble to upper-bound the expectation
over the normalized ensemble. Next, it uses Wick’s theorem to give a diagrammatic method for calculating the
expectations. A particular class of diagrams, called rainbow diagrams, are seen to give the leading order terms. Their
contributions to the expectation can be calculated exactly, while for m ≪ d1/2k , the terms from non-rainbow diagrams
are shown to be negligible. In fact, if we define the generating function
X
Ĝ(x, y) = y m êm
p,d,k , (6)
m≥0

then the methods of Section II can be used to calculate (6) up to 1/d corrections. Taking the analytic continuation
of G(x, y) gives an estimate of the eigenvalue density across the entire spectrum of Mp,d,k . More precisely, since we
can only calculate the generating function up to 1/d corrections, we can use convergence in moments to show that
the distribution of eigenvalues weakly converges almost surely (Corollary 6 below) to a limiting distribution.
√ For this
limiting distribution, for x < 1, the eigenvalue density of Mp,d,k vanishes for eigenvalues less than (1 − x)2 . However,
this calculation,
√ in contrast to the calculation of the largest eigenvalue, only tells us that the fraction of eigenvalues
outside (1 ± x)2 approaches zero with high probability, and cannot rule out the existence of a small number of low
eigenvalues.
The second proof, in Section III, is based on representation theory and combinatorics. It first repeatedly applies two
simplification rules to Ed [~s]: replacing occurrences of ϕ2s with ϕs and replacing E[ϕs ] with I/d whenever ϕs appears
only a single time in a string. Thus ~s is replaced by a (possibly empty) string ~s′ with no repeated characters and
with no characters occurring only a single time. To analyze Ed [~s′ ], we express E[ϕ⊗n ] as a sum over permutations
and use elementary arguments to enumerate permutations with a given number of cycles. We find that the dominant
contribution (corresponding to rainbow diagrams from Section II) comes from the case when ~s′ = ∅, and also analyze
the next leading-order contribution, corresponding to ~s′ of the form 1212, 123213, 12343214, 1234543215, etc. Thus
m
we obtain an estimate for Ep,d,k that is correct up to an o(1) additive approximation.
The third proof, in Section IV, uses the Schwinger-Dyson equations to remove one letter at a time from the string
~s. This leads to a simple recursive formula for em p,d,k that gives precise estimates.
m
All three proof techniques can be used to produce explicit calculations of Ep,d,k . Applying them for the first few
4

values of m yields
1
Ep,d,k = p
2 (p)2
Ep,d,k = p+
dk
3 (p)2 (p)3
Ep,d,k = p + 3 k + 2k
d d
4 (p)2 (p)3 (p)4 (p)2
Ep,d,k = p + 6 k + 6 2k + 3k + 2k k
d d d d (d + 1)k
5 (p)2 (p)3 (p)4 (p)5 (p)2
Ep,d,k = p + 10 k + 20 2k + 10 3k + 4k + 5 · 2k k
d d d d d (d + 1)k
6 (p)2 (p)3 (p)4 (p)5 (p)6
Ep,d,k = p + 15 k + 50 2k + 50 3k + 15 4k + 5k
d d d d d
k
(p) 2 (p) 2 (d + 3) (p)3
+15 · 2k k + 2k + 6k k ,
d (d + 1)k d (d + 1)2k d (d + 1)k (d + 2)k

where (p)t = p!/(p − t)! = p(p − 1) · · · (p − t + 1). We see that O(1) (instead of O(dk )) terms start to appear when
m ≥ 4. The combinatorial significance of these will be discussed in Section III B.

B. Statement of results

Our main result is the following theorem.

Pm 1 m m
Theorem 1. Let βm (x) = ℓ=1 N (m, ℓ)xℓ , where N (m, ℓ) =

m ℓ−1 ℓ are known as the Narayana numbers. Then,

m2 3mk+4

p 1 m p
1− βm k ≤ k E[tr(Mp,d,k )] ≤ exp 1/k
βm k , (7)
p d d xd d
√
where exp(A) := eA and the lower bound holds only when m < p.
Thus, for all m ≥ 1, k ≥ 1, x > 0 and p = xdk ,

lim em k
p,d,k = βm (p/d ),
d→∞

1
where we have used the notation em
p,d,k = dk
m
E[tr(Mp,d,k )].
Variants of the upper bound are proven separately in each of the next three sections, but the formulation used in
the Theorem is proven in Section IV. Since the lower bound is simpler to establish, we prove it only in Section III,
although the techniques of Sections II and IV would also give nearly the same bound.
For the data-hiding and correlation-locking scheme proposed in [23], it is important
√ that kM k = 1 + o(1) whenever
x = o(1). In fact, we will show that kM k is very likely to be close to (1 +√ x)2 , just as was previously known for
Wishart matrices. First we observe that for large m, βm (x) is roughly (1 + x)2m .
Lemma 2.
x √ √
√ (1 + x)2m ≤ βm (x) ≤ (1 + x)2m (8)
2m2 (1 + x)3

The proof is deferred to Section I E.

Taking m as large as possible in Theorem 1 gives us tight bounds on the typical behavior of kMp,d,k k.
Corollary 3. With Mp,d,k and x defined as above,

√ √

ln(d) ln(d)
(1 + x)2 − O √ ≤ E[kMp,d,k k] ≤ (1 + x)2 + O 1
d d 2k

and the same bounds hold with Mp,d,k replaced by M̂p,d,k .

Proof. A weaker version of the upper bound can be established by setting m ∼ d1/k(k+4) in
k 1
E[kMp,d,k k] ≤ E[kMp,d,k km ]1/m ≤ d m (em
p,d,k ) ,
m (9)

where the first inequality is from the convexity of x 7→ xm . In fact, the version stated here is proven in (42) at the
end of Section II.
The lower bound will be proven in Section I E.
Next, the reason we can focus our analysis on the expected value of kMp,d,k k is because kMp,d,k k is extremely
unlikely to be far from its mean. Using standard measure-concentration arguments (detailed in Section I E), we can
prove:
Lemma 4. For any ǫ > 0,

Pr (|kMp,d,k k − E[kMp,d,k k]| ≥ ǫ) ≤ 2 exp(−(d − 1)ǫ2 /k). (10)

For any 0 < ǫ ≤ 1,

dǫ2 (d−1)δ2

Pr kM̂p,d,k k − E[kMp,d,k k] ≥ ǫE[kMp,d,k k] + δ ≤ 2pke− 4k2 + 2e− 4k (11)

Combined with Corollary 3 we obtain:

Corollary 5.

ln d
Pr |kMp,d,k k − λ+ ]| ≥ O + ǫ ≤ 2 exp(−dǫ2 /2).
d1/2k

A similar, but more cumbersome, bound also exists for kM̂p,d,k k.

Note that for the k = 1 case, the exponent can be replaced by O(−dǫ3/2 ), corresponding to typical fluctuations on
the order of O(d−2/3 ) [21]. It is plausible that fluctuations of this size would also hold in the k > 1 case as well, but
we do not attempt to prove that in this paper.
Our asymptotic estimates for em p,d,k also imply that the limiting spectral density of Mp,d,k is given by the Marčenko-
Pastur law, just as was previously known for the k = 1 case. Specifically, let λ1 , . . . , λR be the non-zero eigenvalues of
Mp,d,k , with R = rankMp,d,k . Generically R = min(p, dk ) and the eigenvalues are all distinct. Define the eigenvalue
density to be
R
1 X
ρ(λ) = δ(λi − λ),
R i=1

then
Corollary 6. In the limit of large d at fixed x, ρ(λ) weakly converges almost surely to
p
(λ+ − λ)(λ − λ− )
I(λ− ≤ λ ≤ λ+ )
2πxλ
for any fixed k and for both the normalized and Gaussian ensembles.
√
Here λ± = (1 ± x)2 and I(λ− ≤ λ ≤ λ+ ) = 1 if λ− ≤ λ ≤ λ+ and 0 otherwise.
This corollary follows from Theorem 1 using standard arguments[10]. We believe, but are unable to prove, that in
the x ≤ 1 case, the probability of any non-zero eigenvalues existing below λ− − ǫ vanishes for any ǫ > 0 in the limit
of large d at fixed x, just as is known when k = 1.

C. Applications

1. Data hiding

One of the main motivations for this paper was to analyze the proposed data-hiding and correlation-locking scheme
of [23]. In this section, we will briefly review their scheme and explain the applicability of our results.
6
Pp
Suppose that p = d logc (d) for some constant c > 0, and we consider the k-party state ρ = p1 s=1 ϕs . We can
think of s as a message of (1 + o(1)) log d bits that is “locked” in the shared state. In [23] it was proved that any
LOCC (local operations and classical communication) protocol that uses a constant number of rounds cannot produce
an output with a non-negligible amount of mutual information with s, and [23] also proved that the parties cannot
recover a non-negligible amount of mutual information with each other that would not be revealed to an eavesdropper
on their classical communication so that the state cannot be used to produce a secret key. (They also conjecture that
the same bounds hold for an unlimited number of rounds.) However, if c log log(d) + log(1/ǫ) bits of s are revealed
then each party is left with an unknown state from a set of ǫd states in d dimensions. Since these states are randomly
chosen, it is possible for each party to correctly identify the remaining bits of s with probability 1 − O(ǫ) [24].
On the other hand, the bounds on the eigenvalues of ρ established by our Corollary 5 imply that the scheme of
Ref. [23] can be broken by a separable-once-removed quantum measurement1 : specifically the measurement given by
p
completing { kρk ϕs }s into a valid POVM. We hope that our bounds will also be of use in proving their conjecture
about LOCC distinguishability with an unbounded number of rounds. If this conjecture is established then it will
imply a dramatic separation between the strengths of LOCC and separable-once-removed quantum operations, and
perhaps could be strengthened to separate the strengths of LOCC and separable operations.

2. Sampling from heavy-tailed distributions

A second application of our result is to convex geometry. The matrix M can be thought of as the empirical
covariance matrix of a collection of random √
product vectors. These random product vectors have unit norm, and the
distribution has ψr norm on the order of 1/ dk iff r satisfies r ≤ 2/k. Here the ψr norm is defined (following [3]) for
r > 0 and for a scalar random variable X as

kXkψr = inf{C > 0 : E[exp(|X|/C)r ] ≤ 2} (12)

and for a random vector |ϕi is defined in terms of its linear forms:

kϕkψr = sup khα|ϕikψr ,

|αi

where the sup is taken over all unit vectors α. Technically, the ψr norm is not a norm for r < 1, as it does not
satisfy the triangle inequality. To work with an actual norm, we could replace (12) with supt≥1 E[|X|t ]1/t /t1/r , which
similarly captures the tail dependence. We mention also that the ψ2 norm has been called the subgaussian moment
and the ψ1 norm the subexponential moment.
Thm. 3.6 of Ref. [2] proved √ that when M is a sum of vectors from a distribution with bounded ψ1 norm and
x ≫ 1 then M is within p O( x log x) of xI with high probability. And as we have stated, Refs. [4, 27] can prove
that kM − xIk ≤ O( xk √ log(d)) with high probability, even without assumptions on r, although the bound is only
meaningful when x ≫ k log d. In the case when x ≪ 1 and 1 ≤ r ≤ 2 (i.e. k ≤ 2), Thm 3.3 of Ref. [3] proved
√
that M is within O( x log1/r (1/x)) of a rank-p projector. Aubrun has conjectured √ that their results should hold for
r > 0 and any distribution on D-dimensional unit vectors with ψr norm ≤ O(1/ D). If true, this would cover the
ensembles that we consider.
Thus, our main result bounds the spectrum of M in a setting that is both more general than that of [2, 3] (since
we allow general x > 0 and k ≥ 1, implying that r = 2/k can be arbitrarily close to 0) and more specific (since we
do not consider only products of uniform random vectors, and not general ensembles with bounded ψr norm). Our
results can be viewed as evidence in support of Aubrun’s conjecture.

D. Notation

For the reader’s convenience, we collect here the notation used throughout the paper. This section omits variables
that are used only in the section where they are defined.

1 This refers to a POVM (positive operator valued measure) in which all but one of the measurement operators are product operators.
7

Variable Definition
d local dimension of each subsystem.
k number of subsystems.
p number of random product states chosen
x p/dk .
|ϕis i unit vector chosen at random from Cd for s = 1, . . . , p and i = 1, . . . , k.
|ϕˆis i Gaussian vector from Cd with E[hϕis |ϕis i] = 1.
ϕ |ϕihϕ| (for any state |ϕi)
|ϕs i |ϕ1s i ⊗ · · · ⊗ |ϕks i
|ϕ̂s i |ϕ̂1s i ⊗ · · · ⊗ |ϕ̂ks i
Pp
Mp,d,k s=1√ϕs
λ± (1 ± x)2
m m
Ep,d,k E[tr Mp,d,k ]
1
emp,d,k dk
E[tr M m
p,d,k ]
Ed [~s] E[tr(ϕs1 · · · ϕsm )], where ~s = (s1 , . . . , sm )
y m em
P
G(x, y) p,d,k
Pm≥0m
β(x) ℓ=1 N (m, ℓ)xℓ
m
1
m 1 m m−1 m!m−1!
N (m, ℓ) Narayana number: m ℓ−1 ℓ = ℓ ℓ−1 ℓ−1 = ℓ!ℓ−1!m−ℓ!m−ℓ+1! (and N (0, 0) = 1)
ℓ m
P
F (x, y) 0≤ℓ≤m<∞ N (m, ℓ)x y

We also define |ϕ̂s i, M̂p,d,k , Êp,d,k , Ĝ(x, y) and so on by replacing |ϕis i with |ϕ̂is i.

E. Proof of large deviation bounds

In this section we prove Lemma 2, Lemma 4 and the lower bound of Corollary 3. First we review some terminology
and basic results from large deviation theory, following Ref. [22]. Consider a set X with an associated measure
µ and distance metric D. If Y ⊆ X and x ∈ X then define D(x, Y ) := inf y∈Y D(x, y). For any ǫ ≥ 0 define
Yǫ := {x ∈ X : D(x, Y ) ≤ ǫ}. Now define the concentration function αX (ǫ) for ǫ ≥ 0 to be
αX (ǫ) := max{1 − µ(Yǫ ) : µ(Y ) ≥ 1/2}.
Say that f : X → R is η-Lipschitz if |f (x) − f (y)| ≤ ηD(x, y) for any x, y ∈ X. If m is a median value of f (i.e.
µ({x : f (x) ≤ m}) = 1/2) then we can combine these definitions to obtain the concentration result
µ({x : f (x) ≥ m + ηǫ}) ≤ αX (ǫ). (13)
Proposition 1.7 of Ref. [22] proves that (13) also holds when we take m = Eµ [f ].
Typically we should think of αX (ǫ) as decreasingly exponentially with ǫ. For example, Thm 2.3 of [22] proves that
2
αS 2d−1 (ǫ) ≤ e−(d−1)ǫ , where S 2d−1 denotes the unit sphere in R2d , µ is the uniform measure and we are using the
Euclidean distance.
To analyze independent random choices, we define the ℓ1 direct product Xℓn1 to be the set of n-tuples (x1 , . . . , xn )
with distance measure Dℓ1 ((x1 , . . . , xn ), (y1 , . . . , ynp)) := D(x1 , y1 ) + . . . + D(xn , yn ). Similarly define Xℓn2 to have
distance measure Dℓ2 ((x1 , . . . , xn ), (y1 , . . . , yn )) := D(x1 , y1 )2 + . . . + D(xn , yn )2 .
Now we consider the normalized ensemble. Our random matrices are generated by taking pk independent draws
from S 2d−1 , interpreting them as elements of Cd and then constructing Mp,d,k from them. We will model this as the
space ((S 2d−1 )pℓ2 )kℓ1 . First, observe that Thm 2.4 of [22] establishes that
2
α(S 2d−1 )p2 (ǫ) ≤ e−(d−1)ǫ .
ℓ

Next, Propositions 1.14 and 1.15 of [22] imply that

2
α((S 2d−1 )p2 )k1 (ǫ) ≤ e−(d−1)ǫ /k
.
ℓ ℓ

Now we consider the map f : ((S 2d−1 )pℓ2 )kℓ1 → R that is defined by f ({|ϕis i} s=1,...,p ) = kMp,d,k k, with Mp,d,k defined
i=1,...,k
Pp
as usual as Mp,d,k = s=1 ϕ1s ⊗ · · · ⊗ ϕks . To analyze the Lipschitz constant of f , note that the function M → kM k
8
p
is 1-Lipschitz if we use the ℓ2 norm for matrices (i.e. D(A, B) = tr(A − B)† (A − B)) [20]. Next, we can use
the triangle inequality to show that the defining map from ((S 2d−1 )pℓ2 )kℓ1 to Mp,d,k is also 1-Lipschitz. Thus, f is
1-Lipschitz. Putting this together we obtain the proof of (10).
√
Next, consider the Gaussian ensemble. any Gaussian vector |ϕ̂is i can be expressed as |ϕ̂is i = rs,i |ϕis i, where |ϕis i is
a random unit vector in Cd and rs,i is distributed according to χ22d /2d. Here χ22d denotes the chi-squared distribution
with 2d degrees of freedom; i.e. the sum of the squares of 2d independent Gaussians each with unit variance.
The normalization factors are extremely likely to be close to 1. First, for any t < d one can compute

E[etrs,i ] = (1 − t/d)−d .

Combining this with Markov’s inequality implies that Pr[rs,i ≥ 1 + ǫ] = Pr[etrs,i ≥ et(1+ǫ) ] ≤ (1 − t/d)−d e−t(1+ǫ) for
any t > 0. We will set t = dǫ/(1 + ǫ) and then find that
dǫ2
Pr[rs,i ≥ 1 + ǫ] ≤ e−d(ǫ−ln(1+ǫ)) ≤ e− 4 , (14)

where the second inequality holds when ǫ ≤ 1. Similarly we can take t = −dǫ/(1 − ǫ) to show that
dǫ2
Pr[rs,i ≤ 1 − ǫ] ≤ ed(ǫ+ln(1−ǫ)) ≤ e− 2 .

Now we use the union bound to argue that with high probability none of the rs,i are far from 1. In particular the
2 2
probability that any rs,i differs from 1 by more than ǫ/k is ≤ 2pke−dǫ /4k .
In the case that all the rs,i are close to 1, we can then obtain the operator inequalities

(1 − ǫ)Mp,d,k ≤ M̂p,d,k ≤ (1 + 2ǫ)Mp,d,k . (15)

(For the upper bound we use (1 + ǫ/k)k ≤ eǫ ≤ 1 + 2ǫ for ǫ ≤ 1.) This establishes that kM̂p,d,k k is concentrated
around the expectation of E[kMp,d,k k], as claimed in (11). √
One application of these large deviation bounds is to prove the lower bound in Corollary 3, namely that (1 + x)2 −
O dln d

1/2k ≤ E[kMp,d,k k]. First observe that Theorem 1 and Lemma 2 imply that
2

1 − mp x
√ λm ≤ em p,d,k .
2m2 (1 + x)3 +

On the other hand, d−k tr M ≤ kM k and so em m

p,d,k ≤ E[kMp,d,k k ]. Define µ := E[kMp,d,k k]. Then

em m
p,d,k ≤ E[kMp,d,k k ]
Z ∞
= dλ Pr[kMp,d,k k ≥ λ]mλm−1 using integration by parts
0
Z ∞
≤ µm + m dǫ(µ + ǫ)m−1 Pr[kMp,d,k k ≥ µ + ǫ]
Z0 ∞
(d−1)ǫ2
m
≤µ +m dǫ(µ + ǫ)m−1 e− k from (10)
0
Z ∞
(d − 1)ǫ2

≤ µm 1 + m dǫ exp (m − 1)ǫ − using 1 + ǫ/µ ≤ 1 + ǫ ≤ eǫ
0 k
Z ∞ 2 !!
k 2 (m − 1)2

m d−1 k(m − 1)
≤µ 1+m dǫ exp − ǫ− + completing the square
−∞ k 2(d − 1) 4(d − 1)
r 2 !
k (m − 1)2

2πk
≤ µm 1 + m exp performing the Gaussian integral
d−1 4(d − 1)

Combining these bounds on em

p,d,k and taking the m
th
root we find that
  m1
m2
1− x p
µ ≥ λ+ 
 
2
√ 3
q
2πk
2
k (m−1)2

2m (1 + x) 1 + m d−1 exp 4(d−1)
9

Assuming that m2 ≤ p/2 and m2 k 2 ≤ d, we find that µ ≥ λ+ 1 − O ln(m)
m = λ+ 1 − O ln(d)
√
d
, which yields
the lower bound on E[kM k] stated in Corollary 3. We omit the similar, but more tedious, arguments that can be used
to lower-bound E[kM̂ k].
We conclude the section with the proof of Lemma 2.
2
Proof. For the upper bound, note that N (m, ℓ) ≤ m ≤ 2m

ℓ 2ℓ and so

m 2m
2m √ ℓ′ √
X X
ℓ
N (m, ℓ)x ≤ ′
x = (1 + x)2m .
ℓ
ℓ=1 ′ ℓ =2

For the lower bound, first observe that

2m √

2ℓ 2m(2m − 1) · · · (2m − 2ℓ + 1) ℓ!2 2ℓ ℓ!2
m 2
= · ≤ 2 · ≤ 2 ℓ ≤ 2ℓ.
m · m · (m − 1) · (m − 1) · · · (m − ℓ + 1) · (m − ℓ + 1) (2ℓ)! (2ℓ)!

ℓ

This implies that

2
ℓ m 2m
N (m, ℓ) = ≥ /2m2 . (16)
m(m − ℓ + 1) ℓ 2ℓ
2m
≤ 2m 2m

Next, we observe that 2ℓ+1 2ℓ + 2ℓ+2 , and so by comparing coefficients, we see that

√ m
x)3 X 2m √

(1 +
≥ (1 + x)2m .
x 2ℓ
ℓ=1

Combining this with (16) completes the proof of the Lemma.

II. APPROACH 1: FEYNMAN DIAGRAMS

A. Reduction to the Gaussian ensemble

We begin by showing how all the moments of the normalized ensemble are always upper-bounded by the moments
of the Gaussian ensemble. A similar argument was made in [9, Appendix B]. In both cases, the principle is that
Gaussian vectors can be thought of as normalized vectors together with some small fluctuations in their overall norm,
and that by convexity the variability in norm can only increase the variance and other higher moments.
Lemma 7. (a) For all p, d, k, m and all strings ~s ∈ [p]m ,
m2
e− 2d Êd [~s] ≤ Ed [~s] ≤ Êd [~s]. (17)

(b) For all p, d, k, m,

m2 k
e− 2d m
Êp,d,k m
≤ Ep,d,k m
≤ Êp,d,k . (18)

Proof. First note that

p Z
Y
Ed [~s] = dµ(ϕs ) hϕs1 , ϕs2 ihϕs2 , ϕs3 i...hϕsm , ϕs1 i. (19)
s=1 |ϕs |2 =1

where the integral is over |ϕs i ∈ Cd constrained to hϕs |ϕs i = 1.

Next, for a given choice of s1 , ..., sm , let µs (s1 , ..., sm ) denote the number of times the letter s appears. For
example, for s1 , ..., sm = 1, 2, 2, 1, 3 we have µ1 = 2, µ2 = 2, µ3 = 1. Then, let us introduce variables rs and use
10
R∞ d d+µs

0 drs exp(−drs2 /2)rs2d+2µs −1 (d+µs )!

= 1 to write
p Z ∞
dd+µs
Z 2
drs
Y
Ed [s1 , s2 , ..., sm ] = dµ(ϕs ) drs e− 2 rs2d−1 i
rs1 hϕs1 , ϕs2 irs2 hϕs2 , ϕs3 i...rsm hϕsm , ϕs1(20)
s=1 |ϕs |2 =1 0 (d + µs )!
p
d!dµs d
Y Z
= ( )d dϕ̂s exp(−d|ϕ̂s |2 /2) hϕ̂s1 , ϕ̂s2 ihϕ̂s2 , ϕ̂s3 i...hϕ̂sm , ϕ̂s1 i,
s=1
(d + µ s )! 2π
p
Y d!dµs
= Êd [~s]
s=1
(d + µs )!

where the integral on the second line is over all |ϕ̂s i ∈ Cd , with

|ϕ̂s i = rs |ϕs i. (21)

Then, since the integral

p d Z
Y d
Êd [~s] = dϕ̂s exp(−d|ϕ̂s |2 /2) hϕ̂s1 , ϕ̂s2 ihϕ̂s2 , ϕ̂s3 i...hϕ̂sm , ϕ̂s1 i (22)
s=1
2π

is positive, and
p
Y d!dµs 1 m(m+1)
1≥ ≥ 1
m
≥ e− 2d , (23)
s=1
(d + µs )! 1 + d ··· 1 + d

we establish (17).
m m
Since Ep,d,k (resp. Êp,d,k ) is a sum over Ed [~s]m (resp. Êd [~s]m ), each of which is nonnegative, we also obtain (18).
This completes the proof of the lemma.
From now on, we focus on this sum:
p
p X p
p h Y ik
d
X X Z
m
Êp,d,k = ... ( )d dϕ̂s exp(−d|ϕ̂s |2 /2) hϕ̂s1 , ϕ̂s2 ihϕ̂s2 , ϕ̂s3 i...hϕ̂sm , ϕ̂s1 i (24)
s1 =1 s2 =1 s =1 s=1
2π
m

We introduce a diagrammatic way of evaluating this sum.

B. Diagrammatics

This section now essentially follows standard techniques in field theory and random matrix theory as used in [12].
The main changes are: first, for k = 1, our diagrammatic notation will be the same as the usual “double-line” notation,
while for k > 1 we have a different notation with multiple lines. Second, the recursion relation (35) is usually only
evaluated at the fixed point where it is referred to as the the “Green’s function in the large-d approximation” (or
more typically the large-N approximation), while we study how the number of diagrams changes as the number of
iterations a is increased in order to verify that the sum of Eq. (34) is convergent. Third, we only have a finite number,
2m, of vertices, so we are able to control the corrections which are higher order in 1/d or 1/p. In contrast, Ref. [12],
for example, considers Green’s functions Q which are sums of diagrams with an arbitrary numbers of vertices.
Integrating Eq. (22) over ϕ̂s generates s µs ! diagrams, as shown in Fig. 1. Each diagram is built by starting with
one incoming directed line on the left and one outgoing line on the right, with m successive pairs of vertices as shown
in Fig. 1(a). We then join the lines coming out of the vertices vertically, joining outgoing lines with incoming lines, to
make all possible combinations such that, whenever a pair of lines are joined between the i-th pair of vertices and the
j-th pair of vertices, we have si = sj . Finally, we join the rightmost outgoing line to the leftmost incoming line; then
the resulting diagram forms a number of closed loops. The value of Eq. (22) is equal to the sum over such diagrams
of dl−m , where l is the number of closed loops in the diagram. Two example diagrams with closed loops are shown in
Fig. 1(b).
Similarly, the sum of Eq. (24) can also be written diagrammatically. There are k incoming lines on the left and
k outgoing lines on the right. We have m successive pairs of vertices as shown in Fig. 2(a). Each vertex has now k
pairs of lines connected vertically: either the solid lines in the pairs are outgoing and the dashed lines are incoming
11

FIG. 1: a) Vertices for Êd [s1 , s2 ]. b) Example diagrams for Êd [s1 , s2 ] with s1 = s2 . The diagram on the left has l = m = 2
while the diagram on the right (which is also present for s1 6= s2 ) has l = 1.

ξ1 s ξ
ξ2 ξs 1
s 2
s
a)

ξ1 ξ1
ξ2 s s ξ2

FIG. 2: a) Vertices for diagrams. The left vertex corresponds to Φ̂† and the right vertex corresponds to Φ̂. b) An example
diagram with m = 2 and k = 2. There are ln = 1 + 2 = 3 loops of solid lines and lp = 1 disconnected objects of dashed lines.

or vice-versa, depending on whether the vertex has incoming solid lines on the horizontal or outgoing. We label the
incoming solid lines by indices ξ1 , ..., ξk ∈ [d], which we refer to as color indices, and then alternately assign to lines
along the horizontal axis either a single index of the form s ∈ [p] for the dashed lines, which we refer to as flavor
indices, or k different color indices of the form ξ1 , ..., ξk ∈ [d] for the solid lines. Each of the k lines in a set of k
parallel solid lines is also labelled by a “copy index”, with the top line labelled as copy 1, the second as copy 2, and
so on, up to copy k.
Each of the k pairs of lines coming from a vertex is labelled with a color index ξ and a flavor index s, as well as a
copy index. The copy index on a vertical solid line is the same as the copy index of the solid line it connects to on
the horizontal, so a given vertex has k distinct copy indices, ranging from 1...k. Each diagram consists of a way of
joining different pairs of vertical lines, subject to the rule that when we join two vertical lines, both have the same
copy index; thus, if a given vertical line comes from the k ′ -th row, 1 ≤ k ′ ≤ k, then it must join to a line which also
comes from the k ′ -th row.
The value of a diagram is equal to d−m times the number of possible assignments of values to the indices, such
that whenever two lines are joined they have the same indices. The solid lines break up into some number ln different
closed loops; again, when counting the number of closed loops, we join the solid lines leaving on the right-hand side
of the diagram to those entering on the left-hand side of the diagram. Since all solid lines in a loop have the same
copy index, we have ln = ln,1 + ln,2 + ... + ln,k , where ln,k′ is the number of loops of solid lines with copy index
k ′ . The dashed lines s come together in vertices where k different lines meet. Let lp denote the number of different
disconnected sets of dashed lines. Then, the value of a diagram is equal to
d−mk dln plp . (25)
Note, we refer to disconnected sets of lines in the case of dashed lines; this is because multiple lines meet at a single
vertex; for k = 1 these sets just become loops. An example diagram is shown in Fig. 2(b) for k = 2.
Let ckm (ln , lp ) equal the number of diagrams with given ln , lp for given m, k. Then,
X X
m
Êp,d,k = ckm (ln , lp )d−mk dln plp . (26)
ln ≥1 lp ≥1
12

FIG. 3: Iterative construction of rainbow diagrams for k = 2. The solid lines with a filled circle denotes any open rainbow
diagram as does the dashed line with a filled circle.

FIG. 4: (a,b) Rainbow diagrams which require two iterations for k = 1.

C. Rainbow Diagrams

An important set of diagrams are the so-called “rainbow diagrams”, which will be the dominant contributions to
the sum (26). We define these rainbow diagrams with the following iterative construction.
We define a group of k solid lines or a single dashed line to be an open rainbow diagram as shown in Fig. 3(a). We
also define any diagram which can be contructed as in Fig. 3(b,c) to be a open rainbow diagram, where the k solid
lines or one dashed line with a filled circle may be replaced by any open rainbow diagram. We say that the rainbow
diagrams in Fig. 3(b,c) has solid and dashed external lines respectively.
In general all open rainbow diagrams can be constructed from the iterative process described in Fig. 3(b,c), with
one “iteration” consisting of replacing one of the filled circles in Fig. 3(b,c) with one of the diagrams in Fig. 3. The
diagrams in Fig. 3(a) require zero iterations, and each iteration adds one vertex. For example, in Fig. 4(a,b) we show
the two diagrams with solid external lines which require two iterations to construct for k = 1. We define a rainbow
diagram to be any open rainbow diagram where we assume that the right outgoing edge and left incoming edge are
solid lines and are connected.

D. Combinatorics of Diagrams and Number of Loops

We now go through several claims about the various diagrams. The goal will be to count the number of diagrams
for given ln , lp . First, we claim that for the rainbow diagrams

ln + klp = (m + 1)k, (27)

as may be directly verified from the construction. Next, we claim that for any diagram

ln + klp ≤ (m + 1)k. (28)

From Eq. (27) the rainbow diagrams saturate this bound (28). We claim that it suffices to show Eq. (28) for k = 1
in order to show Eq. (28) for all k. To see this, consider any diagram for k > 1. Without loss of generality, suppose
ln,1 ≥ ln,k′ for all 1 ≤ k ′ ≤ k. Then, ln + klp ≤ k(ln,1 + p). We then remove all the solid lines on the horizontal with
copy indices 2...k, as well as all pairs of lines coming from a vertex with copy indices 2...k. Having done this, both
13

the solid and the dashed lines form closed loops, since only two dashed lines meet at each vertex. The new diagram
is a diagram with k = 1. The number of loops of solid lines is ln,1 , while the number of loops of dashed lines in the
new diagram, lp′ , is greater than or equal to lp since we have removed dashed lines from the diagram. Thus, if we can
show Eq. (28) for k = 1, it will follow that ln,1 + lp′ ≤ (m + 1) and so ln + klp ≤ (m + 1)k.
To show Eq. (28) for k = 1, we take the given diagram, and make the replacement as shown between the left and
right half of Fig. 5(a): first we straighten the diagram out as shown in the middle of Fig. 5(a), then we replace the
double line by a wavy line connected the solid and dashed lines. Finally, we take the point where the solid line leaves
the right-hand side of the diagram and connects to the solid line entering the left-hand side and put a single dot on
this point for reference later as shown in Fig. 5(b,c). Having done this, the diagram consists of closed loops of solid or
dashed lines, with wavy lines that connect solid to dashed lines, and with one of the closed loops of solid lines having
a dot on it at one point.
This procedure gives an injective mapping from diagrams written as in Fig. 2 to diagrams written as in Fig. 5.
However, this mapping is not invertible; when we undo the procedure of Fig. 5(a), we find that some diagrams can
only be written as in Fig. 2 if there are two or more horizontal lines. The diagrams which are the result of applying
this procedure to a diagram as in Fig. 2 with only one horizontal line are those that are referred to in field theory as
contributions to the “quenched average,” while the sum of all diagrams, including those not in the quenched average,
is referred to as the “annealed average”. To determine if a diagram is a contribution to the quenched average, start at
the dot and then follow the line in the direction of the arrow, crossing along a wavy line every time it is encountered,
and continuing to follow solid and dashed lines in the direction of the arrow, and continuing to cross every wavy
line encountered. Then, a diagram is a contribution to the quenched average if and only if following the lines in this
manner causes one to traverse the entire diagram before returning to the starting point, while traversing wavy lines in
both directions. As an example, consider the diagram of Fig. 5(c): this diagram is not a contribution to the quenched
average, as can be seen by traversing the diagram, or by re-drawing the diagram as in Fig. 5(d) which requires two
horizontal solid lines2 . If a diagram is a contribution to the quenched average, then traversing the diagram in this
order (following solid, dashed, and wavy lines as above) corresponds to traversing the diagram writen as in Fig. 2
from left to right.
Since all diagrams are positive, we can bound the sum of diagrams which are contributions to the quenched average
by bounding the sum of all diagrams as in Fig. 5. The number of wavy lines is equal to m. The diagram is connected
so therefore the number of solid plus dashed loops, which is equal to ln,1 + lp′ , is at most equal to the number of
wavy lines plus one. Therefore, Eq. (28) follows. From this construction, the way to saturate Eq. (28) is to make a
diagram which is a tree whose nodes are closed loops of dashed and solid lines and whose edges are wavy lines; that
is, a diagram such that the removal of any wavy line breaks the diagram into two disconnected pieces. These trees
are the same as the rainbow diagrams above. In Fig. 5(b) we show the two different trees which correspond to the
rainbow diagrams of Fig. 4.
Next, we consider the diagrams which are not rainbow diagrams. First, we consider the case k = 1. Let d =
m + 1 − ln − lp ≥ 0. If d > 0, then the diagram is not a rainbow diagram. However, if d > 0, using the construction
above with closed loops connected by wavy lines, there are only ln + lp loops connected by more than ln + lp − 1
wavy lines; this implies that the diagram is not a tree (using the notation of Fig. 5) or a rainbow diagram (using the
notation of Fig. 4), and hence it is possible to remove d different lines and arrive at a diagram which is a rainbow
diagram. Thus, all diagrams with 2m vertices and d > 0 can be formed by taking rainbow diagrams with 2m − d
vertices and adding d wavy lines; these wavy lines can be added in at most [m(m − 1)]d different ways. Thus, for
k = 1 we have

m + 1 − ln − lp = d > 0 → c1m (ln , lp ) ≤ c1m−d (ln , lp )m2d (29)

We now consider the number of diagrams which are not rainbow diagrams for k > 1. We consider all diagrams,
including those which contribute to the annealed average, but we write the diagrams as in Fig. 2, possibly using
multiple horizontal lines. Consider first a restricted class of diagrams: those diagrams for which, for every vertex with
k pairs of lines leaving the vertex, all k of those pairs of lines connect with pairs of lines at the same vertex. This is
not the case for, for example, the diagram of Fig. 2(b), where of the two pairs of lines leaving the leftmost vertex,
the top pair reconnects at the second vertex from the left, while the bottom pair reconnects at the rightmost vertex.
However, for a diagram in this restricted class, the counting of diagrams is exactly the same as in the case k = 1,
since the diagrams in this restricted class are in one-to-one correspondence with those for k = 1. So, the number of

2 Such annealed diagrams are contributions to the average of the product of two (or more) traces of powers of M̂p,d,k .
14

FIG. 5: (a) Deformation of diagram. (b) Deformation of diagrams in Fig. 4(a,b). (c) Example of a diagram which contributes
to the annealed average but not the quenched average. (d) Same diagram as in (c).

diagrams in this restricted class, ckm,r , obeys

(m + 1)k − ln − klp = d > 0 → ckm,r (ln , lp ) ≤ ckm−d,r (ln , lp )[m(m − 1)]d/k (30)

Now, we consider a diagram which is not in this restricted class. Locate any vertex with incoming solid lines
on the horizontal and an outgoing dashed line on the horizontal, such that not all pairs of lines leaving this vertex
reconnect at the same vertex. Call this vertex v1 . Then, find any other vertex to which a pair of lines leaving vertex v1
reconnects. Call this vertex v2 . Let there be l pairs of lines leaving vertex v1 which do not conenct to v2 , and similarly
l pairs of lines entering v2 which do not come from v1 , with 1 ≤ l ≤ k − 1. Label these pairs of lines L11 , L12 , ..., L1k and
L21 , L22 , ..., L2k , respectively. Let these lines connect to pairs of lines M11 , M21 , ..., Mk1 and M12 , M22 , ..., Mk2 respectively.
Let v3 be the vertex just to the right of v1 , so that the dashed line entering v1 comes from v3 , and similarly let v4
be the vertex just to the left of v2 , so that the dashed line leaving v2 goes into v4 , as shown in Fig. 6(a). Then, we
determine if there is a way to re-connect pairs of lines so that now L1l′ connects to L2l′ and Ml1′ connects to Ml2′ for all l′
in some subset of {1, ..., l} such that the diagram splits into exactly two disconnected pieces. If there is, then we find
the smallest subset of {1, ..., l} with this property (making an arbitrary choice if there is more than one such subset)
and make those reconnections. Let V1 , V2 denote the two disconnected subsets of vertices after the reconnections. By
making these reconnections, then, we are reconnecting precisely the pairs of lines which originally connected vertices
in set V1 to those in V2 ; if there are lc such lines, then we increase ln by lc ≥ 1. Thus, we increase lp by one and also
increases ln by at least 1. We then modify the diagram to rejoin the two pieces: the dashed line leaving to the right
of vertex v1 connects to it some vertex w1 in the same piece, and there is some other dashed line in the other piece
which connects two vertices v1′ , w1′ ; we re-connect these dashed lines so that v1 connects to w1′ and v1′ connects to
w1 . This reduces lp by 1 back to its original value and makes the diagram connected. Thus, we succeed in increasing
ln + klp − mk by at least 1.
On the other hand, if no such subset exists, we re-connect all pairs of lines for all 1 ≤ l′ ≤ l, as shown in Fig. 6(b).
The resulting diagram must be connected (if not, then there would have been a subset of lines which could be re-
connected to split the line into exactly two disconnected pieces). Then, there are two cases: the first case is when the
dashed line leaving v2 does not connect to v1 (so that v2 6= v3 and v1 6= v4 ) and it is possible to re-connect the dashed
lines joining v1 to v3 and v2 to v4 so that now v2 is connected to v1 and v3 is connected to v4 without breaking the
diagram into two disconnected pieces. In this first case, we then also make this re-connection of dashed lines, which
increases lp by one, while keeping the diagram connected. However, in this case, the initial re-connection of pairs of
lines may have reduced ln by at most l. Thus, in this case klp + ln − mk is increased by at least 1. The second case is
when either v2 connects to v1 already or it is not possible to make the re-connection of dashed lines without splitting
the diagram into two pieces. This is the case in Fig. 6(b). In this case, however, ln must have increased by at least 1
15

a)
v1 v3 v4 v2

b)
v1 v3 v4 v2

FIG. 6: (a) Diagram of Fig. 2(b) with vertices v1 , v2 , v3 , v4 marked, for a particular choice of v1 , v4 . (b) Result of applying
re-connection procedure to diagram.

by the initial re-connection of pairs of lines3 and thus again we increase klp + ln − mk by at least 1.
Repeating this procedure, we ultimately arrive at a diagram in the restricted class above. At each step, we succeed
in reducing d = (m + 1)k − ln − klp by at least unity, either by increasing ln by at least 1 and lp by 2, or by
increasing lp by 1 and reducing ln by at most k − 1. Given a diagram in the restricted class, we can further reduce d
following Eq. (30). Then, any diagram can be found by starting with a diagram in the restricted class and undoing
this procedure; at each step in undoing the procedure we have at most m2 (m − 1)2(k−1) choices (there are at most m2
choices for v1 , v2 , and then we must re-connect at most 2(k − 1) pairs of lines). Thus, for (m + 1)k − ln − klp = d > 0
we have

ckm (ln , lp ) ≤ m2k ckm (ln + 1, lp ) (31)

+m2k ckm (ln − k + 1, lp + 1)
+m2 ckm−1 (ln , lp ).

This implies that for d > 0,

ln +lp =m+1−d ln +lp =m+1−(d−1)
X X
ckm (ln , lp )dln plp ≤ δ ckm (ln , lp )dln plp , (32)
ln ,lp ln ,lp

with
m2k m2k dk−1 m2 m2k m2
δ= + + k = (1 + x−1 ) + k. (33)
d p d d d

E. Bound on Number of Rainbow Diagrams

Finally, we provide a bound on the number of rainbow diagrams. Let us define Sva (ln , lp ) to equal to the number of
open rainbow diagrams with solid lines at the end, with v vertices, ln loops of solid lines (not counting the loop that
would be formed by connected the open ends), and lp disconnected sets of dashed lines, which may be constructed
by at most a iterations of the process shown in Fig. 3. Similarly, define Dva (ln , lp ) to equal to the number of open
rainbow diagrams with dashed lines at the end, with v vertices, ln loops of solid lines (not counting the loop that
would be formed by connected the open ends), and lp disconnected sets of dashed lines, which may be constructed
by at most a iterations of the process shown in Fig. 3. These open rainbow diagrams obey ln /k + lp = m. Define the

3 To see why ln must have been increased by at least one when reconnecting pairs of lines, in the case where making the reconnection of
the dashed line would split the diagram into two disconnected pieces, let V1 , V2 denote the vertices in these two disconnected pieces.
Then, by reconnecting the pairs of lines, there are no longer any solid lines joining V1 to V2 , so ln increases by l′ ≥ 1.
16

generating function4
XXX
G(a)
s (z, d, p) = z −v/2 d−vk/2 dln plp Sva (ln , lp ), (34)
v ln lp
(a)
XXX
Gd (z, d, p) = z −v/2 d−vk/2 dln plp Dva (ln , lp ).
v ln lp

Then, we have the recursion relations, which come from Fig. 3(b,c):
−1 (a−1)
G(a)
s (z, d, p) = 1 + z xGd (z, d, p)Gs(a−1) (z, d, p), (35)
(a) (a−1)
Gd (z, d, p) =1+ z −1 Gs(a−1) (z, d, p)Gd (z, d, p).
(a) (a)
First, consider the case x ≤ 1. From Eq. (35), Gd (z, d, p) = 1 + (Gs (z, d, p) − 1)/x for all a, so that we have
(a) (a−1) (a−1) (a−1)
the recursion Gs (z, d, p) = 1 + z −1 xGs (z, d, p)(1 + (Gs (z, d, p) − 1)/x) = 1 + z −1 (x − 1)Gs (z, d, p) +
(a−1)
z −1 Gs (z, d, p)2 . The fixed points of this recursion relation are given by
p
z −1 (1 − x) + 1 ± (z −1 (1 − x) + 1)2 − 4z −1
Gs (z, d, p) ≡ . (36)
2z −1
Define
√ −1
√

1+x−2 x
z0 = 2
= (1 + x)2 . (37)
(1 − x)
Then, for z > z0 , Eq. (35) has two real fixed points, while at z = z0 , Eq. (35) has a single fixed point at
z0 √ √
1 + z0−1 (1 − x) = 1 + x = z0 > 1.

Gs (z0 , d, p) = (38)
2
(0) (0) (a)
Since Gs (z, d, p) = Gd (z, d, p) = 1 which is smaller than the fixed point, we find that Gs (z, d, p) increases
montonically with a and remains bounded above by Gs (z, d, p). All rainbow diagrams with 2m vertices can be found
after a finite number (at most m) iterations of Fig. 3(b,c) so
ln +lp =m+1
X
ckm (ln , lp )d−mk dln plp ≤ pz0m Gs (z0 , d, p). (39)
ln ,lp

(a) (a) (a)

Alternately, if x ≥ 1, we use Gs (z, d, p) = 1 + (Gd (z, d, p) − 1)x, to get the recursion Gd (z, d, p) = 1 +
−1 (a) (a) (a−1) (a−1)
z Gd (z, d, p)(1 + (Gd (z, d, p) − 1)x) = 1 + z −1(1 − x)Gd (z, d, p) + z −1xGd (z, d, p)2 . Then, again for z = z0
(a)
this recursion has a single fixed point and Gs (z, d, p) increases monotonically with a and remains bounded by
Gs (z0 , d, p).

F. Sum of All Diagrams

We now bound the sum of all diagrams (26) using the bound on the sum of rainbow diagrams (39) and Eq. (32).

X ln +lpX
=m+1−j
X
ckm (ln , lp )d−mk dln plp ≤ pz0m Gs (z0 , d, p) δj . (40)
j≥0 ln ,lp j≥0

Then, if δ < 1 we have

m p m p m+ 21
Êp,d,k ≤ z0 Gs (z0 , d, p) = z (41)
1−δ 1−δ 0

4 The limit as a → ∞ of this generating functional is equal to, up to a factor 1/z in front, the Green’s function in the large-d limit usually
defined in field theory.
17

We can pick m of order d1/2k and still have δ ≤ 1/2. Then we can use E[kM̂p,d,k k] ≤ (Êp,d,k
m
)1/m to bound
√
√ 2 ln(2p z0 )

E[kM̂p,d,k k] ≤ (1 +x) · exp
m
√ 2

k ln(d)
= (1 + x) + O 1 , (42)
d 2k
as claimed in Corollary 3. We are assuming in the O() notation in this bound that x = Θ(1).

III. APPROACH 2: COMBINATORICS AND REPRESENTATION THEORY

This section gives a second proof of Theorem 1 that uses facts about symmetric subspaces along with elementary
combinatorics. The fundamentals of the proof resemble those of the last section in many ways, which we will discuss
at the end of this section. However, the route taken is quite different, and this approach also suggests different possible
extensions.
Recall that we would like to estimate
X
m
Ep,d,k = Ed [~s]k .
s∈[p]m
~

Our strategy will be to repeatedly reduce the string ~s to simpler forms. Below we will describe two simple methods
for reducing ~s into a possibly shorter string R(~s) such that Ed [~s] equals Ed [R(~s)], up to a possible multiplicative
factor of 1/d to some power. Next we will consider two important special cases. First are the completely reducible
strings: ~s for which the reduced string R(~s) is the empty string. These are analogous to the rainbow diagrams in
Section II and their contribution can be calculated exactly (in Section III A). The second special case is when ~s is
irreducible, meaning that R(~s) = ~s; that is, neither simplification steps can be applied to ~s. These strings are harder
to analyze, but fortunately make a smaller contribution to the final sum. In Section III B, we use representation
theory to give upper bounds for Ed [~s] for irreducible strings ~s, and thereby to bound the overall contribution from
irreducible strings. Finally, we can describe a general string as an irreducible string punctuated with some number
of repeated letters (defined below) and completely reducible strings. The overall sum can then be bounded using a
number of methods; we will choose to use a generating function approach, but inductively verifying the final answer
would also be straightforward.
Reducing the string: Recall that Ed [~s] = tr ϕs1 · · · ϕsm , where each |ϕs i is a unit vector randomly chosen from Cd .
We will use the following two reductions to simplify ~s.
1. Remove repeats. Since ϕa is a pure state, ϕ2a = ϕa and we can replace every instance of aa with a in ~s without
changing Ed [~s]. Repeatedly applying this means that if si = si+1 = · · · = sj , then Ed [~s] is unchanged by
deleting positions i + 1, . . . , j. Here we identify position i with m + i for all i, so that repeats can wrap around
the end of the string: e.g. the string 11332221 would become 321.
2. Remove unique letters. Since E[ϕa ] = I/d for any a, we can replace any letters which appear only once with
I/d. Thus, if si 6= sj for all j 6= i then Ed [~s] = Ed [~s′ ]/d, where ~s′ ∈ [p]m−1 is obtained from ~s by deleting
the position i. Repeating this process results in a string where every letter appears at least twice and with a
multiplicative factor of 1/d for each letter that has been removed. Sometimes the resulting string will be empty,
in which case we say Ed [∅] = tr I = d. Thus for strings of length one, Ed [a] = Ed [∅]/d = d/d = 1.
We will repeatedly apply these two simplification steps until no further simplifications are possible. Let R(~s) denote
the resulting (possibly empty) string. Recall from above that when R(~s) = ∅, we say ~s is completely reducible, and
when R(~s) = ~s, we say ~s is irreducible. The sums over these two special cases are described by the following two
Lemmas.
Lemma 8.
m
1 X k
X (p)ℓ p
k
Ed [~
s ] = N (m, ℓ) kl ≤ βm k ≤ λm
+. (43)
d m
d d
s∈[p]
~ ℓ=1
R(~
s)=∅
18

We will prove this Lemma and discuss its significance in Section III A. It will turn out that the completely reducible
m
strings make up the dominant contribution to Ep,d,k when m is not too large. Since (43) is nearly independent of
m
k (once we fix x and p), this means that Ep,d,k is also nearly independent of k. It remains only to show that the
sub-leading order terms do not grow too quickly with k. Note that this Lemma establishes the lower bound of
Theorem 1.
For the irreducible strings we are no longer able to give an exact expression. However, when m is sufficiently small
relative to d and p, we have the following nearly tight bounds.
1
Lemma 9. If m < min(dk/6 /21+k/2 , (p/5000) 2k+12 ) then
km
X
k e 2d1/3 m
Ed [~s] ≤ x 2 (44)
5000m2k+12 22+k m2
s∈[p]m
~ 1− p 1− k
R(~
s)=~
s d3

m m2
Additionally, when m is even, the left-hand side of (44) is ≥ x 2 e− 2p .
1
The proof is in Section III B. Observe that when m ∈ o(dk/6 ) ∩ o(p 2k+12 ) and m is even, we bound the sum on the
LHS of (44) by (1 ± o(1))xm/2 . We also mention that there is no factor of 1/dk on the LHS, so that when x = O(1)
and m satisfies the above condition, the contribution from irreducible strings is a O(1/dk ) fraction of the contribution
from completely reducible strings.
Next, we combine the above two Lemmas to bound all strings that are not covered by Lemma 8.
1
Lemma 10. If m < min(dk/6 /21+k/2 , (p/5000) 2k+12 ) then
km
X
k e 2d1/3 m+ 12
Ed [~s] ≤ 2k+12
mλ+ . (45)
22+k m2
s∈[p]m
~ 1 − 5000mp 1− k
s)6=∅
R(~ d3

The proof is in Section III C.

1
To simplify the prefactor in (45), we assume that m < min(2d1/3 /k, dk/6 /22+k/2 , (p/5000) 2k+12 /2), so that the
m+ 12 24m3 λ2+
RHS of (45) becomes simply ≤ 12mλ+ . By Lemma 2, this is ≤ x βm (x). Then we combine Lemma 8 and
Lemma 10 to obtain the bound
24m3 λ2+

em
p,d,k ≤ 1+ βm (x) (46)
p
which is a variant of the upper-bound in Theorem 1. It is tighter than (7), but holds for a more restricted set of m.
If we express the upper bound in terms of λm
+ then we can skip Lemma 8 and obtain simply
p !
m 12m λ+
ep,d,k ≤ 1 + λm
+. (47)
dk

A. Completely reducible strings

We begin by reviewing some facts about Narayana numbers from [30, 31]. The Narayana number

1 m m 1 m m−1
N (m, ℓ) = = (48)
m ℓ−1 ℓ ℓ ℓ−1 ℓ−1
counts the number of valid bracketings of m pairs of parentheses in which the sequence () appears ℓ times. A
straightforward combinatorial proof of (48) is in [30]. When we sum (48) over ℓ (e.g. if we set x = 1 in (43)) then we
1 2m
obtain the familiar Catalan numbers m+1 m .
We can now prove Lemma 8. The combinatorial techniques behind the Lemma have been observed before[30, 31],
and have been applied to the Wishart distribution in [10, 13].
Proof: For a string ~s such that R(~s) = ∅, let ℓ be the number of distinct letters in ~s. In the process of reducing ~s to
the empty string we will ultimately remove ℓ unique letters, so that Ed [~s]k = dk(1−ℓ) . It remains now only to count
the number of different ~s that sastify R(~s) = ∅ and have ℓ distinct letters.
19

Suppose the distinct letters in ~s are S1 , S2 , . . . , Sℓ ∈ [p]. We order them so that the first occurrence of Si is earlier
than the first occurrence of Si+1 for each i. Let ~σ be the string obtained from ~s by replacing each instance of Si
with i. Then ~σ has the first occurrences of 1, 2, . . . , ℓ appearing in increasing order and still satisfies R(~σ ) = ∅ and
Ed [~σ ]k = dk(1−ℓ) . Also, for each ~σ , there are p!/(p − ℓ)! ≤ pℓ corresponding ~s.
It remains only to count the number of distinct ~σ for a given choice of m and ℓ. We claim that this number is given
by N (m, ℓ). Given ~σ , define ai to be the location of the first occurrence of the letter i for i = 1, . . . , ℓ. Observe that

1 = a1 < a2 < · · · < aℓ ≤ m. (49)

Pi
Next, define µi to be the total number of occurrences of i in ~σ , and define bi = j=1 µj for i = 1, . . . , ℓ. Then

1 ≤ b1 < b2 < · · · < bℓ = m (50)

Finally, we have

ai ≤ b i for each i = 1, . . . , ℓ. (51)

Ref. [31] proved that the number of (a1 , b1 ), . . . , (aℓ , bℓ ) satisfying (49), (50) and (51) is N (m, ℓ). Thus, we need
only prove that ~σ is uniquely determined by (a1 , b1 ), . . . , (aℓ , bℓ ). The algorithm for finding ~σ is as follows.
For t = 1, . . . , m.
If t = ai then set s := i.
Set σt := s.
Set µs := µs − 1.
While (µs = 0) set s := s − 1.

In other words, we start by placing 1’s until we reach a2 . Then we start placing 2’s until we’ve either placed µ2 2’s,
in which case we go back to placing 1’s; or we’ve reached a3 , in which we case we start placing 3’s. The general rule
is that we keep placing the same letter until we either encounter the next ai or we run out of the letter we were using,
in which case we go back to the last letter we placed.
To show that ~σ couldn’t be constructed in any other way, first note that we have σai = i for each i by definition.
Now fix an i and examine the interval between ai and ai+1 . Since it is before ai+1 , it must contain only letters in
{1, . . . , i}. Using the fact that R(~σ ) = ∅, we know that ~σ cannot contain the subsequence j-i-j-i (i.e. cannot be of
the form · · · j · · · i · · · j · · · i). We now consider two cases.
Case (1) is that µi ≥ ai+1 − ai . In this case we must have σt = i whenever ai < t < ai+1 . Otherwise, this would
mean that some s ∈ {1, . . . , i − 1} appears in this interval, and since s must have appeared earlier as well (s < i so
as < ai and σas = s), then no i’s can appear later in the string. However, this contradicts the fact that µi ≥ ai+1 − ai .
Thus if µi ≥ ai+1 − ai then the entire interval between ai and ai+1 must contain i’s.
Case (2) is that µi < ai+1 − ai . This means that there exists t with ai < t < ai+1 and σt ∈ {1, . . . , i − 1}; if there
is more than one then take t to be the lowest (i.e. earliest). Note that σt′ 6= i for all t′ > t; otherwise we would have
a σt -i-σt -i subsequence. Also, by definition σt′ = i for ai ≤ t′ < t. Since this is the only place where i appears in the
string, we must have t = ai + µi . Once we have placed all of the i’s, we can proceed inductively to fill the rest of the
interval with letters from {1, . . . , i − 1}.
In both cases, ~σ is uniquely determined by a1 , . . . , aℓ and b1 , . . . , bℓ (or equivalently, µ1 , . . . , µℓ ). This completes
the proof of the equality in (43).
Before continuing, we will mention some facts about Narayana numbers that will later be useful. Like the Catalan
numbers, the Narayana numbers have a simple generating function; however, since they have two parameters the
generating function has two variables. If we define
X
F (x, y) = N (m, ℓ)xℓ y m , (52)
0≤ℓ≤m<∞

then one can show[30, 31] (but note that [30] takes the sum over m ≥ 1) that
p
1 + (1 − x)y − 1 − 2(1 + x)y + (1 − x)2 y 2
F (x, y) = . (53)
2y

We include a proof for convenience. First, by convention N (0, 0) = 1. Next, an arrangement of m pairs of parentheses
can start either with () or ((. Starting with () leaves N (m − 1, ℓ − 1) ways to complete the string. If the string starts
20

with (( then suppose the ) paired with the first ( is the ith ) in the string. We know that 2 ≤ i ≤ m and that the
first 2i characters must contain exactly i (’s and i )’s. Additionally, the 2i − 1st and 2ith characters must both be )’s.
Let j be the number of appearances of () amongst these first 2i characters. Note that j ≤ min(i − 1, ℓ), and that ()
appears ℓ − i times in the last 2m − 2i characters. Thus there are

m min(i−1,ℓ)
X X m min(i−1,ℓ)
X X
N (i − 1, j)N (m − i, ℓ − j) = −N (m − 1, ℓ) + N (i − 1, j)N (m − i, ℓ − j)
i=2 j=1 i=1 j=1

ways to complete a string starting with ((. Together, these imply that

m min(i−1,ℓ)
X X
N (m, ℓ) = N (m − 1, ℓ − 1) − N (m − 1, ℓ) + N (i − 1, j)N (m − i, ℓ − j), (54)
i=1 j=1

which we can state equivalently as an identify for the generating function (52):

F = 1 + xyF + y(F 2 − F ), (55)

which has the solution (53). (The sign in front of the square root can be established from 1 = N (0, 0) = F (x, 0).)
Connection to Section II: Observe that (53) matches (36) once we make the substitution y = z −1 . Indeed it can
be shown that rainbow diagrams have a one-to-one correspondence with valid arrangements of parentheses, and thus
can be enumerated by the Narayana numbers in the same way.
Connection to free probability: Another set counted by the Narayana numbers is the set of noncrossing partitions
of [m] into ℓ parts. The non-crossing condition means that we never have a < b < c < d with a, c in one part of
the partition and b, d in another; it is directly analogous to the property that ~σ contains no subsequence of the form
j-i-j-i.
To appreciate the significance of this, we return to the classical problem of throwing p balls into d bins. The
occupancy of a single bin is z = z1 + . . .+ zp where z1 , . . . , zp are i.i.d. and have Pr[zi = 0] = 1 − 1/d, Pr[zi = 1] = 1/d.
One can readily verify that
m
X (p)ℓ
E[z m ] = | Par(m, ℓ)| ,
dℓ
ℓ=1

where Par(m, ℓ) is the set of (unrestricted) partitions of m into ℓ parts.

This is an example of a more general phenomenon in which convolution of classical random variables involves
partitions the same way that convolution of free random variables involves non-crossing partitions. See Ref. [29] for
more details.

B. Irreducible strings

As with the completely reducible strings, we will break up the sum based on the powers of p and d which appear.
However, while in the last section p and 1/dk both depended on the single parameter ℓ, here we will find that some
terms are smaller by powers of 1/p and/or 1/d. Our strategy will be to identify three parameters—ℓ, c2 , and µ̂2 , all
defined below—for
√ m which the leading contribution occurs when all three equal m/2. We show that this contribution is
proportional to x and that all other values of ℓ, c2 , and µ̂2 make negligible contributions whenever m is sufficiently
small.
Again, we will let ℓ denote the number of unique letters in ~s. We will also let S1 , . . . , Sℓ ∈ [p] denote these unique
letters. However, we choose them so that S1 < S2 < · · · < Sℓ , which can be done in

pℓ

p
≤ (56)
ℓ ℓ!

ways. Again, we let ~σ ∈ [ℓ]m be the string that results from replacing all the instances of Si in ~s with i. However,
because of our different choice of S1 , . . . , Sℓ , we no longer guarantee anything about the ordering of 1, . . . , ℓ in ~σ .
21

We will also take µa be the frequency of a in ~σ for each a = 1, . . . , ℓ. We also define µ̂b to be the number of a such
that µa = b. Observe that
X
ℓ= µ̂b (57)
b
ℓ
X X
m= µa = bµ̂b . (58)
a=1 b

Also recall that since R(~σ ) = ~σ , ~σ has no repeats or unique letters. Thus µa ≥ 2 for each a, or equivalently µ̂1 = 0.
This also implies that ℓ ≤ m/2. Since (56) is maximised when ℓ = ⌊ m 2 ⌋, we will focus on this case first and then show
that other values of ℓ have smaller contributions. Moreover (58) implies that µ̂2 ≤ ℓ and (57), (58) and the fact that
µ̂1 = 0 imply that µ̂2 ≥ 3ℓ − m. Together we have

3ℓ − m ≤ µ̂2 ≤ ℓ. (59)

Thus ℓ is close to m/2 if and only if µ̂2 is as well. This will be useful because strings will be easier to analyze when
almost all letters occur exactly twice.
We now turn to the estimation of Ed [~σ ]. To analyze Ed [~σ ] = E[tr ϕσ1 ϕσ2 · · · ϕσm ], we first introduce the cyclic shift
operator
X
Cm = |i1 , . . . , im ihi2 , . . . , im , i1 |.
i1 ,...,im ∈[d]

Then we use the identity

tr[ϕσ1 ϕσ2 · · · ϕσm ] = tr[Cm (ϕσ1 ⊗ ϕσ2 ⊗ · · · ⊗ ϕσm )]. (60)

Next, we take the expectation. It is a well-known consequence of Schur-Weyl duality (see e.g. Lemma 1.7 of [11]) that
P
⊗t π∈St π
E[ϕ ] = . (61)
d(d + 1) · · · (d + t − 1)

We will apply this to (60) by inserting (61) in the appropriate locations as given by ~σ . Let S~σ := {π ∈ Sm : σi =
σπ(i) ∀i ∈ [m]} be the set of permutations that leaves ~σ (or equivalently ~s) invariant. Then |S~σ | = µ1 ! · · · µℓ ! and

Ed [~σ ] = E[tr Cm (ϕσ1 ⊗ ϕσ2 ⊗ · · · ⊗ ϕσm )] (62a)

P
π∈S~ σ
π
= tr Cm Qℓ (62b)
d(d + 1) · · · (d + µi − 1)
Pi=1
π∈S π
≤ tr Cm Qℓ ~σ (62c)
µi
i=1 d
P
π∈S~
σ
tr Cm π
= m
(62d)
X d
= dcyc(Cm π)−m . (62e)
π∈S~
σ

This last equality follows from the fact that for any permutation ν acting on (Cd )⊗m , we have that

tr ν = dcyc(ν) , (63)

where cyc(ν) is the number of cycles of ν. (Eq. (63) can be proven by first considering the case when cyc(ν) = 1 and
then decomposing a general permutation into a tensor product of cyclic permutations.)
To study cyc(Cm π), we introduce a graphical notation for strings. For any string ~σ , define the letter graph G to
be a directed graph with ℓ vertices such that for i = 1, . . . , ℓ, vertex i has in-degree and out-degree both equal to
µi . (For brevity, we will simply say that i has degree µi .) Thus there are a total of m edges. The edges leaving and
entering vertex i will also be ordered. To construct the edges in G, we add an edge from si to si+1 for i = 1, . . . , m,
with sm+1 := s1 . The ordering on these edges is given by the order we add them in. That is, if letter a appears in
22

1 2 3 4 5

FIG. 7: The letter graph corresponding to the string 123241351352.

1 2 3 4 5

FIG. 8: Example of the letter graph for the case when m = 10 and ℓ = µ̂2 = cmax
2 = m
2
. The corresponding string is 1234543215.

positions i1 , i2 , . . . with i1 < i2 < · · · , then the first edge out of a is directed at si1 +1 , the second out-edge points at
si2 +1 , and so on. Likewise, a’s incoming edges (in order) come from si1 −1 , si2 −1 , . . ..
Now we think of the incoming and outgoing edges of a vertex as linked, so that if we enter on the j th incoming
edge of a vertex, we also exit on the j th outgoing edge. This immediately specifies a cycle through some or all of G.
If we use the ordering specified in the last paragraph then the cycle is in fact an Eulerian cycle (i.e. visits each edge
exactly once) that visits the vertices in the order s1 , s2 , . . . , sm . Thus, from a letter graph G and a starting vertex we
can reconstruct the string ~σ that was used to generate G.
The letter graph of ~σ can also be used to give a cycle decomposition of Cm π. Any permutation π ∈ S~σ can be
thought of as permuting the mapping between in-edges and out-edges for each vertex. The resulting number of edge-
disjoint cycles is exactly cyc(Cm π). To see this, observe that π maps i1 to some i2 for which σi1 = σi2 and then Cm
maps i2 to i2 + 1. In G these two steps simply correspond to following one of the edges out of i1 . Following the path
(or the permutation) until it repeats itself, we see that cycles in G are equivalent to cycles in Cm π.
We now use letter graphs to estimate (62). While methods for exactly enumerating cycle decompositions of directed
graphs do exist[7], for our purposes a crude upper bound will suffice. Observe that because ~σ contains no repeats,
G contains no 1-cycles. Thus, the shortest cycles in G (or equivalently, in Cm π) have length 2. Let c2 (π) denote
the number of 2-cycles in Cm π and cmax 2 = maxπ∈S~σ c2 (π). Sometimes we simply write c2 instead of c2 (π) when the
argument is understood from context. We now observe that c2 obeys bounds analogous to those in (59). In particular,
cmax
2 ≤m 2 , and for any π,

m − 2c2 (π) m + c2 (π)

cyc(Cm π) ≤ c2 (π) + = . (64)
3 3
Since c2 (π) ≤ cmax
2 ≤ m/2, (64) implies that cyc(Cm π) ≤ m/2. Thus the smallest power of 1/d possible in (62) is
m
2. When we combine this with (59), we see that the leading-order contribution (in terms of p and d) is O(xm/2 ),
and that other terms are smaller by powers of 1/p and/or 1/d. Additionally, this leading-order contribution will have
a particularly simple combinatorial factor.
The leading-order term. Consider the case when m is even and ℓ = µ̂2 = cmax 2 = m2 . This corresponds to a graph
with ℓ vertices, each with in-degree and out-degree two. Additionally, there is an ordering of the edges which organizes
them into ℓ 2-cycles. Thus every vertex participates in exactly two 2-cycles. Since the graph is connected, it must
take the form of a single doubly-linked loop. Thus the letter graph of the leading-order term is essentially unique. See
Fig. 8 for an example when m = 10. The only freedom here is the ordering of the vertices, which can be performed
in ℓ! ways. Together with (56), this means the combinatorial contribution is simply ℓ! pℓ ≤ pℓ .

Now we examine the sum in (62). Assume without loss of generality that the vertices 1, . . . , ℓ are connected in the
cycle 1 − 2 − 3 − · · ·− ℓ − 1. Each vertex has two different configurations corresponding to the two permutations in S2 .
In terms of the letter graph these correspond to the two different ways that the two incoming edges can be connected
to the two outgoing edges. Since vertex i has one edge both to and from each of i ± 1, we can either
23

1 2 1 2
i-1 i i+1 i-1 i i+1
1 2 2 1
(a)closed configuration (b)open configuration

FIG. 9: Vertex i is connected to i ± 1 by one edge in either direction. These edges can be connected to each other in two ways,
which are depicted in (a) and (b). We call (a) a “closed” configuration and (b) an “open” configuration.

(a) connect the incoming i − 1 edge to the outgoing i − 1 edge, and the incoming i + 1 edge to the outgoing i + 1
edge (the closed configuration) ; or,
(b) connect the incoming i − 1 edge to the outgoing i + 1 edge, and the incoming i + 1 edge to the outgoing i − 1
edge (the open configuration).
These possibilities are depicted in Fig. 9.
Let c denote the number of vertices in closed configurations. These vertices can be selected in cℓ ways. If 1 ≤ c ≤ ℓ

then c is also the number of cycles: to see this, note that each closed configuration caps two cycles and each cycle
consists of a chain of open configurations that is capped by two closed configurations on either end. The exception is
when c = 0. In this case, there are two cycles, each passing through each vertex exactly once. Thus, the RHS of (62)
evaluates (exactly) to
ℓ
" m2 #
X ℓ m 1 m
d2−m + dc−m = d− 2 1+ + d− 2 (d2 − 1) .
c=1
c d
m
Combining everything, we find a contribution of x 2 (1 + o(1)) as d → ∞. In particular, when m is even this yields
the lower bound claimed in Lemma 9. We now turn to the case when cmax 2 , ℓ and µ̂2 are not all equal to m/2.
The sum over all terms. Our method for handling arbitrary values of cmax 2 , ℓ and µ̂2 is to compare their contribution
with the leading-order term. We find that if one of these variables is decreased we gain combinatorial factors, but
also need to multiply by a power of 1/p or 1/d. The combinatorial factors will turn out to be polynomial in m, so
if m is sufficiently small the contributions will be upper-bounded by a geometrically decreasing series. This process
resembles (in spirit, if not in details) the process leading to Eq. (31) in Section II.
Our strategy is to decompose the graph into a “standard” component which resembles the leading-order terms and
a “non-standard” component that can be organized arbitrarily. The standard component is defined to be the set of
2-cycles between degree-2 vertices. When ℓ = µ̂2 = cmax 2 = m 2 the entire graph is in the standard component, so when
m
ℓ, µ̂2 , cmax
2 ≈ 2 , the non-standard component should be small. Thus, in what follows, it will be helpful to keep in
mind that the largest contributions come from when m 2 − ℓ, m
2 − µ̂2 , m2 − c2
max
are all small, and so our analysis will
focus on this case.
Begin by observing that there are ℓ − µ̂2 vertices with degree greater than two. Together these vertices have
m − 2µ̂2 in- and out-edges. Thus, they (possibly together with some of the degree-2 vertices) can participate in at
most m − 2µ̂2 2-cycles. Fix a permutation π for which c2 (π) = cmax 2 . To account for all the 2-cycles, there must be
at least cmax
2 − (m − 2µ̂2 ) 2-cycles between degree-2 vertices. These 2-cycles amongst degree-2 vertices (the standard
component) account for ≥ 2cmax 2 − 2m + 4µˆ2 edges. Thus the number of non-standard edges entering and leaving
the degree-2 vertices is ≤ 2µ̂2 − (2cmax 2 − 2m + 4µˆ2 ) = 2m − 2cmax2 − 2µ̂2 . Together we have ≤ 3m − 2cmax 2 − 4µˆ2
non-standard edges.
We now bound the number of ways to place the m edges in G. First, we can order the degree-2 vertices in µ̂2 ! ways.
This ordering will later be used to place the 2-cycles of the standard component. Next, we fix an arbitrary ordering
for the ℓ − µ̂2 vertices with degree larger than two. We then place

eNS := 3m − 2cmax
2 − 4µ̂2

non-standard edges. This can be done in ≤ meNS ways. One way to see this is that each non-standard edge
has m choices of destination, since we allow them to target specific incoming edges of their destination vertex.
Call these destination edges {I1 , . . . , IeNS }. These incoming edges correspond to eNS outgoing edges, which we call
{O1 , . . . , OeNS }, and which become the starting points of the non-standard edges. Without loss of generality we can
sort {O1 , . . . , OeNS } according to some canonical ordering; let {O1′ , . . . , Oe′ NS } be the sorted version of the list. Then
we let Oi′ connect to Ii for i = 1, . . . , eNS . Since our ordering of {I1 , . . . , IeNS } was arbitrary, this is enough to specify
24

any valid placement of the edges. Additionally, our choices of {I1 , . . . , IeNS } also determine the degrees µ1 , . . . , µℓ
since they account for all of the incoming edges of the non-degree-2 vertices and out-degree equals in-degree. Note
that nothing prevents non-standard edges from being used to create 2-cycles between degree-2 vertices. However we
conservatively still consider such cycles to be part of the non-standard component.
The remaining m − eNS = 2cmax 2 + 4µ̂2 − 2m edges (if this number is positive) make up 2-cycles between degree-2
vertices, i.e. the standard component. Here we use the ordering of the degree-2 vertices. After the non-standard edges
are placed, some degree-2 vertices will have all of their edges filled, some will have one incoming and one outgoing
edge filled, and some will have none of their edges filled. Our method of placing 2-cycles is simply to place them
between all pairs of neighbors (relative to our chosen ordering) whenever this is possible.
We conclude that the total number of graphs is
max max
−4µ̂2 −4µ̂2
≤ µ̂2 !meNS ≤ µ̂2 !m3m−2c2 ≤ ℓ!m3m−2c2 . (65)

In order to specify a string ~σ , we need to additionally choose a starting edge. However, if we start within the standard
component, the fact that we have already ordered the degree-2 vertices means that this choice is already accounted
for. Thus we need only consider
max
−4µ̂2
eNS + 1 ≤ 2eNS = 23m−2c2 (66)

initial edges, where we have used the fact that 1 + a ≤ 2a for any integer a. The total number of strings corresponding
to given values of ℓ, µ̂2 , cmax
2 is then upper-bounded by the product of (66) and (65):
max
−4µ̂2
ℓ!(2m)3m−2c2 . (67)

Observe that this matches the combinatorial factor for the leading-order term (cmax 2 = µ̂2 = ℓ = m/2) and then
degrades smoothly as cmax2 , µ̂ 2 , ℓ move away from m/2.
Finally, we need to evaluate the sum over permutations in (62). Our choices for non-standard vertices are substan-
tially more complicated than the open or closed options we had for the leading-order case. Fortunately, it suffices to
analyze only whether each 2-cycle is present or absent. Since a 2-cycle consists of a pair of edges of the form (i, j) and
(j, i), each such cycle can independently
max
be present or absent. Thus, max
while there are µ1 ! · · · µℓ ! total elements of S~σ ,
we can break the sum into 2c2 different groups of (µ1 ! · · · µℓ !)/2c2 permutations, each corresponding to a different
subset of present 2-cycles. In other words, there are exactly
max
c2 µ1 ! · · · µ ℓ !
max
c 2 c2

choices of π ∈ S~σ such that c2 (π) = c. Using the fact that cyc(Cm π) ≤ (m + c2 (π))/3, we have
cmax cmax
X cmax µ1 ! · · · µℓ !
2
m+c µ1 ! · · · µℓ ! −2m+c max
2
2 3 −m
2 − 31
Ed [~σ ] ≤ cmax
d = c max d 3 1 + d
c=0
c 2 2 2 2

Finally, observe that µ1 ! · · · µℓ ! is a convex function of µ1 , . . . , µℓ and thus is maximized when µ1 = m − 2ℓ + 2 and
µ2 = · · · = µℓ = 2 (ignoring the fact that we have already fixed µ̂2 ). Thus

max −2m+cmax
2
1
cmax
2
Ed [~σ ] ≤ (m − 2ℓ + 2)!2ℓ−1−c2 d 3 1 + d− 3 (68)
m max −2m+cmax
2 m
≤ mm−2ℓ 2 2 −c2 d 3 e 2d1/3 , (69)

where in the last step we used the facts that 2 ≤ ℓ ≤ m/2 and cmax
2 ≤ m/2.
We now combine (69) with the combinatorial factor in (67) to obtain
k
pℓ

−2m+cmax m
3m−2cmax m−2ℓ m −cmax 2
X X X X
k −4µ̂2
Ed [~s] ≤ ℓ!(2m) 2 m 2 2 2 d 3 e 2d 1/3
(70)
ℓ!
s∈[p]m
~ 0≤cmax
2 ≤m m
2 0≤ℓ≤ 2 3ℓ−m≤µ̂2 ≤ℓ
R(~
s)=~s
m
m km X pℓ− 2 max m max
= x 2 e 2d1/3 (2m)(m−2c2 )+(2m−4µ̂2 ) mk(m−2ℓ) 2k( 2 −c2 ) (71)
d 3 ( 2 −c2 )
k m max
0≤cmax
2 ≤m2
0≤ℓ≤ m
2
3ℓ−m≤µ̂2 ≤ℓ
25

We can bound the sum over µˆ2 by introducing α = ℓ − µ̂2 , so that

m−2ℓ
X X 4α m−2ℓ
(2m)2(m−2µ̂2 ) = (2m)2m−4ℓ (2m) = (2m)2m−4ℓ (1 + 16m4 )m−2ℓ ≤ 65m6 (72)
3ℓ−m≤µ̂2 ≤ℓ α=0

Substituting (72) in (71) and rearranging, we obtain

m m max
X m km X X 5000m2k+12 2 −ℓ 22+k m2 2 −c2
Ed [~s]k ≤ x 2 e 2d1/3 k (73)
p d3
s∈[p]
~ m
0≤cmax
2 ≤m
2 0≤ℓ≤
m
2
R(~
s)=~s
km
e 2d1/3 m
≤ x 2 . (74)
5000m2k+12 22+k m2
1− p 1− k
d3

In the last step we have assumed that both terms in the denominator are positive. This completes the proof of
Lemma 9.

C. Bounding the sum of all strings

For any string ~s ∈ [p]m we will repeatedly remove repeats and unique letters until the remaining string is irreducible.
Each letter in the original string either (a) appears in the final irreducible string, (b) is removed as a repeat of one
of the letters appearing in the final irreducible string, or (c) is removed as part of a completely reducible substring.
√ t
Call the letters A, B or C accordingly. Assign a weight of x y t to each run of t A’s, a weight of y t to each run
P∞ Pt
of t B’s and of t=0 ℓ=0 N (t, ℓ)xℓ y t to each run of t C’s. Here y is an indeterminant, but we will see below that
it can also be thought of as a small number. We will define G(x, y) to be the sum over all finite strings of A’s, B’s
and C’s, weighted according to the above scheme. Note that [y m ]G(x, y) (i.e. the coefficient of y m in G(x, y)) is the
contribution from strings of length m.
We now relate G(x, y) to the sum in (45). Define
km
e 2d1/3
A0 = 2k+12

22+k m2
1 − 5000mp 1− k
d3
√ t
so that Lemma 9 implies that the contribution from all irreducible strings of length t is ≤ A0 x as long as 1 ≤ t ≤ m.
√ 0
We will treat the t = 0 case separately in Lemma 8, but for simplicity allow it to contribute a A0 x term to the
present sum. Similarly, we ignore the fact that there are no irreducible strings of length 1, 2, 3 or 5, since we are only
concerned with establishing an upper bound here. Thus
X G(x, y0 )
Ed [~s]k ≤ A0 [y m ]G(x, y) ≤ A0 , (75)
m
y0m
s∈[p]
~
s)6=∅
R(~

where the second inequality holds for any y0 within the radius of convergence of G. We will choose y0 below, but first
give a derivation of G(x, y).
To properly count the contributions from completely reducible substrings (a.k.a. C’s), we recall that F (x, y) counts
all C strings of length ≥ 0. Thus, it will be convenient to model a general string as starting with a run of 0 or more
C’s, followed by one or more steps, each of which places either an A or a B, and then a run of 0 or more C’s. (We
omit the case where the string consists entirely of C’s, since this corresponds to completely reducible strings.) Thus,
√
X √ n y(1 + x)F 2 (x, y)
G(x, y) = F (x, y) · y(1 + x)F (x, y) = √ , (76)
1 − y(1 + x)F (x, y)
n≥1
√
which converges whenever F converges and y(1 + x)F < 1. However, since we are only interested in the coefficient
m
of y we can simplify our calculations by summing over only n ≤ m. We also omit the n = 0 term, which corresponds
to the case of completely reducible strings, which we treat separately. Thus, we have
m
X √ n
Gm (x, y) := F (x, y) · y(1 + x)F (x, y) ,
n=1
26

and Gm (x, y) satisfies [y m ]Gm (x, y) = [y m ]G(x, y).

√ −2
Now define y0 = λ−1 x) . Rewriting F as 12 y −1 + 1 − x − (y −1 − (1 + x))2 − 4x , we see that
p
+ = (1 +
√ √ √
F (x, y0 ) = 1 + x. Thus y0 (1 + x)F (x, y0 ) = 1 and Gm (x, y0 ) = m(1 + x).
Substituting into (75) completes the proof of the Lemma.

D. Alternate models

We now use the formalism from Section III B to analyze some closely related random matrix ensembles that have
been suggested by the information locking proposals of [23]. The first ensemble we consider is one in which each
|ϕjs i is a random unit vector in Aj ⊗ Bj , then the Bj system is traced out. Let dA = dim A1 = . . . = dim Ak and
dB = dim B1 = . . . = dim Bk . The resulting matrix is
k
X O
Mp,dA [dB ],k := trBj ϕjsj .
s∈[p]m j=1
~

If dB ≪ dA then we expect the states trB ϕs to be nearly proportional to mutually orthogonal rank-dB projectors
and so we expect Mp,dA [dB ],k to be nearly isospectral to Mp,dA /dB ,k ⊗ τd⊗k
B
, where τd := Id /d. Indeed, if we define
m m
Ep,d [d
A B ],k := tr M p,d [d
A B ],k then we have
Lemma 11.
m(m+1)kdB
m m k(1−m)
Ep,d A [dB ],k
≤ Ep,d A /dB ,k
e 2dA
dB .
Proof. Define EdA [dB ] [~s] = tr(trB1 (ϕ1s1 ) · · · trBm (ϕ1sm ). Following the steps of (62), we see that
m m
A
EdA [dB ] [~s] = tr(Cm ⊗ I B )E(ϕs1 ⊗ · · · ϕsm ) (77)
Am
⊗ π Bm
P
Am Bm π∈S~s π
≤ tr(Cm ⊗ I ) (78)
(dA dB )m
X cyc(C π)−m cyc(π)−m
= dA m dB . (79)
π∈S~s

Next, we use the fact (proved in [26]) that for any π ∈ Sm , cyc(Cm π) + cyc(π) ≤ m + 1 to further bound
X cyc(C π)−m 1−cyc(C π) X dA cyc(Cm π)−m
EdA [dB ] [~s] ≤ dA m dB m
= d1−m
B . (80)
dB
π∈S~s π∈S~s

On the other hand, if µ1 , . . . , µp are the letter frequencies of ~s then (62) and (23) yield
cyc(Cm π)
P
π∈S~s d m(m+1) X
Ed [~s] = Qp ≥ e− 2d dcyc(Cm π)−m . (81)
s=1 d(d + 1) · · · (d + µ s − 1)
π∈S~s

Setting d = dA /dB and combining (80) and (81) yields the inequality
m(m+1)dB
EdA [dB ] [~s] ≤ EdA /dB [~s]e 2dA
.
We then raise both sides to the k th power and sum over ~s to establish the Lemma.
m
To avoid lengthy digressions, we avoid presenting any lower bounds for Ep,d A [dB ],k
.
Next, we also consider a model in which some of the random vectors are repeated, which was again first proposed
in [23]. Assume that p1/k is an integer. For s = 1, . . . , p and j = 1, . . . , k, define
& '
s
s(j) := j .
p1− k
Pp
Note that as s ranges from 1, . . . , p, s(j) ranges from 1, . . . , pj/k . Define M̃p,d,k = s=1 |ϕ̃s ihϕ̃s |, where |ϕ̃s i =
|ϕ1s(1) i ⊗ · · · ⊗ |ϕks(k) i. In [23], large-deviation arguments were used to show that for x = o(1), kM̃p,d,k k = 1 + o(1) with
high probability. Here we show that this can yield an alternate proof of our main result on the behavior of kMp,d,k k,
at least for small values of x. In particular, we prove
27

Corollary 12. For all m, p, d, k,

m m
Ẽp,d,k ≤ Ep,d,k .

This implies that if λ̃ is a randomly drawn eigenvalue of M̃p,d,k , λ is a randomly drawn eigenvalue of Mp,d,k and γ
is a real number, then Pr[λ ≥ γ] ≤ Pr[λ̃ ≥ γ]. In particular

Pr[kMp,d,k k ≥ γ] ≤ dk Pr[kM̃p,d,k k ≥ γ].

The proof of Corollary 12 is a direct consequence of the following Lemma, which may be of independent interest.
Lemma 13. If s′i = s′j whenever si = sj for some strings ~s, ~s′ ∈ [p]m then Ed [~s] ≤ Ed [~s′ ].
Proof. The hypothesis of the Lemma can be restated with no loss of generality as saying that ~s′ is obtained from
~s by a series of merges, each of which replaces all instances of letters a, b with the letter a. We will prove the
inequality for a single such merge. Next, we rearrange ~s so that the a’s and b’s are at the start of the string. This
rearrangement corresponds to a permutation π0 , so we have Ed [~s] = tr π0† Cm π0 E[ϕ⊗µ a
a
⊗ ϕ⊗µ
b
b
⊗ ω] and Ed [~s′ ] =
†
tr π0 Cm π0 E[ϕa⊗µa +µb ⊗ ω], where ω is a tensor product of various ϕs , with s 6∈ {a, b}. Taking the expectation over
ω yields a positive linear combination of various permutations, which we absorb into the π0† Cm π0 term by using the
cyclic property of the trace. Thus we find

⊗ ϕ⊗µ
X
Ed [~s] = cπ tr πE[ϕ⊗µ
a
a
b
b
⊗ I m−µa −µb ] (82)
π∈Sm
X
′
Ed [~s ] = cπ tr πE[ϕa⊗µa +µb ⊗ I m−µa −µb ] (83)
π∈Sm

for some cπ ≥ 0. A single term in the Ed [~s] sum has the form cπ E[|hϕa |ϕb i|2f (π) ] for some f (π) ≥ 0, while for
Ed [~s′ ], the corresponding term is simply cπ . Since E[|hϕa |ϕb i|2f (π) ] ≤ 1, this establishes the desired inequality.

IV. APPROACH 3: SCHWINGER-DYSON EQUATIONS

A. Overview

The final method we present uses the Schwinger-Dyson equations[15] to evaluate traces of products of random pure
states. First, we show how the expectation of a product of traces may be expressed as an expectation of a similar
product involving fewer traces. This will allow us to simplify Ed [~s]k , and thus to obtain a recurrence relation for
em
p,d,k .

B. Expressions involving traces of random matrices

1. Eliminating one ϕ: Haar random case

We start by considering the case when k = 1 (i.e. |ϕi i are just Haar-random, without a tensor product structure).
Let ϕ be a density matrix of a Haar-random state over Cd .
Let A1 , . . . , Aj be matrix-valued random variables that are independent of ϕ (but there may be dependencies
between Ai ). We would like to express

E[tr(ϕA1 ϕA2 . . . ϕAj )],

by an expression that depends only on A1 , . . . , Aj . First, if ϕ = |ϕihϕ|, then

tr(ϕA1 ϕ . . . ϕAi ) tr(ϕAi+1 ϕ . . . ϕAj ) = hϕ|A1 ϕ . . . ϕAi |ϕihϕ|Ai+1 ϕ . . . ϕAj |ϕi

= hϕ|A1 ϕ . . . Ai ϕAi+1 . . . ϕAj |ϕi = tr(ϕA1 . . . ϕAj ). (84)

This allows to merge expressions that involve the same matrix ϕ.

Second, observe that ϕ = U |0ih0|U † , where U is a random unitary and |0i is a fixed state. By applying eq. (19)
from [15], we get
j−1
1X
E[tr(ϕA1 ϕA2 . . . ϕAj )] = − E[tr(ϕA1 . . . ϕAi ) tr(ϕAi+1 . . . ϕAj )]
d i=1

j
1X
+ E[tr(ϕA1 . . . Ai−1 ϕ) tr(Ai ϕAi+1 . . . ϕAj )].
d i=1

Because of (84), we can replace each term in the first sum by E[tr(ϕA1 . . . ϕAj )]. Moving those terms to the left hand
d
side and multiplying everything by d+j−1 gives

j
1 X
E[tr(ϕA1 ϕA2 . . . ϕAj )] = E[tr(ϕA1 . . . Ai−1 ϕ) tr(Ai ϕAi+1 . . . ϕAj )]. (85)
d + j − 1 i=1

For i = j, we have

tr(ϕA1 . . . Aj−1 ϕ) tr(Aj ) = tr(ϕA1 . . . Aj−1 ) tr(Aj ). (86)

Here, we have applied tr(AB) = tr(BA) and ϕ2 = ϕ. For i < j, we can rewrite

tr(ϕA1 . . . Ai−1 ϕ) tr(Ai ϕAi+1 . . . ϕAj ) = tr(ϕA1 . . . Ai−1 ) tr(ϕAi+1 . . . ϕAj Ai )

= tr(ϕA1 . . . Ai−1 ϕAi+1 . . . ϕAj Ai ). (87)

By combining (85), (86) and (87), we have

j−1
!
1 X
E[tr(ϕA1 ϕA2 . . . ϕAj )] = E[tr(ϕA1 . . . ϕAj−1 ) tr(Aj )] + E[tr(ϕA1 . . . Ai−1 ϕAi+1 . . . ϕAj Ai )] (88)
d+j−1 i=1
j−1
!
1 X
≤ E[tr(ϕA1 . . . ϕAj−1 ) tr(Aj )] + E[tr(ϕA1 . . . Ai−1 ϕAi+1 . . . ϕAj Ai )] . (89)
d i=1

2. Consequences

Consider E[tr(ϕ1 . . . ϕm )] with ϕi as described in section IV A. Let Y1 , . . . , Yl be the different matrix valued random
variables that occur among ϕ1 , . . . , ϕm . We can use the procedure described above to eliminate all occurrences of Y1 .
Then, we can apply it again to eliminate all occurrences of Y2 , . . ., Yl−1 , obtaining an expression that depends only
on tr(Yl ). Since tr(Yl ) = 1, we can then evaluate the expression.
Each application of (88) generates a sum of trace expressions with positive real coefficients. Therefore, the fi-
nal expression in tr(Yl ) is also a sum of terms that involve tr(Yl ) with positive real coefficients. This means that
E[tr(ϕ1 . . . ϕm )] is always a positive real.

3. Eliminating one ϕ: the tensor product case

We claim
Lemma 14. Let ϕ be a tensor product of k Haar-random states in d dimensions and A1 , . . . , Aj be matrix-valued
random variables which are independent from ϕ and whose values are tensor products of matrices in d dimensions.
Then,
j−1
1 + j k d−1/k jk X
E[tr(ϕA1 ϕA2 . . . ϕAj )] ≤ E[tr(ϕA1 . . . ϕAj−1 ) tr(Aj )] + 1/k E[tr(ϕA1 . . . Ai−1 ϕAi+1 . . . ϕAj Ai )].
d d i=1
29

Proof. Because of the tensor product structure, we can express

ϕ = ϕ1 ⊗ ϕ2 ⊗ . . . ϕk ,

Ai = A1i ⊗ A2i ⊗ . . . Aki .

We have
k
Y
E[tr(ϕA1 ϕA2 . . . ϕAj )] = E[tr(ϕl Al1 ϕl . . . ϕl Alj )].
l=1

We expand each of terms in the product according to (89). Let C0 = E[tr(ϕl Al1 . . . ϕl Alj−1 ) tr(Alj )] and

Ci = E[tr(ϕl Al1 . . . Ali−1 ϕl Ali+1 . . . ϕl Alj Ali )]

for i ∈ {1, 2, . . . , j − 1}. (Since each of k subsystems has equal dimension d and are identically distributed, the
expectations C0 , . . . , Cj−1 are independent of l.) Then, from (89), we get
k j−1 j−1
1 Y 1 X X
E[tr(ϕA1 ϕA2 . . . ϕAj )] ≤ (C0 + C1 + . . . + Cj−1 ) = . . . Ci · · · Cik .
dk dk i =0 i =0 1
l=1 1 k

Consider one term in this sum. Let r be the number of l for which il = 0. We apply the arithmetic-geometric mean
inequality
x1 + x2 + · · · + xk √
≥ k x1 x2 · · · xk
k
to
( 1
d− k (Cil )k if il = 0
xl = r .
d (k−r)k (Cil )k 6 0
if il =

(In cases if r = 0 or r = k, we just define xl = Cil for all l ∈ {1, 2, . . . , k}.) We now upper-bound the coefficients of
(C0 )k in the resulting sum. For (C0 )k , we have a contribution of 1 from the term which has i1 = . . . = ik = 0 and a
contribution of at most d−1/k from every other term. Since there are at most j k terms, the coefficient of (C0 )k is at
most

1 + j k d−1/k .
r
The coefficient of (Cj )k in each term is at most d (k−r)k . Since r ≤ k − 1 (because the r = k terms only contain C0 ’s),
r k−1
we have d (k−r)k ≤ d k . The Lemma now follows from there being at most j k terms.

C. Main results

1. Haar random case

We would like to upper-bound

p p
1 X X
em
p,d,1 = ... E[tr(ϕs1 . . . ϕsm )].
d s =1 s =1
1 m

Lemma 15.
m−2
X
m−l−1 p + m3 m−1
em
p,d,1 ≤ elp,d,1 ep,d,1 + ep,d,1 . (90)
d
l=0
30

Proof. In section IV D 1.
Using e0p,d,1 = tr(I)/d = 1, we can state Lemma 15 equivalently as

m−1
p + m3
X
m−l−1
em
p,d,1 ≤ elp,d,1 ep,d,1 + − 1 em−1
p,d,1 . (91)
d
l=0

Define x̃ = (p + m3 )/d (and note that it is not exactly the same as the variable of the same name in Section II). Then
(91) matches the recurrence for the Narayana coefficients in (54). Thus we have
Corollary 16.
m
X √ 2m
em
p,d,1 ≤ N (m, ℓ)x̃ℓ = βm (x̃) ≤ (1 + x̃) (92)
ℓ=1

Similar arguments (which we omit) establish the lower bound em ℓ

P
p,d,1 ≥ ℓ N (m, ℓ)(p)ℓ /(d + m) , which is only
slightly weaker than the bound stated in Theorem 1 and proved in Lemma 8.

2. Tensor product case

The counterpart of Lemma 15 is

Lemma 17.
m−2
mk X l 3mk+3

m−l−1 p
em
p,d,k ≤ 1 + 1/k ep,d,k ep,d,k + + 1/k em−1
p,d,k (93)
d dk d
l=0
m−1
mk X l mk+3 mk

m−l−1 p
= 1 + 1/k ep,d,k ep,d,k + + 3 1/k − 1 + 1/k em−1
p,d,k (94)
d dk d d
l=0
(95)
k+3
This time we set x̃k = dpk + 3 md1/k . Also define γ = mk /d1/k . Then Lemma 17 implies that em
p,d,k ≤ (1 +
m m
γ) [y ]F̃ (x̃k , y), where F̃ satisfies the recurrence

2 x̃k
F̃ = 1 + y F̃ + y − 1 F̃ . (96)
1+γ

Thus we obtain
Corollary 18.

x̃k
em
p,d,k ≤ (1 + γ) βm m
(97)
1+γ
m
3mk+4

x̃k
≤ βm (x) ≤ exp βm (x) (98)
x xd1/k

Proof. (97) follows from the preceding discussion as well as the relation between βm and the recurrence (96), which
was discussed in Section III A and in [30, 31]. The first inequality in (98) is because βm (x(1 + ǫ)) ≤ (1 + ǫ)m βm (x) for
any ǫ ≥ 0, which in turn follows from the fact that βm (x) is a degree-m polynomial in x with nonnegative coefficients.
The second inequality follows from the inequality 1 + ǫ ≤ eǫ .

D. Proofs

1. Proof of Lemma 15

We divide the terms E[tr(ϕs1 . . . ϕsm )] into several types.

First, we consider terms for which s1 ∈ / {s2 , . . . , sm }. Then, ϕs1 is independent from ϕs2 . . . ϕsm . Because of
linearity of expectation , we have

I 1
E[tr(ϕs1 . . . ϕsm )] = tr(E[ϕs1 ]E[ϕs2 . . . ϕsm ]) = tr E[ϕs2 . . . ϕsm ] = E[tr(ϕs2 . . . ϕsm )].
d d
p
By summing over all possible s1 ∈ [p], the sum of all terms of this type is d times the sum of all possible
/ {s2 , . . . , sm }, i.e., dp times Ep−1,d,1
E[tr(ϕs2 . . . ϕsm )] with s1 ∈ m−1
.
For the other terms, we can express them as

E[tr(ϕs1 Y1 ϕs1 Y2 . . . ϕs1 Yj )] (99)

with Y1 , . . . , Yj being products of ϕi for i 6= s1 . (Some of those products may be empty, i.e. equal to I.)
To simplify the notation, we denote ϕ = ϕs1 . Because of (89), (99) is less than or equal to
j−1
!
1 X
E[tr(ϕY1 ϕ . . . Yi−1 Yi ϕ . . . ϕYj ] + E[tr(ϕY1 ϕY2 . . . ϕYj−1 ) tr(Yj )] (100)
d i=1

We handle each of the two terms in (100) separately. For each the term in the sum, we will upper-bound the sum of
m−1
them all (over all E[tr(ϕY1 ϕY2 . . . ϕYj )]) by 1d Ep,d,1 times the maximum number of times the same term can appear
in the sum.
Therefore, we have to answer the question: given a term E[tr(Z1 . . . Zm−1 )], what is the maximum number of ways
how this term can be generated as E[tr(ϕY1 ϕ . . . Yi−1 Yi ϕ . . . ϕYj ]?
Observe that ϕ = Z1 . Thus, given Z1 . . . Zm−1 , ϕ is uniquely determined. Furthermore, there are at most m
locations in Z1 . . . Zm−1 which could be the boundary between Yi−1 and Yi . The original term E[tr(ϕY1 . . . ϕYi )] can
then be recovered by adding ϕ in that location. Thus, each term can be generated in at most m ways and the sum
m−1
of them all is at most m d Ep,d,1 .
It remains to handle the terms of the form

E[tr(ϕY1 ϕY2 . . . ϕYj−1 ) tr(Yj )]. (101)

We consider two cases:

Case 1: There is no ϕi which occurs both in Yj and in at least one of Y1 , . . . , Yj−1 . Then, the matrix valued
random variables ϕY1 ϕY2 . . . ϕYj−1 and Yj are independent. Therefore, we can rewrite (101) as

E[tr(ϕY1 ϕY2 . . . ϕYj−1 )]E[tr(Yj )]. (102)

Fix Y1 , . . . , Yj−1 . Let l be the length of Yj and let o be the number of different ϕi that occur in Y1 . . . Yj−1 . Then,
there are p − o − 1 different ϕi s which can occur in Yj (i.e., all p possible ϕi s, except for ϕs1 and those o which occur
in Y1 . . . Yj−1 ).
l l l l
Therefore, the sum of E[tr(Yj )] over all possible Yj is exactly Ep−o−1,d,1 . We have Ep−t,d,1 ≤ Ep−o−1,d,1 ≤ Ep,d,1 .
Therefore, the sum of all terms (102) in which Yj is of length l is lower-bounded by the sum of all
l
Ep−t,d,1 E[tr(ϕY1 ϕY2 . . . ϕYj−1 )]

l m−l−1 l m−l−1
which is equal to Ep−t,d,1 Ep,d,1 . Similarly, it is upper-bounded by Ep,d,1 Ep,d,1 .
Case 2: There exists ϕi which occurs both in Yj and in some Yl , l ∈ {1, . . . , j − 1}.
We express Yj = Zϕi W and Yl = Z ′ ϕi W ′ . Then, (101) is equal to

E[tr(ϕY1 . . . Yl−1 ϕZϕi W ϕYl+1 . . . ϕYj−1 ) tr(Z ′ ϕi W ′ ) =

E[tr(ϕY1 . . . Yl−1 ϕZϕi W ′ Z ′ ϕi W ϕYl+1 . . . ϕYj−1 )].

In how many different ways could this give us the same term E[tr(Z1 . . . Zm−1 )]?
Given Z1 , . . . , Zm−1 , we know ϕ = Z1 . Furthermore, we can recover Yj by specifying the location of the first ϕi ,
the second ϕi and the location where W ′ ends and Z ′ begins. There are at most m − 1 choices for each of those three
parameters. Once we specify them all, we can recover the original term (101). Therefore, the sum of all terms (101)
m−1
in this case is at most (m − 1)3 times the sum of all E[tr(Z1 . . . Zm−1 )], which is equal to Ep,d,1 .
32

Overall, we get
m−2
m p m−1 m m−1 X l m−l−1 (m − 1)3 m−1
Ep,d,1 ≤ Ep−1,d,1 + Ep,d,1 + Ep,d,1 Ep,d,1 + Ep,d,1 , (103)
d d d
l=0

with the first term coming from the terms where s1 ∈/ {s2 , . . . , sk }, the second term coming from the bound on the
sum in (100) and the third and the fourth terms coming from Cases 1 and 2. By combining the terms, we can rewrite
(103) as
m−2
m p + m3 m−1 1 X l m−l−1
Ep,d,1 ≤ Ep,d,1 + Ep,d,1 Ep,d,1 . (104)
d d
l=0

Dividing (104) by d completes the proof.

We remark as well that these techniques can yield a lower bound for Ep,d,1 . To do so, we apply the inequality
1/(d + j − 1) ≥ 1/(d + m) to (88), and then combine the lower bounds from the s1 ∈ / {s2 , . . . , sk } case and Case 1.
(For the other cases, we can use 0 as the lower bound, since we know that the expectation of any product of traces is
positive.) This yields
m−2
!
d X
m−l−1 p m−1
ep,d,1 ep−t,d,1 + ep−1,d,1 ≤ em
l
p,d,1 . (105)
d+m d
l=0

2. Proof of Lemma 17

The proof is the same as for Lemma 17, except that, instead of (89) we use Lemma 14.
The first term in (103), dpk Ep−1,d,k
m−1
, remains unchanged. The terms E[tr(ϕY1 . . . Yi−1 Yi ϕ . . . ϕYj )] in (100) are now
jk jk mk m m−1
multiplied by d1/k
instead of d1 . We have d1/k
≤ d1/k
. Therefore, the second term in (103) changes from d Ep,d,1 to
mk+1 m−1
d1/k
Ep,d,k .
jk mk
The terms E[tr(ϕY1 . . . ϕYj−1 ) tr(Yj )] in (100) acquire an additional factor of 1 + d1/k
≤ 1+ d1/k
. This factor is
then acquired by the third and the fourth terms in (103). Thus, we get
m−2
mk+1 m−1 mk X l mk (m − 1)3 m−1

m p m−1 m−l−1
Ep,d,k ≤ E + E + 1 + E E + 1 + Ep,d,k .
dk p−1,d,k d1/k p,d,k d1/k l=0 p,d,k p,d,1 d1/k d

The lemma now follows from merging the second term with the fourth term.

E. Relation to combinatorial approach

The recursive approach of this section appears on its face to be quite different from the diagrammatic and combi-
natorial methods discussed earlier. However, the key recursive step in (85) (or equivalently (88)) can be interpreted
in terms of the sorts of sums over permutations seen in Section III.
Consider an expression of the form X = E[tr(ϕA1 ϕA2 . . . ϕAj )]. For the purposes of this argument, we will ignore
the fact that A1 , . . . , Aj are random variables. Letting Cj denote the j-cycle, we can rewrite X as

X = tr(Cj E[ϕ⊗j ](A1 ⊗ A2 ⊗ · · · ⊗ Aj )) = tr(E[ϕ⊗j ](A1 ⊗ A2 ⊗ · · · ⊗ Aj )),

since Cj |ϕi⊗j = |ϕi⊗j . Next we apply (61) and obtain

P
π∈Sj tr(π(A1 ⊗ A2 ⊗ · · · ⊗ Aj ))
X= .
d(d + 1) · · · (d + j − 1)

We will depart here from the approach in Section III by rewriting the sum over Sj . For 1 ≤ i ≤ j, let (i, j) denote
the permutation that exchanges positions i and j, with (j, j) = e standing for the identity permutation. We also
define Sj−1 ⊂ Sj to be the subgroup of permutations of the first j − 1 positions. Since (1, j), . . . , (j − 1, j), (j, j) are
33

a complete set of coset representatives for Sj−1 , it follows that any π ∈ Sj can be uniquely expressed in the form
(i, j)π ′ with 1 ≤ j and π ′ ∈ Sj−1 . Our expression for X then becomes
j ′
P !
1 π ′ ∈Sj−1 π (A1 ⊗ A2 ⊗ · · · ⊗ Aj )
X
X= tr (i, j) .
d+j−1 i=1
d(d + 1) · · · (d + j − 2)
j
" !#
1 X
⊗j−1
= E tr (i, j)(ϕ ⊗ I)(A1 ⊗ A2 ⊗ · · · ⊗ Aj )
d+j−1 i=1
j−1
" #
1 X
= E tr (ϕA1 ϕA2 · · · ϕAj−1 ) tr(Aj ) + tr(ϕA1 ) · · · tr(ϕAi−1 ) tr(Aj ϕAi ) tr(ϕAi+1 ) · · · tr(ϕAj−1 ) ,
d+j−1 i=1

which matches the expression in (85), or equivalently, (88).

The difference in approaches can then be seen as stemming from the different ways of summing over π ∈ Sj .
In Section III (and to some extent, Section II), we analyzed the entire sum by identifying leading-order terms and
deriving a perturbative expansion that accounted for all the other terms. By contrast, the approach of this section is
based on reducing the sum over Sj to a similar sum over Sj−1 .

V. LOWER BOUNDS ON THE SPECTRUM

The bulk of our paper has been concerned with showing that kM k is unlikely to be too large (Corollary 5). Since
we give asymptotically sharp bounds on d−k E[tr M m ], we in fact obtain asymptotically convergent estimates of the
eigenvalue density of M (Corollary
√ 6). However, this does not rule out the possibility that a single eigenvalue of M
might be smaller than (1 − x)2 ; rather, it states that the expected number of such eigenvalues is o(dk ).
In fact, our method was successful in proving asymptotically sharp estimates on the largest eigenvalue of M . We
now turn to proving bounds on the smallest eigenvalue of M . To use the trace method to show that w.h.p. there are
no small eigenvalues, one would like to upper bound expressions such as E[tr(M − λI)2m ], for an appropriate choice
of λ. If we succeed in bounding such an expression then the λmin (the smallest eigenvalue of M ) is lower bounded by
1/m
E[(λ − λmin )2 ] ≤ E[tr(M − λI)2m ] , (106)

and hence
1/2m
E[λmin ] ≥ λ − E[tr(M − λI)2m ] . (107)

Let us first describe a failed attempt to bound this result, before giving the correct approach. To bound E[tr(M −
λI)2m ], the natural first attempt is to use the expansion
2m 2m−n
2m
X 2m
E[tr(M − λI) ]= E[tr(M n )] −λ . (108)
n=0
n

One might then attempt to estimate each term in the above expansion in turn. Unfortunately, what happens is the
following: the leading order (rainbow) terms for E[tr(M n )] can be summed directly
√ over n. One may
√ show that this
sum contributes a result to E[tr(M − λI)2m ] which grows roughly as max{(( x − 1)2 − λ)2m , (( x + 1)2 − λ)2m }.
That is, it is dominated by either the largest or smallest eigenvalue of the limiting distribution, depending on the
value of λ. However, we are unable to control the corrections to this result. While they are suppressed in powers of
1/d, they grow rapidly with m due to the binomial factor, causing this attempt to fail.
We now describe a simple alternate approach. Let us work within the Feynman diagram framework. By (15), the
spectrum of Mp,d,k is close to that of M̂p,d,k with high probability, so we can translate bounds on λmin in the Gaussian
ensemble to bounds on the smallest eigenvalue in the normalized ensemble.
Having reduced to the Gaussian ensemble, we now construct a diagrammatic series for E[tr(M̂ − λI)2m ]. One way
to construct such a diagrammatic series is to add in extra diagrams, in which rather than having m pairs of vertices,
we instead have n pairs of vertices, interspered with m − n “identity operators”, where nothing happens: the solid
lines simply proceed straight through. However, there already is a particular contraction in our existing diagrammatic
series in which solid lines proceed straight through. This is a particular contraction of neighboring vertices, in which
34

a dashed line connects the two vertices and all vertical lines leaving the two vertices are connected to each other.
So, we can obtain the same result by using our original diagrammatic expansion, but with a change in the rules for
weighting diagrams. If a a diagram has a certain number, c, of pairs of neighboring vertices contracted in the given
way, then we adjust the weight of the diagram by
d−k p − λ c x − λ
= . (109)
d−k p x
If d−k p − λ ≥ 0, then this new series consists only of positive terms and we can use our previous techniques for
estimating the series, bounding it by the sum of rainbow diagrams, plus higher order corrections. The sum of rainbow
diagrams changes in this approach. One could use a new set of generating functionals to evaluate the new sum of
rainbow diagrams, but√we can in fact find √ the result more directly: we can directly use the fact that this sum is
bounded by dk max{(( x − 1)2 − λ)2m , (( x + 1)2 − λ)2m }. The corrections remain small. Taking the smallest value
√ that x − λ ≥ 0, we have λ = x, and so we find that, for x > 1, the sum of these diagrams is bounded
of λ such
by dk (2 x + 1)2m . This gives us a bound that, for any ǫ > 0, the expectation value for the smallest eigenvalue is
asymptotically greater than
√
x − 2 x − 1 − ǫ, (110)
and hence using concentration of measure arguments and the above reduction to the Gaussian ensemble, we can then
show that, for any ǫ > 0, with high probability,
√ the smallest eigenvalue of a matrix chosen randomly from the uniform
ensemble is greater than or equal to x − 2 x − 1 − ǫ.
On the other hand, if x < 1, then we will need to instead consider E[tr(M̂ ′ − λI)2m ] where M̂ ′ is the Gram matrix
of the ensemble. Since M̂ ′ has the same spectrum as M̂ but is only p × p, all of the terms in (108) are identical except
that tr I equals p instead of dk . We can use a similar diagrammatic technique to incorporating the identity terms.
Now each term of M̂ contributes the pair of vertices from Fig. 2(a), but in the opposite order. Along the horizontal,
the solid lines are the internal lines and the dashed lines are external. Now the identity diagrams correspond to the
case when the dashed lines proceed straight through. These components of a diagram initially had a contribution of
1 (with k closed solid loops canceling the natural d−k contribution from each pair of vertices). Thus, adding in the
−λI terms results in a multiplicative factor of (1 − λ) for each vertex pair with the configuration where the dashed
lines go straight through. Now we can choose λ √ to be as large as 1 √
and still have each diagram be nonnegative. The
resulting bound on E[tr(M̂ ′ − λI)2m ] is√p max{(( x − 1)2 − 1)2m , (( x + 1)2 − 1)2m } plus small corrections. We find
that the smallest eigenvalue is ≥ 1 − 2 x − x − ǫ with high probability. √
Combining these bounds, we find that the smallest eigenvalue is asymptotically
√ 2 no lower than (1− x)2 −2 min(1, x).
This is within a 1 − o(1) factor of the unproven-but-true value of (1 − x) in the limits x → 0 and x → √ ∞.
We believe that it should be possible to improve this result to get an asymptotic lower bound of (1 − x)2 , staying
within the framework of trace methods, using any of the three techniques we have used. This will require a more
careful estimate of the negative terms to show that our methods remain valid. We leave the solution of this problem
to future work.

Acknowledgments

We are grateful to Guillaume Aubrun for bringing [2, 3, 27] to our attention, for telling us about his conjecture, and
for many helpful conversations on convex geometry. AA was supported by University of Latvia Research Grant and
Marie Curie grant QAQC (FP7-224886). MBH was supported by U. S. DOE Contract No. DE-AC52-06NA25396.
AWH was supported by U.S. ARO under grant W9111NF-05-1-0294, the European Commission under Marie Curie
grants ASTQIT (FP6-022194) and QAP (IST-2005-15848), and the U.K. Engineering and Physical Science Research
Council through “QIP IRC.” MBH and AWH thank the KITP for hospitality at the workshop on “‘Quantum Infor-
mation Science”.

[1] A. Abeyesinghe, I. Devetak, P. Hayden, and A. Winter. The mother of all protocols: Restructuring quantum information’s
family tree. Proc. Roc. Soc. A, 465(2108):2537–2563, 2009. arXiv:quant-ph/0606225.
[2] R. Adamczak, A. Litvak, A. Pajor, and N. Tomczak-Jaegermann. Quantitative estimates of the convergence of the empirical
covariance matrix in log-concave ensembles. J. Amer. Math. Soc., Oct 2009. arXiv:0903.2323.
[3] R. Adamczak, A. E. Litvak, A. Pajor, and N. Tomczak-Jaegermann. Restricted isometry property of matrices with
independent columns and neighborly polytopes by random sampling, 2009. arXiv:0904.4723.
35

[4] R. Ahlswede and A. Winter. Strong converse for identification via quantum channels. IEEE Trans. Inf. Theory, 48(3):569–
579, 2002. arXiv:quant-ph/0012127.
[5] A. Anderson, R. C. Meyrs, and V. Periwal. Complex random surfaces. Phys. Lett. B, 254(1-2):89 – 93, 1991.
[6] A. Anderson, R. C. Myers, and V. Periwal. Branched polymers from a double-scaling limit of matrix models. Nuclear
Physics B, 360(2-3):463 – 479, 1991.
[7] R. Arratia, B. Bollobás, and G. Sorkin. The interlace polynomial of a graph. J. Comb. Th. B,, 92(2):199–233,, 2004.
[8] A. Ben-Aroya, O. Schwartz, and A. Ta-Shma. Quantum expanders: motivation and construction. In CCC, 2008.
arXiv:0709.0911 and arXiv:quant-ph/0702129.
[9] C. H. Bennett, P. Hayden, D. W. Leung, P. W. Shor, and A. J. Winter. Remote preparation of quantum states. ieeeit,
51(1):56–74, 2005. quant-ph/0307100.
[10] A. Bose and A. Sen. Another look at the moment method for large dimensional random matrices. Elec. J. of Prob.,
13(21):588–628, 2008.
[11] M. Christandl. The structure of bipartite quantum states: Insights from group theory and cryptography. PhD thesis,
University of Cambridge, 2006. arXiv:quant-ph/0604183.
[12] J. Feinberg and A. Zee. Renormalizing rectangles and other topics in random matrix theory. J. Stat. Phys., 87(3–4):473–504,
1997.
[13] P. Forrester. Log-gases and random matrices. unpublished manuscript. Chapter 2.
http://www.ms.unimelb.edu.au/∼matpjf/matpjf.html.
[14] M. B. Hastings. Entropy and entanglement in quantum ground states. Phys. Rev. B, 76:035114, 2007.
arXiv:cond-mat/0701055.
[15] M. B. Hastings. Random unitaries give quantum expanders. Phys. Rev. A, 76:032315, 2007. arXiv:0706.0556.
[16] M. B. Hastings. A counterexample to additivity of minimum output entropy. Nature Physics, 5, 2009. arXiv:0809.3972.
[17] P. Hayden, P. W. S. D. W. Leung, and A. J. Winter. Randomizing quantum states: Constructions and applications.
Comm. Math. Phys., 250:371–391, 2004. arXiv:quant-ph/0307104.
[18] P. Hayden, D. W. Leung, and A. Winter. Aspects of generic entanglement. Comm. Math. Phys., 265:95, 2006.
arXiv:quant-ph/0407049.
[19] P. Hayden and A. J. Winter. Counterexamples to the maximal p-norm multiplicativity conjecture for all p > 1. Comm.
Math. Phys., 284(1):263–280, 2008. arXiv:0807.4753.
[20] R. A. Horn and C. Johnson. Matrix Analysis. Cambridge University Press, 1985.
[21] I. M. Johnstone. On the distribution of the largest eigenvalue in principle components analysis. Annals of Statistics,
29(2):295–327, 2001.
[22] M. Ledoux. The Concentration of Measure Phenomenon. AMS, 2001.
[23] D. W. Leung and A. J. Winter. Locking 2-LOCC distillable common randomness and LOCC-accessible information. in
preparation.
[24] A. Montanaro. On the distinguishability of random quantum states. Comm. Math. Phys., 273(3):619–636, 2007.
arXiv:quant-ph/0607011v2.
[25] R. C. Myers and V. Periwal. From polymers to quantum gravity: Triple-scaling in rectangular random matrix models.
Nuclear Physics B, 390(3):716 – 746, 1993.
[26] A. Nica and R. Speicher. Lectures on the Combinatorics of Free Probability. Cambridge University Press, 2006.
[27] M. Rudelson. Random vectors in the isotropic position. J. Func. Anal., 164(1):60–72, 1999.
[28] G. Smith and J. Smolin. Extensive nonadditivity of privacy. arXiv:0904.4050, 2009.
[29] R. Speicher. Free probability theory and non-crossing partitions. Lothar. Comb, B39c, 1997.
[30] R. P. Stanley. Enumerative Combinatorics, vol. 2. Cambridge University Press, 1999. Exercise 6.36 and references therein.
[31] R. A. Sulanke. The Narayana distribution. J. of Stat. Planning and Inference, 101(1–2):311–326, 2002.
[32] J. Verbaarschot. Spectrum of the QCD Dirac operator and chiral random matrix theory. Phys. Rev. Lett., 72(16):2531–
2533, Apr 1994.
[33] J. Verbaarschot. The spectrum of the Dirac operator near zero virtuality for Nc = 2 and chiral random matrix theory.
Nuclear Physics B, 426(3):559 – 574, 1994.
[34] J. Yard and I. Devetak. Optimal quantum source coding with quantum information at the encoder and decoder, 2007.
arXiv:0706.2907.

MR 20 Nissan PDF
88% (8)
MR 20 Nissan PDF
1,639 pages
Functional Analysis With Applications
100% (1)
Functional Analysis With Applications
338 pages
Lecture Notes For Statistical Mechanics Fall 2010
100% (1)
Lecture Notes For Statistical Mechanics Fall 2010
55 pages
Universality For Real Symmetric Matrix Models
No ratings yet
Universality For Real Symmetric Matrix Models
651 pages
Theoretical Statistical Physics: Prof. Dr. Christof Wetterich Institute For Theoretical Physics Heidelberg University
No ratings yet
Theoretical Statistical Physics: Prof. Dr. Christof Wetterich Institute For Theoretical Physics Heidelberg University
175 pages
250+ TOP MCQs On Geotechnical Engineering and Answers
100% (4)
250+ TOP MCQs On Geotechnical Engineering and Answers
4 pages
Lecture Notes in Statistical Mechanics and Mesoscopics: Doron Cohen
No ratings yet
Lecture Notes in Statistical Mechanics and Mesoscopics: Doron Cohen
158 pages
RM Notes Speicher v2
No ratings yet
RM Notes Speicher v2
141 pages
SK Book Bonus
No ratings yet
SK Book Bonus
188 pages
Random Matrices
No ratings yet
Random Matrices
196 pages
Possible Generalization of Boltzmann-Gibbs Statist
No ratings yet
Possible Generalization of Boltzmann-Gibbs Statist
10 pages
Free Probability Notes
No ratings yet
Free Probability Notes
140 pages
How Often Is The Coordinate of A Harmonic
No ratings yet
How Often Is The Coordinate of A Harmonic
15 pages
Shannon Theory On General Probabilistic Theory
No ratings yet
Shannon Theory On General Probabilistic Theory
18 pages
247A Notestodd-Kemp
No ratings yet
247A Notestodd-Kemp
109 pages
Testing Identity of Collections of Quantum States - Sample Complexity Analysis
No ratings yet
Testing Identity of Collections of Quantum States - Sample Complexity Analysis
29 pages
Optimal Rigidity and Maximum of The Characteristic Polynomial of Wigner Matrices
No ratings yet
Optimal Rigidity and Maximum of The Characteristic Polynomial of Wigner Matrices
62 pages
Dyson 1962
No ratings yet
Dyson 1962
9 pages
Algebraic Approach To Quantum Theory
No ratings yet
Algebraic Approach To Quantum Theory
44 pages
Multispin
No ratings yet
Multispin
21 pages
Tsallis1988 Article PossibleGeneralizationOfBoltzm PDF
No ratings yet
Tsallis1988 Article PossibleGeneralizationOfBoltzm PDF
9 pages
Generalized Maxwell-Boltzmann, Bose-Einstein, Fermi-Dirac and Acharya-Swamy Statistics and The P Olya Urn Model
No ratings yet
Generalized Maxwell-Boltzmann, Bose-Einstein, Fermi-Dirac and Acharya-Swamy Statistics and The P Olya Urn Model
8 pages
Random Matrix Theories and Chaotic Dynamics: Oriol Bohigas
No ratings yet
Random Matrix Theories and Chaotic Dynamics: Oriol Bohigas
116 pages
Tao RMT Survey
No ratings yet
Tao RMT Survey
54 pages
Tsallis
No ratings yet
Tsallis
9 pages
Statistical Mechanics Lectures
No ratings yet
Statistical Mechanics Lectures
148 pages
Ds 11
No ratings yet
Ds 11
15 pages
002 Lecture Statistical Theory
No ratings yet
002 Lecture Statistical Theory
7 pages
Jarzynski 2018 - Berry Conj and Information Theory
No ratings yet
Jarzynski 2018 - Berry Conj and Information Theory
8 pages
Matrix Product States, Random Matrix Theory and The Principle of Maximum Entropy
No ratings yet
Matrix Product States, Random Matrix Theory and The Principle of Maximum Entropy
15 pages
Received 13 December 1980
No ratings yet
Received 13 December 1980
9 pages
Random Matrix Theory: Manjunath Krishnapur Indian Institute of Science, Bangalore
No ratings yet
Random Matrix Theory: Manjunath Krishnapur Indian Institute of Science, Bangalore
77 pages
The Functional Analysis of Quantum Information Theory: Ved Prakash Gupta, Prabha Mandayam and V. S. Sunder
No ratings yet
The Functional Analysis of Quantum Information Theory: Ved Prakash Gupta, Prabha Mandayam and V. S. Sunder
123 pages
10 1103@PhysRevLett 123 254101
No ratings yet
10 1103@PhysRevLett 123 254101
6 pages
Gap Probabilities and Densities of Extreme Eigenvalues of Random Matrices: Exact Results
No ratings yet
Gap Probabilities and Densities of Extreme Eigenvalues of Random Matrices: Exact Results
47 pages
Four Lectures On Statistical Physics of Learning
No ratings yet
Four Lectures On Statistical Physics of Learning
74 pages
MIT6 441S16 Midterm
No ratings yet
MIT6 441S16 Midterm
5 pages
Microcanonical Ensemble Unit 8
No ratings yet
Microcanonical Ensemble Unit 8
12 pages
Mathematical Introduction To Quantum Information Processing: Michael M. Wolf Mar 2023
No ratings yet
Mathematical Introduction To Quantum Information Processing: Michael M. Wolf Mar 2023
150 pages
Mixed States and Pure States: (Dated: April 9, 2009)
No ratings yet
Mixed States and Pure States: (Dated: April 9, 2009)
14 pages
Schrodinger 1936
No ratings yet
Schrodinger 1936
8 pages
Week 3 PDF
No ratings yet
Week 3 PDF
20 pages
Probabilistic and Statistical Aspects of Quantum Theory 1st Edition. Edition Alexander S. Holevo Download
No ratings yet
Probabilistic and Statistical Aspects of Quantum Theory 1st Edition. Edition Alexander S. Holevo Download
151 pages
A Counterexample To Additivity of Minimum Output Entropy: PACS Numbers
No ratings yet
A Counterexample To Additivity of Minimum Output Entropy: PACS Numbers
4 pages
Models of Theoretical Physics Notes
No ratings yet
Models of Theoretical Physics Notes
116 pages
Maximum Entropy Method: Sampling Bias: Jorge - Cossio@cigb - Edu.cu
No ratings yet
Maximum Entropy Method: Sampling Bias: Jorge - Cossio@cigb - Edu.cu
10 pages
Eigenvalue and Gaussian Data Pca Replica Notes
No ratings yet
Eigenvalue and Gaussian Data Pca Replica Notes
14 pages
Oct-2023 Tybsc Cs 363 Web Technologies II
No ratings yet
Oct-2023 Tybsc Cs 363 Web Technologies II
2 pages
Statistical Physics Lecture Notes
No ratings yet
Statistical Physics Lecture Notes
175 pages
Lectures On Random Matrix Theory
No ratings yet
Lectures On Random Matrix Theory
131 pages
Quantitative Edge Eigenvector Universality For Random Regular Graphs: Berry-Esseen Bounds With Explicit Constants
No ratings yet
Quantitative Edge Eigenvector Universality For Random Regular Graphs: Berry-Esseen Bounds With Explicit Constants
38 pages
Formulation of Quantum Statistics
No ratings yet
Formulation of Quantum Statistics
46 pages
Sheet 01 Solutions
No ratings yet
Sheet 01 Solutions
9 pages
C-E Pfister 2005 Nonlinearity 18 237
No ratings yet
C-E Pfister 2005 Nonlinearity 18 237
26 pages
Computational Statistical Physics Notes
No ratings yet
Computational Statistical Physics Notes
61 pages
QM HW 3
No ratings yet
QM HW 3
3 pages
Random Matrix Approximation of Spectra of Integral Operators
No ratings yet
Random Matrix Approximation of Spectra of Integral Operators
55 pages
Shallow Quantum Circuits IGL Paper
No ratings yet
Shallow Quantum Circuits IGL Paper
7 pages
RCC Structure by PANDI MANI
No ratings yet
RCC Structure by PANDI MANI
13 pages
Sports Acoustics
No ratings yet
Sports Acoustics
43 pages
B Math.0000043321.00896.86
No ratings yet
B Math.0000043321.00896.86
10 pages
CSEC Technical Drawing June 2010 P032
No ratings yet
CSEC Technical Drawing June 2010 P032
6 pages
Free Probability and Operator Algebras
100% (1)
Free Probability and Operator Algebras
144 pages
Transformer Test Report
No ratings yet
Transformer Test Report
17 pages
Info Theory Homework Solutions
No ratings yet
Info Theory Homework Solutions
9 pages
Convection Heat Transfer
No ratings yet
Convection Heat Transfer
60 pages
(Lecture Notes in Mathematics 1730) Siegfried Graf, Harald Luschgy (Auth.) - Foundations of Quantization For Probability Distributions (2000, Springer-Verlag Berlin Heidelberg) PDF
No ratings yet
(Lecture Notes in Mathematics 1730) Siegfried Graf, Harald Luschgy (Auth.) - Foundations of Quantization For Probability Distributions (2000, Springer-Verlag Berlin Heidelberg) PDF
237 pages
S220 Loader Service Guide
No ratings yet
S220 Loader Service Guide
29 pages
Tyco Drenchers - TFP807 - 07 - 2014
100% (1)
Tyco Drenchers - TFP807 - 07 - 2014
14 pages
Namma Kalvi 12th Computer Applications Practical Manual em
No ratings yet
Namma Kalvi 12th Computer Applications Practical Manual em
33 pages
Danyal Education: Tanjong Katong Girls' I
No ratings yet
Danyal Education: Tanjong Katong Girls' I
20 pages
New Pattern Input Output Exam Cart
No ratings yet
New Pattern Input Output Exam Cart
55 pages
Chapter 3 Methods of Lead Optimization
No ratings yet
Chapter 3 Methods of Lead Optimization
23 pages
Missel Product List GB 2017 02 Fire Protection PDF
No ratings yet
Missel Product List GB 2017 02 Fire Protection PDF
36 pages
Hack RQD
No ratings yet
Hack RQD
82 pages
Carbon Black Surface Area Analysis
No ratings yet
Carbon Black Surface Area Analysis
39 pages
Strong Swan Documentation (Updated Till Eap-Md5)
No ratings yet
Strong Swan Documentation (Updated Till Eap-Md5)
58 pages
Basic Hydrology Report
No ratings yet
Basic Hydrology Report
6 pages
Diborane: Properties and Applications
No ratings yet
Diborane: Properties and Applications
36 pages
Data Mining
No ratings yet
Data Mining
32 pages
BIOS Instructor Setup Rev 6 65
No ratings yet
BIOS Instructor Setup Rev 6 65
24 pages
Discussion - A Technical Note - Derivation of The LRFD Column Design Equations
No ratings yet
Discussion - A Technical Note - Derivation of The LRFD Column Design Equations
2 pages
DT-10 Owner's Manual: Turning On The Power
No ratings yet
DT-10 Owner's Manual: Turning On The Power
3 pages
Exact Solutions of The Sextic Oscillator From The Bi-Confluent Heun Equation
No ratings yet
Exact Solutions of The Sextic Oscillator From The Bi-Confluent Heun Equation
17 pages
Big Data Computing: Week 8 Quiz
No ratings yet
Big Data Computing: Week 8 Quiz
3 pages
3-in-1 Transducer Install Guide
No ratings yet
3-in-1 Transducer Install Guide
2 pages
Gauss Legendre Quadrature Method
No ratings yet
Gauss Legendre Quadrature Method
7 pages
Mysql Assignment 1
No ratings yet
Mysql Assignment 1
2 pages
C Programming For Dummies 2nd Edition Dan Gookin PDF Download
100% (1)
C Programming For Dummies 2nd Edition Dan Gookin PDF Download
92 pages