Game of Surface Codes
Game of Surface Codes
Given a quantum gate circuit, how does one form in a surface-code architecture.
execute it in a fault-tolerant architecture with There exist several encoding schemes for surface
as little overhead as possible? In this pa- codes, among others, defect-based [7], twist-based [8]
per, we discuss strategies for surface-code quan- and patch-based [9] encodings. In this work, we focus
tum computing on small, intermediate and large on the latter. Surface-code patches have a low space
scales. They are strategies for space-time trade- overhead compared to other schemes, and offer low-
arXiv:1808.02892v3 [quant-ph] 3 Feb 2019
offs, going from slow computations using few overhead Clifford gates [10, 11]. In addition, they are
qubits to fast computations using many qubits. conceptually less difficult to understand, as they do not
Our schemes are based on surface-code patches, directly involve braiding of topological defects. Design-
which not only feature a low space cost com- ing computational schemes with surface-code patches
pared to other surface-code schemes, but are only requires the concepts of qubits and measurements.
also conceptually simple – simple enough that To this end, we describe the operations of surface-code
they can be described as a tile-based game with patches as a tile-based game. This is helpful to design
a small set of rules. Therefore, no knowledge of protocols and determine their space-time cost. The ex-
quantum error correction is necessary to under- act correspondence between this game and surface-code
stand the schemes in this paper, but only the patches is specified in Appendix A, but it is not crucial
concepts of qubits and measurements. for understanding this paper. Readers who are inter-
The field of quantum computing is fuelled by the ested in the detailed surface-code operations may read
promise of fast solutions to classically intractable prob- Appendix A in parallel to the following section.
lems, such as simulating large quantum systems or fac- Surface codes as a game. The game is played on
toring large numbers. Already ∼100 qubits can be used a board partitioned into a number of tiles. An example
to solve useful problems that are out of reach for clas- of a 5 × 2 grid of tiles is shown in Fig. 1. The tiles
sical computers [1, 2]. Despite the exponential speed- can be used to host patches, which are representations
up, the actual time required to solve these problems of qubits. We denote the Pauli operators of each qubit
is orders of magnitude above the coherence times of as X, Y and Z. Patches have dashed and solid edges
any physical qubit. In order to store and manipulate representing Pauli operators. We consider two types of
quantum information on large time scales, it is neces- patches: one-qubit and two-qubit patches. One-qubit
sary to actively correct errors by combining many phys- patches represent one qubit and consist of two dashed
ical qubits into logical qubits using a quantum error- and two solid edges. Each of the two dashed (solid)
correcting code [3–5]. Of particular interest are codes edges represent the qubit’s X (Z) operator. While the
that are compatible with the locality constraints of real- square patch in Fig. 1a only occupies one tile, a one-
istic devices such as superconducting qubits, which are qubit patch can also be shaped to, e.g., occupy three
limited to operations that are local in two dimensions. tiles (b). A two-qubit patch (c) consists of six edges and
The most prominent such code is the surface code [6, 7]. represents two qubits. The first qubit’s Pauli operators
X1 and Z1 are represented by the two top edges, while
Working with logical qubits introduces additional
overhead to the computation. Not only is the space cost
drastically increased as physical qubits are replaced by
logical qubits, but also the time cost increases due to
(a) (c)
the restricted set of accessible logical operations. Sur-
face codes, in particular, are limited to a set of 2D-
local operations, which means that arbitrary gates in a
quantum circuit may require several time steps instead (b)
of just one. To keep the cost of surface-code quan-
tum computing low, it is important to find schemes
that translate quantum circuits into surface-code lay-
outs with a low space-time overhead. This is also nec- Figure 1: Examples of one-qubit (a/b) and two-qubit (c)
essary to benchmark how well quantum algorithms per- patches in a 5 × 2 grid of tiles.
Example:
100 qubits Sec. 4: Sec. 5: Sec. 6:
Trade-offs limited by T count Trade-offs limited by T depth Trade-offs beyond Clifford+T
108 T gates
p = 10−4 55,000 qubits 120,000 qubits 1500 × 220,000 = 330m qubits ···
d = 13 4 hours 22 minutes 1 second ···
∼100 qubits p = 10−3 310,000 qubits 1,000,000 qubits 3000 × 1,500,000 ≈ 4.5b qubits ···
(Appendix C) d = 27 7 hours 45 minutes 1 second ···
Figure 3: Overview of the content of this paper. To illustrate the space-time trade-offs discussed in this work, we show the number
of physical qubits and the computational time required for a circuit of 108 T gates distributed over 106 T layers. We consider
physical error rates of p = 10−4 and p = 10−3 , for which we need code distances d = 13 and d = 27, respectively. We assume
that each code cycle takes 1 µs.
cols based on error-correcting codes with transversal T of one T layer per qubit measurement time, effectively
gates, such as punctured Reed-Muller codes [16, 17] and implementing Fowler’s time-optimal scheme [21]. If the
block codes [18–20]. In comparison to braiding-based 108 T gates are distributed over 106 layers, and mea-
implementations of distillation protocols, we reduce the surements (and classical processing) can be performed
space-time cost by up to 90%. in 1 µs, up to 1500 units of 220,000 qubits can be run in
A data block combined with a distillation block con- parallel, where each unit is responsible for the execution
stitutes a quantum computer in which T gates are per- of one T layer. This way, the computational time can
formed one after the other. At this stage, the quan- be brought down to 1 second using 330 million qubits.
tum computer can be sped up by increasing the num- While this is a large number, the units do not necessar-
ber of distillation blocks, effectively decreasing the time ily need to be part of the same quantum computer, but
it takes to distill a single magic state, as we discuss can be distributed over up to 1500 quantum computers
in Sec. 4. In order to illustrate the resulting space- with 220,000 qubits each, and with the ability to share
time trade-off, we consider the example of a 100-qubit Bell pairs between neighboring computers.
computation with 108 T gates, which can already be In Sec. 6, we discuss further space-time trade-offs that
used to solve classically intractable problems [2]. As- are beyond the parallelization of Clifford+T circuits. In
suming an error rate of p = 10−4 and a code-cycle time particular, we discuss the use of Clifford+ϕ circuits, i.e.,
of 1 µs, a compact data block together with a distillation circuits containing arbitrary-angle rotations beyond T
block can finish the computation in 4 hours using 55,000 gates. These require the use of additional resources,
physical qubits.1 Adding 10 more distillation blocks in- but can speed up the computation. We also discuss the
creases the qubit count to 120,000 and decreases the possibility of hardware-based trade-offs by using higher
computational time to 22 minutes, using 1 per T gate. code distances, but in turn shorter measurements with
For further space-time trade-offs in Sec. 5, we exploit a decreased measurement fidelity. Ultimately, the speed
that the T gates of a circuit are arranged in layers of of a quantum computer is limited by classical process-
gates that can be executed simultaneously. This en- ing, which can only be improved upon by faster classical
ables linear space-time trade-offs down to the execution computing.
Finally, we note that while the number of qubits re-
1 We will assume that the total number of physical qubits is
quired for useful quantum computing is orders of mag-
twice the number of physical data qubits. This is consistent with
superconducting qubit platforms, where the use of measurement
nitude above what is currently available, a proof-of-
ancillas doubles the qubit count. If a platform does not require principle two-qubit device demonstrating all necessary
the use of ancilla qubits, the total qubit count is reduced by 50% operations using undistilled magic states can be built
compared to the numbers reported in this paper. with 48 physical data qubits, see Appendix C.
if P P 0 = P 0 P : (a) if P P 0 = P 0 P : (c)
(c)
if P P 0 = −P 0 P : if P P 0 = −P 0 P :
if P1 P 0 = −P 0 P1 : if P2 P 0 = −P 0 P2 : (b)
Figure 4: A generic circuit consists of π/4 rotations (orange), π/8 rotations (green) and measurements (blue). The Pauli product
in each box specifies the axis of rotation or the basis of measurement. If the Pauli operator is −P instead of P , a minus sign
is found in the corner of the box, such that, e.g., Z−π/4 corresponds to an S † gate. Using the commutation rules in (a/b), all
Clifford gates can be moved to the end of the circuit. Using (c), the Clifford gates can be absorbed by the final measurements.
1 Clifford+T quantum circuits commuted to the end of the circuit, the Zπ/8 rotations
become Pauli product rotations. The rules for moving
Pπ/4 rotations past Pϕ0 gates are shown in Fig. 4a: If P
Our goal is to implement full quantum algorithms with
and P 0 commute, Pπ/4 can simply be moved past Pϕ0 .
surface codes. The input to our problem is the al-
If they anticommute, Pϕ0 turns into (iP P 0 )ϕ when Pπ/4
gorithm’s quantum circuit. The universal gate set
is moved to the right. Since C(P1 , P2 ) gates consist
Clifford+T is well-suited for surface codes, since it sepa-
of π/4 rotations, similar rules can be derived as shown
rates easy operations from difficult ones. Often, this set
is generated using the Hadamard gate H, phase gate S,
controlled-NOT (CNOT) gate, and the T gate. Instead, (a) Single-qubit rotations
we choose to write our circuits using Pauli product ro-
tations Pϕ (see Fig. 5), because it simplifies circuit ma-
nipulations. Here, Pϕ = exp(−iP ϕ), where P is a Pauli
product operator (such as Z, Y ⊗ X, or X ⊗ 1 ⊗ X) and
ϕ is an angle. In this sense, S = Zπ/4 , T = Zπ/8 ,
and H = Zπ/4 · Xπ/4 · Zπ/4 . The CNOT gate can
also be written in terms of Pauli product rotations as (b) CNOT (c) C(P1 , P2 ) gate
CNOT = (Z ⊗X)π/4 ·(1 ⊗X)−π/4 ·(Z ⊗ 1)−π/4 . In fact,
we can more generally define P1 -controlled-P2 gates as
C(P1 , P2 ) = (P1 ⊗ P2 )π/4 · (1 ⊗ P2 )−π/4 · (P1 ⊗ 1)−π/4 .
The CNOT gate is the specific case of C(Z, X).
Getting rid of Clifford gates. Clifford gates are
considered to be easy, because, by definition, they map
Pauli operators onto other Pauli operators [22]. This Figure 5: Clifford+T gates in terms of Pauli rotations.
can be used to simplify the input circuit. A generic cir- (a) Single-qubit Clifford gates are π/4 rotations, and the T
cuit is shown in Fig. 4, consisting of Clifford gates, Zπ/8 gate is a π/8 rotation. (b/c) P1 -controlled-P2 gates are Clif-
rotations and Z measurements. If all Clifford gates are ford gates, where C(Z, X) is the CNOT gate.
in Fig. 4b: If P 0 anticommutes with P1 , Pϕ0 turns into that, in the usual definition, only up to n T gates can
(P 0 P2 )ϕ after commutation. If P 0 anticommutes with be part of a layer, whereas in our case, there is no limit.
P2 , Pϕ0 turns into (P 0 P1 )ϕ . If P 0 anticommutes with When partitioning π/8 rotations into layers, the naive
both P1 and P2 , Pϕ0 turns into (P 0 P1 P2 )ϕ . approach often yields more layers than are necessary.
After moving the Clifford gates to the right, the re- For instance, a naive partitioning of the first 6 T gates
sulting circuit consists of three parts: a set of π/8 ro- of Fig. 6 yields 4 layers. A few commutations can bring
tations, a set of π/4 rotations, and Z measurements. the number down to 2 layers. There are a number of
Because Clifford gates map Pauli operators onto other algorithms for the optimization of the T depth [27–29].
Pauli operators, the Clifford gates can be absorbed by Here, we use the simple greedy algorithm shown below
the final measurements, turning Z measurements into to reduce the number of layers.
Pauli product measurements. The commutation rules Note that when a reordering puts two equal π/8 rota-
of this final step are shown in Fig. 4c and are similar to tions into the same layer, they can be combined into a
the commutation of Clifford gates past rotations. π/4 rotation that is commuted to the end of the circuit,
T count and T depth. Thus, every n-qubit circuit thereby decreasing the T count. As we discuss in Sec. 6,
can be written as a number of consecutive π/8 rotations this kind of algorithm can not only be used with π/8 ro-
and n final Pauli product measurements, as shown in tations, but, in principle, with arbitrary Pauli product
Fig. 6. We refer to the number of π/8 rotations as the rotations. The reduction of the circuit depth in terms
T count. An important part of circuit optimization is of non-π/8 rotations can be useful when going beyond
the minimization of the T count, for which there ex- Clifford+T circuits.
ist various approaches [23–26]. The π/8 rotations of
a circuit can be grouped into layers. All π/8 rotations 1.1 Pauli product measurements
that are part of a layer need to mutually commute. The
number of π/8 layers of a circuit is strictly speaking not When implementing circuits like Fig. 6 with surface
the same quantity as the T depth, but we will still refer codes, one obstacle is that π/8 rotations are not di-
to it as the T depth and to π/8 layers as T layers. Note rectly part of the set of available operations. Instead,
one uses magic states [16] as a resource. These states
are π/8-rotated Pauli eigenstates |mi = |0i + eiπ/4 |1i.
They can be consumed in order to perform Pπ/8 rota-
repeat
tions. The corresponding circuit [30] is shown in Fig. 7.
for each layer i do
for each rotation j in layer i + 1 do
if (rotation j commutes with all
rotations in layer i) then
Move rotation j from layer i + 1 to
layer i;
end
end
end
until the partitioning no longer changes;
Figure 7: Circuit to perform a π/8 rotation by consuming a
Algorithm to reduce the T count and T depth. magic state.
ancilla region
Figure 11: (a) Patches can be rotated in 3 to change whether the X or Z operator is adjacent to the compact block’s ancilla
region. (b) A Pπ/4 gate can be performed explicitly via a P ⊗ Y measurement with a |0i ancilla qubit. (c) Six-step protocol to
perform the rotation of Fig. 10 in a compact block. The magic state is consumed in 9, where steps 2-5 are the two π/4 rotations
in Fig. 10, steps 6 and 7 are patch rotations, and step 8 is the Pauli product measurement consuming the magic state.
Yπ/4 = Zπ4 Xπ/4 Z−π/4 . Rotations with an even number the lower row. Finally, in step 8, we measure the Pauli
of Y ’s require two π/4 rotations, while an odd num- product involving the magic state.
ber of Y ’s can be handled by one rotation. Only the This general procedure can be used for any π/8 ro-
left two π/4 rotations in Fig. 10 need to be performed tation. First, up to two π/4 rotations are performed in
explicitly. The right two rotations can be commuted 2. Next, patches in the upper and lower row are ro-
to the end of the circuit, changing the subsequent π/8 tated, which takes 3 per row. Finally, the Pauli prod-
rotations. Similarly to a π/8 rotation, a Pπ/4 rotation uct is measured in 1, requiring a total of 9. While
can be executed using a resource state |Y i = |0i + i |1i, this is very slow compared to Fig. 8, the compact block
as shown in Fig. 11b. However, even though this state is a valid choice for small quantum computers where the
is a Pauli eigenstate, it cannot be readily prepared in distillation of a magic state takes longer than 9.
our framework. Instead, we use a |0i state and Y mea-
surements, such that a Pπ/4 rotation is performed by
a P ⊗ Y measurement between the qubits and the |0i 2.2 Intermediate block
state. Afterwards, the |0i state is measured in X. If the One possibility to speed up compact blocks is to store
−P ⊗ Y and X measurements in Fig. 11b yield different all qubits in one row instead of two. This is the inter-
outcomes, a Pauli correction is necessary. mediate block shown in Fig. 13a, which uses 2n + 4 tiles
In Fig. 11, we go through the steps necessary to per- to store n qubits. By eliminating one row, all patch
form the (Y ⊗1⊗Y ⊗Z⊗Y ⊗Y )π/8 rotation of Fig. 10. In rotations can be done simultaneously. In addition, one
step 1, we start with a 12-tile data block storing 6 qubits can save 1 by moving all patches to the other side,
in the blue region. The orange region is not part of the thereby eliminating the need to move patches back to
data block, but is part of the adjacent distillation block, their row after the rotation. An example is shown in
i.e., it is the source of the magic states. In steps 2-5, Fig. 12. Suppose we have 5 qubits and need to pre-
we perform the two π/4 rotations that are necessary to pare them for a Z ⊗ X ⊗ Z ⊗ Z ⊗ X measurement. The
replace the Y operators with X’s, i.e., the first two π/4 first, third and fourth qubit are moved to the other side,
rotations in the circuit of Fig. 10. In step 6, we first which takes 1. Simultaneously, the second and fifth
rotate patches in the upper row, and then, in step 7, in qubit are rotated, which takes 2. Therefore, the total
number of time steps to consume a magic state is at Pauli operators are in the left two edges, and the second
most 5, where 2 are used for up to two π/4 rota- qubit’s operators are in the right two edges. Therefore,
tions, 2 for the patch rotations, and 1 for the Pauli the example in Fig. 13b is a fast block that stores 18
product measurement consuming the magic state. qubits.
Since all Pauli operators are accessible, the Pauli
2.3 Fast block product measurement protocol of Fig. 8 can be used
to consume a magic state every 1. n qubits occupy
The disadvantage of square patches is that only one ap square arrangement of tiles with √ a side length of
Pauli operator is adjacent to the data block’s ancilla n/2 + 1, i.e., a total of 2n + 8n + 1 tiles. Even
region, i.e., available for Pauli product measurements p
if n/2 is not integer, one should keep the block as
at any given time. Two-tile one-qubit patches as in square-shaped as possible by picking the closest integer
Fig. 8, on the other hand, allow for the measurement as a side length and shortening the last column. While
of any Pauli operator, but use two tiles for each qubit. the fast block uses more tiles compared to the compact
In order to have both compact storage and access to and intermediate blocks, it has a lower space-time cost,
all Pauli operators, we use two-qubit patches for our making it more favorable for large quantum comput-
fast blocks in Fig. 13b. These patches use two tiles to ers for which the distillation of a magic state takes less
represent two qubits (see Fig. 1), where the first qubit’s than 5.
Note that if undistilled magic states are sufficient,
(a) Intermediate block then any data block can already be used as a full quan-
tum computer. A proof-of-principle two-qubit device
in the spirit of Ref. [31] that constitutes a universal
two-qubit quantum computer with undistilled magic
ancilla region states and can demonstrate all the operations that are
used in our framework can be realized with six tiles,
(b) Fast block as shown in Appendix C. This proof-of-principle device
uses (3d − 1) · 2d physical data qubits, i.e., 48, 140, or
280 data qubits for distances d = 3, 5 or 7. If ancilla
qubits are used for stabilizer measurements, the number
of physical qubits roughly doubles, but it is still within
reach of near-term devices.
Summary. Data blocks store the data qubits of
the computation and consume magic states. Compact
blocks use 1.5n + 3 tiles for n qubits and require up to
9 to consume a magic state. Intermediate blocks use
2n + 4 tiles and √ take up to 5 per magic state. Fast
blocks use 2n + 8n + 1 tiles and take 1 per magic
state. Data blocks need to be combined with distillation
blocks for large-scale quantum computation.
ancilla region
3 Distillation blocks
In this section, we discuss designs of tile blocks that
Figure 13: (a) Intermediate blocks store n data qubits in 2.5n+ are used for magic state distillation. This is necessary,
4 tiles and√require up to 5 per magic state. (b) Fast blocks because with surface codes, the initialization of non-
use 2n + 8n + 1 tiles and require 1 per magic state. Pauli eigenstates is prone to errors, which means that
π/8 rotations performed using these states may lead The circuit begins with 5 qubits initialized in the |+i
to errors. In order to decrease the probability of such state and 10 qubits in the |0i state. Qubits 1-4, 5 and 6-
an error, magic state distillation [16] is used to con- 15 are associated with the four X stabilizers, the logical
vert many low-fidelity magic states into fewer higher- X operator, and the ten Z stabilizers of the code. The
fidelity states. This requires only Clifford gates (i.e., first five operations are multi-target CNOTs that corre-
Pauli product measurements), so, in principle, any of spond to the code’s encoding circuit. They map the X
the data blocks discussed in the previous section can Pauli operators of qubits 1-4 onto the code’s X stabiliz-
be used for this purpose. However, magic state distilla- ers, the X Pauli of qubit 5 onto the logical X operator
tion is repeated extremely often for large-scale quantum and the Z operators of qubits 6-15 onto the code’s Z
computation, so it is worth optimizing these protocols. stabilizers. Because we start out with +1-eigenstates of
Here, we discuss a general procedure that can be X and Z, this circuit prepares the simultaneous stabi-
applied to any distillation protocol based on an error- lizer eigenstate corresponding to the logical |+iL state.
correcting code with transversal T gates, such as punc- Next, a transversal T gate is applied, transforming the
tured Reed-Muller codes [16, 17] or block codes [18–20]. logical state to TL |+iL (actually to TL† |+iL ). Note that
To show the general structure of such a protocol, we go the 15 Zπ/8 rotations are potentially faulty. Finally, the
through the example of 15-to-1 distillation [16], i.e., a encoding circuit is reverted, shifting the logical qubit in-
protocol that uses 15 faulty magic states to distill a formation back into qubit 5, and the information about
single higher-fidelity state. the X and Z stabilizers into qubits 1-4 and 6-15. If
no errors occurred, qubit 5 is now a magic state T |+i
(actually T † |+i). In order to detect whether any of the
3.1 15-to-1 distillation 15 π/8 rotations were affected by an error, qubits 1-4
and 6-15 are measured in the X and Z basis, respec-
The 15-to-1 protocol is based on a quantum error-
tively, effectively measuring the stabilizers of the code.
correcting code that uses 15 qubits to encode a single
Since the code distance is 3, up to two errors can be
logical qubit with code distance 3. The reason why this
detected, which will yield a -1 measurement outcome
can be used for magic state distillation is that, for this
on some stabilizers. If any error is detected, all qubits
code, a physical T gate on every physical qubit corre-
are discarded and the distillation protocol is restarted.
sponds to a logical T gate (actually T † ) on the encoded
This way, if the error probability of each of the 15 T
qubit, which is called a transversal T gate. The general
gates is p, the error probability of the output state is
structure of a distillation circuit based on a code with
reduced to 35p3 to leading order. In other words, this
transversal T gates is shown in Fig. 14 for the example
protocol takes 15 magic states with error probability p,
of 15-to-1. It consists of four parts: an encoding circuit,
and outputs a single magic state with an error of 35p3 .
transversal T gates, decoding and measurement.
Simplifying the circuit. Using the commutation the first X stabilizer of this 15-qubit code is 1 ⊗ 1 ⊗ 1 ⊗
rules of Fig. 4b, we can commute the first set of multi- X ⊗ 1 ⊗ 1 ⊗ 1 ⊗ 1 ⊗ X ⊗ X ⊗ X ⊗ X ⊗ X ⊗ X ⊗ X. The
target CNOTs to the right. This maps the Zπ/8 rota- rows below the horizontal bar – in this case the last
tions onto Z-product π/8 rotations. Since controlled- row – show the logical X operators of the code. The
Pauli gates satisfy C(P1 , P2 ) = C(P1 , P2 )† , the multi- circuit in Fig. 15 is then obtained by placing a |+i state
target CNOTs of the encoding circuit precisely cancel for each row and a π/8 rotation for each column, with
the multi-target CNOTs of the decoding circuit, leaving the axis of rotation determined by the indices in the
a circuit of 15 Z-type π/8 rotations in Fig. 14. column – a 1 for each 0 and a Z for each 1. Note that,
Note that qubits 6-15 in this circuit are entirely re- in Fig. 15, the first four rotations (columns) of Eq. (1)
dundant. They are initialized in a Z eigenstate, are then are absorbed by the initial states.
part of a Z-type rotation, and are finally measured in
the Z basis, trivially yielding the outcome +1. Since 3.2 Triorthogonal codes
they serve no purpose, they can simply be removed to
yield the five-qubit circuit in Fig. 15, where we have The aforementioned circuit translation can be applied
absorbed the single-qubit π/8 rotations into the initial to any code with transversal T gates. One particu-
|+i states and rearranged the remaining 11 rotations. larly versatile and simple scheme to generate such codes
This kind of circuit simplification is equivalent to the is based on triorthogonal matrices [17, 18], which we
space-time trade-offs mentioned in Ref. [17] and can be briefly review in this section. The first step is to write
applied to any protocol that is based on a code with down a triorthogonal matrix G, such as
transversal T gates. In general, a code with mx X sta-
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
bilizers that uses n qubits to encode k logical qubits 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
yields a circuit of n−mx π/8 rotations on mx +k qubits.
G= 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 . (2)
Each of the mx + k qubits are either associated with an 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
X stabilizer or one of the k logical qubits. For each of
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
the n qubits of the code, the circuit contains one π/8
rotation with an axis that has a Z on each stabilizer or Triorthogonality refers to three criteria: i) The number
logical X operator that this qubit is part of. In order to of 1s in each row is a multiple of 8. ii) For each pair
more easily determine the n − mx rotations, it is useful of rows, the number of entries where both rows have
to write down an n × (mx + k) matrix that shows the a 1 is a multiple of 4. iii) For each set of three rows,
X stabilizers and logical X operators of the code. For the number of entries where all three rows have a 1 is a
15-to-1, such a matrix could look like this: multiple of 2. In other words,
X
∀a : Ga,i = 0 (mod 8)
0 0 0 1 0 0 0 0 1 1 1 1 1 1 1 i
X
0 0 1 0 0 1 1 1 0 0 0 1 1 1 1 ∀a, b : Ga,i Gb,i = 0 (mod 4) (3)
i
M15-to-1 0
= 1 0 0 1 0 1 1 0 1 1 0 0 1 1
(1) X
1 0 0 0 1 1 0 1 1 0 1 0 1 0 1 ∀a, b, c : Ga,i Gb,i Gc,i = 0 (mod 2)
i
0 0 0 0 1 1 1 0 1 1 0 1 0 0 1
A general procedure based on classical Reed-Muller
Each of the first four rows describes one of the four codes to obtain such matrices is described in Ref. [17].
X stabilizers of the code, where 0 stands for 1 and 1 After obtaining a triorthogonal matrix, such as the
stands for X. For instance, the first row indicates that one in Eq. (2), the second step is to put it in a row
echelon form by Gaussian elimination distance lower than 2, precluding them from detecting
errors and improving the quality of magic states. In
0000100001111 1 1 1
0 0 0 1 0 0 1 1 1 0 0 0 1 fact, the minimum number of qubits in triorthogonal
1 1 1
codes was shown to be 14 [33].
0 0 1 0 0 1 0 1 1 0 1 1 0
G̃ = 0 1 1. (4)
Semi-triorthogonal codes. There are also codes
0 1 0 0 0 1 1 0 1 1 0 1 0 1 0 1
1000011101101 0 0 1 that are based on “semi-triorthogonal” matrices, where
all three conditions of Eq. (3) are only satisfied mod-
The last step is to remove one of the columns that con- ulo 2. One example is the matrix
tains a single 1, i.e., one of the first five columns, which
is also called puncturing.2 Puncturing an a × b tri-
orthogonal matrix k times yields a code encoding k log-
0 0 0 0 0 0 1 1 0 0 1 1 0 1 1 1 0 1 1 0 1 1 0 1
ical qubits with mx = b − k and n = a − k. The rows of 0 0 0 0 0 1 0 1 0 1 0 1 1 10 1 1 0 1 1 0 1 1 0
the matrix after puncturing that contain an even num-
0
0 0 0 1 0 0 1 1 0 0 1 1 01 0 1 1 0 1 1 0 1 1
ber of 1s describe X stabilizers, whereas the rows with
0
0 0 1 0 0 0 0 1 1 1 1 0 10
. 0 0 0 0 0 0 0 1 1
an odd number of 1s describe X logical operators. In
0 0 1 0 0 0 0 0 1 1 1 1 0 00 0 0 0 0 1 1 1 0 0
terms of distillation protocols, a code described by such
0
1 0 0 0 0 0 0 1 1 1 1 0 00 0 1 1 1 0 0 0 0 0
a matrix can be used for n-to-k distillation. Indeed, if
1 0 0 0 0 0 0 0 1 1 1 1 1 0 1 1 0 0 0 0 0 0 0 0
we puncture the matrix in Eq. (4) once by removing the
(6)
first column, we retrieve the 15-to-1 protocol of Eq. (1).
When this matrix is punctured four times, it yields a
We can also puncture it twice by removing the first two
code that can be used for a 20-to-4 protocol. A scheme
columns. This yields the matrix
to generate such matrices for 3k+8-to-k distillation is
00100001111111 shown in Ref. [18]. For the case of the 20-to-4 protocol,
0 1 0 0 1 1 1 0 0 0 1 1 1 1
the matrix that describes the code
1 0 0 1 0 1 1 0 1 1 0 0 1 1 ,
M14-to-2 = (5)
0 0 0 1 1 0 1 1 0 1 0 1 0 1 0 0 1 1 0 0 1 1 0
1 1 1 0 1 1 0 1 1 0 1
00011101101001 0 1 0 1 0 1 0 1 1
1 0 1 1 0 1 1 0 1 1 0
1 0 0 1 1 0 0 1 1
0 1 0 1 1 0 1 1 0 1 1
which describes a 14-to-2 protocol. The corresponding
M = 0 0 0 0 1 1 1 1 0
1,0 0 0 0 0 0 0 0 1 1
circuit can be simply read off from this matrix. It is 20-to-4 0 0 0 0 1 1 1 1 0
0 0 0 0 0 0 1 1 1 0 0
almost identical to the 15-to-1 protocol of Fig. 15, ex-
0
0 0 0 1 1 1 1 0
0 0 0 1 1 1 0 0 0 0 0
cept that the fourth qubit is initialized in the |+i state
0 0 0 0 1 1 1 1 1
0 1 1 0 0 0 0 0 0 0 0
and is not measured at the end of the circuit, but in-
(7)
stead outputs a second magic state. However, because
can be straightforwardly translated into the circuit in
the code of 14-to-2 has a code distance of 2, the output
Fig. 16. While semi-triorthogonal codes can be used
error probability is higher, namely 7p2 [18]. Punctur-
the same way for distillation as properly triorthogo-
ing the matrix G̃ any further would yield codes with a
nal codes, their caveat is that a Clifford correction
2 Even though this is commonly called puncturing, it would be may be required. This correction can be obtained by
perhaps more accurate to refer to this process as shortening (see, adding columns to the semi-triorthogonal matrix until
e.g., Ref. [32]), as was pointed out to me by a referee. it becomes properly triorthogonal, e.g., by adding the
Figure 17: Implementation of the 15-to-1 and 20-to-4 distillation protocols in our framework. Each time step in (c) and (d)
corresponds to an auto-corrected π/8 rotation (b), which in turn is based on selective π/4 rotations (a).
columns of the matrix face codes. Distillation protocols are particularly sim-
ple quantum circuits, since they exclusively consist of
0 0 0 0 1 1 1 1 Z-type π/8 rotations. Therefore, we can use a con-
0 0 1 1 0 0 1 1
struction similar to the compact data block, and still
1 1 0 0 0 0 1 1
only require 1 per rotation.
M
Clifford correction 0
= 0 0 0 0 0 0 0 (8)
0 0 0 0 0 0 0 0 Because distillation circuits are relatively short, it is
0 0 0 0 0 0 0 0 useful to avoid the Clifford corrections of Fig. 7 that
0 0 0 0 0 0 0 0 may be required with 50% probability after a magic
state is consumed. These corrections slow down the pro-
to the matrix of Eq. (7). Since the additional columns tocol, because they change the final X measurements to
come in pairs, this Clifford correction always consists of Pauli product measurements. Instead, we use a circuit
Z-type π/4 rotations [18]. which consumes a magic state and automatically per-
In this case, the correction consists of four π/4 rota- forms the Clifford correction. It is based on the selective
tions on the first three qubits, effectively changing the π/4 rotation circuit in Fig. 17a. To perform a Pπ/4 ro-
first (Z ⊗ Z ⊗ Z)π/8 rotation to a (Z ⊗ Z ⊗ Z)−π/8 rota- tation according to the circuit in Fig. 11b, a |0i state
tion, and the initial magic states to |mi = |0i+e−iπ/4 |1i is initialized and P ⊗ Y is measured, which takes 1.
states. The probability of any of the four output states However, the π/4 rotation is only performed if the |0i
being affected by an error is 22p2 . When treating this qubit is measured in X afterwards. If, instead, it is
output error rate as 5.5p2 per magic state, one should measured in Z, the qubit is simply discarded without
take into account that, for multiple output states, er- performing any operation. In other words, the choice
rors can be correlated. Note that 3k+8-to-k protocols of measurement basis determines whether a Pπ/4 or a 1
can be modified to 3k+4-to-k [33–35]. operation is performed. This can be used to construct
the circuit in Fig. 17b. Here, the first step to perform a
Pπ/8 gate is to measure P ⊗ Z between the qubits and a
3.3 Surface-code implementation
magic state |mi, and Z ⊗ Y between |mi and |0i. These
Having outlined the general structure of distillation pro- two measurements commute and can be performed si-
tocols, we now discuss their implementation with sur- multaneously. If the outcome of the first measurement
ancilla
ancilla
ancilla
ancilla
ancilla
ancilla
ancilla
ancilla
ancilla
ancilla
ancilla
ancilla
Figure 19: 176-tile block that can be used for 225-to-1 distillation. The qubits highlighted in red are used for the second level of
the distillation protocol. The blue ancilla is used to move level-1 magic states into the two |mi-|0i blocks of the level-2 distillation.
3.5 Higher-fidelity protocols just 11. Therefore, the entire protocol finishes in 15
using 176 tiles with a total space-time cost of 2640d3 .
So far, we have only explicitly discussed protocols that It should be noted that, since lower-level distillation
reduce the input error to ∼p2 or ∼p3 . There are two blocks produce magic states with low fidelity, there is no
strategies to obtain protocols with a higher output fi- benefit in using the full code distance to produce these
delity: concatenation and higher-distance codes. states. The space-time cost of concatenated protocols
Concatenation. In the 15-to-1 protocol, we use 15 can be reduced significantly by running the lower-level
undistilled magic states to obtain a distilled magic state distillation blocks at a reduced code distance (see, e.g.,
with an error rate of 35p3 . If we perform the same pro- Refs. [12, 38]), using smaller patches and fewer code
tocol, but use 15 distilled magic states from previous cycles. The exact code distance that should be used
15-to-1 protocols as inputs, the output state will have depends on the protocol and the desired output fidelity.
an error rate of 35(35p3 )3 = 1500625p9 . This corre- Higher-distance codes. Alternatively, we can use
sponds to a 225-to-1 protocol obtained from the con- a code that produces higher-fidelity states. In Ref. [17],
catenation of two 15-to-1 protocols. It is also possible several protocols based on punctured Reed-Muller codes
to concatenate protocols that are not identical. Strate- are discussed. One of these protocols is a 116-to-12
gies to combine high-yield and low-yield protocols are protocol based on a code with n = 116, k = 12 and
discussed in Ref. [18]. mx = 17. It yields 12 magic states which each have an
In Fig. 19, we show an unoptimized block that can error rate of 41.25p4 . According to Eq. (9), this pro-
be used for 225-to-1 distillation. It consists of 11 15- tocol can be implemented using 44 tiles for 99 with
to-1 blocks that are used for the first level of distilla- a space-time cost of 363d3 per output state and a suc-
tion. Since each of these 11 blocks takes 11 to finish, cess probability of (1 − p)116 . For protocols with a high
they can be operated such that exactly one of these space cost such as 116-to-12, the space-time cost can be
blocks finishes in every time step. Therefore, in ev- slightly reduced by introducing additional ancilla space,
ery time step, one first-level magic state can be used for such that two operations can be performed simultane-
second-level distillation by moving it into one of the two ously. One possible configuration is shown in Fig. 20.
level-2 |mi-|0i blocks via the blue ancilla. The qubits This increases the space cost to 81 tiles, but reduces
that are used for the second level are highlighted in red. the time cost to 50, with a total space-time cost of
Note that since, for the second level, the single-qubit 337.5d3 per output state.
π/8 rotations require distilled magic states, the 15-to- Output-to-input ratio is not everything. A pop-
1 protocol of Fig. 15 requires 15 rotations instead of ular figure of merit when comparing n-to-k distillation
Figure 23: Fast setups using fast data blocks and 11 15-to-1 distillation blocks for p = 10−4 or 5 116-to-12 distillation block for
p = 10−3 .
as the number of tiles multiplied by 2d2 , taking mea- avoid this bottleneck, we can use the intermediate data
surement qubits into account. The minimal setup for block instead, which occupies 204 tiles, but consumes
p = 10−4 uses 164 · 2 · 132 ≈ 55,400 physical qubits and one magic state every 5. With 22 tiles for distillation
finishes the computation in 13·11·108 code cycles. With (see Fig. 22), this setup uses 226 tiles and finishes the
1 µs per code cycle, this amounts to roughly 4 hours. computation after 5.5 · 108 time steps. This increases
For p = 10−3 , the condition changes to the number of qubits to 76,400, but reduces the com-
putational time to 2 hours.
210 · 9.27 · 108 × d · pL (10−3 , d) < 0.01 , (12)
For p = 10−3 , the addition of a distillation block
which is satisfied for d = 27 with a final error probability reduces the distillation time to 4.64. At this point,
of 0.5%. The final error probability for d = 25 is at one should switch to the more efficient 116-to-12 block
4.9%. Thus, the minimal setup uses 210 · 2 · 272 ≈ of Fig. 20, which uses 81 tiles and distills a magic state
306,000 physical qubits and finishes the computation in on average every 4.68. The intermediate data block
27 · 9.27 · 108 code cycles, which amounts to roughly cannot keep up with this distillation rate, but we can
7 hours. Note that, in principle, a success probability still use it to consume one magic state every 5 instead
of less than 50% would be sufficient to reach arbitrary of 4.68. Such a configuration uses 228 data tiles, 81
precisions by repeating computations or running them distillation tiles and 13 storage tiles, i.e., a total of 322
in parallel. This means that the code distances that we tiles corresponding to approximately 469,000 physical
consider may be higher than what is necessary. qubits. The computational time reduces to 5 · 108 time
steps, i.e., 3.75 hours. Note that in Fig. 22b, the 12
output states of the 116-to-12 protocol should be chosen
4.4 Step 4: Add distillation blocks as 1, 3, 5, . . . , 25. They can be moved into the green
Only a small fraction of the tiles of the minimal setup is storage space in the last step of the protocol, since the
used for magic state distillation, i.e., 6.7% for p = 10−4 space denoted as ancilla 2 in Fig. 20 is not being used
and 21% for p = 10−3 . On the other hand, adding one in the last step.
additional distillation block doubles the rate of magic Trade-offs down to 1 per T gate. Adding addi-
state production, potentially doubling the speed of com- tional distillation blocks can reduce the time per T gate
putation. Therefore, in order to speed up the computa- down to 1. For p = 10−4 , 11 distillation blocks pro-
tion and decrease the space-time cost, we add additional duce 1 magic state every 1. To consume these magic
distillation blocks to our setup. states fast enough, we need to use a fast data block.
For p = 10−4 , adding one more distillation block re- This fast block uses 231 tiles and the 11 distillation
duces the time that it takes to distill a magic state blocks together with their storage tiles use 11∗12 = 132
to 5.5 per state. However, the compact block can tiles, as shown in Fig. 23a. With a total of 363 tiles, this
only consume magic states at 9 per state. In order to setup uses 123,000 qubits and finishes the computation
| {z } | {z } | {z }
layer 1 layer 2 layer 3
Figure 25: Time-optimal implementation of a three-qubit quantum computation consisting of 9 T gates in 3 T layers. Post-
corrected π/8 rotations (b) can be used to decide at a later point, whether the performed operation was a Pπ/8 or a P−π/8
rotation.
we know, whether we should have performed a Pπ/8 or where all three T layers are executed simultaneously.
a P−π/8 gate, we use post-corrected π/8 rotations in The reason why we can only group up T gates that are
Fig. 25b, which are similar to the auto-corrected rota- part of the same layer is that otherwise the Pauli correc-
tions of Fig. 17b. The post-corrected rotation uses a tions of the post-corrected rotation would not commute
resource state consisting of two qubits, a magic state with the other rotations. The time-optimal circuit con-
|mi and a second qubit that we refer to as a correction sists of three steps: The preparation of Bell pairs for
qubit |ci. The resource state is generated by initializing each T layer, the application of T gates, and a set of fi-
|ci in |0i and measuring Z ⊗ Y between |mi and |ci. In nal Bell measurements. At this point, the computation
order to perform a post-corrected π/8 rotation, the re- is not finished, as we still need to measure the correction
source state is consumed by measuring P ⊗ Z involving qubits of the post-corrected rotations. Because these in-
the magic state, and measuring |mi in X. The correc- volve potential Pauli corrections, the correction qubits
tion qubit |ci is stored for later use. It can be used at of the different T layers need to be measured one after
a later moment to decide, whether the rotation should the other. Thus, every T layer is executed one after the
have been a +π/8 or −π/8 rotation by measuring |ci other, where each execution requires the time that it
either in the Z or X basis. Depending on the measure- takes to measure the correction qubits and perform the
ment outcome, a Pauli correction may be required. classical processing to determine the next set of mea-
The time-optimal circuit. This can be used to ex- surements from the Pauli corrections. We refer to this
ecute multiple T layers simultaneously. If U is a product time as tm . In other words, any Clifford+T circuit con-
of mutually commuting π/8 rotations, i.e., a T layer, sisting of nL T layers can be executed in nL · tm , inde-
the teleportation corrections replace all π/8 rotations pendent of the code distance, which is the main feature
with post-corrected rotations. An example is shown in of the time-optimal scheme [21].
Fig. 25 for a three-qubit computation of three T layers, The circuit in Fig. 25c naively requires 2n · nL qubits
for an n-qubit computation, which scales with the preparation takes 113. If tm = 1 µs, then nmax is
length of the computation. Since we only have a finite ∼1500 for p = 10−4 and ∼3000 for p = 10−3 . Indepen-
number of qubits at our disposal, our goal is to imple- dently of the error rate, the computational time drops
ment the circuit in Fig. 26 instead. Here, the qubits to one second.
form groups of 2n qubits. We refer to each of these
groups as a unit. Using nu units, nu −1 layers of T gates
can be performed at the same time. In the circuit, the 5.2 Units
steps of Bell state preparation (BP ), post-corrected T
Units differ from the fast setups in Fig. 23 in three as-
layer execution (T ) and Bell basis measurement (BM )
pects. First, the number of qubits stored in the data
are performed repeatedly until the end of the computa-
block is doubled. Secondly, the distillation protocols are
tion. We refer to the block of operations (BP -T -BM )
modified to output |mi-|ci pairs, instead of just magic
as unit preparation. Every time that unit preparation is
states |mi. Thirdly, in order to store correction qubits
finished, all qubits except for the correction qubits (not
|ci, additional space is required. Contrary to magic-
shown in Fig. 26) and half of the qubits of the last unit
state storage tiles, correction-qubit storage tiles do not
are discarded. At this point, the next set of unit prepa-
need to be connected to the data block’s ancilla region.
rations begins. Simultaneously, the correction qubits of
the recently finished units are measured one after the Modified distillation blocks. In order to have dis-
other, which has a time cost of (nu −1)·tm . This means tillation blocks output |mi-|ci pairs, extra tiles and op-
that the number of units can be increased to speed up erations are required. We show the necessary modifi-
the computation, until (nu −1)·tm reaches the time that cations for the example of 15-to-1 and 116-to-12 distil-
it takes to prepare a unit tu . At this maximum number lation. A modified 15-to-1 block is shown in Fig. 27a.
of units nmax = tu /tm + 1, a T layer is executed every Apart from the standard 11 distillation tiles (orange)
tm and the computation cannot be sped up any further and one magic-state storage tile (green), it also contains
in the Clifford+T framework. 19 correction-qubit storage tiles (purple) and an addi-
tional tile (gray) that is used for neither distillation nor
Note that the first and last unit differ from the other storage. The additional steps that modify the protocol
units. While all other units need to execute nT T gates are shown in Fig. 27c, which zooms into the highlighted
every tu , the first and last unit need to execute nT T region of Fig. 27a. In step 1 of the shown protocol, the
gates only every 2tu , where nT is the number of T gates distillation has just finished after 11. The patch of
per layer. Furthermore, the other blocks need to be able the output state is deformed in step 2, and an addi-
to store up to 2nT correction qubits, since, after the end tional qubit |ci is initialized in the |0i state. The Y ⊗ Z
of a unit preparation, nT correction qubits are stored, operator between |ci and |mi is measured in step 3. In
and may need to remain stored until the end of the step 4, the correction qubit is sent to storage. Finally,
next unit preparation. For the first and last block, on in step 5, the magic state |mi is moved to its storage
the other hand, the required storage space is halved. tile. This operation blocks one of the orange tiles that is
In the following, we will show how to prepare units used for the distillation protocol for 4. Still, this does
in our framework. We find that, for our examples, unit not slow down 15-to-1 distillation, since the first 4 rota-
Figure 27: Modified 15-to-1 distillation blocks (a) output a |mi-|ci pair every 11. After the end of the distillation protocol, four
additional steps (c) are necessary. The modified 116-to-12 distillation block (b) finishes after 53, due to the three additional
steps in (d).
tion of the protocol in Fig. 15 can be chosen, such that ber of distillation blocks is chosen such that at least
the output qubit is not needed. Therefore, the modified 100 |mi-|ci pairs can be distilled in 113. A full time-
distillation block outputs one |mi-|ci pair every 11. optimal quantum computer consists of a row of multiple
For 116-to-12 distillation, a modified block is shown units, see Fig. 29c. The units shown in the figure con-
in Fig. 27b. We arrange the qubits, such that the 12 out- tain some unused tiles. This gives the units a rectangu-
put states are found in the positions shown in step 1 of lar profiles, even though this is not necessarily required.
Fig. 27d. Using 2, correction qubits are prepared and In our case, the units have a footprint of 54 × 21 and
Y ⊗ Z operators are measured. Finally, the patches are 37 × 21 tiles, respectively. Note that the first and last
deformed back to square patches and all magic states
are sent to the green storage, while all correction qubits 0 Step 1
are sent to the purple storage. This adds 3 to the pro-
tocol, meaning that this block outputs 12 |mi-|ci pairs
every 53 with a success probability of (1 − p)116 . For
p = 10−3 , this corresponds to one output every 4.96.
As mentioned in Sec. 4, modified distillation blocks 1 Step 2 1 Step 3
can also be used with setups, in which T gates are per-
formed one after the other, in order to deal with slow
classical processing. In this case, only one correction
qubit storage tile per magic state is required.
Units. Modified distillation blocks together with fast 2 Step 4 2 Step 5
data blocks are what we refer to as units. The units for
our example computation for p = 10−3 and p = 10−4
are shown in Fig. 29a-b. They both consist of a 200-
qubit fast data block, 200 correction-qubit storage tiles,
and a number of distillation blocks. Since we will show
that unit preparation takes 113 in our case, the num- Figure 28: Bell basis measurement (BM ) in 2.
unit 1
unit 2
unit 3
unit 4
Figure 29: Units consist of fast data blocks, modified distillation blocks and storage tiles. (a) The unit for p = 10−3 consists of
54 × 21 = 1134 tiles. (b) For p = 10−4 , the number of tiles is 37 × 21 = 777. (c) A time-optimal setup consists of a row of
multiple units, which means that the space to the bottom and top of the fast data blocks needs to remain free.
unit of a time-optimal setup are smaller, as they only to the top, and the other with a neighboring unit to the
require 100 correction-qubit storage tiles and half the bottom. For an n-qubit quantum computation,
√ this Bell
number of distillation blocks. state preparation can be performed in n+1 time steps,
Unit preparation. In order to implement the time- as we show in Fig. 30 for the example of n = 9. For this,
optimal circuit of Fig. 26 with the setup of Fig. 29, we every qubit is initialized in the |+i state. The Bell state
show protocols that can be used for the BP -T -BM op- preparation requires a series of Z ⊗ Z measurements.
erations. The data blocks of every unit store 2n qubits The protocol in Fig. 30 shows that, since an n-qubit
in n two-qubit patches. We arrange the qubits in such computation √ implies that the number of rows of the
a way that the the final Bell measurements (BM ) are data
√ block is n, these measurements require a total of
Z ⊗ Z and X ⊗ X measurements of the two qubits of n + 1 time steps.
every two-qubit patch. This Bell measurement can be In total, the unit preparation of an n-qubit computa-
√
done in 2, as shown in Fig. 28. tion with nT T gates per layer requires n+1 time steps
This arrangement of qubits implies that, for every for the Bell state preparation, nT time steps for the exe-
two-qubit patch, one of the qubits needs to be part of a cution of the T layer, and 2 time steps
√ for the Bell basis
Bell state preparation (BP ) with the neighboring unit measurement, i.e., a total of nT + n + 3 time steps. In
ent. dist .
. ent. dist
unit un it
(b) effective circuit
ent. dist .
. ent. dist
Bell pairs Bell pairs
ent. dist. ent. dist.
unit unit
ent. dist. ent. dist.
Bell pairs Bell pairs
. ent. dist
ent. dist .
un it unit
Figure 30: Bell state preparation (BP ) for a 9-qubit compu- . ent. dist
tation (18 qubits per unit) in 4. All two-qubit patches are ent. dist .
initialized in the |+i⊗2 state. Each measurement ancilla is used
for a Z ⊗ Z measurement between two qubits in different units.
√
For n-qubit computations, this requires n + 1 time steps.
Figure 31: Scheme for distributed quantum computing in a
circular arrangement of quantum computers with the ability
to share Bell pairs between nearest neighbors. If the Bell-pair
our example, this amounts to 113, which corresponds fidelity is low, entanglement distillation (ent. dist.) can be used
to tu = 1469 µs for p = 10−4 and tu = 3051 µs for to increase the fidelity. This scheme effectively implements the
p = 10−3 . Thus, time optimality is reached with 1470 circular time-optimal circuit drawn schematically in (b).
units for p = 10−4 and 3052 units for p = 10−3 .
Space-time trade-offs. Of course, it is also possi-
ble to use fewer units than required for time optimality. software-based entanglement distillation [39, 40] can be
Using nu units means that nT · (nu − 1) T gates are per- used to convert a large number of low-fidelity Bell pairs
formed every tu . In our example, 100 · (nu − 1) T gates into fewer high-fidelity Bell pairs. Recent experiments
are performed every 113. With three units, the com- have made progress towards generating entanglement
putational time drops to 56.5% of the computational between different superconducting chips [41–43].
time of the fast setup in Fig. 23. With ten units, it drops For the time-optimal scheme, quantum computers
to 11%. The number of qubits per unit is ∼260,000 may be arranged in a circle as shown in Fig. 31a,
for p = 10−4 and ∼1,650,000 for p = 10−3 , so going with the ability to share Bell pairs between neighboring
from the fast setup to parallelized units is, initially, not quantum computers. This effectively implements the
a favorable space-time trade-off. Since the space-time circuit that is schematically drawn in Fig. 31b. Note
cost has increased compared to the fast setup, it is also that in this circuit, there is no first and last unit. Here,
useful to check whether the code distance needs to be every unit performs nT π/8 rotations every tu . There-
readjusted. If we use three units – ignoring that the first fore, time optimality is reached with one fewer unit, and
and last unit are, in principle, smaller – the space-time each unit only needs to store nT correction qubits in-
cost is still below the space-time cost of the minimal stead of 2nT . With only 100 correction-qubit storage
setup in both cases. Adding more units significantly tiles and ignoring the unused tiles, the qubit count of
improves the space-time cost. It is also a prescription the units in Fig. 29 drops to ∼220,000 for p = 10−4 and
to linearly speed up the quantum computer down to the ∼1,470,000 for p = 10−3 , which are the numbers that
time-optimal limit. we report in Fig. 3. Thus, if nearest-neighbor communi-
cation between quantum computers is feasible, already
fewer than 2 million physical qubits per quantum com-
5.3 Distributed quantum computing
puter can be used to implement the full time-optimal
Note that, apart from the initial sharing of entangled scheme with 1500-3000 quantum computers.
Bell pairs, the units operate entirely independently of Entanglement distillation increases the qubit count.
each other. This implies that, if Bell pairs can be shared Note that it does not slow down the computation, as
between different quantum computers, each unit can be Bell pairs do not need to be distilled instantly. Entan-
located in a separate quantum computer. The shared glement distillation can take up to tu to distill the nT
Bell pairs do not even need to have a high fidelity, as Bell pairs required per entanglement distillation block.
π/8 rotations from pairs of doubly-controlled gates in a all qubits need to use a higher code distance. Only
circuit. Reducing the T count by increasing the circuit the correction qubits that are measured to execute each
depth [54] can still be a useful circuit manipulation for rotation layer need to be larger, and only right before
T -count-limited setups. We also note that the T count they are measured. The physical qubit measurement
can be reduced by combining gate synthesis and magic does not need to be a quantum non-demolition mea-
state distillation (synthillation) [55, 56]. surement, but can be a desctructive measurement. Ul-
C(P1 , P2 , P3 , P4 ) gates, i.e., triply-controlled Pauli timately, however, the speed of quantum computation
gates, can be written as 15 π/16 rotations, as shown is limited by the speed of classical computation. Ex-
in Fig. 35. While the T depth of this circuit is no ploring superconducting logic [57] to speed up classical
longer 1, the rotation depth is. In fact, any multi- computation may be a viable route to speed up quan-
controlled Pauli gate with n controls can be constructed tum computers.
from 2n − 1 Pπ/2n rotations by following the pattern Summary. All the schemes discussed in this paper
shown in Figs. 5, 34 and 35. The rotation depth of can not only be used with Clifford+T circuits, but also
all these gates is 1. Multi-controlled gates can also be with Clifford+ϕ circuits. The only difference is that
pieced together from C(P1 , P2 , P3 ) rotations, but this more and different resource states are required. Their
increases the circuit depth. By using small-angle rota- distillation and storage requires more space than ordi-
tions, any multi-controlled Pauli gate can be executed nary magic state distillation, but their use can speed up
in one step. the computation by several orders of magnitude.
104
103
102
101
100
10−1
10−2
10−3
10−4
A B C D E F G H I J KL M N O P
A: Compact block + 1 distillation block (Fig. 21) L: 2 units (Figs. 29, 31) M: 3 units N: 10 units
B: Intermediate block + 2 distillation blocks (Fig. 22) O: 100 units P: 1469/1470 units (time-optimal)
C-K: Fast block + 3-11 distillation block (Fig. 23)
Figure 36: Space-time, space, and time cost of the schemes discussed in this paper for the example of a 100-qubit quantum
computation with T count 108 and T depth 106 , under the assumption of a 1 µs code cycle time, and a 1 µs measurement and
classical processing time. The solid and dashed lines in M-P are for circular (solid) and linear (dashed) arrangements of units.
data block and a single distillation block, we traded T depth. We have not investigated how this trade-off
off space versus time, increasing the size of the quan- affects the space-time cost in our scheme.
tum computer and, in return, decreasing the computa- Room for optimization. In our T -count-limited
tional time. For the example of a computation with a schemes and for the preparation of units, one T gate is
T count of 108 and a T depth of 106 with an error rate performed after the other. If the input circuit is known,
of p = 10−4 , the minimal setup consists of 164 tiles and it is reasonable to assume that qubits can be arranged in
executes one T gate every 11, corresponding to a com- a way that allows for the parallel execution of multiple
putational time of 4 hours with 55,400 physical qubits. T gates in the same data block. Furthermore, there is a
From here, the space-time cost is drastically reduced strict separation between tiles used for magic state dis-
by adding more distillation blocks, as shown in Fig. 36 tillation and tiles used for data blocks in our schemes.
and Tab. 2. With this strategy, the computational time By sharing tiles between blocks, the space overhead may
is reduced to 1 per T gate, where the computational be reduced. Moreover, we have only considered a hand-
cost of a circuit is governed by its T count. ful of distillation protocols. It would be interesting to
For further space-time trade-offs, we parallelized T see which distillation protocols can be used to optimize
layers using units. This is an increase in space-time the cost function of Eq. (9). Finally, concrete tile lay-
cost, especially for linear arrangements of units (dashed outs that can be used to distill and consume the addi-
line in Fig. 36), but enables further space-time trade- tional resources necessary for Clifford+ϕ computing are
offs. Linearly trading off space versus time, the compu- still missing.
tational time can be reduced to one measurement per Beyond surface codes. Even though we designed
T layer. Units are well-suited for distributed quantum our schemes with surface codes in mind, they can, in
computing, as the sharing of Bell pairs between neigh- principle, be applied to other toric-code-based patches,
boring units is part of the parallelization scheme. such as Majorana surface-code patches [11] or color-
This exhausts the space-time trade-offs that are pos- code patches [13, 61, 62]. Color codes can reduce the
sible within the Clifford+T framework. Switching to number of physical qubits due to more compact encod-
Clifford+ϕ circuits can provide further trade-offs, as ing, but require more elaborate hardware to measure
additional resources are introduced for arbitrary-angle the higher-weight check operators. The space cost is
rotations. This can be used to execute circuits in a time reduced by replacing all surface-code patches by color-
proportional to their rotation depth, as opposed to their code patches, with the exception of Pauli product mea-
physical qubits 55,400 76,400 90,200 - 123,000 447,000 679,000 2,230,000 - 328,000,000
(788,000) (2,630,000 - 386,000,000)
computational time 4h 2h 79-22 min 12 min 490 sec 147 sec - 1 sec
(734 sec) (163 sec - 1 sec)
Table 2: Space and time cost of the schemes plotted in Fig. 36. The number in parentheses are for linear arrangements of units
(dashed lines in Fig. 36).
|mi
X Y Z
Z Z
Z Z X X
X X Z Z
than Z ⊗ Z, e.g., operators involving the Y operator, a red dot in Fig. 41. Their product is equivalent to the
as shown in Fig. 40d. First, a patch is deformed to desired operator, i.e., Y|q1 i ⊗ X|q3 i ⊗ Z|q4 i ⊗ X|q5 i . The
a wider patch by initializing physical qubits in the X new check operators are measured for d code cycles to
basis and measuring the new stabilizers, which takes d account for measurement errors. This procedure corre-
code cycles. Below the wide patch, a rectangular an- sponds to the multi-body lattice surgery protocol intro-
cilla patch is initialized in the |0i state. A column of duced in Ref. [12]. It can be used to measure any prod-
physical qubits in the center is missing, so that, in the uct of surface-code-boundary Pauli operators by initial-
next step, the ancilla can be used for twist-based lattice izing physical qubits in the |+i state in an ancilla region
surgery [11], measuring the Y operator. The product of of width d, and then measuring new check operators,
the operators highlighted in red in the third step corre- where the product of the nontrivial operators yields the
sponds to the logical Y ⊗ Z operator between the two outcome of the desired multi-patch measurement. The
logical qubits. The lattice surgery in the third step ancilla region of width d is required to ensure that the
involves dislocation operators and a five-qubit twist de- code distance of the stabilizer configuration during the
fect. Even though these stabilizers are irregular, they multi-body lattice surgery remains d.
can still be measured in a square lattice of physical Moving boundaries. The protocol to move patches
qubits with nearest-neighbor couplings, as we show in is similar to lattice surgery. It is shown in Fig. 40c.
Fig. 39. For the measurement of twist operators and Extending the patch via its Z boundary in the second
wide X and Z stabilizers, up to three measurement an- step is the same operation as a Z ⊗ Z lattice surgery
cillas can be used. between the patch and a rectangular |+i ancilla qubit
Multi-patch measurements. For a multi-patch to the right. This needs to be done for d code cycles
measurement in Fig. 41, all physical qubits located in to account for measurement errors. Finally, the patch
the region of the ancilla patch are initialized in the |+i is shortened again by measuring the left two thirds of
state. Next, new check operators are introduced. The physical qubits in the X basis.
newly introduced X-type stabilizers all yield trivial out- Moving corners. The movement of corners of a
comes, since they are products of physical qubits initial- surface-code patch is shown in Fig. 40b. It corresponds
ized in an X eigenstate and previously measured check to a change of boundary stabilizers. In order to account
operators. The nontrivial operators are highlighted by for measurement errors of the newly measured stabiliz-
ers, this requires d code cycles. The top left physical higher number of corners. A patch with 2N + 2 cor-
qubit in the second step of Fig. 40b is removed from ners represents N qubits, as shown in Fig. 42. The
the patch via an X measurement. simplest case is a four-corner patch (a/b) representing
a single qubit. Six-corner patches (c) are two-qubit
patches. The general rule that assigns the operators
B Extended ruleset of N qubits to the edges of a (2N + 2)-corner patch is
given in Fig. 42d. Going clockwise, the dashed bound-
Some surface-code operations are not covered by the aries correspond to X1 , X1 X2 , X2 X3 , . . . , XN −1 XN and
rules discussed in the introduction. In particular, we XN . Starting to the right of X1 , the solid edges corre-
only consider patches with 4 or 6 corners, where we spond to Z1 , Z2 , . . . , ZN and the product Z1 Z2 · · · ZN .
refer to the points where two edges meet as corners. One can also consider patches with shortened edges,
In general, one could also consider patches with a such that they occupy fewer tiles. The drawback of this
is that in every time step, an error corresponding to
Four-, six- and eight-corner patches the Pauli operator represented by the shortened edge
will occur with a certain probability perr . An exam-
ple of a six-corner patch with two shortened X edges
(a) (c) is shown in Fig. 43, meaning that this six-corner patch
is susceptible to X errors. In the surface-code imple-
mentation, this corresponds to a patch with boundaries
that are shorter than d physical data qubits, effectively
(b) reducing the code distance of the logical operators en-
coded by the shortened edges. Note that patches with
shortened edges may occupy more than d2 physical data
(d) (2N + 2)-corner patches qubits per tile.
With (2N + 2)-corner patches, the set of operations
needs to be modified. The initialization rule for such
patches is:
5 6 7 8 9
10 11 12 13 14
Figure 45: Proof-of-principle two-qubit device implemented with 48 physical data qubits.
codes the magic state in a three-qubit repetition code ancilla, following the protocol of Fig. 11b.
with a logical Z operator ZL = Z ⊗ Z ⊗ Z. To consume This demonstrates that a proof-of-principle experi-
the magic state, Z1 ⊗ Z2 ⊗ ZL is measured in step 3. ment can be built with 48 physical data qubits. In gen-
This consumes a magic state for the Z ⊗ Z rotation. eral, this requires 6d2 − 2d qubits, i.e., 48 for d = 3, 140
The next rotation is a Y ⊗ X rotation. Here, we for d = 5 and 280 for d = 7. If measurement qubits are
first need to deform |q1 i, such that both the X and Z required for syndrome readout, the number of physical
boundaries of the qubit are accessible. Qubit |q2 i is qubits roughly doubles.
rotated in steps 5-8 using the protocol in Fig. 11a. In
step 9, again, a magic state is initialized in a two-qubit
repetition code with ZL = Za1 ⊗ Za2 . In step 10, the D Implementation of the 7-to-1 proto-
magic state is consumed via a Y1 ⊗ Za1 and a X1 ⊗ Za2 col
measurement.
This kind of protocol consisting of patch deformations Even though the distillation of |Y i = |0i + i |1i states
and patch rotations can be used to perform any π/8 has no use in our framework, we show how to imple-
rotation with the exception of (Y ⊗ Y )π/8 , since there ment the 7-to-1 distillation protocol for benchmarking
is not enough space to make both Y operators accessible purposes in Fig. 46. The protocol is based on the 7-
for lattice surgery. For this rotation, we first explicitly qubit Steane code. Its X stabilizers are the faces shown
execute a Clifford gate to change (Y ⊗Y )π/8 to any other in Fig. 46a, and its logical X operator can be chosen
rotation. Any Clifford gate that does not commute with as the X ⊗ X ⊗ X operator with support on the three
Y ⊗ Y will suffice. In our example, we choose a Zπ/4 qubits drawn in red.
rotation. It is performed by initializing a |0i state in Following the procedure in Sec. 3, the distillation
step 13, and measuring Z1 ⊗ Y between |q1 i and the circuit is obtained by initializing mx + k = 4 qubits in
Figure 46: The Steane code (a) is the basis of 7-to-1 distillation (c). In our framework, the corresponding distillation block (b)
uses 7 tiles for 4.
the |+i state, where the first three qubits are associ- the initial state. The remaining four rotations are
ated with the three X stabilizers, and the last qubit is shown in Fig. 46c.
associated with the logical X operator. For each qubit A distillation block that can be used for this protocol
of the Steane code, the circuit contains a π/4 rotation is shown in Fig. 46b. Since the consumption of |Y i
with Z’s on each stabilizer and logical operator that resource states requires no Clifford correction, this block
the qubit is part of. The three qubits in the corner consists of only 7 tiles. With four rotations, the leading
of the triangle are only part of a single stabilizer and order of the space-time cost of this protocol is 7d2 · 4d =
no logical operator, therefore they contribute with 28d3 .
single-qubit Zπ/4 rotations, which can be absorbed into