Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
40 views37 pages

Game of Surface Codes

This document discusses strategies for executing quantum circuits using surface code architectures to minimize overhead. It describes representing surface code patches as a tile-based game with simple rules. The game can initialize qubits, measure qubits in different bases, and deform patches representing qubits, which allows operations like moving and entangling qubits with low overhead in space and time. Various examples of small quantum circuits and protocols are implemented using the tile-based game rules to illustrate the approach.

Uploaded by

ant.finnerty
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views37 pages

Game of Surface Codes

This document discusses strategies for executing quantum circuits using surface code architectures to minimize overhead. It describes representing surface code patches as a tile-based game with simple rules. The game can initialize qubits, measure qubits in different bases, and deform patches representing qubits, which allows operations like moving and entangling qubits with low overhead in space and time. Various examples of small quantum circuits and protocols are implemented using the tile-based game rules to illustrate the approach.

Uploaded by

ant.finnerty
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

A Game of Surface Codes:

Large-Scale Quantum Computing with Lattice Surgery


Daniel Litinski @ Dahlem Center for Complex Quantum Systems, Freie Universität Berlin, Arnimallee 14, 14195 Berlin, Germany

Given a quantum gate circuit, how does one form in a surface-code architecture.
execute it in a fault-tolerant architecture with There exist several encoding schemes for surface
as little overhead as possible? In this pa- codes, among others, defect-based [7], twist-based [8]
per, we discuss strategies for surface-code quan- and patch-based [9] encodings. In this work, we focus
tum computing on small, intermediate and large on the latter. Surface-code patches have a low space
scales. They are strategies for space-time trade- overhead compared to other schemes, and offer low-
arXiv:1808.02892v3 [quant-ph] 3 Feb 2019

offs, going from slow computations using few overhead Clifford gates [10, 11]. In addition, they are
qubits to fast computations using many qubits. conceptually less difficult to understand, as they do not
Our schemes are based on surface-code patches, directly involve braiding of topological defects. Design-
which not only feature a low space cost com- ing computational schemes with surface-code patches
pared to other surface-code schemes, but are only requires the concepts of qubits and measurements.
also conceptually simple – simple enough that To this end, we describe the operations of surface-code
they can be described as a tile-based game with patches as a tile-based game. This is helpful to design
a small set of rules. Therefore, no knowledge of protocols and determine their space-time cost. The ex-
quantum error correction is necessary to under- act correspondence between this game and surface-code
stand the schemes in this paper, but only the patches is specified in Appendix A, but it is not crucial
concepts of qubits and measurements. for understanding this paper. Readers who are inter-
The field of quantum computing is fuelled by the ested in the detailed surface-code operations may read
promise of fast solutions to classically intractable prob- Appendix A in parallel to the following section.
lems, such as simulating large quantum systems or fac- Surface codes as a game. The game is played on
toring large numbers. Already ∼100 qubits can be used a board partitioned into a number of tiles. An example
to solve useful problems that are out of reach for clas- of a 5 × 2 grid of tiles is shown in Fig. 1. The tiles
sical computers [1, 2]. Despite the exponential speed- can be used to host patches, which are representations
up, the actual time required to solve these problems of qubits. We denote the Pauli operators of each qubit
is orders of magnitude above the coherence times of as X, Y and Z. Patches have dashed and solid edges
any physical qubit. In order to store and manipulate representing Pauli operators. We consider two types of
quantum information on large time scales, it is neces- patches: one-qubit and two-qubit patches. One-qubit
sary to actively correct errors by combining many phys- patches represent one qubit and consist of two dashed
ical qubits into logical qubits using a quantum error- and two solid edges. Each of the two dashed (solid)
correcting code [3–5]. Of particular interest are codes edges represent the qubit’s X (Z) operator. While the
that are compatible with the locality constraints of real- square patch in Fig. 1a only occupies one tile, a one-
istic devices such as superconducting qubits, which are qubit patch can also be shaped to, e.g., occupy three
limited to operations that are local in two dimensions. tiles (b). A two-qubit patch (c) consists of six edges and
The most prominent such code is the surface code [6, 7]. represents two qubits. The first qubit’s Pauli operators
X1 and Z1 are represented by the two top edges, while
Working with logical qubits introduces additional
overhead to the computation. Not only is the space cost
drastically increased as physical qubits are replaced by
logical qubits, but also the time cost increases due to
(a) (c)
the restricted set of accessible logical operations. Sur-
face codes, in particular, are limited to a set of 2D-
local operations, which means that arbitrary gates in a
quantum circuit may require several time steps instead (b)
of just one. To keep the cost of surface-code quan-
tum computing low, it is important to find schemes
that translate quantum circuits into surface-code lay-
outs with a low space-time overhead. This is also nec- Figure 1: Examples of one-qubit (a/b) and two-qubit (c)
essary to benchmark how well quantum algorithms per- patches in a 5 × 2 grid of tiles.

Accepted in Quantum 2019-02-01, click title to verify 1


the second qubit’s operators X2 and Z2 are found in the (a) Bell state preparation (b) Moving corners
two bottom edges. The remaining two edges represent 0 Step 1 1 Step 2 0 Step 1 1 Step 2
the operators Z1 · Z2 and X1 · X2 .
In the following, we specify the operations that can be
used to manipulate the qubits represented by patches.
Some of these operations take one time step to complete (c) Qubit movement
(denoted by 1), whereas others can be performed in- 0 Step 1 1 Step 2 1 Step 3
stantly, requiring 0. The goal is to implement quan-
tum algorithms using as few tiles and time steps as pos-
sible. There are three types of operations: qubit initial-
ization, qubit measurement and patch deformation. (d) Y basis measurement
0 Step 1 1 Step 2 2 Step 3 2 Step 4
I. Qubit initialization:

– One-qubit patches can be initialized in the X


and Z eigenstates |+i and |0i. (Cost: 0)
– Two-qubit patches can be initialized in the
states |+i ⊗ |+i and |0i ⊗ |0i. (Cost: 0) (e) Y|q1 i ⊗ X|q3 i ⊗ Z|q4 i ⊗ X|q5 i measurement
– One-qubit patches can be initialized in an ar- 0 Step 1 1 Step 2
bitrary state. Unless this state is |+i or |0i,
an undetected random Pauli error may spoil
the qubit with probability p. (Cost: 0)
ancilla
II. Qubit measurement:

– Single-patch measurements: The qubits rep-


resented by patches can be measured in the
Figure 2: Examples of short protocols. (a) Preparation of a
X or Z basis. For two-qubit patches, the two two-qubit Bell state in 1. (b) Moving corners of a four-corner
qubits must be measured simultaneously and patch to change its shape in 1. (c) Moving a square-patch
in the same basis. This measurement removes qubit over long distances in 1. (d) Measurement of a square-
the patch from the board, freeing up previ- patch qubit in the Y basis using an ancilla qubit and 2. (e) A
ously occupied tiles. (Cost: 0) multi-qubit Y|q1 i ⊗ X|q3 i ⊗ Z|q4 i ⊗ X|q5 i measurement in 1.
– Two-patch measurements: If edges of two dif-
ferent patches are positioned in adjacent tiles, III. Patch deformation:
the product of the operators of the two edges
can be measured. For example, the product – Edges of a patch can be moved to deform the
Z ⊗Z between two neighboring square patches patch. If the edge is moved onto a free tile
can be measured, as highlighted in step 2 of to increase the size of the patch, this takes
Fig. 2a by the blue rectangle. If the edge of 1 to complete. If the edge is moved inside
one patch is adjacent to multiple edges of the the patch to make the patch smaller, the ac-
other patch, the product of all involved Pauli tion can be performed instantly.
operators can be measured. For instance, if – Corners of a patch can be moved along the
qubit A’s Z edge is adjacent to both qubit patch boundary to change its shape, as shown
B’s X edge and Z edge, the operator ZA ⊗ YB in Fig. 2b. (Cost: 1)
can be measured (see step 3 of Fig. 2d), since
Y = iXZ. (Cost: 1) To illustrate these operations, we go through three
– Multi-patch measurements: An arbitrarily- short example protocols in Fig. 2a/c/d. The first ex-
shaped ancilla patch can be initialized. The ample (a) is the preparation of a Bell pair. Two square
product of any number of operators adjacent patches are initialized in the |+i state. Next, the oper-
to the ancilla patch can be measured. The an- ator Z ⊗ Z is measured. Before the measurement, the
cilla patch is discarded after the measurement. qubits are in the state |+i ⊗ |+i = (|00i + |01i + |10i +
The example of a Y|q1 i ⊗ X|q3 i ⊗ Z|q4 i ⊗ X|q5 i |11i)/2. If the measurement outcome√ is +1, the qubits
measurement is shown in Fig. 2e. (Cost: 1) end up in the state (|00i + |11i)/
√ 2. For the outcome
−1, the state is (|01i + |10i)/ 2. In both cases, the two

Accepted in Quantum 2019-02-01, click title to verify 2


qubits are in a maximally entangled Bell state. This jection and classical processing may need to be taken
protocol takes 1 to complete. The second example (c) into account. For these protocols, we will show how
is the movement of a square patch into a different tile. they can be adapted to prevent such contributions from
For this, the square patch is enlarged by patch defor- increasing the time cost beyond t · d code cycles.
mation, which takes 1, and then made smaller again
at no time cost. The third example (d) is the measure-
ment of a square patch in the Y basis. For this, the Overview
patch is deformed such that the X and Z edge are on
Having established the rules of the game and the corre-
the same side of the patch. An ancillary patch is ini-
spondence of our framework to surface-code operations,
tialized in the |0i state and the operator Z ⊗ Y between
our goal is to implement arbitrary quantum computa-
the ancilla and the qubit is measured. The ancilla is
tions. In this work, we discuss strategies to tackle the
discarded by measuring it in the Z basis.
following problem: Given a quantum circuit, how does
Translation to surface codes. As described in one execute it as fast as possible on a surface-code-based
Appendix A, protocols designed within this framework quantum computer of a certain size? This is an opti-
can be straightforwardly translated into surface-code mization problem that was shown to be NP-hard [15], so
operations. Essentially, patches correspond to surface- the focus is on heuristics rather than a general solution.
code patches with dashed and solid edges as rough and The content of this paper is outlined in Fig. 3.
smooth boundaries. Thus, for surface codes with a code The input to our problem is an arbitrary gate cir-
distance d, each tile corresponds to d2 physical data cuit corresponding to the computation. We refer to the
qubits. Each time step roughly corresponds to d code qubits that this circuit acts on as data qubits. As we
cycles, i.e., measuring all surface-code check operators review in Sec. 1, the natural universal gate set for sur-
d times. We associate a time step with all surface-code face codes is Clifford+T , where Clifford gates are cheap
operations which have a time cost that scales with d, but and T gates are expensive. In fact, Clifford gates can
no time step with operations whose time cost is inde- be treated entirely classically, and T gates require the
pendent of the code distance, but may still be nonzero. consumption of a magic state |0i+eiπ/4 |1i. Only faulty
For this reason, the correspondence between 1 and d (undistilled ) magic states can be prepared in our frame-
code cycles is not exact. work. To generate higher-fidelity magic states for large-
Two-patch and multi-patch measurements corre- scale quantum computation, a lengthy protocol called
spond to (twist-based) lattice surgery [9, 11] and multi- magic state distillation [16] is used.
qubit lattice surgery [12], respectively, which both re- It is therefore natural to partition a quantum com-
quire d code cycles to account for measurement errors. puter into a block of tiles that is used to distill magic
Qubit initialization has no time cost, since, in the case states (a distillation block) and a block of tiles that
of X and Z eigenstates, it can be done simultaneously hosts the data qubits (a data block) and consumes
with the subsequent lattice surgery [9, 13]. For arbi- magic states. The speed of a quantum computer is gov-
trary states, initialization corresponds to state injec- erned by how fast magic states can be distilled, and how
tion [13, 14]. Its time cost does not scale with d. Simi- fast they can be consumed by the data block.
larly, single-qubit measurements in the X or Z basis cor- In Sec. 2, we discuss how to design data blocks. In
respond to the simultaneous measurement of all phys- particular, we show three designs: compact, intermedi-
ical data qubits in the corresponding basis and some ate and fast blocks. The compact block uses 1.5n + 3
classical error correction, which does not scale with d tiles to store n qubits, but takes up to 9 to consume
either. Patch deformation is code deformation, which a magic state. Intermediate blocks use 2n + 4 tiles and
requires d code cycles, unless the patch becomes smaller require up √to 5 per magic state. Finally, the fast block
in the process, in which case it corresponds to single- uses 2n + 8n + 1 tiles, but requires only 1 to con-
qubit measurements. Note that not all surface-code op- sume a magic state. The compact block is an option for
erations are covered by this framework. An extended early quantum computers with few qubits, where the
set of rules is discussed in Appendix B. generation of a single magic state takes longer than 9.
In essence, the framework can be used to estimate the The fast block has a better space-time overhead, which
space-time cost of a computation. The leading-order makes it more favorable on larger scales.
term of the space-time cost – the term that scales with Data blocks need to be combined with distillation
d3 – of a protocol that uses s tiles for t time steps is blocks for universal quantum computing. In Sec. 3,
st · d3 in terms of (physical data qubits)·(code cycles). we discuss designs of distillation blocks. Since magic
The space cost is s · d2 physical data qubits. Determin- state distillation is the main operation of a surface-
ing the exact time cost requires special care. In some code-based quantum computer, it is important to min-
protocols, the subleading contributions due to state in- imize its space-time cost. We discuss distillation proto-

Accepted in Quantum 2019-02-01, click title to verify 3


Sec. 1: Clifford+T circuits Sec. 2: Data blocks Sec. 3: Distillation blocks

Example:
100 qubits Sec. 4: Sec. 5: Sec. 6:
Trade-offs limited by T count Trade-offs limited by T depth Trade-offs beyond Clifford+T
108 T gates

p = 10−4 55,000 qubits 120,000 qubits 1500 × 220,000 = 330m qubits ···
d = 13 4 hours 22 minutes 1 second ···
∼100 qubits p = 10−3 310,000 qubits 1,000,000 qubits 3000 × 1,500,000 ≈ 4.5b qubits ···
(Appendix C) d = 27 7 hours 45 minutes 1 second ···

Figure 3: Overview of the content of this paper. To illustrate the space-time trade-offs discussed in this work, we show the number
of physical qubits and the computational time required for a circuit of 108 T gates distributed over 106 T layers. We consider
physical error rates of p = 10−4 and p = 10−3 , for which we need code distances d = 13 and d = 27, respectively. We assume
that each code cycle takes 1 µs.

cols based on error-correcting codes with transversal T of one T layer per qubit measurement time, effectively
gates, such as punctured Reed-Muller codes [16, 17] and implementing Fowler’s time-optimal scheme [21]. If the
block codes [18–20]. In comparison to braiding-based 108 T gates are distributed over 106 layers, and mea-
implementations of distillation protocols, we reduce the surements (and classical processing) can be performed
space-time cost by up to 90%. in 1 µs, up to 1500 units of 220,000 qubits can be run in
A data block combined with a distillation block con- parallel, where each unit is responsible for the execution
stitutes a quantum computer in which T gates are per- of one T layer. This way, the computational time can
formed one after the other. At this stage, the quan- be brought down to 1 second using 330 million qubits.
tum computer can be sped up by increasing the num- While this is a large number, the units do not necessar-
ber of distillation blocks, effectively decreasing the time ily need to be part of the same quantum computer, but
it takes to distill a single magic state, as we discuss can be distributed over up to 1500 quantum computers
in Sec. 4. In order to illustrate the resulting space- with 220,000 qubits each, and with the ability to share
time trade-off, we consider the example of a 100-qubit Bell pairs between neighboring computers.
computation with 108 T gates, which can already be In Sec. 6, we discuss further space-time trade-offs that
used to solve classically intractable problems [2]. As- are beyond the parallelization of Clifford+T circuits. In
suming an error rate of p = 10−4 and a code-cycle time particular, we discuss the use of Clifford+ϕ circuits, i.e.,
of 1 µs, a compact data block together with a distillation circuits containing arbitrary-angle rotations beyond T
block can finish the computation in 4 hours using 55,000 gates. These require the use of additional resources,
physical qubits.1 Adding 10 more distillation blocks in- but can speed up the computation. We also discuss the
creases the qubit count to 120,000 and decreases the possibility of hardware-based trade-offs by using higher
computational time to 22 minutes, using 1 per T gate. code distances, but in turn shorter measurements with
For further space-time trade-offs in Sec. 5, we exploit a decreased measurement fidelity. Ultimately, the speed
that the T gates of a circuit are arranged in layers of of a quantum computer is limited by classical process-
gates that can be executed simultaneously. This en- ing, which can only be improved upon by faster classical
ables linear space-time trade-offs down to the execution computing.
Finally, we note that while the number of qubits re-
1 We will assume that the total number of physical qubits is
quired for useful quantum computing is orders of mag-
twice the number of physical data qubits. This is consistent with
superconducting qubit platforms, where the use of measurement
nitude above what is currently available, a proof-of-
ancillas doubles the qubit count. If a platform does not require principle two-qubit device demonstrating all necessary
the use of ancilla qubits, the total qubit count is reduced by 50% operations using undistilled magic states can be built
compared to the numbers reported in this paper. with 48 physical data qubits, see Appendix C.

Accepted in Quantum 2019-02-01, click title to verify 4


(a/b)

if P P 0 = P 0 P : (a) if P P 0 = P 0 P : (c)
(c)

if P P 0 = −P 0 P : if P P 0 = −P 0 P :

if P1 P 0 = −P 0 P1 : if P2 P 0 = −P 0 P2 : (b)

Figure 4: A generic circuit consists of π/4 rotations (orange), π/8 rotations (green) and measurements (blue). The Pauli product
in each box specifies the axis of rotation or the basis of measurement. If the Pauli operator is −P instead of P , a minus sign
is found in the corner of the box, such that, e.g., Z−π/4 corresponds to an S † gate. Using the commutation rules in (a/b), all
Clifford gates can be moved to the end of the circuit. Using (c), the Clifford gates can be absorbed by the final measurements.

1 Clifford+T quantum circuits commuted to the end of the circuit, the Zπ/8 rotations
become Pauli product rotations. The rules for moving
Pπ/4 rotations past Pϕ0 gates are shown in Fig. 4a: If P
Our goal is to implement full quantum algorithms with
and P 0 commute, Pπ/4 can simply be moved past Pϕ0 .
surface codes. The input to our problem is the al-
If they anticommute, Pϕ0 turns into (iP P 0 )ϕ when Pπ/4
gorithm’s quantum circuit. The universal gate set
is moved to the right. Since C(P1 , P2 ) gates consist
Clifford+T is well-suited for surface codes, since it sepa-
of π/4 rotations, similar rules can be derived as shown
rates easy operations from difficult ones. Often, this set
is generated using the Hadamard gate H, phase gate S,
controlled-NOT (CNOT) gate, and the T gate. Instead, (a) Single-qubit rotations
we choose to write our circuits using Pauli product ro-
tations Pϕ (see Fig. 5), because it simplifies circuit ma-
nipulations. Here, Pϕ = exp(−iP ϕ), where P is a Pauli
product operator (such as Z, Y ⊗ X, or X ⊗ 1 ⊗ X) and
ϕ is an angle. In this sense, S = Zπ/4 , T = Zπ/8 ,
and H = Zπ/4 · Xπ/4 · Zπ/4 . The CNOT gate can
also be written in terms of Pauli product rotations as (b) CNOT (c) C(P1 , P2 ) gate
CNOT = (Z ⊗X)π/4 ·(1 ⊗X)−π/4 ·(Z ⊗ 1)−π/4 . In fact,
we can more generally define P1 -controlled-P2 gates as
C(P1 , P2 ) = (P1 ⊗ P2 )π/4 · (1 ⊗ P2 )−π/4 · (P1 ⊗ 1)−π/4 .
The CNOT gate is the specific case of C(Z, X).
Getting rid of Clifford gates. Clifford gates are
considered to be easy, because, by definition, they map
Pauli operators onto other Pauli operators [22]. This Figure 5: Clifford+T gates in terms of Pauli rotations.
can be used to simplify the input circuit. A generic cir- (a) Single-qubit Clifford gates are π/4 rotations, and the T
cuit is shown in Fig. 4, consisting of Clifford gates, Zπ/8 gate is a π/8 rotation. (b/c) P1 -controlled-P2 gates are Clif-
rotations and Z measurements. If all Clifford gates are ford gates, where C(Z, X) is the CNOT gate.

Accepted in Quantum 2019-02-01, click title to verify 5


| {z } | {z } | {z } | {z } | {z } | {z }
layer 1 layer 2 layer 3 layer 4 layer 1 layer 2
Figure 6: Clifford+T circuits can be written as a number of consecutive π/8 rotations. These gates are grouped into layers of
mutually commuting rotations. A simple greedy algorithm can be used to reduce the number of layers, i.e., the T depth.

in Fig. 4b: If P 0 anticommutes with P1 , Pϕ0 turns into that, in the usual definition, only up to n T gates can
(P 0 P2 )ϕ after commutation. If P 0 anticommutes with be part of a layer, whereas in our case, there is no limit.
P2 , Pϕ0 turns into (P 0 P1 )ϕ . If P 0 anticommutes with When partitioning π/8 rotations into layers, the naive
both P1 and P2 , Pϕ0 turns into (P 0 P1 P2 )ϕ . approach often yields more layers than are necessary.
After moving the Clifford gates to the right, the re- For instance, a naive partitioning of the first 6 T gates
sulting circuit consists of three parts: a set of π/8 ro- of Fig. 6 yields 4 layers. A few commutations can bring
tations, a set of π/4 rotations, and Z measurements. the number down to 2 layers. There are a number of
Because Clifford gates map Pauli operators onto other algorithms for the optimization of the T depth [27–29].
Pauli operators, the Clifford gates can be absorbed by Here, we use the simple greedy algorithm shown below
the final measurements, turning Z measurements into to reduce the number of layers.
Pauli product measurements. The commutation rules Note that when a reordering puts two equal π/8 rota-
of this final step are shown in Fig. 4c and are similar to tions into the same layer, they can be combined into a
the commutation of Clifford gates past rotations. π/4 rotation that is commuted to the end of the circuit,
T count and T depth. Thus, every n-qubit circuit thereby decreasing the T count. As we discuss in Sec. 6,
can be written as a number of consecutive π/8 rotations this kind of algorithm can not only be used with π/8 ro-
and n final Pauli product measurements, as shown in tations, but, in principle, with arbitrary Pauli product
Fig. 6. We refer to the number of π/8 rotations as the rotations. The reduction of the circuit depth in terms
T count. An important part of circuit optimization is of non-π/8 rotations can be useful when going beyond
the minimization of the T count, for which there ex- Clifford+T circuits.
ist various approaches [23–26]. The π/8 rotations of
a circuit can be grouped into layers. All π/8 rotations 1.1 Pauli product measurements
that are part of a layer need to mutually commute. The
number of π/8 layers of a circuit is strictly speaking not When implementing circuits like Fig. 6 with surface
the same quantity as the T depth, but we will still refer codes, one obstacle is that π/8 rotations are not di-
to it as the T depth and to π/8 layers as T layers. Note rectly part of the set of available operations. Instead,
one uses magic states [16] as a resource. These states
are π/8-rotated Pauli eigenstates |mi = |0i + eiπ/4 |1i.
They can be consumed in order to perform Pπ/8 rota-
repeat
tions. The corresponding circuit [30] is shown in Fig. 7.
for each layer i do
for each rotation j in layer i + 1 do
if (rotation j commutes with all
rotations in layer i) then
Move rotation j from layer i + 1 to
layer i;
end
end
end
until the partitioning no longer changes;
Figure 7: Circuit to perform a π/8 rotation by consuming a
Algorithm to reduce the T count and T depth. magic state.

Accepted in Quantum 2019-02-01, click title to verify 6


0 Step 1 1 Step 2

ancilla region

Figure 8: Example of a Z|q1 i ⊗Y|q2 i ⊗X|q4 i ⊗Z|mi measurement


to implement a (Z ⊗ Y ⊗ 1 ⊗ X)π/8 gate. Figure 9: A compact block stores n data qubits in 1.5n + 3
tiles. The consumption of a magic state can take up to 9.

A Pπ/8 rotation corresponds to a P ⊗ Z measurement


involving the magic state. If the measurement outcome of tiles that are used for magic state distillation (distil-
is P ⊗ Z = −1, then a corrective Pπ/4 operation is lation blocks) and a set of tiles that host data qubits and
necessary. Since this is a Clifford gate, it can be sim- consume magic states via Pauli product measurements
ply commuted to the end of the circuit, changing the (data blocks). In this section, we discuss designs for
axes of the subsequent π/8 rotations. Finally, in or- the latter. In principle, the structure shown in Fig. 8
der to discard the magic state, it is disentangled from is a data block, where each qubit is stored in a two-
the rest of the system by an X measurement. Here, tile patch and magic states can be consumed every 1.
an outcome X = −1 prompts a Pπ/2 correction. π/2 However, this sort of design uses 3n tiles to host n data
rotations correspond to Pauli operators, i.e., Pπ/2 = P . qubits, which is a relatively large space overhead.
The Pauli correction can also be commuted to the end
of the circuit. When Pπ/2 is moved past a P 0 rotation 2.1 Compact block
or measurement, it changes the axis of rotation or mea-
surement basis to −P 0 , if P and P 0 anticommute. The first design that we discuss uses only 1.5n + 3 tiles.
In essence, if magic states are available, the only This compact block is shown in Fig. 9, where each data
operations required for universal quantum computing qubit is stored in a square patch. This lowers the space
are Pauli product measurements. In our framework, cost, but restricts the operators that are accessible by
such operations can be performed in 1 via multi- Pauli product measurements, as only the Z operator is
patch measurements, corresponding to multi-qubit lat- free to be measured. Using 3, patches may also be ro-
tice surgery. An example is shown in Fig. 8, where a tated (see Fig. 11a), such that the X operator becomes
(Z ⊗ Y ⊗ 1 ⊗ X)π/8 rotation on four qubits |q1 i-|q4 i accessible instead of the Z operator. The problematic
stored in four two-tile one-qubit patches is performed. operators are Y operators, which are the reason why
Using the circuit identity in Fig. 7, this is done by mea- the consumption of a magic state can take up to 9.
suring Z|q1 i ⊗Y|q2 i ⊗X|q4 i ⊗Z|mi between the four qubits The worst-case scenario is a π/8 rotation involv-
and a magic state. ing an even number of Y operators, such as the one
Summary. Clifford+T circuits can be written in shown in Fig. 10. One possibility to replace Y oper-
terms of π/8 rotations, π/4 rotations and measure- ators by X or Z operators is via π/4 rotations, since
ments. To convert input circuits into a standard form,
π/4 rotations can be commuted to the end of the cir-
cuit and absorbed by the final measurements. Thus, any
quantum computation can be written as a sequence of
π/8 rotations grouped into layers of mutually commut-
ing rotations. The number of rotations is the T count
and the number of layers is the T depth. Each rotation
can be performed by consuming a magic state via a
Pauli product measurement. These measurements can
be implemented in our framework in 1.

Figure 10: For compact blocks, the worst-case scenario are


2 Data blocks Pauli product measurements involving an even number of Y
operators, e.g., the measurement required for a (Y ⊗ 1 ⊗ Y ⊗
Since Clifford+T circuits are a sequence of π/8 rota- Z ⊗ Y ⊗ Y )π/8 gate. Such measurements require two explicit
tions, each requiring the consumption of a magic state, π/4 rotations (left), and two π/4 rotations that are commuted
it is natural to partition a quantum computer into a set to the end of the circuit (right).

Accepted in Quantum 2019-02-01, click title to verify 7


(a) Patch rotation (b) π/4 rotations
1 2 2 3 3

(c) (Y ⊗ 1 ⊗ Y ⊗ Z ⊗ Y ⊗ Y )π/8 rotation in 9


0 Step 1 1 Step 2 1 Step 3 2 Step 4

2 Step 5 5 Step 6 8 Step 7 9 Step 8

Figure 11: (a) Patches can be rotated in 3 to change whether the X or Z operator is adjacent to the compact block’s ancilla
region. (b) A Pπ/4 gate can be performed explicitly via a P ⊗ Y measurement with a |0i ancilla qubit. (c) Six-step protocol to
perform the rotation of Fig. 10 in a compact block. The magic state is consumed in 9, where steps 2-5 are the two π/4 rotations
in Fig. 10, steps 6 and 7 are patch rotations, and step 8 is the Pauli product measurement consuming the magic state.

Yπ/4 = Zπ4 Xπ/4 Z−π/4 . Rotations with an even number the lower row. Finally, in step 8, we measure the Pauli
of Y ’s require two π/4 rotations, while an odd num- product involving the magic state.
ber of Y ’s can be handled by one rotation. Only the This general procedure can be used for any π/8 ro-
left two π/4 rotations in Fig. 10 need to be performed tation. First, up to two π/4 rotations are performed in
explicitly. The right two rotations can be commuted 2. Next, patches in the upper and lower row are ro-
to the end of the circuit, changing the subsequent π/8 tated, which takes 3 per row. Finally, the Pauli prod-
rotations. Similarly to a π/8 rotation, a Pπ/4 rotation uct is measured in 1, requiring a total of 9. While
can be executed using a resource state |Y i = |0i + i |1i, this is very slow compared to Fig. 8, the compact block
as shown in Fig. 11b. However, even though this state is a valid choice for small quantum computers where the
is a Pauli eigenstate, it cannot be readily prepared in distillation of a magic state takes longer than 9.
our framework. Instead, we use a |0i state and Y mea-
surements, such that a Pπ/4 rotation is performed by
a P ⊗ Y measurement between the qubits and the |0i 2.2 Intermediate block
state. Afterwards, the |0i state is measured in X. If the One possibility to speed up compact blocks is to store
−P ⊗ Y and X measurements in Fig. 11b yield different all qubits in one row instead of two. This is the inter-
outcomes, a Pauli correction is necessary. mediate block shown in Fig. 13a, which uses 2n + 4 tiles
In Fig. 11, we go through the steps necessary to per- to store n qubits. By eliminating one row, all patch
form the (Y ⊗1⊗Y ⊗Z⊗Y ⊗Y )π/8 rotation of Fig. 10. In rotations can be done simultaneously. In addition, one
step 1, we start with a 12-tile data block storing 6 qubits can save 1 by moving all patches to the other side,
in the blue region. The orange region is not part of the thereby eliminating the need to move patches back to
data block, but is part of the adjacent distillation block, their row after the rotation. An example is shown in
i.e., it is the source of the magic states. In steps 2-5, Fig. 12. Suppose we have 5 qubits and need to pre-
we perform the two π/4 rotations that are necessary to pare them for a Z ⊗ X ⊗ Z ⊗ Z ⊗ X measurement. The
replace the Y operators with X’s, i.e., the first two π/4 first, third and fourth qubit are moved to the other side,
rotations in the circuit of Fig. 10. In step 6, we first which takes 1. Simultaneously, the second and fifth
rotate patches in the upper row, and then, in step 7, in qubit are rotated, which takes 2. Therefore, the total

Accepted in Quantum 2019-02-01, click title to verify 8


1 Step 1 2 Step 2 2 Step 3

Figure 12: Patch rotations in preparation of a Z ⊗ X ⊗ Z ⊗ Z ⊗ X measurement with an intermediate block.

number of time steps to consume a magic state is at Pauli operators are in the left two edges, and the second
most 5, where 2 are used for up to two π/4 rota- qubit’s operators are in the right two edges. Therefore,
tions, 2 for the patch rotations, and 1 for the Pauli the example in Fig. 13b is a fast block that stores 18
product measurement consuming the magic state. qubits.
Since all Pauli operators are accessible, the Pauli
2.3 Fast block product measurement protocol of Fig. 8 can be used
to consume a magic state every 1. n qubits occupy
The disadvantage of square patches is that only one ap square arrangement of tiles with √ a side length of
Pauli operator is adjacent to the data block’s ancilla n/2 + 1, i.e., a total of 2n + 8n + 1 tiles. Even
region, i.e., available for Pauli product measurements p
if n/2 is not integer, one should keep the block as
at any given time. Two-tile one-qubit patches as in square-shaped as possible by picking the closest integer
Fig. 8, on the other hand, allow for the measurement as a side length and shortening the last column. While
of any Pauli operator, but use two tiles for each qubit. the fast block uses more tiles compared to the compact
In order to have both compact storage and access to and intermediate blocks, it has a lower space-time cost,
all Pauli operators, we use two-qubit patches for our making it more favorable for large quantum comput-
fast blocks in Fig. 13b. These patches use two tiles to ers for which the distillation of a magic state takes less
represent two qubits (see Fig. 1), where the first qubit’s than 5.
Note that if undistilled magic states are sufficient,
(a) Intermediate block then any data block can already be used as a full quan-
tum computer. A proof-of-principle two-qubit device
in the spirit of Ref. [31] that constitutes a universal
two-qubit quantum computer with undistilled magic
ancilla region states and can demonstrate all the operations that are
used in our framework can be realized with six tiles,
(b) Fast block as shown in Appendix C. This proof-of-principle device
uses (3d − 1) · 2d physical data qubits, i.e., 48, 140, or
280 data qubits for distances d = 3, 5 or 7. If ancilla
qubits are used for stabilizer measurements, the number
of physical qubits roughly doubles, but it is still within
reach of near-term devices.
Summary. Data blocks store the data qubits of
the computation and consume magic states. Compact
blocks use 1.5n + 3 tiles for n qubits and require up to
9 to consume a magic state. Intermediate blocks use
2n + 4 tiles and √ take up to 5 per magic state. Fast
blocks use 2n + 8n + 1 tiles and take 1 per magic
state. Data blocks need to be combined with distillation
blocks for large-scale quantum computation.

ancilla region
3 Distillation blocks
In this section, we discuss designs of tile blocks that
Figure 13: (a) Intermediate blocks store n data qubits in 2.5n+ are used for magic state distillation. This is necessary,
4 tiles and√require up to 5 per magic state. (b) Fast blocks because with surface codes, the initialization of non-
use 2n + 8n + 1 tiles and require 1 per magic state. Pauli eigenstates is prone to errors, which means that

Accepted in Quantum 2019-02-01, click title to verify 9


Figure 14: Encode-T -decode circuit of the 15-to-1 distillation protocol. The multi-target CNOTs (orange) can be commuted past
the T gates, such that they cancel and leave 15 Z-type Pauli product rotations.

π/8 rotations performed using these states may lead The circuit begins with 5 qubits initialized in the |+i
to errors. In order to decrease the probability of such state and 10 qubits in the |0i state. Qubits 1-4, 5 and 6-
an error, magic state distillation [16] is used to con- 15 are associated with the four X stabilizers, the logical
vert many low-fidelity magic states into fewer higher- X operator, and the ten Z stabilizers of the code. The
fidelity states. This requires only Clifford gates (i.e., first five operations are multi-target CNOTs that corre-
Pauli product measurements), so, in principle, any of spond to the code’s encoding circuit. They map the X
the data blocks discussed in the previous section can Pauli operators of qubits 1-4 onto the code’s X stabiliz-
be used for this purpose. However, magic state distilla- ers, the X Pauli of qubit 5 onto the logical X operator
tion is repeated extremely often for large-scale quantum and the Z operators of qubits 6-15 onto the code’s Z
computation, so it is worth optimizing these protocols. stabilizers. Because we start out with +1-eigenstates of
Here, we discuss a general procedure that can be X and Z, this circuit prepares the simultaneous stabi-
applied to any distillation protocol based on an error- lizer eigenstate corresponding to the logical |+iL state.
correcting code with transversal T gates, such as punc- Next, a transversal T gate is applied, transforming the
tured Reed-Muller codes [16, 17] or block codes [18–20]. logical state to TL |+iL (actually to TL† |+iL ). Note that
To show the general structure of such a protocol, we go the 15 Zπ/8 rotations are potentially faulty. Finally, the
through the example of 15-to-1 distillation [16], i.e., a encoding circuit is reverted, shifting the logical qubit in-
protocol that uses 15 faulty magic states to distill a formation back into qubit 5, and the information about
single higher-fidelity state. the X and Z stabilizers into qubits 1-4 and 6-15. If
no errors occurred, qubit 5 is now a magic state T |+i
(actually T † |+i). In order to detect whether any of the
3.1 15-to-1 distillation 15 π/8 rotations were affected by an error, qubits 1-4
and 6-15 are measured in the X and Z basis, respec-
The 15-to-1 protocol is based on a quantum error-
tively, effectively measuring the stabilizers of the code.
correcting code that uses 15 qubits to encode a single
Since the code distance is 3, up to two errors can be
logical qubit with code distance 3. The reason why this
detected, which will yield a -1 measurement outcome
can be used for magic state distillation is that, for this
on some stabilizers. If any error is detected, all qubits
code, a physical T gate on every physical qubit corre-
are discarded and the distillation protocol is restarted.
sponds to a logical T gate (actually T † ) on the encoded
This way, if the error probability of each of the 15 T
qubit, which is called a transversal T gate. The general
gates is p, the error probability of the output state is
structure of a distillation circuit based on a code with
reduced to 35p3 to leading order. In other words, this
transversal T gates is shown in Fig. 14 for the example
protocol takes 15 magic states with error probability p,
of 15-to-1. It consists of four parts: an encoding circuit,
and outputs a single magic state with an error of 35p3 .
transversal T gates, decoding and measurement.

Accepted in Quantum 2019-02-01, click title to verify 10


Figure 15: 15-to-1 distillation circuit that uses 5 qubits and 11 π/8 rotations.

Simplifying the circuit. Using the commutation the first X stabilizer of this 15-qubit code is 1 ⊗ 1 ⊗ 1 ⊗
rules of Fig. 4b, we can commute the first set of multi- X ⊗ 1 ⊗ 1 ⊗ 1 ⊗ 1 ⊗ X ⊗ X ⊗ X ⊗ X ⊗ X ⊗ X ⊗ X. The
target CNOTs to the right. This maps the Zπ/8 rota- rows below the horizontal bar – in this case the last
tions onto Z-product π/8 rotations. Since controlled- row – show the logical X operators of the code. The
Pauli gates satisfy C(P1 , P2 ) = C(P1 , P2 )† , the multi- circuit in Fig. 15 is then obtained by placing a |+i state
target CNOTs of the encoding circuit precisely cancel for each row and a π/8 rotation for each column, with
the multi-target CNOTs of the decoding circuit, leaving the axis of rotation determined by the indices in the
a circuit of 15 Z-type π/8 rotations in Fig. 14. column – a 1 for each 0 and a Z for each 1. Note that,
Note that qubits 6-15 in this circuit are entirely re- in Fig. 15, the first four rotations (columns) of Eq. (1)
dundant. They are initialized in a Z eigenstate, are then are absorbed by the initial states.
part of a Z-type rotation, and are finally measured in
the Z basis, trivially yielding the outcome +1. Since 3.2 Triorthogonal codes
they serve no purpose, they can simply be removed to
yield the five-qubit circuit in Fig. 15, where we have The aforementioned circuit translation can be applied
absorbed the single-qubit π/8 rotations into the initial to any code with transversal T gates. One particu-
|+i states and rearranged the remaining 11 rotations. larly versatile and simple scheme to generate such codes
This kind of circuit simplification is equivalent to the is based on triorthogonal matrices [17, 18], which we
space-time trade-offs mentioned in Ref. [17] and can be briefly review in this section. The first step is to write
applied to any protocol that is based on a code with down a triorthogonal matrix G, such as
transversal T gates. In general, a code with mx X sta-  
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
bilizers that uses n qubits to encode k logical qubits 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
yields a circuit of n−mx π/8 rotations on mx +k qubits.  
G= 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 . (2)

Each of the mx + k qubits are either associated with an 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1
X stabilizer or one of the k logical qubits. For each of
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1
the n qubits of the code, the circuit contains one π/8
rotation with an axis that has a Z on each stabilizer or Triorthogonality refers to three criteria: i) The number
logical X operator that this qubit is part of. In order to of 1s in each row is a multiple of 8. ii) For each pair
more easily determine the n − mx rotations, it is useful of rows, the number of entries where both rows have
to write down an n × (mx + k) matrix that shows the a 1 is a multiple of 4. iii) For each set of three rows,
X stabilizers and logical X operators of the code. For the number of entries where all three rows have a 1 is a
15-to-1, such a matrix could look like this: multiple of 2. In other words,
X
  ∀a : Ga,i = 0 (mod 8)
0 0 0 1 0 0 0 0 1 1 1 1 1 1 1 i
X
0 0 1 0 0 1 1 1 0 0 0 1 1 1 1 ∀a, b : Ga,i Gb,i = 0 (mod 4) (3)
i
 
M15-to-1 0
= 1 0 0 1 0 1 1 0 1 1 0 0 1 1
 (1) X
1 0 0 0 1 1 0 1 1 0 1 0 1 0 1 ∀a, b, c : Ga,i Gb,i Gc,i = 0 (mod 2)
i
0 0 0 0 1 1 1 0 1 1 0 1 0 0 1
A general procedure based on classical Reed-Muller
Each of the first four rows describes one of the four codes to obtain such matrices is described in Ref. [17].
X stabilizers of the code, where 0 stands for 1 and 1 After obtaining a triorthogonal matrix, such as the
stands for X. For instance, the first row indicates that one in Eq. (2), the second step is to put it in a row

Accepted in Quantum 2019-02-01, click title to verify 11


Figure 16: 20-to-4 distillation circuit that uses 7 qubits and 17 π/8 rotations.

echelon form by Gaussian elimination distance lower than 2, precluding them from detecting
  errors and improving the quality of magic states. In
0000100001111 1 1 1
0 0 0 1 0 0 1 1 1 0 0 0 1 fact, the minimum number of qubits in triorthogonal
1 1 1
  codes was shown to be 14 [33].
0 0 1 0 0 1 0 1 1 0 1 1 0
G̃ =  0 1 1. (4)
Semi-triorthogonal codes. There are also codes
0 1 0 0 0 1 1 0 1 1 0 1 0 1 0 1
1000011101101 0 0 1 that are based on “semi-triorthogonal” matrices, where
all three conditions of Eq. (3) are only satisfied mod-
The last step is to remove one of the columns that con- ulo 2. One example is the matrix
tains a single 1, i.e., one of the first five columns, which
is also called puncturing.2 Puncturing an a × b tri-
orthogonal matrix k times yields a code encoding k log-  
0 0 0 0 0 0 1 1 0 0 1 1 0 1 1 1 0 1 1 0 1 1 0 1
ical qubits with mx = b − k and n = a − k. The rows of 0 0 0 0 0 1 0 1 0 1 0 1 1 10 1 1 0 1 1 0 1 1 0
the matrix after puncturing that contain an even num- 
0

0 0 0 1 0 0 1 1 0 0 1 1 01 0 1 1 0 1 1 0 1 1
ber of 1s describe X stabilizers, whereas the rows with 
0

0 0 1 0 0 0 0 1 1 1 1 0 10
. 0 0 0 0 0 0 0 1 1
an odd number of 1s describe X logical operators. In 
0 0 1 0 0 0 0 0 1 1 1 1 0 00 0 0 0 0 1 1 1 0 0
terms of distillation protocols, a code described by such 
0

1 0 0 0 0 0 0 1 1 1 1 0 00 0 1 1 1 0 0 0 0 0
a matrix can be used for n-to-k distillation. Indeed, if
1 0 0 0 0 0 0 0 1 1 1 1 1 0 1 1 0 0 0 0 0 0 0 0
we puncture the matrix in Eq. (4) once by removing the
(6)
first column, we retrieve the 15-to-1 protocol of Eq. (1).
When this matrix is punctured four times, it yields a
We can also puncture it twice by removing the first two
code that can be used for a 20-to-4 protocol. A scheme
columns. This yields the matrix
  to generate such matrices for 3k+8-to-k distillation is
00100001111111 shown in Ref. [18]. For the case of the 20-to-4 protocol,
0 1 0 0 1 1 1 0 0 0 1 1 1 1
  the matrix that describes the code
1 0 0 1 0 1 1 0 1 1 0 0 1 1 ,
M14-to-2 =  (5)

 
0 0 0 1 1 0 1 1 0 1 0 1 0 1 0 0 1 1 0 0 1 1 0
1 1 1 0 1 1 0 1 1 0 1
00011101101001 0 1 0 1 0 1 0 1 1
1 0 1 1 0 1 1 0 1 1 0
 
1 0 0 1 1 0 0 1 1
0 1 0 1 1 0 1 1 0 1 1
which describes a 14-to-2 protocol. The corresponding  
M = 0 0 0 0 1 1 1 1 0
1,0 0 0 0 0 0 0 0 1 1
circuit can be simply read off from this matrix. It is 20-to-4 0 0 0 0 1 1 1 1 0
0 0 0 0 0 0 1 1 1 0 0
almost identical to the 15-to-1 protocol of Fig. 15, ex- 
0

0 0 0 1 1 1 1 0
0 0 0 1 1 1 0 0 0 0 0
cept that the fourth qubit is initialized in the |+i state
0 0 0 0 1 1 1 1 1
0 1 1 0 0 0 0 0 0 0 0
and is not measured at the end of the circuit, but in-
(7)
stead outputs a second magic state. However, because
can be straightforwardly translated into the circuit in
the code of 14-to-2 has a code distance of 2, the output
Fig. 16. While semi-triorthogonal codes can be used
error probability is higher, namely 7p2 [18]. Punctur-
the same way for distillation as properly triorthogo-
ing the matrix G̃ any further would yield codes with a
nal codes, their caveat is that a Clifford correction
2 Even though this is commonly called puncturing, it would be may be required. This correction can be obtained by
perhaps more accurate to refer to this process as shortening (see, adding columns to the semi-triorthogonal matrix until
e.g., Ref. [32]), as was pointed out to me by a referee. it becomes properly triorthogonal, e.g., by adding the

Accepted in Quantum 2019-02-01, click title to verify 12


(a) Selective π/4 rotation (b) Auto-corrected π/8 rotation

(c) Implementation of the 15-to-1 circuit in Fig. 15


0 Step 1 1 Step 2 1 Step 3 11 Step 22 11 Step 23

(d) Implementation of the 20-to-4 circuit in Fig. 16


0 Step 1 1 Step 2 17 Step 34 17 Step 35

Figure 17: Implementation of the 15-to-1 and 20-to-4 distillation protocols in our framework. Each time step in (c) and (d)
corresponds to an auto-corrected π/8 rotation (b), which in turn is based on selective π/4 rotations (a).

columns of the matrix face codes. Distillation protocols are particularly sim-
  ple quantum circuits, since they exclusively consist of
0 0 0 0 1 1 1 1 Z-type π/8 rotations. Therefore, we can use a con-
0 0 1 1 0 0 1 1
  struction similar to the compact data block, and still
1 1 0 0 0 0 1 1
  only require 1 per rotation.
M
Clifford correction 0
= 0 0 0 0 0 0 0 (8)
0 0 0 0 0 0 0 0 Because distillation circuits are relatively short, it is
 
0 0 0 0 0 0 0 0 useful to avoid the Clifford corrections of Fig. 7 that
0 0 0 0 0 0 0 0 may be required with 50% probability after a magic
state is consumed. These corrections slow down the pro-
to the matrix of Eq. (7). Since the additional columns tocol, because they change the final X measurements to
come in pairs, this Clifford correction always consists of Pauli product measurements. Instead, we use a circuit
Z-type π/4 rotations [18]. which consumes a magic state and automatically per-
In this case, the correction consists of four π/4 rota- forms the Clifford correction. It is based on the selective
tions on the first three qubits, effectively changing the π/4 rotation circuit in Fig. 17a. To perform a Pπ/4 ro-
first (Z ⊗ Z ⊗ Z)π/8 rotation to a (Z ⊗ Z ⊗ Z)−π/8 rota- tation according to the circuit in Fig. 11b, a |0i state
tion, and the initial magic states to |mi = |0i+e−iπ/4 |1i is initialized and P ⊗ Y is measured, which takes 1.
states. The probability of any of the four output states However, the π/4 rotation is only performed if the |0i
being affected by an error is 22p2 . When treating this qubit is measured in X afterwards. If, instead, it is
output error rate as 5.5p2 per magic state, one should measured in Z, the qubit is simply discarded without
take into account that, for multiple output states, er- performing any operation. In other words, the choice
rors can be correlated. Note that 3k+8-to-k protocols of measurement basis determines whether a Pπ/4 or a 1
can be modified to 3k+4-to-k [33–35]. operation is performed. This can be used to construct
the circuit in Fig. 17b. Here, the first step to perform a
Pπ/8 gate is to measure P ⊗ Z between the qubits and a
3.3 Surface-code implementation
magic state |mi, and Z ⊗ Y between |mi and |0i. These
Having outlined the general structure of distillation pro- two measurements commute and can be performed si-
tocols, we now discuss their implementation with sur- multaneously. If the outcome of the first measurement

Accepted in Quantum 2019-02-01, click title to verify 13


is +1, no Clifford correction is required and |0i is read General space-time cost. The scheme of Fig. 17
out in Z. If the outcome is -1, |0i is measured in X, can be used to implement any protocol based on a
yielding the required Clifford correction. triorthogonal code. For an n-qubit code with k log-
This can be used to implement the 15-to-1 protocol ical qubits and mx X stabilizers, the protocol uses
of Fig. 15 in 11 using 11 tiles, as shown in Fig. 17c. 1.5(mx + k) + 4 tiles for (n − mx ) . In this time,
Four qubits are initialized in |mi, and a fifth in |+i. A it distills k magic states with a success probability of
2 × 2 block of tiles to the left is reserved for the |mi ∼(1 − p)n , since any error will result in failure. There-
and |0i qubits of the auto-corrected π/8 rotations. Two fore, such a protocol distills k magic state on average
additional tiles are used for the ancilla of the multi- every (n−mx )/(1−p)n time steps. Thus, the space-time
patch measurement. In step 2, the first π/8 rotation cost per magic state is
(1 ⊗ 1 ⊗ Z ⊗ Z ⊗ Z)π/8 is performed. Depending on
the measurement outcome of step 2, the |0i ancilla is [1.5(mx + k) + 4](n − mx ) 3
cost(n, mx , k, p, d) = d .
read out in the X or Z basis. This is repeated 11 times, k(1 − p)n
once for each of the 11 rotations in Fig. 15. Finally, in (9)
step 23, qubits 1-4 are measured in X. If all four out- In order to minimize the space-time cost for distillation
comes are +1, the distillation protocol yields a distilled in our framework, one should pick a distillation protocol
magic state in tile 5. Since 11 tiles are used for 11, that minimizes this quantity for a given input and target
the space-time cost is 121d3 in terms of (physical data error rate.
qubits)·(code cycles) to leading order. Similarly, the
20-to-4 protocol of Fig. 16 is implemented in Fig. 17d
using 14 tiles for 17, i..e, with a leading-order space- 3.4 Benchmarking
time cost of 238d3 .
Caveat. Even though our leading-order estimate of We can use the previously described 15-to-1 and 20-
the time cost of 11d code cycles for 15-to-1 or 17d code to-4 schemes to benchmark our implementations. In
cycles for 20-to-4 is correct, the full time cost also con- Ref. [36], these schemes were implemented with lattice
tains contributions that do not scale with d. The two surgery and their cost compared to implementations
processes that may require special care in the magic based on braiding of hole defects. In addition, the 7-
state distillation protocol are state injection and classi- to-1 scheme was considered, which is a scheme to distill
cal processing. Every 1 requires the initialization of |Y i states. The distillation of these states is not neces-
a magic state and a short classical computation to de- sary in our framework, but for benchmarking purposes
termine whether the |0i state needs to be measured in we show the 7-to-1 protocol in Appendix D. It can be
X or Z. While neither of these processes scales with d, implemented using 7 tiles for 4, i.e., with a space-time
they can slow down the distillation protocol, depending cost of 28d3 .
on the injection scheme and the control hardware that We summarize the leading-order space-time costs
is used. This slowdown can be avoided by using addi- of the three protocols in Table 1. The comparison
tional 2 × 2 blocks of |0i-|mi pairs, as shown in Fig. 18 shows drastic reductions in space-time cost compared
for 15-to-1 distillation with one additional block. Here, to schemes based on braiding of hole defects and com-
the left and right block can be used in an alternating pared to other approaches to optimizing lattice surgery.
fashion, i.e., the left block for rotations 1, 3, 5, . . . and Compared to the braiding-based scheme, the space-time
the right block for rotations 2, 4, 6, . . . While one block cost of 7-to-1, 15-to-1 and 20-to-4 is reduced by 60%,
is being used for a rotation, the other one can be used 84% and 90%, respectively.
to prepare a new magic state and to process the mea-
surement outcomes of the previous rotation.
7-to-1 15-to-1 20-to-4
3 3
Hole braiding [20, 37] 70d 750d 2344d3
Lattice surgery [36] 140d3 540d3 1134d3
Our framework 28d3 121d3 238d3

Table 1: Comparison of the leading-order space-time cost of 7-


Figure 18: Two 2 × 2 ancilla blocks can be used to prevent to-1, 15-to-1 and 20-to-4 with defect-based schemes, optimized
state injection and classical processing from slowing down the lattice surgery in Ref. [36] and our schemes. The space-time
15-to-1 protocol. cost is in terms of (physical data qubits)·(code cycles).

Accepted in Quantum 2019-02-01, click title to verify 14


ancilla

ancilla

ancilla

ancilla

ancilla

ancilla

ancilla
ancilla

ancilla
ancilla

ancilla

ancilla

ancilla
Figure 19: 176-tile block that can be used for 225-to-1 distillation. The qubits highlighted in red are used for the second level of
the distillation protocol. The blue ancilla is used to move level-1 magic states into the two |mi-|0i blocks of the level-2 distillation.

3.5 Higher-fidelity protocols just 11. Therefore, the entire protocol finishes in 15
using 176 tiles with a total space-time cost of 2640d3 .
So far, we have only explicitly discussed protocols that It should be noted that, since lower-level distillation
reduce the input error to ∼p2 or ∼p3 . There are two blocks produce magic states with low fidelity, there is no
strategies to obtain protocols with a higher output fi- benefit in using the full code distance to produce these
delity: concatenation and higher-distance codes. states. The space-time cost of concatenated protocols
Concatenation. In the 15-to-1 protocol, we use 15 can be reduced significantly by running the lower-level
undistilled magic states to obtain a distilled magic state distillation blocks at a reduced code distance (see, e.g.,
with an error rate of 35p3 . If we perform the same pro- Refs. [12, 38]), using smaller patches and fewer code
tocol, but use 15 distilled magic states from previous cycles. The exact code distance that should be used
15-to-1 protocols as inputs, the output state will have depends on the protocol and the desired output fidelity.
an error rate of 35(35p3 )3 = 1500625p9 . This corre- Higher-distance codes. Alternatively, we can use
sponds to a 225-to-1 protocol obtained from the con- a code that produces higher-fidelity states. In Ref. [17],
catenation of two 15-to-1 protocols. It is also possible several protocols based on punctured Reed-Muller codes
to concatenate protocols that are not identical. Strate- are discussed. One of these protocols is a 116-to-12
gies to combine high-yield and low-yield protocols are protocol based on a code with n = 116, k = 12 and
discussed in Ref. [18]. mx = 17. It yields 12 magic states which each have an
In Fig. 19, we show an unoptimized block that can error rate of 41.25p4 . According to Eq. (9), this pro-
be used for 225-to-1 distillation. It consists of 11 15- tocol can be implemented using 44 tiles for 99 with
to-1 blocks that are used for the first level of distilla- a space-time cost of 363d3 per output state and a suc-
tion. Since each of these 11 blocks takes 11 to finish, cess probability of (1 − p)116 . For protocols with a high
they can be operated such that exactly one of these space cost such as 116-to-12, the space-time cost can be
blocks finishes in every time step. Therefore, in ev- slightly reduced by introducing additional ancilla space,
ery time step, one first-level magic state can be used for such that two operations can be performed simultane-
second-level distillation by moving it into one of the two ously. One possible configuration is shown in Fig. 20.
level-2 |mi-|0i blocks via the blue ancilla. The qubits This increases the space cost to 81 tiles, but reduces
that are used for the second level are highlighted in red. the time cost to 50, with a total space-time cost of
Note that since, for the second level, the single-qubit 337.5d3 per output state.
π/8 rotations require distilled magic states, the 15-to- Output-to-input ratio is not everything. A pop-
1 protocol of Fig. 15 requires 15 rotations instead of ular figure of merit when comparing n-to-k distillation

Accepted in Quantum 2019-02-01, click title to verify 15


ancilla 2
4 Trade-offs limited by T count
Having discussed data blocks and distillation blocks in
the previous two sections, we are now ready to piece
them together to a full quantum computer. In order
ancilla 1 to illustrate the steps that are necessary to calculate
the space and time cost of a computation and to trade
off space against time, we consider an example com-
putation with a T count of 108 and a T depth of 106 .
We consider two different scenarios: an error rate of
p = 10−3 and an error rate of p = 10−4 . The error rate
Figure 20: 81-tile block that can be used for the 116-to-12 determines how many physical qubits are required per
protocol. Here, two π/8 rotations can be performed at the logical qubit and which distillation protocol should be
same time, where one rotation uses the ancilla space denoted used. It is only a meaningful number, if we specify an er-
as ancilla 1, and the other one uses ancilla 2. ror model for the physical qubits and undistilled magic
states. We will assume circuit-level nose for the physi-
cal qubits, i.e., faulty qubits, gates and measurements.
protocols is the ratio k/n. One of the protocols in
The error model for undistilled magic states depends
Ref. [17] is a 912-to-112 protocol with n = 912, k = 112
on the specific state-injection protocol. We will assume
and mx = 64, which yields 112 output state, each with
that raw magic states are affected by random Pauli er-
an error rate of 10.63p6 . While the output fidelity is
rors with probability p. To calculate concrete numbers,
not as high as for 225-to-1, the output-to-input ratio is
we assume that the quantum computer can perform a
much higher. For p = 10−3 , the output fidelity of 225-
code cycle every 1 µs. We want to perform the 108 -T -
to-1 is ∼1.5 × 10−21 , while it is only ∼10−17 for 912-
gate computation in a way that the probability of any
to-112. Therefore, if output-to-input ratio were a good
one of the T gates being affected by an error stays be-
figure of merit, we would expect the 912-to-112 proto-
low 1%. In addition, we require that the probability of
col to be considerably less costly compared to 225-to-1.
an error affecting any of the logical qubits encoded in
If we use an implementation in the spirit of Fig. 20,
surface-code patches stays below 1%. This results in a
the space cost is roughly 2.5(mx + k) tiles and the pro-
2% chance that the quantum computation will yield a
tocol takes (n − mx )/2 time steps. Thus, 912-to-112
wrong result. In order to exponentially increase the pre-
uses 440 tiles for 424. This would put the space-time
cision of the computation, it can be repeated multiple
cost per state at 1665d3 , which is indeed lower than
times or run in parallel on multiple quantum computers.
that of 225-to-1. However, the success probability of
912-to-112 for p = 10−3 is only at ∼40%, which more
than doubles the actual space-time cost. On the other 4.1 Step 1: Determine distillation protocol
hand, the space-time cost of 225-to-1 is barely affected
by the success probability, as each of the level-1 15-to- The first step is to determine which distillation protocol
1 blocks finishes with 98.5% success probability. This is sufficient for the computation. In order to stay below
means that, with 1.5% probability, a time step of 225- 1% error probability with 108 T gates, each magic state
to-1 is skipped, since the necessary level-1 state is miss- needs to have an error rate below 10−10 . For p = 10−4 ,
ing. This only increases the space-time cost from 26403 the 15-to-1 protocol is sufficient, since it yields an out-
to 2680d3 . Even without further decreasing the space- put error rate of 35p3 = 3.5 · 10−11 . For p = 10−3 ,
time cost of 225-to-1 by reducing the code distance of 15-to-1 is not enough. On the other hand, two levels of
the level-1 distillation blocks, this indicates that the 15-to-1, i.e., 225-to-1, yield magic states with an error
output-to-input ratio is not a good figure of merit in rate of 1.5 · 10−21 , which is many orders of magnitude
our framework. above what is required. A less costly protocol is 116-
Summary. The class of magic state distillation pro- to-12, which yields output states with an error rate of
tocols that are based on an n-qubit error-correcting 41.25p4 = 4.125 · 10−11 , which suffices for our purposes.
code with mx X stabilizers and k logical qubits can
be implemented using 1.5(mx + k) + 4 tiles and n − mx 4.2 Step 2: Construct a minimal setup
time steps. Such protocols output k magic states with
a success probability of (1 − p)n . Therefore, if the in- In order to determine the necessary code distance, we
put fidelity and desired output fidelity are known, the first construct a minimal setup, i.e., a configuration of
distillation protocol should minimize the cost function tiles that can be used for the computation and uses as
given in Eq. (9). little space as possible. The reason why this is useful

Accepted in Quantum 2019-02-01, click title to verify 16


(a) Minimal setup for p = 10−4 (a) Intermediate setup for p = 10−4

(b) Intermediate setup for p = 10−3


(b) Minimal setup for p = 10−3

Figure 21: Minimal setups using compact data blocks for p =


10−4 (with 15-to-1 distillation) and p = 10−3 (with 116-to- Figure 22: Intermediate setups using intermediate data blocks
12 distillation). Blue tiles are data block tiles, orange tiles and two 15-to-1 distillation blocks for p = 10−4 or one compact
are distillation block tiles, green tiles are used for magic state 116-to-12 distillation block for p = 10−3 .
storage and gray tiles are unused tiles.

storage tiles (green tiles in Fig. 21b). Here, we choose


to determine the code distance is that the initial space- the 12 output states to be qubits 6, 8, 10, . . . , 26 and 27.
time trade-offs that we discuss significantly improve the In the last step of the protocol these states are moved
overall space-time cost. Therefore, the minimal setup into the green space, where they are consumed by the
can be used to comfortably upper-bound the required data block one after the other. This minimal setup uses
code distance. 153 tiles for the data block, 44 tiles for the distillation
For p = 10−4 , a minimal setup consists of a compact block and 13 tiles for storage. In total, it uses 210 tiles
data block and a 15-to-1 distillation block, see Fig. 21a. and finishes the computation in 9.27 · 108 time steps.
The compact block stores 100 qubits in 153 tiles and
requires up to 9 to consume a magic state. The 15- 4.3 Step 3: Determine code distance
to-1 distillation block uses 11 tiles and outputs a magic
state every 11 with 99.9% success. To ensure that the Since each tile corresponds to d × d physical data qubits
tile of the distillation block that is occupied by qubit 5 is and each time step corresponds to d code cycles, 164 en-
not blocked during the first time step of the distillation coded logical qubits need to survive for (11 · 108 )d code
protocol, the first π/8 rotation of the protocol should cycles for the minimal setup with p = 10−4 . The proba-
be chosen such that it does not involve qubit 5, e.g., the bility of a single logical error on any of these 164 qubits
fourth rotation of Fig. 15. In total, this minimal setup needs to stay below 1% at the end of the computation.
uses 164 tiles and performs a T gate every 11, i.e., The logical error rate per logical qubit per code cycle
finishes the computation in 11 · 108 time steps. can be approximated [12] as
For p = 10−3 , a minimal setup consists of a compact
pL (p, d) = 0.1(100p)(d+1)/2 (10)
data block and a 116-to-12 distillation block, as shown
in Fig. 21b. For the minimal setup, we do not use the for circuit-level noise. Therefore, the condition to de-
larger and faster distillation block shown in Fig. 20, but termine the required code distance is
instead a block in the spirit of the 15-to-1 block. This
116-to-12 distillation block uses 44 tiles and distills 12 164 · 11 · 108 · d · pL (10−4 , d) < 0.01 . (11)
magic states in 99 with 89% success probability, i.e.,
on average one state every 9.27. Because this distil- For distance d = 11, the final error probability is at
lation protocol outputs magic states in bursts, i.e., 12 19.8%. Therefore, distance d = 13 is sufficient, with a
at the same time, these states need to be stored before final error probability of 0.2%. The number of physi-
being consumed. Therefore, we introduce additional cal qubits used in the minimal setup can be calculated

Accepted in Quantum 2019-02-01, click title to verify 17


(a) Fast setup for p = 10−4 (b) Fast setup for p = 10−3

distillation block storage tiles


fast data block unused tiles

Figure 23: Fast setups using fast data blocks and 11 15-to-1 distillation blocks for p = 10−4 or 5 116-to-12 distillation block for
p = 10−3 .

as the number of tiles multiplied by 2d2 , taking mea- avoid this bottleneck, we can use the intermediate data
surement qubits into account. The minimal setup for block instead, which occupies 204 tiles, but consumes
p = 10−4 uses 164 · 2 · 132 ≈ 55,400 physical qubits and one magic state every 5. With 22 tiles for distillation
finishes the computation in 13·11·108 code cycles. With (see Fig. 22), this setup uses 226 tiles and finishes the
1 µs per code cycle, this amounts to roughly 4 hours. computation after 5.5 · 108 time steps. This increases
For p = 10−3 , the condition changes to the number of qubits to 76,400, but reduces the com-
putational time to 2 hours.
210 · 9.27 · 108 × d · pL (10−3 , d) < 0.01 , (12)
For p = 10−3 , the addition of a distillation block
which is satisfied for d = 27 with a final error probability reduces the distillation time to 4.64. At this point,
of 0.5%. The final error probability for d = 25 is at one should switch to the more efficient 116-to-12 block
4.9%. Thus, the minimal setup uses 210 · 2 · 272 ≈ of Fig. 20, which uses 81 tiles and distills a magic state
306,000 physical qubits and finishes the computation in on average every 4.68. The intermediate data block
27 · 9.27 · 108 code cycles, which amounts to roughly cannot keep up with this distillation rate, but we can
7 hours. Note that, in principle, a success probability still use it to consume one magic state every 5 instead
of less than 50% would be sufficient to reach arbitrary of 4.68. Such a configuration uses 228 data tiles, 81
precisions by repeating computations or running them distillation tiles and 13 storage tiles, i.e., a total of 322
in parallel. This means that the code distances that we tiles corresponding to approximately 469,000 physical
consider may be higher than what is necessary. qubits. The computational time reduces to 5 · 108 time
steps, i.e., 3.75 hours. Note that in Fig. 22b, the 12
output states of the 116-to-12 protocol should be chosen
4.4 Step 4: Add distillation blocks as 1, 3, 5, . . . , 25. They can be moved into the green
Only a small fraction of the tiles of the minimal setup is storage space in the last step of the protocol, since the
used for magic state distillation, i.e., 6.7% for p = 10−4 space denoted as ancilla 2 in Fig. 20 is not being used
and 21% for p = 10−3 . On the other hand, adding one in the last step.
additional distillation block doubles the rate of magic Trade-offs down to 1 per T gate. Adding addi-
state production, potentially doubling the speed of com- tional distillation blocks can reduce the time per T gate
putation. Therefore, in order to speed up the computa- down to 1. For p = 10−4 , 11 distillation blocks pro-
tion and decrease the space-time cost, we add additional duce 1 magic state every 1. To consume these magic
distillation blocks to our setup. states fast enough, we need to use a fast data block.
For p = 10−4 , adding one more distillation block re- This fast block uses 231 tiles and the 11 distillation
duces the time that it takes to distill a magic state blocks together with their storage tiles use 11∗12 = 132
to 5.5 per state. However, the compact block can tiles, as shown in Fig. 23a. With a total of 363 tiles, this
only consume magic states at 9 per state. In order to setup uses 123,000 qubits and finishes the computation

Accepted in Quantum 2019-02-01, click title to verify 18


in 108 , i.e., in 21 minutes and 40 seconds. 5 Trade-offs limited by T depth
For p = 10−3 , parallelizing 5 distillation blocks pro-
duces a magic state every 0.936. This is faster than In the previous section, we parallelized distillation
the fast block can consume the states, but allows for blocks to finish computations in a time proportional to
the execution of a T gate every 1. With 231 tiles for the T count. In this section, we combine the previous
the fast block, 405 distillation tiles and 60 storage tiles, constructions of data and distillation blocks to what we
the total space cost is 696 tiles. The setup shown in refer to as units. By parallelizing units, we exploit the
Fig. 20b contains four unused tiles to make sure that fact that, in our example, the 108 T gates are arranged
all storage lines are connected to the data block. Stor- in 106 layers of 100 T gates to finish the computation
age lines need to be connected to the ancilla space of the in a time proportional to the T depth. We first slightly
data block either directly, via other storage lines or via increase the space-time cost compared to the previous
unused tiles. In any case, this corresponds to roughly section, in order to speed up the computation down to
1,020,000 physical qubits. The computation finishes af- one measurement per T layer. In this sense, we imple-
ter 45 minutes. ment Fowler’s time-optimal scheme [21].
Avoiding the classical overhead. Every con-
sumption of a magic state corresponds to a Pauli prod- 5.1 T layer parallelization
uct measurement, the outcome of which determines
whether a Clifford correction is required. This correc- The main concept used to parallelize T layers is quan-
tion is commuted past the subsequent rotations, po- tum teleportation. The teleportation circuit is shown
tentially changing the axis of rotation. Therefore, the in Fig. 24a. It
√ starts with the generation of a Bell pair
computation cannot continue before the measurement (|00i+|11i)/ 2 by the Z ⊗Z measurement of |+i⊗|+i.
outcome is determined. This involves a small classical An arbitrary gate U is performed on the second half of
computation to process the physical measurements (i.e., the Bell pair. Next, a qubit |ψi and the first half of the
decoding and feed-forward), which could slow down the Bell pair are measured in the Bell basis, i.e., in X ⊗ X
quantum computation. In order to avoid this, the magic and Z ⊗ Z. After the measurement, the first two qubits
state consumption can be performed using the auto- are discarded and |ψi is teleported to the third qubit
corrected π/8 rotations of Fig. 17b. Here, the classi- through the gate U . This means that the output state
cal computation merely determines, whether the ancilla is U |ψi, if the teleportation is successful. However, it
qubit – which we refer to as the correction qubit |ci – is is only successful, if both Bell basis measurements yield
measured in the X or Z basis. While this classical com- a +1 outcome. In the other three cases, the teleported
putation is running, the magic state for the subsequent state is U X |ψi, U Y |ψi or U Z |ψi. Note that the cor-
π/8 rotation can be consumed, as the auto-corrected rection operation to recover the state |ψi is not a Pauli
rotation involves no Clifford correction. This means operation P , but instead U P U † , which, in general, is as
that distillation blocks should output |mi − |ci pairs, difficult to perform as U itself.
for which we construct modified distillation blocks in If U is a Pπ/8 rotation, as in Fig. 24b, the Pauli er-
the following section. If the classical computation is, rors change Pπ/8 to P−π/8 up to a Pauli correction.
on average, faster than 1 (i.e., d code cycles), then Since it is only after the Bell basis measurement that
classical processing does not slow down the quantum
computation in the T -count-limited schemes. (a) Teleportation circuit
Summary. Data blocks combined with distillation
blocks can be used for large-scale quantum computing.
The first step is to determine a sufficiently high-fidelity
distillation protocol. Next, one constructs a minimal
setup from a compact data block and a single distilla-
tion block to upper-bound the required code distance.
Finally, one can trade off space against time by using (b) Teleportation through a π/8 rotation
fast data blocks and adding more distillation blocks.
This can reduce the time per T gate down to 1. In
our example, the trade-off also reduces the space-time Figure 24: (a) Circuit for quantum teleportation of |ψi through
cost compared to the minimal setup by a factor of 5 for a gate U . Only if both Bell basis measurement yield +1, the
p = 10−4 and by a factor of 2.8 for p = 10−3 . In or- teleported state is U |ψi. If Z ⊗ Z = −1, the state is U X |ψi.
der to fully exploit the space-time trade-offs discussed If X ⊗ X = −1, the state is U Z |ψi. If both measurements
in this section, the input circuit should be optimized for yield -1, the state is U Y |ψi. (b) If U is a π/8 rotation, the
T count. corrective Paulis change Pπ/8 to P−π/8 .

Accepted in Quantum 2019-02-01, click title to verify 19


(a) Clifford+T circuit (b) Post-corrected π/8 rotation

| {z } | {z } | {z }
layer 1 layer 2 layer 3

(c) Time-optimal Clifford+T circuit

Figure 25: Time-optimal implementation of a three-qubit quantum computation consisting of 9 T gates in 3 T layers. Post-
corrected π/8 rotations (b) can be used to decide at a later point, whether the performed operation was a Pπ/8 or a P−π/8
rotation.

we know, whether we should have performed a Pπ/8 or where all three T layers are executed simultaneously.
a P−π/8 gate, we use post-corrected π/8 rotations in The reason why we can only group up T gates that are
Fig. 25b, which are similar to the auto-corrected rota- part of the same layer is that otherwise the Pauli correc-
tions of Fig. 17b. The post-corrected rotation uses a tions of the post-corrected rotation would not commute
resource state consisting of two qubits, a magic state with the other rotations. The time-optimal circuit con-
|mi and a second qubit that we refer to as a correction sists of three steps: The preparation of Bell pairs for
qubit |ci. The resource state is generated by initializing each T layer, the application of T gates, and a set of fi-
|ci in |0i and measuring Z ⊗ Y between |mi and |ci. In nal Bell measurements. At this point, the computation
order to perform a post-corrected π/8 rotation, the re- is not finished, as we still need to measure the correction
source state is consumed by measuring P ⊗ Z involving qubits of the post-corrected rotations. Because these in-
the magic state, and measuring |mi in X. The correc- volve potential Pauli corrections, the correction qubits
tion qubit |ci is stored for later use. It can be used at of the different T layers need to be measured one after
a later moment to decide, whether the rotation should the other. Thus, every T layer is executed one after the
have been a +π/8 or −π/8 rotation by measuring |ci other, where each execution requires the time that it
either in the Z or X basis. Depending on the measure- takes to measure the correction qubits and perform the
ment outcome, a Pauli correction may be required. classical processing to determine the next set of mea-
The time-optimal circuit. This can be used to ex- surements from the Pauli corrections. We refer to this
ecute multiple T layers simultaneously. If U is a product time as tm . In other words, any Clifford+T circuit con-
of mutually commuting π/8 rotations, i.e., a T layer, sisting of nL T layers can be executed in nL · tm , inde-
the teleportation corrections replace all π/8 rotations pendent of the code distance, which is the main feature
with post-corrected rotations. An example is shown in of the time-optimal scheme [21].
Fig. 25 for a three-qubit computation of three T layers, The circuit in Fig. 25c naively requires 2n · nL qubits

Accepted in Quantum 2019-02-01, click title to verify 20


Figure 26: An example of a time-optimal circuit using four units. In this case, each unit consists of six qubits, i.e., it is a three-qubit
quantum computation, where three T layers can be executed simultaneously.

for an n-qubit computation, which scales with the preparation takes 113. If tm = 1 µs, then nmax is
length of the computation. Since we only have a finite ∼1500 for p = 10−4 and ∼3000 for p = 10−3 . Indepen-
number of qubits at our disposal, our goal is to imple- dently of the error rate, the computational time drops
ment the circuit in Fig. 26 instead. Here, the qubits to one second.
form groups of 2n qubits. We refer to each of these
groups as a unit. Using nu units, nu −1 layers of T gates
can be performed at the same time. In the circuit, the 5.2 Units
steps of Bell state preparation (BP ), post-corrected T
Units differ from the fast setups in Fig. 23 in three as-
layer execution (T ) and Bell basis measurement (BM )
pects. First, the number of qubits stored in the data
are performed repeatedly until the end of the computa-
block is doubled. Secondly, the distillation protocols are
tion. We refer to the block of operations (BP -T -BM )
modified to output |mi-|ci pairs, instead of just magic
as unit preparation. Every time that unit preparation is
states |mi. Thirdly, in order to store correction qubits
finished, all qubits except for the correction qubits (not
|ci, additional space is required. Contrary to magic-
shown in Fig. 26) and half of the qubits of the last unit
state storage tiles, correction-qubit storage tiles do not
are discarded. At this point, the next set of unit prepa-
need to be connected to the data block’s ancilla region.
rations begins. Simultaneously, the correction qubits of
the recently finished units are measured one after the Modified distillation blocks. In order to have dis-
other, which has a time cost of (nu −1)·tm . This means tillation blocks output |mi-|ci pairs, extra tiles and op-
that the number of units can be increased to speed up erations are required. We show the necessary modifi-
the computation, until (nu −1)·tm reaches the time that cations for the example of 15-to-1 and 116-to-12 distil-
it takes to prepare a unit tu . At this maximum number lation. A modified 15-to-1 block is shown in Fig. 27a.
of units nmax = tu /tm + 1, a T layer is executed every Apart from the standard 11 distillation tiles (orange)
tm and the computation cannot be sped up any further and one magic-state storage tile (green), it also contains
in the Clifford+T framework. 19 correction-qubit storage tiles (purple) and an addi-
tional tile (gray) that is used for neither distillation nor
Note that the first and last unit differ from the other storage. The additional steps that modify the protocol
units. While all other units need to execute nT T gates are shown in Fig. 27c, which zooms into the highlighted
every tu , the first and last unit need to execute nT T region of Fig. 27a. In step 1 of the shown protocol, the
gates only every 2tu , where nT is the number of T gates distillation has just finished after 11. The patch of
per layer. Furthermore, the other blocks need to be able the output state is deformed in step 2, and an addi-
to store up to 2nT correction qubits, since, after the end tional qubit |ci is initialized in the |0i state. The Y ⊗ Z
of a unit preparation, nT correction qubits are stored, operator between |ci and |mi is measured in step 3. In
and may need to remain stored until the end of the step 4, the correction qubit is sent to storage. Finally,
next unit preparation. For the first and last block, on in step 5, the magic state |mi is moved to its storage
the other hand, the required storage space is halved. tile. This operation blocks one of the orange tiles that is
In the following, we will show how to prepare units used for the distillation protocol for 4. Still, this does
in our framework. We find that, for our examples, unit not slow down 15-to-1 distillation, since the first 4 rota-

Accepted in Quantum 2019-02-01, click title to verify 21


(a) Modified 15-to-1 block (c) Modified 15-to-1 protocol
11 Step 1 12 Step 2

(b) Modified 116-to-12 block

13 Step 3 14 Step 4 15 Step 5

(d) Modified 116-to-12 protocol


50 Step 1 52 Step 2 53 Step 3

Figure 27: Modified 15-to-1 distillation blocks (a) output a |mi-|ci pair every 11. After the end of the distillation protocol, four
additional steps (c) are necessary. The modified 116-to-12 distillation block (b) finishes after 53, due to the three additional
steps in (d).

tion of the protocol in Fig. 15 can be chosen, such that ber of distillation blocks is chosen such that at least
the output qubit is not needed. Therefore, the modified 100 |mi-|ci pairs can be distilled in 113. A full time-
distillation block outputs one |mi-|ci pair every 11. optimal quantum computer consists of a row of multiple
For 116-to-12 distillation, a modified block is shown units, see Fig. 29c. The units shown in the figure con-
in Fig. 27b. We arrange the qubits, such that the 12 out- tain some unused tiles. This gives the units a rectangu-
put states are found in the positions shown in step 1 of lar profiles, even though this is not necessarily required.
Fig. 27d. Using 2, correction qubits are prepared and In our case, the units have a footprint of 54 × 21 and
Y ⊗ Z operators are measured. Finally, the patches are 37 × 21 tiles, respectively. Note that the first and last
deformed back to square patches and all magic states
are sent to the green storage, while all correction qubits 0 Step 1
are sent to the purple storage. This adds 3 to the pro-
tocol, meaning that this block outputs 12 |mi-|ci pairs
every 53 with a success probability of (1 − p)116 . For
p = 10−3 , this corresponds to one output every 4.96.
As mentioned in Sec. 4, modified distillation blocks 1 Step 2 1 Step 3
can also be used with setups, in which T gates are per-
formed one after the other, in order to deal with slow
classical processing. In this case, only one correction
qubit storage tile per magic state is required.
Units. Modified distillation blocks together with fast 2 Step 4 2 Step 5
data blocks are what we refer to as units. The units for
our example computation for p = 10−3 and p = 10−4
are shown in Fig. 29a-b. They both consist of a 200-
qubit fast data block, 200 correction-qubit storage tiles,
and a number of distillation blocks. Since we will show
that unit preparation takes 113 in our case, the num- Figure 28: Bell basis measurement (BM ) in 2.

Accepted in Quantum 2019-02-01, click title to verify 22


(a) Unit for p = 10−3

(b) Unit for p = 10−4


data |mi storage
distillation |ci storage
unused tiles

(c) Time-optimal setup

unit 1

unit 2

unit 3

unit 4

Figure 29: Units consist of fast data blocks, modified distillation blocks and storage tiles. (a) The unit for p = 10−3 consists of
54 × 21 = 1134 tiles. (b) For p = 10−4 , the number of tiles is 37 × 21 = 777. (c) A time-optimal setup consists of a row of
multiple units, which means that the space to the bottom and top of the fast data blocks needs to remain free.

unit of a time-optimal setup are smaller, as they only to the top, and the other with a neighboring unit to the
require 100 correction-qubit storage tiles and half the bottom. For an n-qubit quantum computation,
√ this Bell
number of distillation blocks. state preparation can be performed in n+1 time steps,
Unit preparation. In order to implement the time- as we show in Fig. 30 for the example of n = 9. For this,
optimal circuit of Fig. 26 with the setup of Fig. 29, we every qubit is initialized in the |+i state. The Bell state
show protocols that can be used for the BP -T -BM op- preparation requires a series of Z ⊗ Z measurements.
erations. The data blocks of every unit store 2n qubits The protocol in Fig. 30 shows that, since an n-qubit
in n two-qubit patches. We arrange the qubits in such computation √ implies that the number of rows of the
a way that the the final Bell measurements (BM ) are data
√ block is n, these measurements require a total of
Z ⊗ Z and X ⊗ X measurements of the two qubits of n + 1 time steps.
every two-qubit patch. This Bell measurement can be In total, the unit preparation of an n-qubit computa-

done in 2, as shown in Fig. 28. tion with nT T gates per layer requires n+1 time steps
This arrangement of qubits implies that, for every for the Bell state preparation, nT time steps for the exe-
two-qubit patch, one of the qubits needs to be part of a cution of the T layer, and 2 time steps
√ for the Bell basis
Bell state preparation (BP ) with the neighboring unit measurement, i.e., a total of nT + n + 3 time steps. In

Accepted in Quantum 2019-02-01, click title to verify 23


1 Step 1 2 Step 2 3 Step 3 4 Step 4 (a) Distributed quantum computing

ent. dist .
. ent. dist
unit un it
(b) effective circuit
ent. dist .
. ent. dist
Bell pairs Bell pairs
ent. dist. ent. dist.
unit unit
ent. dist. ent. dist.
Bell pairs Bell pairs
. ent. dist
ent. dist .
un it unit
Figure 30: Bell state preparation (BP ) for a 9-qubit compu- . ent. dist
tation (18 qubits per unit) in 4. All two-qubit patches are ent. dist .
initialized in the |+i⊗2 state. Each measurement ancilla is used
for a Z ⊗ Z measurement between two qubits in different units.

For n-qubit computations, this requires n + 1 time steps.
Figure 31: Scheme for distributed quantum computing in a
circular arrangement of quantum computers with the ability
to share Bell pairs between nearest neighbors. If the Bell-pair
our example, this amounts to 113, which corresponds fidelity is low, entanglement distillation (ent. dist.) can be used
to tu = 1469 µs for p = 10−4 and tu = 3051 µs for to increase the fidelity. This scheme effectively implements the
p = 10−3 . Thus, time optimality is reached with 1470 circular time-optimal circuit drawn schematically in (b).
units for p = 10−4 and 3052 units for p = 10−3 .
Space-time trade-offs. Of course, it is also possi-
ble to use fewer units than required for time optimality. software-based entanglement distillation [39, 40] can be
Using nu units means that nT · (nu − 1) T gates are per- used to convert a large number of low-fidelity Bell pairs
formed every tu . In our example, 100 · (nu − 1) T gates into fewer high-fidelity Bell pairs. Recent experiments
are performed every 113. With three units, the com- have made progress towards generating entanglement
putational time drops to 56.5% of the computational between different superconducting chips [41–43].
time of the fast setup in Fig. 23. With ten units, it drops For the time-optimal scheme, quantum computers
to 11%. The number of qubits per unit is ∼260,000 may be arranged in a circle as shown in Fig. 31a,
for p = 10−4 and ∼1,650,000 for p = 10−3 , so going with the ability to share Bell pairs between neighboring
from the fast setup to parallelized units is, initially, not quantum computers. This effectively implements the
a favorable space-time trade-off. Since the space-time circuit that is schematically drawn in Fig. 31b. Note
cost has increased compared to the fast setup, it is also that in this circuit, there is no first and last unit. Here,
useful to check whether the code distance needs to be every unit performs nT π/8 rotations every tu . There-
readjusted. If we use three units – ignoring that the first fore, time optimality is reached with one fewer unit, and
and last unit are, in principle, smaller – the space-time each unit only needs to store nT correction qubits in-
cost is still below the space-time cost of the minimal stead of 2nT . With only 100 correction-qubit storage
setup in both cases. Adding more units significantly tiles and ignoring the unused tiles, the qubit count of
improves the space-time cost. It is also a prescription the units in Fig. 29 drops to ∼220,000 for p = 10−4 and
to linearly speed up the quantum computer down to the ∼1,470,000 for p = 10−3 , which are the numbers that
time-optimal limit. we report in Fig. 3. Thus, if nearest-neighbor communi-
cation between quantum computers is feasible, already
fewer than 2 million physical qubits per quantum com-
5.3 Distributed quantum computing
puter can be used to implement the full time-optimal
Note that, apart from the initial sharing of entangled scheme with 1500-3000 quantum computers.
Bell pairs, the units operate entirely independently of Entanglement distillation increases the qubit count.
each other. This implies that, if Bell pairs can be shared Note that it does not slow down the computation, as
between different quantum computers, each unit can be Bell pairs do not need to be distilled instantly. Entan-
located in a separate quantum computer. The shared glement distillation can take up to tu to distill the nT
Bell pairs do not even need to have a high fidelity, as Bell pairs required per entanglement distillation block.

Accepted in Quantum 2019-02-01, click title to verify 24


Summary. In order to speed up an n-qubit quan-
tum computation beyond 1 per T gate, we parallelize
T layers using units. With an average
√ of nT T gates per
layer, a unit consist of 4n + 4 n + 1 tiles for the data
block, 2nT storage tiles for the correction qubits, and
enough distillation blocks to distill nT |mi-|ci pairs
√ in
the time it takes to prepare a unit, which is nT + n + 3
time steps. If the unit preparation time is tu and the
time for single-qubit measurements and classical pro- | {z } | {z }
cessing is tm , a time-optimal setup consists of tu /tm + 1 layer 1 layer 2
units, executing one T layer every tm . Using fewer units Figure 32: Clifford+ϕ circuit. The first two rotation layers (ϕ
results in a linear space-time trade-off. With nu units, layers) with three rotations per layer are shown.
nT · (nu − 1) T gates are performed in tu . A circular ar-
rangement of units can be used for distributed quantum
computing. This also reduces the number of correction- 6.1 Clifford+ϕ circuits
qubit storage tiles to 1nT and the number of units in a
Instead of requiring an input circuit that consists of
time-optimal setup to tu /tm . In order to fully exploit
Clifford gates and π/8 rotations, we consider circuits
the space-time trade-offs discussed in this section, the
that consist of Clifford gates and arbitrary ϕ rotations,
input circuit should be optimized for T depth.
which we call Clifford+ϕ circuits. Using the procedure
in Sec. 1, Clifford gates can be commuted to the end
of the circuit, such that we end up with a circuit like
the one in Fig. 32. Rotations that mutually commute
6 Trade-offs beyond Clifford+T can be grouped up into layers. The algorithm of Sec. 1
can be used to reduce the number of layers. It can even
Under the assumption that measurements and feed- reduce the number of rotations, since, if two rotations
forward can be done in 1 µs, we described how to per- Pϕ1 and Pϕ2 with the same axis of rotation are moved
form a 108 -T -gate computation in just 1 second. A more into the same layer, they can be combined into a single
conservative assumption would be a measurement and rotation Pϕ1 +ϕ2 . Clifford+ϕ circuits are characterized
feed-forward time of 10 µs, which increases the compu- by their rotation count (or ϕ count) and rotation depth
tation time to 10 seconds. Although this seems fast, (or ϕ depth), rather than T count and T depth.
many quantum computations have T counts that are Each ϕ rotation can be performed using a |ϕi =
significantly higher than 108 . While the T count of |0i + ei(2ϕ) |1i resource state. When this state is con-
Hubbard model simulations [2] is indeed in this range, sumed to perform a Pϕ rotation, there is a 50% chance
quantum chemistry simulations can be more demand- that a P−ϕ rotation is performed instead. For π/8 ro-
ing. In particular, the simulation of FeMoco [1], a struc- tations, this is not very problematic, since the correc-
ture that plays an important role in nitrogen fixation, tion operation is a π/4 rotation, which can simply be
can have a T count of up to 1015 . With a serial execu- commuted to the end of the circuit. For general P−ϕ ,
tion of one T gate every 10 µs, the computation takes the correction is a P2ϕ rotation, which requires the use
317 years to finish. Even if the gates are grouped into of a |2ϕi state. If this fails, the next correction is a
100 T gates per layer, the computation still takes over P4ϕ rotation requiring a |4ϕi state and so on. Thus,
3 years. a wide variety of resource state is required to execute
While Clifford+T is a gate set that is very well arbitrary-angle rotations. In the case of ϕ = π/2k for
suited for surface codes, it is often not the gate set an integer k, |ϕi states can be distilled using specialized
which is natural to the quantum computations in ques- protocols [35, 45]. For other angles, |ϕi states can be ap-
tion. In particular, quantum simulation based on Trot- proximated using |π/2k i states, or pieced together from
terization consists of many small-angle rotations. In ordinary magic states |mi via circuit synthesis. Ordi-
the Clifford+T framework, each small-angle rotation is nary magic states can also generate states that can be
translated into a series of T gates via gate synthesis. De- used for V gates [46–48], which are Pauli rotations with
pending on the desired precision, this can require ∼100 an angle θ = arccos(3/5).
T gates for each rotation [44], which must be executed All the schemes discussed in this work can be used
in series. In order to speed up computations beyond with Clifford+ϕ circuits by replacing magic state dis-
their T count or T depth, it is therefore constructive tillation blocks by distillation blocks that produce re-
to consider additional resources for gates other than T source states for arbitrary-angle rotations. In order to
gates. consume these states in a systematic way similar to the

Accepted in Quantum 2019-02-01, click title to verify 25


(a) Post-corrected ϕ rotation

Figure 34: C(P1 , P2 , P3 ) gate in terms of seven π/8 rotations.

|π/2k i states are used, the cascade of measurements


(b) C(P1 , P2 ) gates via measurements
terminates after k steps. This technique of cascading
resource state measurements is also referred to as pro-
grammable ancilla rotations [49]. Note that the cascade
of measurements can also be postponed to a later point,
such that the post-corrected ϕ rotations can be used in
the time-optimal scheme.
Using the T -count-limited scheme of Sec. 4, we can
Figure 33: (a) A post-corrected ϕ rotation can be used to execute a ϕ rotation every 1. For 100 T gates per ϕ
decide at a later point, whether the performed operation was rotation, this speeds up the computation by a factor of
a Pϕ or a P−ϕ gate. (b) A C(P1 , P2 ) gate can be performed 100. Also, the time-optimal setting of Sec. 5 can be used
explicitly using a |+i ancilla and Pauli product measurements. with Clifford+ϕ circuits. However, the execution of a ϕ
layer can take more than 2tm , as the measurement cas-
cades for all rotations in the layer need to terminate.
post-corrected π/8 rotations in Fig. 25b, we can use the For instance, for 100 rotations per layer, each layer exe-
post-corrected version of ϕ rotations shown in Fig. 33. cution takes, on average, 8tm . For 100 T gates per rota-
First, the n resource states are entangled with the data tion, ϕ layer parallelization reduces the computational
qubits via a C(P, Z ⊗n ) gate. Just like magic state con- time by a factor of 12.5 compared to T layer paralleliza-
sumption, this can be done every 1, since the data tion, i.e., from over 3 years to 3 months. In the specific
qubits are only part of one measurement in the mea- case of quantum chemistry simulations, their T count
surement circuit in Fig. 33b. Next, the |ϕi state is can be reduced significantly by using more advanced al-
measured in Z. If the outcome of this measurement gorithms [50–52], which also profit from arbitrary-angle
is +1, then the rotation is successful and all other re- rotations. Thus, if distributed quantum computing is
source states are discarded by measuring them in X. feasible, Clifford+ϕ circuits such as the ones used for
If, instead, the outcome is -1, the |2ϕi state is mea- quantum chemistry can be executed with qubit counts
sured in Z. If the outcome of this Z measurement is per quantum computer not far above the numbers re-
+1, the correction is successful, and the remaining re- ported in Fig. 3. The only difference to Clifford+T units
source states are discarded by X measurements. For is that larger distillation blocks are required to produce
-1, the corrections continue with a Z measurement of and store the |ϕi resource states.
|4ϕi. Note that, in most cases, this cascade of mea- Multi-controlled Pauli gates. Other gates that
surements finishes in the second step. Therefore, on are used extensively in quantum algorithms are multi-
average, it takes 2tm to perform these measurements. controlled Paulis, such as Toffoli or CCZ gates. In
However, sufficiently many resource state are required Fig. 5, we have shown how C(P1 , P2 ) gates can be writ-
in order to be prepared for the most unlikely situations, ten in terms of π/4 rotations. A similar decomposition
in which many measurement steps are required. The is possible for multi-controlled Pauli gates. In Fig. 34,
probability to require n measurement steps (i.e., n re- we show how a C(P1 , P2 , P3 ) gate is a product of 7
source states down to |2n ϕi) is exponentially low, 2−n . π/8 rotations. For instance, C(Z, Z, X) is the Toffoli
Therefore, the number of resource states that need to gate. From the circuit, it is evident that the T depth
be generated for each ϕ rotation scales logarithmically of C(P1 , P2 , P3 ) gates is one [28]. In principle, these
with the rotation count of the circuit, if one wants to doubly-controlled Pauli gates can be written with just
stay below a certain probability that any of these rota- four T gates [53], but this increases the number of lay-
tions is slowed down by a missing resource state. If ers and a similar effect can be obtained by cancelling

Accepted in Quantum 2019-02-01, click title to verify 26


Figure 35: C(P1 , P2 , P3 , P4 ) gate in terms of 15 π/15 rotations.

π/8 rotations from pairs of doubly-controlled gates in a all qubits need to use a higher code distance. Only
circuit. Reducing the T count by increasing the circuit the correction qubits that are measured to execute each
depth [54] can still be a useful circuit manipulation for rotation layer need to be larger, and only right before
T -count-limited setups. We also note that the T count they are measured. The physical qubit measurement
can be reduced by combining gate synthesis and magic does not need to be a quantum non-demolition mea-
state distillation (synthillation) [55, 56]. surement, but can be a desctructive measurement. Ul-
C(P1 , P2 , P3 , P4 ) gates, i.e., triply-controlled Pauli timately, however, the speed of quantum computation
gates, can be written as 15 π/16 rotations, as shown is limited by the speed of classical computation. Ex-
in Fig. 35. While the T depth of this circuit is no ploring superconducting logic [57] to speed up classical
longer 1, the rotation depth is. In fact, any multi- computation may be a viable route to speed up quan-
controlled Pauli gate with n controls can be constructed tum computers.
from 2n − 1 Pπ/2n rotations by following the pattern Summary. All the schemes discussed in this paper
shown in Figs. 5, 34 and 35. The rotation depth of can not only be used with Clifford+T circuits, but also
all these gates is 1. Multi-controlled gates can also be with Clifford+ϕ circuits. The only difference is that
pieced together from C(P1 , P2 , P3 ) rotations, but this more and different resource states are required. Their
increases the circuit depth. By using small-angle rota- distillation and storage requires more space than ordi-
tions, any multi-controlled Pauli gate can be executed nary magic state distillation, but their use can speed up
in one step. the computation by several orders of magnitude.

6.2 Shorter measurements 7 Conclusion


If the bottleneck of slow classical processing can be over- In this work, we described how full quantum com-
come, then the only hardware-based restriction to the putations can be performed in surface-code-based ar-
speed of quantum computation is the time it takes to chitectures of different sizes. Previous works on the
measure a physical qubit. In the time-optimal scheme, translation of quantum computations into surface-code
the execution time of each rotation layer is governed schemes [36, 58–60] attempted to optimize the logical
by the measurement time. This measurement time qubit arrangement via algorithms that take a quan-
only needs to be high, if the measurement fidelity is tum circuit as an input. Here, we took a different
required to be sufficiently low. In order to speed up approach by discussing computational schemes that do
the computation, one can use shorter qubit measure- not require any prior knowledge about the input circuit.
ments. This exponentially decreases the measurement This has the advantage that a resource count with our
fidelity. On the other hand, the measurement fidelity schemes only requires the T count and T depth of the
of encoded surface-code qubits increases exponentially input circuit, and that the schemes consist of modu-
with the number of qubits comprising the logical qubit. lar blocks that can be optimized independently of each
Thus, by using twice as many physical qubits to encode other. In addition, the space-time cost is lower com-
the measured logical qubit, the measurement time can pared to earlier works [20, 36].
be decreased by a factor of two, doubling the compu- Big quantum computers are fast. Starting from
tational speed of the quantum computer. In fact, not the minimal setup in Fig. 21 that consists of a compact

Accepted in Quantum 2019-02-01, click title to verify 27


space-time cost normalized to minimal setup
100%
80%
60%
40%
20%

space cost normalized to minimal setup

104

103
102

101

100

time cost normalized to minimal setup


100

10−1
10−2

10−3

10−4

A B C D E F G H I J KL M N O P
A: Compact block + 1 distillation block (Fig. 21) L: 2 units (Figs. 29, 31) M: 3 units N: 10 units
B: Intermediate block + 2 distillation blocks (Fig. 22) O: 100 units P: 1469/1470 units (time-optimal)
C-K: Fast block + 3-11 distillation block (Fig. 23)

Figure 36: Space-time, space, and time cost of the schemes discussed in this paper for the example of a 100-qubit quantum
computation with T count 108 and T depth 106 , under the assumption of a 1 µs code cycle time, and a 1 µs measurement and
classical processing time. The solid and dashed lines in M-P are for circular (solid) and linear (dashed) arrangements of units.

data block and a single distillation block, we traded T depth. We have not investigated how this trade-off
off space versus time, increasing the size of the quan- affects the space-time cost in our scheme.
tum computer and, in return, decreasing the computa- Room for optimization. In our T -count-limited
tional time. For the example of a computation with a schemes and for the preparation of units, one T gate is
T count of 108 and a T depth of 106 with an error rate performed after the other. If the input circuit is known,
of p = 10−4 , the minimal setup consists of 164 tiles and it is reasonable to assume that qubits can be arranged in
executes one T gate every 11, corresponding to a com- a way that allows for the parallel execution of multiple
putational time of 4 hours with 55,400 physical qubits. T gates in the same data block. Furthermore, there is a
From here, the space-time cost is drastically reduced strict separation between tiles used for magic state dis-
by adding more distillation blocks, as shown in Fig. 36 tillation and tiles used for data blocks in our schemes.
and Tab. 2. With this strategy, the computational time By sharing tiles between blocks, the space overhead may
is reduced to 1 per T gate, where the computational be reduced. Moreover, we have only considered a hand-
cost of a circuit is governed by its T count. ful of distillation protocols. It would be interesting to
For further space-time trade-offs, we parallelized T see which distillation protocols can be used to optimize
layers using units. This is an increase in space-time the cost function of Eq. (9). Finally, concrete tile lay-
cost, especially for linear arrangements of units (dashed outs that can be used to distill and consume the addi-
line in Fig. 36), but enables further space-time trade- tional resources necessary for Clifford+ϕ computing are
offs. Linearly trading off space versus time, the compu- still missing.
tational time can be reduced to one measurement per Beyond surface codes. Even though we designed
T layer. Units are well-suited for distributed quantum our schemes with surface codes in mind, they can, in
computing, as the sharing of Bell pairs between neigh- principle, be applied to other toric-code-based patches,
boring units is part of the parallelization scheme. such as Majorana surface-code patches [11] or color-
This exhausts the space-time trade-offs that are pos- code patches [13, 61, 62]. Color codes can reduce the
sible within the Clifford+T framework. Switching to number of physical qubits due to more compact encod-
Clifford+ϕ circuits can provide further trade-offs, as ing, but require more elaborate hardware to measure
additional resources are introduced for arbitrary-angle the higher-weight check operators. The space cost is
rotations. This can be used to execute circuits in a time reduced by replacing all surface-code patches by color-
proportional to their rotation depth, as opposed to their code patches, with the exception of Pauli product mea-

Accepted in Quantum 2019-02-01, click title to verify 28


scheme A B C-K L M N-P

physical qubits 55,400 76,400 90,200 - 123,000 447,000 679,000 2,230,000 - 328,000,000
(788,000) (2,630,000 - 386,000,000)

computational time 4h 2h 79-22 min 12 min 490 sec 147 sec - 1 sec
(734 sec) (163 sec - 1 sec)

Table 2: Space and time cost of the schemes plotted in Fig. 36. The number in parentheses are for linear arrangements of units
(dashed lines in Fig. 36).

surement ancillas. In order to keep the space cost References


low, measurement ancillas should remain surface-code
patches and color-to-surface code lattice surgery [63] [1] M. Reiher, N. Wiebe, K. M. Svore, D. Wecker,
should be used during the Pauli product measurement and M. Troyer, Elucidating reaction mechanisms
protocol, as described in Ref. [64]. on quantum computers, PNAS 114, 7555 (2017).
Outlook. If the number of qubits continues to dou- [2] R. Babbush, C. Gidney, D. W. Berry, N. Wiebe,
ble every 8 months [65], the 60,000 - 300,000 physi- J. McClean, A. Paler, A. Fowler, and H. Neven,
cal qubits necessary for classically intractable Hubbard Encoding electronic spectra in quantum circuits
model simulations with a T count of 108 will be avail- with linear T complexity, Phys. Rev. X 8, 041015
able in 7-9 years, assuming qubit quality improves ac- (2018).
cordingly. If multiple quantum computers can be con- [3] J. Preskill, Reliable quantum computers, Proc. Roy.
nected in a network, time-optimal quantum computing Soc. Lond. A 454, 385 (1998).
becomes available shortly thereafter, facilitating the im- [4] B. M. Terhal, Quantum error correction for quan-
plementation of more difficult algorithms such as quan- tum memories, Rev. Mod. Phys. 87, 307 (2015).
tum chemistry simulations or Shor’s algorithm. Classi- [5] E. T. Campbell, B. M. Terhal, and C. Vuil-
cal processing in terms of measurements, feed-forward lot, Roads towards fault-tolerant universal quantum
and decoding is expected to be a significant roadblock computation, Nature 549, 172 (2017).
in speeding up quantum computers. Ultimately, faster [6] A. Y. Kitaev, Fault-tolerant quantum computation
classical control hardware will be necessary to build by anyons, Ann. Phys. 303, 2 (2003).
faster quantum computers. I hope that the schemes [7] A. G. Fowler, M. Mariantoni, J. M. Martinis, and
discussed in this work are a useful roadmap towards A. N. Cleland, Surface codes: Towards practical
large-scale quantum computing, and that the patch- large-scale quantum computation, Phys. Rev. A 86,
based framework is a valuable toolbox for constructions 032324 (2012).
of surface-code-based implementations of quantum al- [8] H. Bombin, Topological order with a twist: Ising
gorithms. anyons from an abelian model, Phys. Rev. Lett.
105, 030403 (2010).
[9] C. Horsman, A. G. Fowler, S. Devitt, and R. V.
Meter, Surface code quantum computing by lattice
Acknowledgments surgery, New J. Phys. 14, 123011 (2012).
[10] B. J. Brown, K. Laubscher, M. S. Kesselring, and
This work would not have been possible without in- J. R. Wootton, Poking holes and cutting corners to
sightful discussion with Austin Fowler and Craig Gid- achieve Clifford gates with the surface code, Phys.
ney about Pauli product measurements and 15-to-1 dis- Rev. X 7, 021029 (2017).
tillation, with Jens Eisert, Markus Kesselring and Fe- [11] D. Litinski and F. v. Oppen, Lattice Surgery with a
lix von Oppen about Clifford tracking and space-time Twist: Simplifying Clifford Gates of Surface Codes,
trade-offs, with Jeongwan Haah and Matthew Hastings Quantum 2, 62 (2018).
about magic state distillation, with Guang Hao Low [12] A. G. Fowler and C. Gidney, Low over-
and Nathan Wiebe about quantum simulation algo- head quantum computation using lattice surgery,
rithms, and with Ali Lavasani about few-qubit surface- arXiv:1808.06709 (2018).
code architectures. This work has been supported by [13] A. J. Landahl and C. Ryan-Anderson, Quan-
the Deutsche Forschungsgemeinschaft (Bonn) within tum computing by color-code lattice surgery,
the network CRC TR 183. arXiv:1407.5103 (2014).
[14] Y. Li, A magic states fidelity can be superior to the

Accepted in Quantum 2019-02-01, click title to verify 29


operations that created it, New J. Phys. 17, 023037 [31] A. Lavasani and M. Barkeshli, Low overhead Clif-
(2015). ford gates from joint measurements in surface,
[15] D. Herr, F. Nori, and S. J. Devitt, Optimization color, and hyperbolic codes, Phys. Rev. A 98,
of lattice surgery is NP-hard, npj Quant. Inf. 3, 35 052319 (2018).
(2017). [32] J. I. Hall, Notes on Coding The-
[16] S. Bravyi and A. Kitaev, Universal quantum com- ory Chapter 6: Modifying Codes,
putation with ideal Clifford gates and noisy ancil- https://users.math.msu.edu/users/jhall/classes/
las, Phys. Rev. A 71, 022316 (2005). codenotes/Mod.pdf, accessed: 2019-01-30.
[17] J. Haah and M. B. Hastings, Codes and Protocols [33] E. T. Campbell and M. Howard, Magic state
for Distilling T , controlled-S, and Toffoli Gates, parity-checker with pre-distilled components,
Quantum 2, 71 (2018). Quantum 2, 56 (2018).
[18] S. Bravyi and J. Haah, Magic-state distillation with [34] A. M. Meier, B. Eastin, and E. Knill, Magic-
low overhead, Phys. Rev. A 86, 052329 (2012). state distillation with the four-qubit code, Quant.
[19] C. Jones, Multilevel distillation of magic states Inf. Comp. 13, 195 (2013).
for quantum computing, Phys. Rev. A 87, 042305 [35] E. T. Campbell and J. O’Gorman, An efficient
(2013). magic state approach to small angle rotations,
[20] A. G. Fowler, S. J. Devitt, and C. Jones, Surface Quantum Sci. Technol. 1, 015007 (2016).
code implementation of block code state distillation,
[36] D. Herr, F. Nori, and S. J. Devitt, Lattice surgery
Scientific Rep. 3, 1939 (2013).
translation for quantum computation, New J. Phys.
[21] A. G. Fowler, Time-optimal quantum computation, 19, 013034 (2017).
arXiv:1210.4626 (2012).
[37] A. G. Fowler and S. J. Devitt, A bridge to lower
[22] D. Gottesman, The Heisenberg representation of
overhead quantum computation, arXiv:1209.0510
quantum computers, Proc. XXII Int. Coll. Group.
(2012).
Th. Meth. Phys. 1, 32 (1999).
[23] V. Kliuchnikov, D. Maslov, and M. Mosca, Fast [38] C. Gidney and A. G. Fowler, Efficient magic state
and efficient exact synthesis of single-qubit uni- factories with a catalyzed |CCZi to 2 |T i transfor-
taries generated by Clifford and T gates, Quantum mation, arXiv:1812.01238 (2018).
Info. Comput. 13, 607 (2013). [39] C. H. Bennett, G. Brassard, S. Popescu, B. Schu-
[24] V. Kliuchnikov, D. Maslov, and M. Mosca, Asymp- macher, J. A. Smolin, and W. K. Wootters, Pu-
totically optimal approximation of single qubit uni- rification of noisy entanglement and faithful tele-
taries by Clifford and T circuits using a constant portation via noisy channels, Phys. Rev. Lett. 76,
number of ancillary qubits, Phys. Rev. Lett. 110, 722 (1996).
190502 (2013). [40] C. H. Bennett, H. J. Bernstein, S. Popescu, and
[25] D. Gosset, V. Kliuchnikov, M. Mosca, and B. Schumacher, Concentrating partial entangle-
V. Russo, An algorithm for the T -count, ment by local operations, Phys. Rev. A 53, 2046
arXiv:1308.4134 (2013). (1996).
[26] L. E. Heyfron and E. T. Campbell, An efficient [41] C. Dickel, J. J. Wesdorp, N. K. Langford, S. Peiter,
quantum compiler that reduces T count, Quantum R. Sagastizabal, A. Bruno, B. Criger, F. Mot-
Sci. Technol. 4, 015004 (2018). zoi, and L. DiCarlo, Chip-to-chip entanglement
[27] M. Amy, D. Maslov, M. Mosca, and M. Roetteler, of transmon qubits using engineered measurement
A meet-in-the-middle algorithm for fast synthesis fields, Phys. Rev. B 97, 064508 (2018).
of depth-optimal quantum circuits, IEEE Transac- [42] P. Campagne-Ibarcq, E. Zalys-Geller, A. Narla,
tions on Computer-Aided Design of Integrated Cir- S. Shankar, P. Reinhold, L. Burkhart, C. Ax-
cuits and Systems 32, 818 (2013). line, W. Pfaff, L. Frunzio, R. J. Schoelkopf, and
[28] P. Selinger, Quantum circuits of T -depth one, M. H. Devoret, Deterministic remote entanglement
Phys. Rev. A 87, 042302 (2013). of superconducting circuits through microwave two-
[29] M. Amy, D. Maslov, and M. Mosca, Polynomial- photon transitions, Phys. Rev. Lett. 120, 200501
time T -depth optimization of Clifford+T circuits (2018).
via matroid partitioning, IEEE Transactions on [43] C. J. Axline, L. D. Burkhart, W. Pfaff, M. Zhang,
Computer-Aided Design of Integrated Circuits and K. Chou, P. Campagne-Ibarcq, P. Reinhold,
Systems 33, 1476 (2014). L. Frunzio, S. Girvin, L. Jiang, et al., On-demand
[30] D. Litinski and F. von Oppen, Quantum computing quantum state transfer and entanglement between
with Majorana fermion codes, Phys. Rev. B 97, remote microwave cavity memories, Nat. Phys. 14,
205404 (2018). 705 (2018).

Accepted in Quantum 2019-02-01, click title to verify 30


[44] N. J. Ross and P. Selinger, Optimal ancilla- [60] L. Lao, B. van Wee, I. Ashraf, J. van Someren,
free Clifford+T approximation of z-rotations, N. Khammassi, K. Bertels, and C. G. Almudever,
arXiv:1403.2975 (2014). Mapping of lattice surgery-based quantum circuits
[45] G. Duclos-Cianci and D. Poulin, Reducing the on surface code architectures, Quantum Sci. Tech-
quantum-computing overhead with complex gate nol. 4, 015005 (2018).
distillation, Phys. Rev. A 91, 042315 (2015). [61] H. Bombin and M. A. Martin-Delgado, Topological
[46] A. W. Harrow, B. Recht, and I. L. Chuang, Ef- quantum distillation, Phys. Rev. Lett. 97, 180501
ficient discrete approximations of quantum gates, (2006).
Journal of Mathematical Physics 43, 4445 (2002). [62] M. S. Kesselring, F. Pastawski, J. Eisert, and
[47] G. Duclos-Cianci and K. M. Svore, Distillation of B. J. Brown, The boundaries and twist defects of
nonstabilizer states for universal quantum compu- the color code and their applications to topological
tation, Phys. Rev. A 88, 042325 (2013). quantum computation, Quantum 2, 101 (2018).
[48] A. Bocharov, Y. Gurevich, and K. M. Svore, Effi- [63] H. P. Nautrup, N. Friis, and H. J. Briegel, Fault-
cient decomposition of single-qubit gates into v ba- tolerant interface between quantum memories and
sis circuits, Phys. Rev. A 88, 012313 (2013). quantum processors, Nat. Commun. 8, 1321 (2017).
[49] N. C. Jones, J. D. Whitfield, P. L. McMahon, M.- [64] D. Litinski and F. von Oppen, Braiding by Ma-
H. Yung, R. V. Meter, A. Aspuru-Guzik, and jorana tracking and long-range CNOT gates with
Y. Yamamoto, Faster quantum chemistry simula- color codes, Phys. Rev. B 96, 205413 (2017).
tion on fault-tolerant quantum computers, New J. [65] IBM doubling qubits every 8 months,
Phys. 14, 115023 (2012). https://www.nextbigfuture.com/2018/02/ibm-
[50] G. H. Low and I. L. Chuang, Hamiltonian simula- doubling-qubits-every-8-months-and-ecommerce-
tion by qubitization, arXiv:1610.06546 (2016). cryptography-at-risk-in-7-15-years.html, accessed:
[51] G. H. Low and I. L. Chuang, Optimal Hamil- 2018-08-01.
tonian simulation by quantum signal processing,
Phys. Rev. Lett. 118, 010501 (2017).
[52] R. Babbush, D. W. Berry, J. R. McClean, A Surface-code qubits and lattice-
and H. Neven, Quantum simulation of chem-
istry with sublinear scaling to the continuum, surgery operations
arXiv:1807.09802 (2018).
To illustrate the translation of protocols in our frame-
[53] C. Jones, Low-overhead constructions for the fault-
work into surface-code patches, we show how the
tolerant Toffoli gate, Phys. Rev. A 87, 022328
patches of Fig. 1 and the rules of the game and pro-
(2013).
tocols of Fig. 2 are implemented with surface codes.
[54] C. Gidney, Halving the cost of quantum addition,
Surface-code patches. Each patch corresponds to
Quantum 2, 74 (2018).
a surface-code patch with code distance d. Therefore,
[55] E. T. Campbell and M. Howard, Unified framework
each tile corresponds to d2 physical data qubits, as
for magic state distillation and multiqubit gate syn-
shown in Fig. 37 for d = 5. In our surface-code patches,
thesis with reduced resource cost, Phys. Rev. A 95,
022316 (2017).
[56] J. O’Gorman and E. T. Campbell, Quantum com- Z Z Z X X X
putation with realistic magic-state factories, Phys.
Rev. A 95, 032338 (2017).
[57] K. K. Likharev and V. K. Semenov, RSFQ Z Z Z X X X
logic/memory family: A new Josephson-junction
technology for sub-terahertz-clock-frequency digital
systems, IEEE Transactions on Applied Supercon-
ductivity 1, 3 (1991).
[58] A. G. Fowler, S. J. Devitt, and C. Jones, Syn-
thesis of arbitrary quantum circuits to topological
assembly: Systematic, online and compact, Scien-
tific Rep. 7, 10414 (2017).
[59] A. Paler, I. Polian, K. Nemoto, and S. J. Devitt,
Fault-tolerant, high-level quantum circuits: form, Figure 37: Surface-code implementation of the patches shown
compilation and description, Quantum Sci. Tech- in Fig. 1. Physical qubits are placed on vertices. Bright faces
nol. 2, 025003 (2017). correspond to Z stabilizers and dark faces to X stabilizers.

Accepted in Quantum 2019-02-01, click title to verify 31


3 code cycles 3 code cycles

|mi

Figure 38: State-injection protocol of Ref. [13].

physical qubits are placed on the vertices, bright faces


correspond to Z stabilizers and dark faces to X sta-
bilizers. Solid and dashed boundaries correspond to X
and Z boundaries (also called rough and smooth bound-
aries). For one-qubit patches, the product of all d phys-
ical X (Z) operators along any of the X (Z) boundaries
is the logical X (Z) operator of the encoded qubit. For
two-qubit patches with six boundaries, the string opera-
Figure 39: Twist-based lattice surgery in a square lattice of
tors located at the boundaries correspond to the logical qubits with nearest-neighbor couplings. The black dots are
operators shown in Fig. 1, i.e., going clockwise, X1 , Z1 , physical data qubits and the white dots are physical measure-
X1 ·X2 , Z2 , X2 , and Z1 ·Z2 . Note that, in principle, the ment qubits.
width of two-tile patches can be 2d − 1 instead of 2d,
potentially reducing the space cost [11]. Furthermore,
the correspondence between solid and dashed, and X dinary surface code in the right panel. Here, the sta-
and Z boundaries is interchangeable. bilizers are, again, only measured for three code cycles,
State initialization. We now show how the opera- independently of d, since the state-injection protocol
tions and protocols of Fig. 2 are implemented with sur- is, in any case, non-fault-tolerant, i.e., produces logical
face codes for d = 5, and motivate their time cost in the states with an error rate proportional to the physical
framework, where the reasoning is that 1 is associated error rate p.
with operations whose time cost scales with d. Surface- Patch measurement and Bell state prepara-
code patches can be initialized in the logical |0i or |+i tion. Surface-code patches are measured in the X or
state by initializing all physical qubits of the patch in Z basis by measuring all physical qubits in the cor-
|0i or |+i, and then measuring all stabilizers. responding basis and performing some classical error
Naively, one would expect that there should be a time correction, where the time cost does not scale with d.
cost associated with this operation, since the stabiliz- Two-patch measurements correspond to lattice surgery
ers need to be measured for d code cycles to account for and can be demonstrated via the preparation of a Bell
measurement errors. However, this can be done simulta- state, as shown in Fig. 40a. Two surface-code patches
neously with the subsequent lattice-surgery operation, are initialized in the logical |+i state by initializing all
as will become apparent in the example of the Bell state physical qubits in |+i and measuring the stabilizers. Si-
preparation. For arbitrary states, the logical states are multaneously, lattice surgery between the two patches
prepared via state injection. This is a non-fault-tolerant is performed, measuring the logical Z ⊗Z operator. The
procedure with a constant time cost that does not scale measurement outcome is the product of the newly intro-
with d, which is why we do not associate a time step duced Z stabilizers highlighted in red, as the product
with it. One such state-injection protocol is described of these stabilizers corresponds to the product of the
in Ref. [13] and is shown in Fig. 38 for the prepara- logical Z operators encoded in the two surface-code Z
tion of a logical magic state |mi. In the left panel, a boundaries. To account for measurement errors, this
physical magic state is prepared, along with a stabilizer measurement is repeated for d code cycles. Finally, the
state by measuring the shown stabilizers for three code patch is split into two patches again, leaving the two
cycles. Note that any single-qubit error during these logical surface-code qubits in an entangled Bell state.
three code cycles will corrupt the logical information. Y measurements. Two-patch measurements can be
Next, the stabilizer configuration is switched to the or- used to measure products of two Pauli operators other

Accepted in Quantum 2019-02-01, click title to verify 32


(a) Bell state preparation (b) Moving corners

(c) Qubit movement

(d) Y basis measurement

X Y Z

Z Z
Z Z X X

X X Z Z

Figure 40: Surface-code implementation of the protocols in Fig. 2a-d.

than Z ⊗ Z, e.g., operators involving the Y operator, a red dot in Fig. 41. Their product is equivalent to the
as shown in Fig. 40d. First, a patch is deformed to desired operator, i.e., Y|q1 i ⊗ X|q3 i ⊗ Z|q4 i ⊗ X|q5 i . The
a wider patch by initializing physical qubits in the X new check operators are measured for d code cycles to
basis and measuring the new stabilizers, which takes d account for measurement errors. This procedure corre-
code cycles. Below the wide patch, a rectangular an- sponds to the multi-body lattice surgery protocol intro-
cilla patch is initialized in the |0i state. A column of duced in Ref. [12]. It can be used to measure any prod-
physical qubits in the center is missing, so that, in the uct of surface-code-boundary Pauli operators by initial-
next step, the ancilla can be used for twist-based lattice izing physical qubits in the |+i state in an ancilla region
surgery [11], measuring the Y operator. The product of of width d, and then measuring new check operators,
the operators highlighted in red in the third step corre- where the product of the nontrivial operators yields the
sponds to the logical Y ⊗ Z operator between the two outcome of the desired multi-patch measurement. The
logical qubits. The lattice surgery in the third step ancilla region of width d is required to ensure that the
involves dislocation operators and a five-qubit twist de- code distance of the stabilizer configuration during the
fect. Even though these stabilizers are irregular, they multi-body lattice surgery remains d.
can still be measured in a square lattice of physical Moving boundaries. The protocol to move patches
qubits with nearest-neighbor couplings, as we show in is similar to lattice surgery. It is shown in Fig. 40c.
Fig. 39. For the measurement of twist operators and Extending the patch via its Z boundary in the second
wide X and Z stabilizers, up to three measurement an- step is the same operation as a Z ⊗ Z lattice surgery
cillas can be used. between the patch and a rectangular |+i ancilla qubit
Multi-patch measurements. For a multi-patch to the right. This needs to be done for d code cycles
measurement in Fig. 41, all physical qubits located in to account for measurement errors. Finally, the patch
the region of the ancilla patch are initialized in the |+i is shortened again by measuring the left two thirds of
state. Next, new check operators are introduced. The physical qubits in the X basis.
newly introduced X-type stabilizers all yield trivial out- Moving corners. The movement of corners of a
comes, since they are products of physical qubits initial- surface-code patch is shown in Fig. 40b. It corresponds
ized in an X eigenstate and previously measured check to a change of boundary stabilizers. In order to account
operators. The nontrivial operators are highlighted by for measurement errors of the newly measured stabiliz-

Accepted in Quantum 2019-02-01, click title to verify 33


Figure 41: Surface-code implementation of the multi-patch measurement in Fig. 2e. The measurement outcome is the product of
all check operators with a red dot.

ers, this requires d code cycles. The top left physical higher number of corners. A patch with 2N + 2 cor-
qubit in the second step of Fig. 40b is removed from ners represents N qubits, as shown in Fig. 42. The
the patch via an X measurement. simplest case is a four-corner patch (a/b) representing
a single qubit. Six-corner patches (c) are two-qubit
patches. The general rule that assigns the operators
B Extended ruleset of N qubits to the edges of a (2N + 2)-corner patch is
given in Fig. 42d. Going clockwise, the dashed bound-
Some surface-code operations are not covered by the aries correspond to X1 , X1 X2 , X2 X3 , . . . , XN −1 XN and
rules discussed in the introduction. In particular, we XN . Starting to the right of X1 , the solid edges corre-
only consider patches with 4 or 6 corners, where we spond to Z1 , Z2 , . . . , ZN and the product Z1 Z2 · · · ZN .
refer to the points where two edges meet as corners. One can also consider patches with shortened edges,
In general, one could also consider patches with a such that they occupy fewer tiles. The drawback of this
is that in every time step, an error corresponding to
Four-, six- and eight-corner patches the Pauli operator represented by the shortened edge
will occur with a certain probability perr . An exam-
ple of a six-corner patch with two shortened X edges
(a) (c) is shown in Fig. 43, meaning that this six-corner patch
is susceptible to X errors. In the surface-code imple-
mentation, this corresponds to a patch with boundaries
that are shorter than d physical data qubits, effectively
(b) reducing the code distance of the logical operators en-
coded by the shortened edges. Note that patches with
shortened edges may occupy more than d2 physical data
(d) (2N + 2)-corner patches qubits per tile.
With (2N + 2)-corner patches, the set of operations
needs to be modified. The initialization rule for such
patches is:

– Qubits can be initialized in the X and Z eigenstates


Figure 42: Patches with 2N + 2 corners represent N qubits. |+i and |0i. All qubits that are part of one patch
Their 2N + 2 edges represent the shown Pauli operators. must be initialized in the same state. (Cost: 0)

Accepted in Quantum 2019-02-01, click title to verify 34


(a) Measurement of Z|q1 i ⊗ Y|q2 i ⊗ X|q4 i ⊗ Z|mi
0 Step 1 0 Step 2

Figure 43: Surface-code implementation of a six-corner patch


with shortened boundaries 1 Step 3 1 Step 3

Similarly, the single-patch measurement rule is modified


to

– Qubits can be measured in the X or Z basis. All


qubits that are part of the same patch are mea-
sured simultaneously and in the same basis. This (b) Ancilla patch
measurement removes the patch from the board.
(Cost: 0)

Pauli product measurements. Using multi-corner


patches with shortened boundaries, the multi-patch Figure 44: Pauli product measurement protocol. (a) Example
measurement rule is, in principle, redundant. For in- of a measurement of the operator Z ⊗ Y ⊗ 1 ⊗ X ⊗ Z of the
stance, the Pauli product measurement of Fig. 8 can be qubits |q1 i, |q2 i, |q3 i, |q4 i and |mi. (b) Ancilla patch used
equivalently performed in 1 via the protocol shown in during the measurement.
Fig. 44. An 8-corner ancilla patch is initialized in the
⊗3
|+i state. The shape of this patch is chosen, such
that each of the four Z edges is adjacent to one of the cent to the n operators part of the measurement. The
four operators that are part of the measurement. Note surface-code implementation of this protocol is identi-
that this means that some of the X edges are shortened, cal to the surface-code implementation of multi-patch
such that the qubits are susceptible to X errors. In this measurements in Fig. 41.
case, this is not a problem, since the qubits are initial- While multi-corner patches and shortened edges in-
ized in X eigenstates and random X errors will cause crease the number of surface-code operations that are
no change to the states. Next, in step 3, we measure covered by the framework, there are still rules that
the four Pauli products Z|q1 i ⊗ Z1 , Y|q2 i ⊗ Z2 , Z|mi ⊗ Z3 can be added to the ruleset to account for more op-
and X|q4 i ⊗ (Z1 · Z2 · Z3 ). Because the ancilla is ini- erations, such as, e.g., the movement of corners inside
tialized in an X eigenstate, the operators Z1 , Z2 and a patch [10]. Also, for the initialization of non-Pauli
Z3 are unknown, and the outcome of each of the four eigenstates, error models other than random Pauli er-
aforementioned measurements is entirely random. How- rors can be considered.
ever, multiplying the four measurement outcomes yields
Z|q1 i ⊗ Y|q2 i ⊗ X|q4 i ⊗ Z|mi ⊗ (Z1 · Z2 · Z3 · Z1 · Z2 · Z3 ),
which is precisely the operator Z|q1 i ⊗Y|q2 i ⊗X|q4 i ⊗Z|mi C Proof-of-principle device
that we wanted to measure. Finally, to discard the an-
cilla patch we measure its three qubits in the X basis. Here, we discuss how (3d − 1) · 2d physical data qubits
Again, X errors will have no effect, as they commute can be used to build a proof-of-principle device that is a
with the measurement basis. Measurement outcomes of universal two-qubit error-corrected quantum computer
Xi = −1 prompt a Pauli correction. If in the previous that uses undistilled magic states and can demonstrate
step, the Zi edge was measured together with a Pauli all the operations required for large-scale quantum com-
operator P , the correction is a Pπ/2 gate. For instance, puting. We go through the example of a computation
if in Fig. 8 the final measurements yield X2 = −1 and that starts with three π/8 rotations around Z⊗Z, Y ⊗X
X3 = −1, the corrections are a Yπ/2 rotation on |q2 i and Y ⊗ Y in Fig. 45. For the first rotation, we need to
and a Zπ/2 rotation on |mi. measure Z1 ⊗ Z2 ⊗ Z|mi . A magic state is initialized in
This type of protocol can be used to measure any a long patch in step 2, which is equivalent to initializing
product of n Pauli operators. An ancilla patch needs a magic state and measuring X ⊗ X between the magic
⊗n
to be initialized in the |+i state with Z edges adja- state and neighboring |0i ancillas. This effectively en-

Accepted in Quantum 2019-02-01, click title to verify 35


1 2 3 4

5 6 7 8 9

10 11 12 13 14

Figure 45: Proof-of-principle two-qubit device implemented with 48 physical data qubits.

codes the magic state in a three-qubit repetition code ancilla, following the protocol of Fig. 11b.
with a logical Z operator ZL = Z ⊗ Z ⊗ Z. To consume This demonstrates that a proof-of-principle experi-
the magic state, Z1 ⊗ Z2 ⊗ ZL is measured in step 3. ment can be built with 48 physical data qubits. In gen-
This consumes a magic state for the Z ⊗ Z rotation. eral, this requires 6d2 − 2d qubits, i.e., 48 for d = 3, 140
The next rotation is a Y ⊗ X rotation. Here, we for d = 5 and 280 for d = 7. If measurement qubits are
first need to deform |q1 i, such that both the X and Z required for syndrome readout, the number of physical
boundaries of the qubit are accessible. Qubit |q2 i is qubits roughly doubles.
rotated in steps 5-8 using the protocol in Fig. 11a. In
step 9, again, a magic state is initialized in a two-qubit
repetition code with ZL = Za1 ⊗ Za2 . In step 10, the D Implementation of the 7-to-1 proto-
magic state is consumed via a Y1 ⊗ Za1 and a X1 ⊗ Za2 col
measurement.
This kind of protocol consisting of patch deformations Even though the distillation of |Y i = |0i + i |1i states
and patch rotations can be used to perform any π/8 has no use in our framework, we show how to imple-
rotation with the exception of (Y ⊗ Y )π/8 , since there ment the 7-to-1 distillation protocol for benchmarking
is not enough space to make both Y operators accessible purposes in Fig. 46. The protocol is based on the 7-
for lattice surgery. For this rotation, we first explicitly qubit Steane code. Its X stabilizers are the faces shown
execute a Clifford gate to change (Y ⊗Y )π/8 to any other in Fig. 46a, and its logical X operator can be chosen
rotation. Any Clifford gate that does not commute with as the X ⊗ X ⊗ X operator with support on the three
Y ⊗ Y will suffice. In our example, we choose a Zπ/4 qubits drawn in red.
rotation. It is performed by initializing a |0i state in Following the procedure in Sec. 3, the distillation
step 13, and measuring Z1 ⊗ Y between |q1 i and the circuit is obtained by initializing mx + k = 4 qubits in

Accepted in Quantum 2019-02-01, click title to verify 36


(a) Steane code (b) Distillation block (c) 7-to-1 distillation circuit

Figure 46: The Steane code (a) is the basis of 7-to-1 distillation (c). In our framework, the corresponding distillation block (b)
uses 7 tiles for 4.

the |+i state, where the first three qubits are associ- the initial state. The remaining four rotations are
ated with the three X stabilizers, and the last qubit is shown in Fig. 46c.
associated with the logical X operator. For each qubit A distillation block that can be used for this protocol
of the Steane code, the circuit contains a π/4 rotation is shown in Fig. 46b. Since the consumption of |Y i
with Z’s on each stabilizer and logical operator that resource states requires no Clifford correction, this block
the qubit is part of. The three qubits in the corner consists of only 7 tiles. With four rotations, the leading
of the triangle are only part of a single stabilizer and order of the space-time cost of this protocol is 7d2 · 4d =
no logical operator, therefore they contribute with 28d3 .
single-qubit Zπ/4 rotations, which can be absorbed into

Accepted in Quantum 2019-02-01, click title to verify 37

You might also like