0% found this document useful (0 votes)

72 views13 pages

Neural Network Decoders for Surface Codes

Recent works have shown that small distance quantum error correction codes can be efficiently decoded by employing machine learning techniques like neural networks. Various techniques employing neural networks have been used to increase the decoding performance, however, there is no framework that analyses a step-by-step procedure in designing such a neural network based decoder. The main objective of this work is to present a detailed framework that investigates the way that various neural netw

Uploaded by

janusz.matejko83

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

72 views13 pages

Neural Network Decoders for Surface Codes

Uploaded by

janusz.matejko83

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/329362532

Designing neural network based decoders for surface codes

Preprint · November 2018

CITATIONS READS
0 48,258

3 authors, including:

Savvas Varsamopoulos Koen Bertels

PASQAL Ghent University
10 PUBLICATIONS 391 CITATIONS 345 PUBLICATIONS 5,977 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Savvas Varsamopoulos on 06 December 2018.

The user has requested enhancement of the downloaded file.

Designing neural network based decoders for surface codes

Savvas Varsamopoulos,1, 2 Koen Bertels,1, 2 and Carmen G. Almudever1, 2

1
Quantum Computer Architecture Lab, Delft University of Technology, The Netherlands
2
QuTech, Delft University of Technology, P.O. Box 5046, 2600 GA Delft, The Netherlands
Email: [email protected]
Recent works have shown that small distance quantum error correction codes can be efficiently decoded
by employing machine learning techniques like neural networks. Various techniques employing neural net-
works have been used to increase the decoding performance, however, there is no framework that analyses a
step-by-step procedure in designing such a neural network based decoder. The main objective of this work is
to present a detailed framework that investigates the way that various neural network parameters affect the
decoding performance, the training time and the execution time of a neural network based decoder. We focus
on neural network parameters such as the size of the training dataset, the structure, and the type of the neural
network, the batch size and the learning rate used during training. We compare various decoding strategies
and showcase how these influence the objectives for different error models. For the noiseless syndrome mea-
surements, we reach up to 50% improvement against the benchmark algorithm (Blossom) and for the noisy
syndrome measurements, we show efficient decoding performance up to 97 qubits for the rotated Surface code.
Furthermore, we show that the scaling of the execution time (time calculated during the inference phase) is
linear with the number of qubits, which is a significant improvement compared to existing classical decoding
algorithms that scale polynomially with the number of qubits.

I. Introduction In this work, we investigate various designs of neural net-

Constant active quantum error correction (QEC) is re- work based decoders and compare them in terms of the de-
garded as necessary in order to perform reliable quantum coding accuracy, training time and execution time. We ana-
computation and storage due to the unreliable nature of cur- lyze how different neural network parameters like the size of
rent quantum technology. Qubits inadvertently interact with the training dataset, the structure and type of the neural net-
their environment even when no operation is applied, forc- work, the batch size and the learning rate used during train-
ing their state to change (decohere). Moreover, application ing, affect the accuracy, the training and execution time of
of imperfect quantum gates results in the introduction of er- such a decoder.
rors in the quantum system. Quantum error correction is the The rest of the paper is organized as follows: in sections
mechanism that reverses these errors and restores the state II and III we provide a brief introduction to quantum error
to the desired one. correction and neural networks, respectively. In section IV,
Quantum error correction involves an encoding and a de- we explain how the different designs of neural network de-
coding process. Encoding is used to enhance protection coders work, as found in literature. Section V, presents the
against errors by employing more resources (qubits). The guidelines of utilization of the neural network parameters.
encoding scheme that is used in this work is the rotated sur- In section VI, we provide the results with the best neural
face code [1]. Decoding is the process that is used to iden- network decoder for the different error models. Finally, in
tify the location and type of error that occurred. As part of section VII, we draw our conclusions about this research.
quantum error correction, decoding has a limited time budget
that is determined by the time of a single round of error cor- II. Quantum error correction
rection. In the case that the decoding time exceeds the time Similar to classical error correction, quantum error correc-
of quantum error correction, either the quantum operations tion encodes a set of unreliable physical qubits to a more
are stalled or a backlog of inputs to the decoding algorithm reliable qubit, known as logical qubit.
is created [2]. Many classical decoding algorithms have been Various quantum error correcting codes have been devel-
proposed with the most widely used being the Blossom algo- oped so far, but in this work we only consider the surface code
rithm [3]. Blossom algorithm has been shown to reach high [13–16], one of the most promising QEC codes. The surface
decoding accuracy, but its decoding time scales polynomially code is a topological stabilizer code that has a simple struc-
with the number of qubits [4], which can be problematic for ture, local interactions between qubits and is proven to have
large quantum systems needed to solve complex problems. high tolerance against errors [17]. It is usually defined over a
However, there are optimized versions of Blossom for topo- 2D lattice of qubits, although higher dimensions can be used.
logical codes, that report linear scaling with the number of In the surface code, a logical qubit consists of physi-
qubits [5] and even a parallel version stating that the average cal qubits that store quantum information, known as data
processing time per detection round is constant, independent qubits, and physical qubits that can be used to detect errors
of the size of the system [6]). We propose the employment of in the logical qubit through their measurement, known as an-
neural networks, since they exhibit constant execution time cillary or ancilla qubits (see Figure 1).
after being trained without any parallelization required and A logical qubit is defined by its logical operators (X̄,Z̄)
have been proven to provide better decoding performance that define how the logical state of the qubit can be changed.
than many classical decoding algorithms[7–12]. Any operator of the form X ⊗n or Z ⊗n that creates a chain
2

that span both boundaries of the same type can be regarded The circuit that is used to collect the ancilla measurements
as a logical operator, with n being the number of data qubits for the surface code is known as syndrome extraction cir-
that are included in the logical operator. Typically, the logical cuit. It is presented in Figure 2 and it signifies one round
operator with the smallest n is selected. For instance, in Fig- of error correction. It includes the preparation of the an-
ure 1, a logical X̄ can be performed by applying these three cilla in the appropriate state, followed by 4 (2) CNOT gates
bit-flip operations: X0 X3 X6 . that entangle the ancilla qubit with its 4 (2) neighboring data
An important feature of the surface code is the code dis- qubits and then the measurement of the ancilla qubit in the
tance. Code distance, referred to as d, describes the degree appropriate basis. The measurement result of the ancilla is a
of protection against errors. More accurately, is the minimum parity-check, which is a value that is calculated as the parity
number of physical operations required to change the logical between the state of the data qubits connected to it. Each an-
state [1] [18]. In surface code, the degree of errors (d.o.e.) cilla performs a parity-check of the form of X ⊗4 /Z ⊗4 (square
that can be successfully corrected, is calculated according to tile) and X ⊗2 /Z ⊗2 (semi-circle tile), as presented in Figure 2.
the following equation: When the state of the data qubits involved in a parity-check
has not changed, then the parity-check will return the same
value as in the previous error correction cycle. In the case
d−1 where the state of an odd number of data qubits involved in a
d.o.e. = (1)
2 parity-check is changed compared to the previous error cor-
rection cycle, the parity-check will return a different value
Therefore, for a d=3 code, only single X- and Z-type errors than the one of the previous cycle (0 ↔ 1). The change in a
(degree = 1) can be guaranteed to be corrected successfully. parity-check in consecutive error correction cycles is known
One of the smallest surface codes, which is currently being as a detection event.
experimentally implemented, is the rotated surface code pre-
Note that the parity-checks are used to identify errors in
sented in Figure 1. It consists of 9 data qubits placed at the
the data qubits without having to measure the data qubits
corners of the square tiles and 8 ancilla qubits placed inside
explicitly and collapse their state. The state of the ancilla
the square and semi-circle tiles. Each ancilla qubit can inter-
qubit at the end of every parity-check is collapsed through
act with its neighboring 4 (square tile) or 2 (semi-circle tile)
the ancilla measurement, but is initialized once more in the
data qubits.
beginning of the next error correction cycle [19].
The parity-checks must conform to the following rules: i)
must commute with each other, ii) must anti-commute with
errors and iii) must commute with the logical operators. An
easier way to describe these parity-checks is to view them as
a matrix, as presented in Figure 1. A matrix containing the
4 X-type parity checks and a matrix containing the 4 Z-type
parity-checks for the d=3 rotated surface code is presented.
The notation Di refers to the ith data qubit used in a given
parity-check. A 1 in the matrix represents that a data qubit
is involved in the parity check.
Gathering all measurement outcomes, forms the error
FIG. 1. Rotated surface code with code distance 3. Data qubits are syndrome. Surface codes can be decoded by collecting the
enumerated from 0 to 8 (D0-D8). X-type ancilla are in the center of ancilla measurements out of one or multiple rounds of error
the white tiles and Z-type ancilla are in the center of grey tiles. correction and providing them to a decoding algorithm that
identifies the errors and outputs data qubit corrections.
As mentioned, ancilla qubits are used to detect errors in We provide a simple example of decoding for the d=3 ro-
the data qubits. Although quantum errors are continuous, the tated surface code presented in Figure 3 on the left side. The
measurement outcome of each ancilla is discretized and then measurement of each parity-check operator returns a binary
forwarded to the decoding algorithm. Quantum errors are value indicating the absence or presence of a nearby data
discretized into bit-flip (X) and phase-flip (Z) errors, that can qubit error. Assume that a Z-type error has occurred on data
be detected by Z-type ancilla and X-type ancilla, respectively. qubit 4 and the initial parity-check has a value of 0. Ancilla
AX1 and AX2 will return a value of 1, which indicates that a
data qubit error has occurred in their proximity and ancilla
AX0 and AX3 will return a value of 0 according to the parity-
checks provided in Figure 1. However, due to the degenerate
nature of the surface code, two different but complementary
sets of errors can produce the same error syndrome. Regard-
FIG. 2. Syndrome extraction circuit for individual Z-type (left) and less of which of the two potential sets of errors has occurred,
X-type (right) ancilla, with the ancilla placed in the bottom. The the decoder is going to provide the same corrections every
ancilla qubit resides at the center of each grey or white tile, respec- time the same error syndrome is observed. Therefore, there
tively. is an assumption when the decoder is designed of which cor-
3

rections are going to be selected when each error syndrome tiple error correction cycles and corrections for data qubit
is observed. For example, when the decoder observes ancilla errors are more confidently proposed by observing the accu-
AX1 and AX2 returning a value of 1, then there are two sets mulation of errors throughout the error correction cycles.
of errors that might have occurred: a Z error at data qubit There exist classical algorithms that can decode efficiently
4 or a Z error at data qubit 2 and at data qubit 6. If the de- the surface code, however optimal decoding is a NP-hard
coder is programmed to output a Z-type correction at data problem. For example, maximum likelihood decoding (MLD)
qubit 4, then in one case the error is going to be erased, but searches for the most probable error that produced the er-
in the other case a logical error will be created. Based on that ror syndrome, whereas minimum weight perfect matching
fact, there is no decoder that can always provide the right set (MWPM) searches for the least amount of errors that pro-
of corrections, since there is a chance of misinterpretation of duced the error syndrome [5, 17]. MLD has a running time of
the error that have occurred. O(nχ3 ) according to [20] where χ is a parameter that con-
A single error on a data qubit will be signified by a pair of trols the approximation precision and the optimized version
neighboring parity-checks changing value from the previous of the Blossom algorithm shows linear scaling with the num-
error correction cycle. In the case where an error occurs at ber of qubits. Also, a parallel version of the Blossom algo-
the sides of the lattice, only one parity-check will provide in- rithm is described in [6] that claims constant execution time
formation about the error, because there is only one parity- regardless of the size of the system.
check available due to the structure of the rotated surface Furthermore, current qubit technology offers a small time
code. Multiple data qubit errors that occur near each other, budget for decoding, making most decoders unusable for
form one dimensional chains of errors which create only two near-term experiments. For example, an error correction cy-
detection events located at the endpoints of the chains (see cle of a d=3 rotated surface code with superconducting qubits,
Figure 3 on the left side and the red line on the right side). On takes ~700nsec [2], which provides the upper limit of the al-
the other hand, a measurement error, which is an error dur- lowed decoding time in this scenario. If noisy error syndrome
ing the measurement itself, is described as a chain between measurements are also assumed, then d error correction cy-
the same parity-check over multiple error correction cycles cles are required to provide the necessary information to the
(see the blue line in Figure 3 on the right side). This blue line decoder, so in this scenario ~2.1µsec will be the upper limit
represents an alternating pattern of the measurement values for the time budget of decoding.
(0-1-0 or 1-0-1) coming from the same parity-check for con- The way that Blossom algorithm performs the decoding
secutive error correction cycles. If such a pattern is identi- is through a MWPM in a graph that includes the detection
fied and is not correlated with a data qubit error, then it is events that have occurred during the error correction cycles
considered a measurement error, so no corrections should be taken into account for decoding. If the number of qubits is
applied. increased, then the graph will be bigger and the decoding
time will increase, assuming no parallelization.
An alternative decoding approach is to use neural net-
works to assist or perform the decoding procedure, since neu-
ral networks provide fast and constant execution time, while
maintaining high application performance. In this paper, we
are going to analyze decoders that include neural networks
and compare them to each other and to an un-optimized ver-
sion of the Blossom algorithm as described in [21].

III. Neural Networks

Artificial neural networks have been shown to reach high
FIG. 3. Rotated surface code with code distance 3. Left: Phase-flip application performance and constant execution time after
(Z) error at data qubit 4, which causes two neighboring parity checks being trained on a set of data generated by the application.
to return a value of 1 (detection events shown in red). Right: Three An artificial neural network is a collection of weighted in-
consecutive rounds of error correction. The red dots indicate de- terconnected nodes that can transmit signals to each other.
tection events that arise from a data qubit error and the blue dots The receiving node processes the incoming signal and sends
indicate detection events that arise from a measurement error. the processed result to its connected node(s). The processing
at each node is different based on the different neural net-
The parity-checks of the surface code protect the quan- work parameters being used.
tum state against quantum errors through continuous error In this work, we focus on two types of neural networks
correction and decoding. However, totally suppressing noise known as Feed-forward neural networks (FFNN) and Re-
is not achievable, because as mentioned the decoder might current neural networks (RNN). Feed-forward neural net-
misinterpret data qubit or measurement errors. Therefore, works are considered to be the simplest type of neural net-
there have been developed algorithmic techniques that in- work, allowing information to move strictly from input to
volve multiple error correction cycles to be run before cor- output, whereas recurrent neural networks are considered to
rections are proposed. In that way, measurement errors are be more sophisticated, including feedback loops. The simple
easily identified by observing the error syndrome out of mul- construction of FFNNs makes them extremely fast in appli-
4

the input gate [22]. The equations that describe the behavior
of all gates in the LSTM cell are described in Figure 6.

FIG. 4. The structure of an artificial neural network with an input

layer (x), a hidden layer (h) and an output layer (y). FIG. 6. Structure of the LSTM cell and equations that describe the
gates of an LSTM cell.

cations, however, RNNs are able to produce better results for The way that neural networks solve problems is not by ex-
more complex problems. plicit programming, rather “learning” the solution based on
In Feed-forward neural networks, input signals xi are given examples. There exist many ways to “teach” the neu-
passed to the nodes of the hidden layer hi and the output ral network how to provide the right answer, however in this
of each node in the hidden layer acts as an input to the nodes work we are focusing on supervised learning. Learning is a
of the following hidden layer, until the output of the nodes procedure which involves the creation of a map between an
of the last hidden layer is passed to the nodes of the output input and a corresponding output and in supervised learning
layer yi . The weighted connections between nodes of differ- the (input, output) pair is provided to the neural network.
ent layers are denoted as Wi and b is the bias of each layer. During training, the neural network adjusts its weights in
σ denotes the activation function that is selected, with popu- order to provide the correct output based on the given in-
lar options being the sigmoid, the hyperbolic tangent (tanh) put. Theoretically, at the end of training, the neural network
and the rectified linear unit (ReLU). The output of the FFNN should be able to infer the right output even for inputs that
presented in Figure 4 is calculated as follows: were not provided during training, which is known as gen-
eralization.
Training is stopped when the neural network can suffi-
~y = σ(Ŵ0 σ(Ŵh ~x + b~h ) + b~0 ) (2) ciently predict the right output to each training input. How-
ever, a definition of the closeness between the desired value
In recurrent neural networks there is feedback that takes and the predicted value needs to be defined. This metric is
into account output from previous time steps yt−1 , ht−1 . known as cost/loss function and guides the neural network
RNNs have a feedback loop at every node, which allows in- towards the desired outcome by estimating the closeness be-
formation to move in both directions. Due to the feedback tween the predicted and the desired value. The cost function
nature (see Figure 5), recurrent neural networks can identify is calculated at the end of every training iteration after the
temporal patterns of widely separated events in noisy input weights have been updated. The cost function that we used
streams. is known as mean squared error, which tries to minimize
the average squared error between the desired output and
the predicted output, given by

n
1X
cost = (Yi − Ŷi )2 (3)
n i=1
FIG. 5. A conceptual visualization of the recurrent nature of an RNN.
, where n is the number of data, Yi is the target value and
Ŷi is the predicted value.
In this work, Long Short-Term Memory (LSTM) cells are The procedure in which the weights are updated during
used as the nodes of recurrent neural networks (see Figure training in order to minimize the cost function is known as
6). In an LSTM cell there are extra gates, namely the input, backpropagation. Backpropagation is a method that cal-
forget and output gate that are used in order to decide which culates the gradient of the cost function with respect to the
signals are going to be forwarded to another node. W is the weights, through the process of stochastic gradient descent.
recurrent connection between the previous hidden layer and In order to be able to use neural networks to find solutions to
current hidden layer. U is the weight matrix that connects a variety of applications (linear and non-linear), it is required
the inputs to the hidden layer. Ce is a candidate hidden state to have a non-linear activation function at the processing
that is computed based on the current input and the previous step of every node. This function defines the contribution
hidden state. C is the internal memory of the unit, which is a of this node to the subsequent nodes that it is connected to.
combination of the previous memory, multiplied by the for- The activation function that was used in this work was the
get gate, and the newly computed hidden state, multiplied by Rectified Linear Unit.
5

IV. Designing neural network based decoders C. Evaluating performance

Neural network based decoders for quantum error correct- Designs for low level decoders typically include a single
ing codes have been recently proposed [7–12]. There are neural network. To obtain the predicted corrections for the
two categories in which they can be divided: i) decoders low level decoder, we sample from the probability distribu-
that search for exact corrections at the physical level and ii) tion that corresponds to the observed error syndrome for
decoders that search for corrections that restore the logical each data qubit, and predict whether a correction should
state. We are going to refer to the former ones as low level be applied at each data qubit. However, this prediction
decoders [9, 10] and the latter ones as high level decoders needs to be verified before being used as a correction, be-
[7, 8, 11, 12]. cause the proposed corrections must generate the same er-
Since there are multiple techniques in designing a neural ror syndrome as the one that was observed. Otherwise,
network based decoder, there is merit in figuring out which the corrections are not valid (see Figure 7a-b). Only when
is the best design strategy. To achieve that, we implemented the two error syndromes match, the predictions are used
both designs as found in the literature and investigated their as corrections on the data qubits (see Figure 7a-c). If the
differences and similarities. Furthermore, we evaluate the de- observed syndrome does not match the syndrome obtained
coding performance achieved with both decoders and inves- from the predicted corrections, then the predictions must be
tigate their training and execution time. We provide an anal- re-evaluated by re-sampling from the probability distribu-
ysis for various design parameters. tion. This re-evaluation step makes the decoding time non-
constant, which can be a big disadvantage. There are ways
A. Inputs/Outputs
to minimize the average amount of re-evaluations, however
this is highly influenced by the physical error rate, the code
Low level decoders take as input the error syndrome and distance and the strategy of re-sampling.
produce as output an error probability distribution for each In Figure 7, the decoding procedure of the low level de-
data qubit based on the observed syndrome. Therefore, a pre- coder is described with an example. On 7a, we present an
diction is made that attempts to correct exactly all physical observed error syndrome shown in red dots and the bit-flip
errors that have occurred. errors on physical data qubits (shown with X on top of them)
High level decoders take as input the error syndrome and that created that syndrome. On 7b, the decoder predicts a
produce as output an error probability for the logical state of set of corrections on physical data qubits and the error syn-
the logical qubit. Based on this scheme, the neural network drome resulting from these corrections is compared against
does not have to predict corrections for all data qubits, rather the observed error syndrome. As can be seen from 7a and
just for the state of the logical qubit, which makes the predic- 7b, the two error syndromes do not match therefore the pre-
tion simpler. This is due to the fact that there are only 4 op- dicted corrections are deemed invalid. On 7c, the decoder
tions as potential logical errors, I,
¯ X̄, Z̄, Ȳ , compared to the
predicts a different set of corrections and the corresponding
case of the low level decoder where the output is equivalent error syndrome to these corrections is compared against the
to the number of data qubits. Moreover, trying to correctly observed error syndrome. In the case of 7a and 7c, the pre-
predict each physical error requires a level of high granular- dicted error syndrome matches the observed one, therefore
ity which is not necessary for error correcting codes like the the corrections are deemed valid.
surface code.

B. Sampling and training process

During the sampling process, multiple error correction cy-
cles are run and the corresponding inputs and outputs for
each decoder are stored. Due to the degenerate nature of the
surface code, the same error syndrome might be produced by
different sets of errors. Therefore, we need to keep track of
the frequency of occurrence of each set of errors that provide
the same error syndrome.
For the low level decoder, based on these frequencies, we FIG. 7. Description of the decoding process of the low level decoder
create an error probability distribution for each data qubit for a d=5 rotated surface code. (a) Observed error syndrome shown
based on the observed error syndrome. For the high level de- in red dots and bit-flip errors on physical data qubits shown with X
coder, based on these frequencies, we create an error proba- on top of them. (b) Invalid data qubits corrections and the corre-
bility distribution for each logical state based on the observed sponding error syndrome. (c) Valid data qubits corrections and the
corresponding error syndrome.
error syndrome.
When sampling is terminated, we train the neural net-
work to map all stored inputs to their corresponding outputs. Designs for high level decoders typically involve two de-
Training is terminated when the neural network is able to coding modules that work together to achieve high speed and
correctly predict at least 99% of the training inputs. Further high level of decoding performance. Either both decoding
information about the training process are provided in sec- modules can be neural networks [8] or one can be a classical
tion V. module and the other can be a neural network[7, 11]. The
6

classical module of the latter design will only receive the er-
ror syndrome out of the last error correction cycle and pre-
dict a set of corrections. In our previous experimentation [7],
this classical module was called simple decoder. The correc-
tions proposed by the simple decoder do not need to exactly
match the errors that occurred, as long as the corrections cor-
respond to the observed error syndrome (valid corrections).
The other module which in both cases is a neural network,
should be trained to receive the error syndromes out of all FIG. 8. Description of the decoding process of the high level decoder
error correction cycles and predict whether the corrections for a d=5 rotated surface code. (a) Observed error syndrome shown
that are going to be proposed by the simple decoder are go- in red dots and bit-flip errors on physical data qubits shown with X
ing to lead to a logical error or not. In that case, the neural on top of them. (b) Corrections proposed by the simple decoder for
network outputs extra corrections, which are the appropriate the observed error syndrome. (c) Additional corrections in the form
logical operator that erases the logical error. The output of of the X̄ logical operator to cancel the logical error generated from
the proposed corrections of the simple decoder.
both modules is combined and any logical error created by
the corrections of the simple decoder will be canceled due to
the added corrections of the neural network (see Figure 8).
error correction cycles are run and decoding is applied
Furthermore, the simple decoder is purposely designed in
in frequent windows. Depending on the error model, a
the simplest way in order to remain fast, regardless of the
single error correction cycle might be enough to suc-
quality of proposed corrections. By adding the simple de-
cessfully decode, as in the case of perfect error syn-
coder alongside the neural network, the corrections can be
drome measurements (window = 1 cycle), or multiple
given at one step and the execution time of the decoder re-
error correction cycles might be required, as in the case
mains small, since both modules are fast and operate in par-
of imperfect error syndrome measurements (window
allel.
> 1 cycle). When the lifetime simulations are stopped,
In Figure 8, the decoding procedure of the high level de-
the decoding performance is evaluated as the ratio of
coder is described with an example. On 8a, we present an
the number of logical errors found over the number of
observed error syndrome shown in red dots and the bit-flip
windows run until the simulations are stopped.
errors on physical data qubits (shown with X on top of them)
that created that syndrome. On 8b, we present the decod- • The training time is the time required by the neural
ing of the classical module known as simple decoder. The network to adjust its weights in a way that the training
simple decoder receives the last error syndrome of the decod- inputs provide the corresponding outputs as provided
ing procedure and proposes corrections on physical qubits by by the training dataset and adequate generalization can
creating chains between each detection event and the near- be achieved.
est boundary of the same type as the parity-check type of the
detection event. In Figure 8b, the corrections on the physi- • The execution time is the time that the decoder needs
cal qubits are shown with X on top of them, indicating the to perform the decoding after being trained. It is cal-
way that the simple decoder functions. The simple decoder culated as the difference between the time when the
corrections are always deemed valid, due to the fact that the decoder receives the first error syndrome of the decod-
predicted and observed error syndrome always match based ing window and the time when it provides the output.
on the construction of the simple decoder. In the case of Fig-
A. Error model
ure 8a-b, the proposed corrections of the simple decoder are
going to lead to an X̄ logical error, therefore we use the neu- These decoders were tested for two error models, the de-
ral network to identify this case and propose the application polarizing error model and the circuit noise model.
of the X̄ logical operator as additional corrections to the sim- The depolarizing model assigns X,Z,Y errors with equal
ple decoder corrections, as presented in 8c. probability p/3, known as depolarizing noise, only on the data
qubits. No errors are inserted on the ancilla qubits and per-
V. Implementation parameters fect parity-check measurements are used. Therefore, only a
In this section, we implement and compare both types of single cycle of error correction is required to find all errors.
neural network based decoders as discussed in the previous The circuit noise model assigns depolarizing noise on the
section and argue about the better strategy to create such a data qubits and the ancilla qubits. Furthermore, each single-
decoder. The chosen strategy will be determined by investi- qubit gate is assumed perfect but is followed by depolarizing
gation of how different implementation parameters affect the noise with probability p/3 and each two-qubit gate is assumed
i) decoding performance, ii) training time and iii) execution perfect but is followed by a two-bit depolarizing map where
time. each two-bit Pauli has probability p/15, except the error-free
case, which has a probability of 1 − p. Depolarizing noise is
• The decoding performance indicates the accuracy of also used at the preparation of a state and the measurement
the algorithm during the decoding process. The typical operation with probability p, resulting in the wrong prepared
way that decoding performance is evaluated is through state or a measurement error, respectively. An important as-
lifetime simulations. In lifetime simulations, multiple sumption is that the error probability of a data qubit error is
7

equal to the probability of a measurement error, therefore d probabilities datasets approach. Each dedicated training
cycles of error correction are deemed enough to decode prop- dataset that was created by a specific physical error proba-
erly. bility is used to test the decoding performance at that same
physical error probability and the probabilities close to that,
B. Choosing the best dataset but not all physical probabilities tested. Moreover, by sam-
The best dataset for a neural network based decoder is pling, training and testing the performance for the same
the dataset that produces the best decoding performance.
physical error rate, the decoder has the most relevant infor-
Naively, one could suggest that including all possible error
mation to perform the task of decoding.
syndromes, would lead to the best decoding performance,
The first step when designing a neural network based
however, as the size of the quantum system increases, includ-
decoder is gathering data that will be used as the training
ing all error syndrome becomes impossible. Therefore, we
dataset. However, as the code distance increases, the size of
need to include as little but as diverse as possible error syn-
the space including all potential error syndromes gets expo-
dromes, which will provide the maximum amount of gener-
nentially large. Therefore, we need to decide at which point
alization, thus the best decoding performance, after training.
the sampling process is going to be terminated.
In our previous experimentation[7], we showed that sam-
Based on the sampling probability (physical error rate), dif-
pling at a single physical error rate that always produces
ferent error syndromes will be more frequent than others. We
the fewest amount of corrections, is enough to decode small chose to include the most frequent error syndromes in the
distance rotated surface codes with a decent level of decod- training dataset. In order to find the best possible dataset, we
ing performance. This concept of always choosing the fewer increase the dataset size until it stops yielding better results
amount of corrections is similar to the Minimum Weight Per- in terms of decoding performance. For each training dataset
fect Matching that Blossom algorithm uses. size, we train a neural network and evaluate the decoding
After sampling and training the neural network at a sin- performance.
gle physical probability, the decoder is tested against a large It is not straightforward to claim that the optimal size of
variety of physical error rates and its decoding performance a training dataset is found, because there is no way to en-
is observed. We call this approach, the single probability sure that we found the minimum number of training samples
dataset approach, because we create only one dataset based that provide the best weights for the neural network, there-
on a single physical error rate and test it against many. Us- fore generalization, after being perfectly trained. Thus, we
ing the single probability dataset approach to decode vari- rely heavily on the decoding performance that each training
ous physical error probabilities is not optimal, because when dataset achieves and typically use more training samples than
sampling at low physical error rates, less diverse samples are the least amount required.
collected, therefore the dataset is not diverse enough to cor-
rectly generalize to unknown training inputs. C. Structure of the neural network
The single probability approach is valid for a real exper- While investigating the optimal size of a dataset, some pre-
iment, since in an experiment there is a single physical er- liminary investigation of the structure has been done, how-
ror probability that the quantum system operates and at that ever only after the dataset is defined, the structure in terms
probability the sampling, training and testing of the decoder of layers and nodes is explored in depth (see Figure 9).
will occur. However, this is not a good strategy for testing A variety of different configurations of layers and nodes
the decoding performance over a wide range of error prob- needs to be tested, so that the configuration with the highest
abilities. This is attributed to the degenerate nature of the accuracy of training in the least amount of training time can
surface code, since different sets of errors generate the same be adopted. The main factors that affect the structure of the
error syndrome. One set of errors is more probable when the neural network are the size of the training dataset, the sim-
physical error rate is small and another when it is high. Based ilarity between the training samples and the type of neural
on the design principles of the decoder, only one of these sets network.
of errors, and always the same, are going to be selected when We found in our investigation that the number of layers
a given syndrome is observed regardless of the physical error selected for training are affected more by the samples, e.g.
rate. Therefore, training a neural network based decoder in the similarity of the input samples, and less by the size of
one physical error rate and testing its decoding performance the training dataset. The number of nodes of the last hidden
in a widely different physical error rate is not beneficial. The layer is selected to be equal to the number of output nodes.
main benefit of this approach lies in the fact that only a single The rest of the hidden layers were selected to have decreasing
neural network has to be trained and used to evaluate the de- number of nodes going from the first to the last layer, but we
coding performance for all the physical error rates that were do not claim that this is necessarily the optimal strategy.
tested. In the single probability dataset approach, the set with We implemented both decoder designs with feed-forward
the fewer errors was always selected, because this set is more and recurrent neural networks. The more sophisticated
probable for the range of physical error rates that we are in- recurrent neural network seems to outperform the feed-
terested in. forward neural network in both the depolarizing and the cir-
To avoid such violations, we created different datasets that cuit noise model. In Figure 10, it is evident that even for the
were obtained by sampling at various physical error rates small decoding problem of d=3 rotated surface code for the
and trained a different neural network at each physical error depolarizing error model, the RNN outperforms the FFNN in
rate taken into account. We call this approach, the multiple decoding performance. This is even more obvious at larger
8

TABLE I. Pseudo-threshold values for the tested decoders (d=3) un-

der depolarizing error model
Decoder Pseudo-threshold
FFNN lld 0.0911
RNN lld 0.0949
FFNN hld 0.0970
RNN hld 0.0969
Blossom 0.0825

FIG. 9. Different configurations of layers and nodes for the d=3 ro-
tated surface code for the depolarizing error model. The nodes of
the tested hidden layers are presented in the legend. Training stops
at 500 training epochs for all configurations, since a good indication
of the training accuracy achieved is evident by that point. Then, the
one that reached the highest training accuracy continues training
until the training accuracy cannot increase any more.

code distances and for the circuit noise model, where the re-
current neural network naturally fits better due to its nature. FIG. 10. Left: Comparison of decoding performance between Blos-
Moreover, training of the FFNN becomes much harder com- som algorithm, low level decoder and high level decoder for the
pared to the RNN as the size of the dataset increases, making d=3 rotated surface code for the depolarizing error model. Right:
the experimentation with FFNN even more difficult. Zoomed in at the region defined by the square.
The metric that we use to compare the different designs
is the pseudo-threshold. The pseudo-threshold is defined
as the highest physical error rate that the quantum device high level decoder is outperforming the low level decoder.
should operate in order for error correction to be beneficial. Although there are ways to increase the decoding perfor-
Operating at higher than the pseudo-threshold probabilities mance of the latter, mainly by re-designing the repetition step
will cause worse decoding performance compared to an un- to find the valid corrections in less repetitions, we found no
encoded qubit. The protection provided by error correction is merit in doing so, since the decoding performance would still
increasing as the physical error rate becomes smaller than the be similar to the high level decoder's and the repetition step
pseudo-threshold value, therefore a higher pseudo-threshold would still not be eliminated.
for a code distance signifies higher decoding performance. Based on these observations, the results presented in Fig-
The pseudo-threshold metric is used when a single code ures 13 and 14 in the Results section were obtained with the
distance is being tested. When a variety of code distances are high level decoder with recurrent neural networks.
investigated, then we use the threshold metric. The thresh-
D. Training process
old is a metric that represents the protection against noise for 1. Batch size
a family of error correcting codes, like the surface code. For Training in batches instead of the whole dataset at once,
the surface code, each code distance has a different pseudo- can be beneficial for the training accuracy and training time.
threshold value, but the threshold value of the code is only By training in batches, the weights of the neural network are
one. updated multiple times per training iteration, which typically
The pseudo-threshold values for all decoders investigated leads to faster convergence. We used batches of 1000 or 10000
in Figure 10 can be found as the points of intersection be- samples, based on the size of the training dataset.
tween the decoder curve and the black dashed line, which
represents the points where the physical error probability is 2. Learning rate
equal to the logical error probability (y = x). The pseudo- Another important parameter of training that can directly
thresholds acquired from Figure 10 are presented in Table I. affect the training accuracy and training time is the learning
The threshold value is defined as the point of intersection rate. The learning rate is the parameter that defines how big
of all the curves of multiple code distances, therefore cannot the updating steps will be for each weight at every training
be seen from Figure 10, since all curves involve d=3 decoders, iteration. Larger learning rates in the beginning of training
but can be found in Figures 13 14 for the depolarizing and can lead the training process to a minimum faster during gra-
circuit noise model, respectively. dient descent, whereas smaller learning rates near the end of
Another observation from Figure 10 and Table I is that the training can increase the training accuracy. Therefore, we
9

devise a strategy of a step-wise decrease of the learning rate

throughout the training process. If the training accuracy has
not increased after a specified number of training iterations
(e.g. 50), then the learning rate is decreased. The learning
rates used range from 0.01 to 0.0001.

3. Generalization
The training process should not only be focused on the cor-
rect prediction of known inputs, but also the correct predic-
tion of inputs unknown to training, known as generalization.
Without generalization, the neural network acts as a Look-
Up Table (LUT), which will lead to sub-optimal behavior as
the code distance increases. In order to achieve high level of
generalization, we continue training until the closeness be-
tween the desired and predicted value up to the 3rd decimal FIG. 11. Execution time for the high level decoder (hld) and the low
digit is higher than 95% over all training samples. level decoder (lld) for Feed-forward (FFNN) and Recurrent neural
networks (RNN) for d=3 rotated surface code for the depolarizing
4. Training and execution time error model.
Timing is a crucial aspect of decoding and in the case of
neural network decoders we need to minimize both the exe-
cution time and the training time as much as possible.
Moreover, the recurrent neural network typically uses
The training time is proportional to the size of the training
more weights compared to the feed-forward neural network,
dataset and the number of qubits. The number of qubits is in-
which translates to higher execution time. However, the de-
creasing in a quadratic fashion, 2d2 − 1, and the selected size
coding performance and the training accuracy achieved with
of the training dataset in our experimentation is increasing in
2 recurrent neural networks leads to better decoding perfor-
an exponential way, 2d −1 . Therefore, training time should mance. Thus, we decided to create high level decoders based
increase exponentially while scaling up. on recurrent neural networks while taking into account all
However, the platform that the training occurs, affects the parameters mentioned above.
the training time immensely, since training in one/multiple
The execution time for high level decoders appears to in-
CPU(s) or one/multiple GPU(s) or a dedicated chip in hard-
crease linearly with the number of qubits. This is justified
ware will result in widely different training times. The neural
by the fact that as the code distance increases, the operation
networks that were used to obtain the results in this work, re-
of the simple decoder does not require more time, since all
quired between half a day to 3 days, depending on the num-
detection events are matched in parallel and independently
ber of weights and the inputs/outputs of the neural network,
to each other, and the size of the neural network increases in
on a CPU with 28 hyper thread cores at 2GHz with 384GB of
such a way that only a linear increase in the execution time is
memory.
calculated. In Table II, we provide the calculated average time
In our simulations in a CPU, we observed the constant
of decoding a surface code cycle under depolarizing noise for
time behavior that was anticipated for the execution time,
all distances tested with the high level decoder with recurrent
however a realistic estimation taking into account all the de-
neural networks.
tails of a hardware implementation that such a decoder might
run, has not been performed by this or any other group yet.
The time budget for decoding is different for different qubit TABLE II. Average time for surface code cycle under depolarizing
technologies, however due to the inadequate fidelity of the error model
quantum operations, it is extremely small for the time being, Code distance Avg. time / cycle
for example ~700nsec for a surface code cycle with supercon- d=3 4.14msec
ducting qubits [19]. d=5 11.19msec
In Figure 11, we present the constant and non-constant ex- d=7 28.53msec
ecution time for the d=3 rotated surface code for the depolar- d=9 31.34msec
izing noise model with perfect error syndrome measurements
for the high level decoder and the low level decoder, respec-
tively. There are factors such as the number of qubits, the type
The low level decoder has to repeat its predictions before of neural network being used and the number of inputs/out-
it predicts a valid set of corrections which makes the exe- puts of the neural network that influence the execution time.
cution time non-constant. With careful design of the repe- The main advantage against classical algorithms is that the
tition step, the average number of predictions can decrease, execution time of such neural network based decoders is in-
however the execution time will remain non-constant. Based dependent of the physical error probability.
on the non-constant execution time and the inferior decod- In the following section we are presenting the results in
ing performance compared to the high level decoder, the low terms of the decoding performance for different code dis-
level decoder was rejected. tances.
10

VI. Results
As we previously mentioned, the way that decoding per- TABLE III. Pseudo-threshold values for the depolarizing error model
formance is tested is by running simulations that sweep a Decoder d=3 d=5 d=7 d=9
large amount of physical error rates and calculate the cor- Blossom 0.08234 0.10343 0.11366 0.11932
responding logical error rate for each of them. This type of Single prob. dataset 0.09708 0.10963 0.12447 N/A
simulations are frequently referred to as lifetime simulations Multiple prob. dataset 0.09815 0.12191 0.12721 0.12447
and the logical error is calculated as the ratio of logical errors
found over the error correction cycles performed to accumu-
late these logical errors.
The design of the neural network based decoder that was
used to obtain the results is described in Figure 12 for the
depolarizing and the circuit error model. For the case of the
depolarizing error model, neural network 1 is not used, so
the input is forwarded directly to the simple decoder since
perfect syndrome measurements are assumed. The decoding
process is similar to the one presented in Figure 8.
The decoding algorithm for the circuit noise model con-
sists of a simple decoder and 2 neural networks. Both neural
networks receive the error syndrome as input. Neural net-
work 1 predicts which detection events at the error syndrome
belong to data qubit errors and which belong to measurement
errors. Then, it outputs the error syndrome relieved of the FIG. 13. Decoding performance comparison between the high level
detection events that belong to measurement errors to the decoder trained on a single probability dataset, the high level de-
simple decoder. The simple decoder provides a set of correc- coder trained on multiple probabilities datasets and Blossom algo-
rithm for the depolarizing error model with perfect error syn-
tions based on the received error syndrome. Neural network
drome measurements. Each point has a confidence interval of 99.9%.
2 receives the initial error syndrome and predicts whether the
simple decoder will make a logical error and outputs a set of
corrections which combine with the simple decoder correc-
high level decoder is trained to identify the most frequently
tions at the output.
encountered error syndromes based on a given physical er-
ror rate, results in more accurate decoding information. An-
other reason for the improvement against the Blossom algo-
rithm, is the ability of identifying correlated errors (-iY=XZ).
For the depolarizing noise model with perfect error syndrome
measurements, the Blossom algorithm is proven to be near-
optimal, so we are not able to observe a large improvement in
the decoding performance. Furthermore, the comparison is
against the un-optimized version of Blossom algorithm [21],
therefore it is mainly performed to get a frame of reference
FIG. 12. The design for the high level decoder that was used for the rather than an explicit numerical comparison.
depolarizing and the circuit noise model. We observe that for the range of physical error rates that
we are interested in, which are below the pseudo-threshold,
A. Depolarizing error model the improvement against Blossom algorithm is reaching up
For the depolarizing error model, we used 5 training to 18.7%, 58.8% and 53.9% for code distance 3, 5 and 7, respec-
datasets that were sampled at these physical error rates : 0.2, tively for the smallest physical error probabilities tested.
0.15, 0.1, 0.08, 0.05. Perfect error syndrome measurements are The threshold of the rotated surface code for the depolar-
assumed, so the logical error rate can be calculated per error izing model has improved from 0.14 for the single probabil-
correction cycle. ity dataset approach to 0.146 for the multiple probabilities
In Table III, we present the pseudo-thresholds achieved datasets approach. The threshold of Blossom is calculated to
from the investigated decoders for the depolarizing error be 0.142.
model with perfect error syndrome measurements for dif-
ferent distances. As expected, when the distance increases, B. Circuit noise model
the pseudo-threshold also increases. Furthermore, the neu- For the circuit noise model, we used 5 training datasets
ral network based decoder with the multiple probabilities that were sampled at these physical error rates : 4.5x10−3 ,
datasets exhibits higher pseudo-threshold values, which is 1.5x10−3 , 8.0x10−3 , 4.5x10−4 , 2.5x10−4 . Since, imperfect er-
expected since it has more relevant information in its dataset. ror syndrome measurements are assumed the logical error
As can be seen from Figure 13, the multiple probabili- rate is calculated per window of d error correction cycles.
ties datasets approach is providing better decoding perfor- In Table IV, we present the pseudo-thresholds achieved for
mance for all code distances simulated. The fact that the the circuit noise model with imperfect error syndrome mea-
11

surements. Again, the neural network based decoder with to perform good decoding beyond d=7. The reason for this
multiple probabilities datasets is performing better than the exponential growth is due to the way that we provide the
single probability dataset. We were not able to use the Blos- data to the neural network. Currently, we gather all error
som algorithm with imperfect measurements for code dis- syndromes out of all the error correction cycles and create
tances higher than 3, therefore we decided not to include it. lists out of them. Then, we provide these lists to the recur-
However, we note that the results that were obtained are sim- rent neural network all-together. Since the recurrent neu-
ilar to the results in the literature corresponding to the circuit ral network can identify patterns both in space and time, we
noise model [23, 24]. also provide the error correction cycle that provided that er-
ror syndrome (time stamp of each error syndrome). Then,
the recurrent neural network is able to differentiate between
TABLE IV. Pseudo-threshold values for the circuit noise model consecutive error correction cycles and find patterns of er-
Decoder d=3 d=5 d=7
rors in them.
Single prob. dataset 3.99x10−4 9.23x10−4 N/A
In order to obtain efficient decoding regardless of the ex-
Multiple prob. dataset 4.44x10−4 1.12x10−3 1.66x10−3
ponentially large state space, we restrict the space that we
sample to the one containing the most frequent error syn-
dromes occurring at the specified sampling (physical) error
probability. However, even by employing such a technique,
it seems impossible to continue beyond d=7 for the circuit
noise model with the decoding approach that we used in this
work. At the circuit noise model for d=7 for example, we
gather error syndromes out of 10 error correction cycles and
each error syndrome contains 48 ancilla qubits. Therefore,
the full space that needs to be explored is 210∗48 , which is
infeasible.
A different approach that minimizes the space that the
neural network needs to search would be extremely valuable.
A promising idea would be to provide the error syndromes
of each error correction cycle one at a time, instead of giv-
ing them all-together, and keep an intermediate state of the
logical qubit.
FIG. 14. Decoding performance comparison between the high level
decoder trained on a single probability dataset and the high level
decoder trained on multiple probabilities datasets for the circuit VII. Conclusions
noise model with imperfect error syndrome measurements. Each This work focused on researching various design strategies
point has a confidence interval of 99.9%. for neural network based decoders. Such kind of decoders are
currently being investigated due to their good decoding per-
We observe from Figure 14 that the results with the mul- formance and constant execution time. They seem to have
tiple probabilities datasets for the circuit noise model are an upper limit at around 160 qubits, however by designing
significantly better, especially as the code distance is in- smarter approaches in the future, we can have neural net-
creased. The case of the d=3 is small and simple enough to work based decoders for larger quantum systems.
be solved equally well by both approaches. The increased de- We emphasized mainly on the design aspects and the pa-
coding performance achieved with the multiple probabilities rameters that affect the performance of the neural networks
datasets approach is based on the more accurate information and devised a detailed plan on how to approach them. We
for the physical error probability that is being tested. showed that we can have high decoding performance for
The threshold of the rotated surface code for the circuit quantum systems of about 100 qubits for both the depolar-
noise model has improved from 1.77x10−3 for the single izing and the circuit noise model. We showed that a neural
probability dataset approach to 3.2x10−3 for the multiple network based decoder that uses the neural network as an
probabilities datasets approach, that signifies that the use of auxiliary module to a classical decoder leads to higher de-
dedicated datasets when decoding a given physical error rate coding performance.
is highly advantageous. Furthermore, we presented the constant execution time of
As mentioned, the single probability dataset is collected such a decoder and showed that it increases linearly with
at a low physical error rate, for example around the pseudo- the code distance in our simulations. We compared differ-
threshold value. Therefore, the size of the training dataset ent types of neural networks, in terms of decoding perfor-
is similar for both the single and the multiple probabilities mance and execution time, concluding that recurrent neu-
datasets for the low physical error rates. For higher physical ral networks can be more powerful than feed-forward neural
error rates, we gather larger training datasets for the multiple networks for such applications.
probabilities datasets approach, which are also more relevant. Finally, we showed that having a dedicated dataset for the
The space that needs to be sampled is getting exponentially physical error rate that the quantum system operates can in-
larger to a point that is infeasible to gather enough samples crease the decoding performance.
12

References [13] A. Kitaev, “Fault-tolerant quantum computation by anyons,”

[1] D. Gottesman, Stabilizer Codes and Quantum Error Correction. Annals of Physics, vol. 303, no. 1, pp. 2 – 30, 2003. [Online].
Caltech Ph.D. Thesis, 1997. Available: http://www.sciencedirect.com/science/article/pii/
[2] T. E. O’Brien, B. Tarasinski, and L. DiCarlo, “Density-matrix S0003491602000180
simulation of small surface codes under current and projected [14] M. H. Freedman and D. A. Meyer, “Projective plane and pla-
experimental noise,” npj Quantum Information, vol. 3, 2017. nar quantum codes,” Foundations of Computational Mathemat-
[3] J. Edmonds, “Paths, trees, and flowers,” Canadian Journal of ics, vol. 1, no. 3, pp. 325–332, 2001.
Mathematics, vol. 17, pp. 449–467, 1965. [Online]. Available: [15] S. B. Bravyi and A. Y. Kitaev, “Quantum codes on a lattice with
http://dx.doi.org/10.4153/CJM-1965-045-4 boundary,” arXiv preprint quant-ph/9811052, 1998.
[4] E. Dennis, A. Kitaev, A. Landahl, and J. Preskill, “Topological [16] H. Bombin and M. A. Martin-Delgado, “Optimal resources
quantum memory,” Journal of Mathematical Physics, vol. 43, for topological two-dimensional stabilizer codes: Comparative
no. 9, pp. 4452–4505, 2002. [Online]. Available: http: study,” Physical Review A, vol. 76, no. 1, p. 012305, 2007.
//dx.doi.org/10.1063/1.1499754 [17] A. G. Fowler, A. M. Stephens, and P. Groszkowski, “High-
[5] A. G. Fowler, “Optimal complexity correction of correlated er- threshold universal quantum computation on the surface
rors in the surface code,” arXiv:1310.0863, 2013. code,” Physical Review A, vol. 80, no. 5, p. 052312, 2009.
[6] A. Fowler, “Minimum weight perfect matching of fault-tolerant [18] D. A. Lidar and T. A. Brun, Quantum Error Correction. Cam-
topological quantum error correction in average o(1) parallel bridge University Press, 2013.
time,” arXiv:1307.1740v3, 2014. [19] R. Versluis, S. Poletto, N. Khammassi, B. Tarasinski, N. Haider,
[7] S. Varsamopoulos, B. Criger, and K. Bertels, “Decoding small D. J. Michalak, A. Bruno, K. Bertels, and L. DiCarlo, “Scalable
surface codes with feedforward neural networks,” Quantum quantum circuit and control for a superconducting surface
Science and Technology, vol. 3, no. 1, p. 015004, 2018. [Online]. code,” Phys. Rev. Applied, vol. 8, p. 034021, Sep 2017. [Online].
Available: http://stacks.iop.org/2058-9565/3/i=1/a=015004 Available: https://link.aps.org/doi/10.1103/PhysRevApplied.8.
[8] P. Baireuther, T. E. O’Brien, B. Tarasinski, and C. W. J. 034021
Beenakker, “Machine-learning-assisted correction of cor- [20] S. Bravyi, M. Suchara, and A. Vargo, “Efficient algorithms
related qubit errors in a topological code,” Quan- for maximum likelihood decoding in the surface code,” Phys.
tum, vol. 2, p. 48, Jan. 2018. [Online]. Available: Rev. A, vol. 90, p. 032326, Sep 2014. [Online]. Available:
https://doi.org/10.22331/q-2018-01-29-48 https://link.aps.org/doi/10.1103/PhysRevA.90.032326
[9] G. Torlai and R. G. Melko, “Neural decoder for topological [21] V. Kolmogorov, “Blossom V: a new implementation of a
codes,” Phys. Rev. Lett., vol. 119, p. 030501, 7 2017. [Online]. minimum cost perfect matching algorithm,” Mathematical
Available: https://link.aps.org/doi/10.1103/PhysRevLett.119. Programming Computation, vol. 1, pp. 43–67, 2009. [Online].
030501 Available: http://dx.doi.org/10.1007/s12532-009-0002-8
[10] S. Krastanov and L. Jiang, “Deep neural network probabilistic [22] I. Changhau, “Lstm and gru – formula summary,” July
decoder for stabilizer codes,” Scientific Reports, vol. 7, no. 11003, 2017. [Online]. Available: https://isaacchanghau.github.io/
2017. post/lstm-gru-formula/
[11] C. Chamberland and P. Ronagh, “Deep neural decoders for [23] A. G. Fowler, A. C. Whiteside, A. L. McInnes, and A. Rabbani,
near term fault-tolerant experiments,” Quantum Science and “Topological code autotune,” PHYSICAL REVIEW X, vol. 2, p.
Technology, vol. 3, no. 4, p. 044002, 2018. [Online]. Available: 041003, 2012.
http://stacks.iop.org/2058-9565/3/i=4/a=044002 [24] A. G. Fowler, A. M. Stephens, and P. Groszkowski, “High
[12] M. Maskara, A. Kubica, and T. Jochym-O’Connor, “Advantages threshold universal quantum computation on the surface
of versatile neural-network decoding for topological codes,” code,” arXiv:0803.0272, 2012.
arXiv:1802.08680, 2018.

View publication stats

Higgott Thesis Final Version
No ratings yet
Higgott Thesis Final Version
266 pages
Information Theory &amp Coding (ECE) by Nitin Mittal
0% (1)
Information Theory &amp Coding (ECE) by Nitin Mittal
15 pages
Quantum Error Correction For Quantum Memories
No ratings yet
Quantum Error Correction For Quantum Memories
47 pages
Quantum Computation With Topological Codes - From Qubit To Topological Fault-Tolerance
No ratings yet
Quantum Computation With Topological Codes - From Qubit To Topological Fault-Tolerance
155 pages
Decoding Small Surface Codes With Feedforward Neural Networks
No ratings yet
Decoding Small Surface Codes With Feedforward Neural Networks
12 pages
Learning High-Accuracy Error Decoding For Quantum Processors
No ratings yet
Learning High-Accuracy Error Decoding For Quantum Processors
28 pages
Daniel Gottesman Book
No ratings yet
Daniel Gottesman Book
322 pages
Qecgpt: Decoding Quantum Error-Correcting Codes With Generative Pre-Trained Transformers
No ratings yet
Qecgpt: Decoding Quantum Error-Correcting Codes With Generative Pre-Trained Transformers
18 pages
Quantum Error Correction Below The Surface Code Threshold
No ratings yet
Quantum Error Correction Below The Surface Code Threshold
27 pages
High-Distance Error-Correcting Codes For Fermion-to-Qubit Mappings in 2D and 3D
No ratings yet
High-Distance Error-Correcting Codes For Fermion-to-Qubit Mappings in 2D and 3D
18 pages
Neural Decoding for Quantum Codes
No ratings yet
Neural Decoding for Quantum Codes
15 pages
Quantum Error Correction Fault Tolerance Ultra
No ratings yet
Quantum Error Correction Fault Tolerance Ultra
3 pages
Overview of State of The Art 2D QEC
No ratings yet
Overview of State of The Art 2D QEC
106 pages
Advantage of Quantum Neural Networks As Quantum Information Decoders
No ratings yet
Advantage of Quantum Neural Networks As Quantum Information Decoders
25 pages
PHY265 Lecture Notes: Introducing Quantum Error Correction: A. C. Quillen March 17, 2025
No ratings yet
PHY265 Lecture Notes: Introducing Quantum Error Correction: A. C. Quillen March 17, 2025
57 pages
Linear Block Codes Complete Chapter Notes
No ratings yet
Linear Block Codes Complete Chapter Notes
31 pages
Delfosse Lecture 4
No ratings yet
Delfosse Lecture 4
37 pages
Day 1 - Presentation 2
No ratings yet
Day 1 - Presentation 2
29 pages
Topological Quantum Memory
No ratings yet
Topological Quantum Memory
54 pages
Quantum Error Correction in MERA
No ratings yet
Quantum Error Correction in MERA
115 pages
Cong Thesis Final
No ratings yet
Cong Thesis Final
220 pages
Quantum Computing Research
No ratings yet
Quantum Computing Research
1 page
A Course in Error-Correcting Codes - Justesen and Høholdt
100% (1)
A Course in Error-Correcting Codes - Justesen and Høholdt
204 pages
Learning-Based Approach For Designing Error-Correcting Codes
No ratings yet
Learning-Based Approach For Designing Error-Correcting Codes
35 pages
The Domain Wall Color Code: Konstantin - Tiurev@quantumsimulations - de
No ratings yet
The Domain Wall Color Code: Konstantin - Tiurev@quantumsimulations - de
17 pages
Lect 04
No ratings yet
Lect 04
3 pages
Simultaneous Discovery Quantum Error Code Encoders
No ratings yet
Simultaneous Discovery Quantum Error Code Encoders
17 pages
Delfosse Lecture 3
No ratings yet
Delfosse Lecture 3
44 pages
Leveraging Biased Noise For More Efficient Quantum Error Correction at The Circuit-Level With Two-Level Qubits
No ratings yet
Leveraging Biased Noise For More Efficient Quantum Error Correction at The Circuit-Level With Two-Level Qubits
24 pages
Quantum Error Correction Below The Surface Code Threshold: Article
No ratings yet
Quantum Error Correction Below The Surface Code Threshold: Article
8 pages
Error Coding Reed Solomon
No ratings yet
Error Coding Reed Solomon
28 pages
Network-Integrated Decoding System For Real-Time Quantum Error Correction With Lattice Surgery
No ratings yet
Network-Integrated Decoding System For Real-Time Quantum Error Correction With Lattice Surgery
13 pages
Introducing Quantum Error Correction
No ratings yet
Introducing Quantum Error Correction
46 pages
Application of Vector Space
No ratings yet
Application of Vector Space
4 pages
Sequential Decoding of The XYZ Hexagonal Stabilizer Code:,, Y Zzy
No ratings yet
Sequential Decoding of The XYZ Hexagonal Stabilizer Code:,, Y Zzy
14 pages
Machine Learning Logical Gates For Quantum Error Correction
No ratings yet
Machine Learning Logical Gates For Quantum Error Correction
17 pages
Correlated Decoding of Logical Algorithms With Transversal Gates
No ratings yet
Correlated Decoding of Logical Algorithms With Transversal Gates
19 pages
s41586 024 08449 y - Reference
No ratings yet
s41586 024 08449 y - Reference
14 pages
An Introduction To Quantum Error Correction and Fault-Tolerant Quantum Computation
No ratings yet
An Introduction To Quantum Error Correction and Fault-Tolerant Quantum Computation
46 pages
Magic State Injection On The Rotated Surface Code
No ratings yet
Magic State Injection On The Rotated Surface Code
8 pages
Guillaud - Repetition Cat Qubits For Fault-Tolerant Quantum Computation
No ratings yet
Guillaud - Repetition Cat Qubits For Fault-Tolerant Quantum Computation
23 pages
Information Theory, Coding and Cryptography Unit-3 by Arun Pratap Singh
50% (4)
Information Theory, Coding and Cryptography Unit-3 by Arun Pratap Singh
64 pages
Quantum Error Correction Insights
No ratings yet
Quantum Error Correction Insights
1 page
A New Efficient Decoder of Linear Block Codes Based On Ensemble Learning Methods
No ratings yet
A New Efficient Decoder of Linear Block Codes Based On Ensemble Learning Methods
11 pages
Quantum Error Correction Advances
No ratings yet
Quantum Error Correction Advances
10 pages
Geometrical Approach To Logical Qubit Fi
No ratings yet
Geometrical Approach To Logical Qubit Fi
14 pages
Quantum Error Correction Codes
No ratings yet
Quantum Error Correction Codes
22 pages
SciPostPhysLectNotes 49
No ratings yet
SciPostPhysLectNotes 49
68 pages
Hyperbolic Floquet Codes
No ratings yet
Hyperbolic Floquet Codes
10 pages
Advances in Error Mitigation For Noisy Intermediate-Scale Quantum Processors
No ratings yet
Advances in Error Mitigation For Noisy Intermediate-Scale Quantum Processors
5 pages
Benchmarking Machine Learning Models For Quantum Error Correction
No ratings yet
Benchmarking Machine Learning Models For Quantum Error Correction
19 pages
Unit 3 10 Error
No ratings yet
Unit 3 10 Error
93 pages
Artificial Intelligence For Quantum Error Correction: A Comprehensive Review
No ratings yet
Artificial Intelligence For Quantum Error Correction: A Comprehensive Review
20 pages
The Idiots Guide To Quantum Error Correction
No ratings yet
The Idiots Guide To Quantum Error Correction
38 pages
Quantum Error Correction: An Introductory Guide
No ratings yet
Quantum Error Correction: An Introductory Guide
29 pages
Stability - Ai Image, @quantshah
No ratings yet
Stability - Ai Image, @quantshah
22 pages
Neural Error Mitigation
No ratings yet
Neural Error Mitigation
20 pages
Quantum Error-Correction Basics
No ratings yet
Quantum Error-Correction Basics
11 pages
Nptel: Coding Theory - Video Course
No ratings yet
Nptel: Coding Theory - Video Course
4 pages
High-Threshold and Low-Overhead Fault-Tolerant Quantum Memory
No ratings yet
High-Threshold and Low-Overhead Fault-Tolerant Quantum Memory
11 pages
Quantum Error Suppression in Surface Codes
No ratings yet
Quantum Error Suppression in Surface Codes
7 pages
Itc Information Theory and Coding Dec 2020
No ratings yet
Itc Information Theory and Coding Dec 2020
2 pages
Low Universities
No ratings yet
Low Universities
9 pages
Exponential Suppression of Bit or Phase Flip Errors With Repetitive Error Correction
No ratings yet
Exponential Suppression of Bit or Phase Flip Errors With Repetitive Error Correction
32 pages
Weiss Sievert Lecture
No ratings yet
Weiss Sievert Lecture
25 pages
Hamming Codes for Math Students
100% (2)
Hamming Codes for Math Students
9 pages
Corrección Eficaz Del Error Cuántico de La Difamación Inducida Por Un Fluctuante Común
No ratings yet
Corrección Eficaz Del Error Cuántico de La Difamación Inducida Por Un Fluctuante Común
6 pages
Introduction To Coding Theory: Basic Codes and Shannon'S Theorem
No ratings yet
Introduction To Coding Theory: Basic Codes and Shannon'S Theorem
7 pages
Error Control Coding Guide
No ratings yet
Error Control Coding Guide
70 pages
Unit 1 (Pulse Modulation)
No ratings yet
Unit 1 (Pulse Modulation)
27 pages
Linear Block Code: by Fikreselam Gared (PHD)
No ratings yet
Linear Block Code: by Fikreselam Gared (PHD)
44 pages
Digital Communications Course File
No ratings yet
Digital Communications Course File
15 pages
LDPC Coded MIMO-CEM System Analysis
No ratings yet
LDPC Coded MIMO-CEM System Analysis
4 pages
Final Exam - Cheatsheet
No ratings yet
Final Exam - Cheatsheet
3 pages
Interference Alignment at Finite SNR For Time-Invariant Channels
No ratings yet
Interference Alignment at Finite SNR For Time-Invariant Channels
5 pages
Cyclic Codes - Detailed Study Notes
No ratings yet
Cyclic Codes - Detailed Study Notes
24 pages
SVM Ecc
No ratings yet
SVM Ecc
6 pages
Integrated Circuits and Applications: CO1 CO2 CO3 CO4 CO5
No ratings yet
Integrated Circuits and Applications: CO1 CO2 CO3 CO4 CO5
72 pages
Digital Communication Systems by Simon Haykin-107
No ratings yet
Digital Communication Systems by Simon Haykin-107
6 pages
Expander Codes
No ratings yet
Expander Codes
13 pages
Neural Network Decoders for Surface Codes
No ratings yet
Neural Network Decoders for Surface Codes
13 pages
Decoding of The Extended Golay Code by The Simplified Successive-Cancellation List Decoder Adapted To Multi-Kernel Polar Codes
No ratings yet
Decoding of The Extended Golay Code by The Simplified Successive-Cancellation List Decoder Adapted To Multi-Kernel Polar Codes
9 pages
Generalized Reed-Solomon Codes: 5.1 Basics
No ratings yet
Generalized Reed-Solomon Codes: 5.1 Basics
13 pages
On The Theory of Low-Density Convolutional Codes I
No ratings yet
On The Theory of Low-Density Convolutional Codes I
31 pages
DC 2 Marks Q&A
No ratings yet
DC 2 Marks Q&A
29 pages
An Fpga Implementation of Successive Cancellation List Decoding For Polar Codes
No ratings yet
An Fpga Implementation of Successive Cancellation List Decoding For Polar Codes
85 pages
Algorithmic Results in List Decoding
No ratings yet
Algorithmic Results in List Decoding
91 pages

Neural Network Decoders for Surface Codes

Uploaded by

Neural Network Decoders for Surface Codes

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Designing neural network based decoders for surface codes

Preprint · November 2018

Savvas Varsamopoulos Koen Bertels

SEE PROFILE SEE PROFILE

The user has requested enhancement of the downloaded file.

Savvas Varsamopoulos,1, 2 Koen Bertels,1, 2 and Carmen G. Almudever1, 2

I. Introduction In this work, we investigate various designs of neural net-

III. Neural Networks

FIG. 4. The structure of an artificial neural network with an input

IV. Designing neural network based decoders C. Evaluating performance

B. Sampling and training process

TABLE I. Pseudo-threshold values for the tested decoders (d=3) un-

devise a strategy of a step-wise decrease of the learning rate

References [13] A. Kitaev, “Fault-tolerant quantum computation by anyons,”

View publication stats

You might also like