Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
57 views231 pages

Diss Helmling

The dissertation by Michael Helmling presents advancements in error-correction decoding through mathematical programming, focusing on integer programming and linear programming relaxations. It introduces novel models, algorithms, and theoretical results, including a combinatorial LP decoder for turbo codes and a fast branch-and-cut algorithm for maximum-likelihood decoding of binary linear codes. The work is supported by seven revised papers published in peer-reviewed journals, along with a comprehensive introduction to the relevant mathematical optimization and coding theory fundamentals.

Uploaded by

davidwillems2205
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views231 pages

Diss Helmling

The dissertation by Michael Helmling presents advancements in error-correction decoding through mathematical programming, focusing on integer programming and linear programming relaxations. It introduces novel models, algorithms, and theoretical results, including a combinatorial LP decoder for turbo codes and a fast branch-and-cut algorithm for maximum-likelihood decoding of binary linear codes. The work is supported by seven revised papers published in peer-reviewed journals, along with a comprehensive introduction to the relevant mathematical optimization and coding theory fundamentals.

Uploaded by

davidwillems2205
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 231

Advances in

Mathematical Programming-Based
Error-Correction Decoding

Michael Helmling // Dissertation // Juli 2015 // Universität Koblenz-Landau


Cover art: Viola Simon, Binary Channel, b/w photograph, 2015. Included with kind permission
of the artist.
Advances in Mathematical Programming-Based
Error-Correction Decoding

von

Michael Helmling
geb. am 19. Februar 1986 in Kaiserslautern

Angenommene Dissertation zur Erlangung des akademischen Grades eines Doktors


der Naturwissenschaften

Fachbereich 3: Mathematik/Naturwissenschaften
Universität Koblenz-Landau

Gutachter:
Prof. Dr. Stefan Ruzika
Prof. Dr. Øyvind Ytrehus
Prof. Dr. Pascal O. Vontobel

Datum der mündlichen Prüfung: 17. Juli 2015


Abstract

The formulation of the decoding problem for linear block codes as an integer program (IP)
with a rather tight linear programming (LP) relaxation has made a central part of channel
coding accessible for the theory and methods of mathematical optimization, especially integer
programming, polyhedral combinatorics and also algorithmic graph theory, since the important
class of turbo codes exhibits an inherent graphical structure.
We present several novel models, algorithms and theoretical results for error-correction de-
coding based on mathematical optimization. Our contribution includes a partly combinatorial
LP decoder for turbo codes, a fast branch-and-cut algorithm for maximum-likelihood (ML)
decoding of arbitrary binary linear codes, a theoretical analysis of the LP decoder’s performance
for 3-dimensional turbo codes, compact IP models for various heuristic algorithms as well as
ML decoding in combination with higher-order modulation, and, finally, first steps towards an
implementation of the LP decoder in specialized hardware.
The scientific contributions are presented in the form of seven revised reprints of papers that
appeared in peer-reviewed international journals or conference proceedings. They are accom-
panied by an extensive introductory part that reviews the basics of mathematical optimization,
coding theory, and the previous results on LP decoding that we rely on afterwards.

iii
Acknowledgements

Every work of an individual is the work of many. This is especially true for a PhD thesis, which
is why I devote my greatest thanks to all of my “many”, who supported me throughout the
eventful period of my life from which this thesis has emerged, non-exhaustively (sorry for
everyone I forgot) including my family, friends, colleagues and co-authors (this listing is by
no means without overlap!). Very special thanks go to my supervisor Stefan (Ruzika), who in
my view rather belongs to the friend or colleague than boss category, to Eirik, whom to work
with is always a pleasure and who has been extremely supportive throughout my entire time
as a PhD student, and to Stefan (Scholl), my colleauge from microelectronic system design who
has always been a patient contact for questions related to communications. I am also much
obliged to my “predecessor” Akın, who gently introduced me into the subject during the period
in which I was allowed to be his research assistant, and the remaining co-authors Frank, Alex
and Florian.
The optimization group of the University of Kaiserslautern with which I was during the first
half of my PhD deserves special thanks for being such a great environment, not only for work
but also for social activities, as well as the ever-growing group at the University of Koblenz.
This work would not exist without financial support by different institutions. I would like to
gratefully thank the Center for Mathematical and Computational Modelling (CM)², the German
Research Council (DFG; grant RU-1524/2-1) and the German Academic Exchange Service (DAAD)
for funding my work and enabling me to be part of an international research community by
supporting numerous visits of conferences and foreign universities.
You would not be able to read this book if it hadn’t been written down. While the typesetting
system called LATEX that I availed myself of for this task does cost nerves from time to time,
putting a mathematical text (and figures!) down on paper without it would be true horror
without question. I hence express my gratitude to the huge community that makes all this
software available for free. Regarding the process of reading this book, a special thank is
expedient to everyone who helped proofreading the text, including the anonymous referees of
the papers that constitute its contributions.
Last but not least, let me especially thank you, dear reader, for showing interest in the results
of a sometimes escapist-seeming work.

v
Contents
Abstract iii

Acknowledgements v

1 Introduction 1

I Foundations 3

2 Notation and Preliminaries 7

3 Optimization Background 9
3.1 Polyhedral Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2 Linear Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3 Integer Programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3.4 Combinatorial Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4 Coding Theory Background 23


4.1 System Setup and the Noisy-Channel Coding Theorem . . . . . . . . . . . . . 23
4.2 MAP and ML Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.3 The AWGN Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4.4 Binary Linear Block Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.5 Convolutional and Turbo Codes . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.6 Non-Binary Codes and Higher-Order Modulation . . . . . . . . . . . . . . . . 33

5 Optimization Into Coding: The Connection 35


5.1 ML Decoding as Integer Program . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.2 LP Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.3 Adaptive LP Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
5.4 Analysis of LP Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.5 LP Decoding of Turbo Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

II Contributions 47

6 Paper I – Mathematical Programming Decoding of Binary Linear Codes: Theory


and Algorithms 49
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

vii
Contents

6.2 Basics and Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52


6.3 Complexity and Polyhedral Properties . . . . . . . . . . . . . . . . . . . . . . . 56
6.4 Basics of LPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
6.5 LPD Formulations for Various Code Classes . . . . . . . . . . . . . . . . . . . 62
6.6 Efficient LP Solvers for BLPD . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.7 Improving the Error-Correcting Performance of BLPD . . . . . . . . . . . . . . 75
6.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

7 Paper II – ML vs. BP Decoding of Binary and Non-Binary LDPC Codes 89


7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
7.2 Binary and Non-Binary LDPC Codes . . . . . . . . . . . . . . . . . . . . . . . 93
7.3 IP-based ML Decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
7.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
7.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

8 Paper III – Integer Programming as a Tool for Analysis of Channel Codes 105
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
8.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
8.3 Fundamentals of Solving IP Problems . . . . . . . . . . . . . . . . . . . . . . . 110
8.4 IP Formulations for Linear Block Codes . . . . . . . . . . . . . . . . . . . . . . 111
8.5 Specialized IP Formulation for Turbo Codes . . . . . . . . . . . . . . . . . . . 114
8.6 Simulation Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
8.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

9 Paper IV – Towards an Exact Combinatorial Algorithm for LP Decoding of Turbo


Codes 123
9.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
9.2 Background and Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
9.3 The LP Relaxation and Conventional Solution Methods . . . . . . . . . . . . . 131
9.4 An Equivalent Problem in Constraints Space . . . . . . . . . . . . . . . . . . . 132
9.5 The Complete Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
9.6 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
9.7 Improving Error-Correcting Performance . . . . . . . . . . . . . . . . . . . . . 145
9.8 Conclusion and Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

10 Paper V – Towards Hardware Implementation of the Simplex Algorithm for LP


Decoding 149
10.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
10.2 The Simplex Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
10.3 Designing Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

viii
Contents

10.4 Simplex Performance Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157


10.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
10.6 Outlook and Future Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

11 Paper VI – Efficient Maximum-Likelihood Decoding of Linear Block Codes on


Binary Memoryless Channels 165
11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
11.2 Notation and Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
11.3 Basic Branch-and-Bound Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 170
11.4 Improvements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
11.5 Minimum Distance Computation . . . . . . . . . . . . . . . . . . . . . . . . . 175
11.6 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178

12 Paper VII – Minimum Pseudoweight Analysis of 3-Dimensional Turbo Codes 181


12.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
12.2 Coding Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
12.3 LP Decoding of 3D-TCs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
12.4 Finite Graph Covers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
12.5 Ensemble-Average Pseudoweight Enumerators . . . . . . . . . . . . . . . . . . 191
12.6 Searching for the Minimum Pseudoweight . . . . . . . . . . . . . . . . . . . . 195
12.7 Numerical Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
12.8 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
12.A Proof of Proposition 12.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
12.B Proof of Proposition 12.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203
12.C Proof of Lemma 12.5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
12.D Proof of Lemma 12.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208

III Closing 211

13 Conclusions and Future Work 213

Bibliography for Parts I and III 215

Curriculum Vitae 219

ix
Chapter 1

Introduction

The use of error-correcting codes in digital communication systems for the purpose of forward
error correction is one of the essential pillars of modern information technology, as most
applications of microelectronic devices rely on the assumption that data can be transmitted
practically error-free.
Coding is the mathematical answer to the problem of unavoidable noise in data transmission,
be it due to environmental influences on the carrier medium (e.g. background noise, physical
obstacles, …), technical limitations of the transmitter and receiver electronics, or the like. The
key idea is to include redundancy in the messages before they are sent in order to make them
robust against the noise introduced during transmission. In mathematical terms, this is usually
accomplished by a linear map, called the code, operating on fixed-sized words of the input.
The receiver contains a complementary decoder that estimates the sent data, based on a given
input signal it reads from the channel. If code and decoder are well designed, the end-to-end
probability of error can in theory be made arbitrarily small, as long as the coding rate is below
the capacity of the channel.
For a given code, an optimal decoder would always output the maximum a-posteriori probability
(MAP) codeword, that is, the codeword that was sent with highest probability, given the observed
noisy output of the transmission channel and a probabilistic model of that channel. Under
certain assumptions (which are always met in the context of this thesis), the MAP codeword
is equivalent to the maximum-likelihood (ML) codeword, which is the one maximizing the
corresponding likelihood function, i.e., the probability of the received noisy message given that a
specific codeword was sent. One notorious problem when it comes to practical implementations
of communication systems lies in the complexity of the ML decoding problem, which appears
to be increasingly hard, the stronger the code itself becomes. While specific classes of codes
along with corresponding decoders that achieve a remarkable error-correction performance
have been developed over the last two decades—most prominently, low-density parity-check
(LDPC) and turbo codes [1, 2]1 –their decoding algorithms based on iterative message-passing
are of heuristic nature, which makes them both suboptimal (they are not ML decoders) and
hard to analyze theoretically.
In contrast, linear programming decoding algorithms, which are based on viewing the decoding
problem as a combinatorial optimization problem formulated as integer linear program (IP),
1
The reference list for this part of the thesis can be found on page 215.

1
Chapter 1 Introduction

exhibit the potential to fill this gap, as they allow to exploit the rigorous, well-developed
theory of mathematical optimization, notably polyhedral combinatorics, to tackle the decoding
problem. The proposal of linear programs (LPs) for decoding both LDPC and turbo codes
by Feldman et al. [3–5] has established a whole new research area of applied mathematics,
subsumed under the term “LP decoding”, in which several groups have subsequently produced
improved formulations, extensions to more general classes of codes, theoretical analysis of
the LP decoder’s behavior and algorithms to efficiently solve the decoding LP—see Paper I
(Chapter 6) for a recent overview of the advances.
This thesis presents my contributions to the research area of LP decoding in form of both theory
and algorithms. It is organized in three parts. Part I thoroughly reviews the fundamentals of the
three topics involved: mathematical optimization, coding theory, and the basics of their link, LP
decoding. That part aims to be theoretically self-contained up to undergraduate mathematics;
as the presentation is nevertheless rather compact, suggestions to several textbooks that cover
the matter more elaborately are made in each chapter.
Part II of this thesis, which starts on page 49, contains seven peer-reviewed research papers
that make up my scientific contributions. Within the thesis, they are refered to as Paper I to
Paper VII and can be found in Chapter 6 to Chapter 12, ordered by their respective date of
appearance. As I have worked within different projects and together with several groups, the
topics of the papers are, to a certain extent, rather diverse, while still being connected via the
overall topic LP decoding.
Paper I constitutes a literature survey of the LP models and algorithms that have been proposed
until its appearance. Paper II is concerned with the comparison of binary and non-binary
LDPC codes under both optimal (ML) and suboptimal decoding. Instead of formulating the
ML decoding problem by the LP-relaxation of an integer program, Paper III fathoms the power
of the integer programming-approach to other coding-related problems, including several
heuristic decoders whose performance can be quickly evaluated using an IP. Papers IV and VII
are concerned with the special class of turbo codes. Since codewords of turbo codes are related
to paths in a graph, they are of particular interest from a combinatorial optimizer’s view. In
Paper IV, a novel mathematical algorithm is presented that efficiently solves a certain class
of optimization problems in which the turbo decoding problem falls. Paper VII on the other
hand comprises a thorough study of LP decoding of 3-dimensional turbo codes, a class of codes
that has been invented to overcome certain shortages of “traditional” turbo codes. The goal of
Paper V is to provide a first step towards an implementation of the LP decoder in specialized
hardware. To that end, the decoder’s complexity is analyzed from the hardware implementation
view, and a study of its performance under limited precision arithmetics is conducted. Finally,
Paper VI presents a true ML decoder that is based on a highly efficient branch-and-cut algorithm
which incorporates several decoding-specific methods for generating primal and dual bounds.
In Part III, the closing part of this document, conclusions are drawn and outlook for future
research is given. It also contains the bibliography for Parts I and III (note that each paper in
Part II has its own list of references) and the author’s academic CV (in German).

2
Part I

Foundations

3
In this first part of the thesis we present the background that is necessary for the main content
in Part II. Because our subject is the application of mathematical optimization to the field
of coding theory, the presentation splits naturally into the areas mathematical optimization
(Chapter 3), coding theory (Chapter 4) and, finally, a review of the previous contributions by
which these two areas have found common ground, in Chapter 5. The following chapters are not
solely meant as a preparation for the study of Part II of this thesis, but also as an introduction
in its own right to a particularly beautiful field of mathematics which is located on the rather
uncrowded boundary between information theory, discrete mathematics, and optimization.
While the scope of this text demands a certain compactness of presentation, and a thesis
introduction can impossibly replace a textbook or lecture, we have nevertheless made it our
aim that a reader who is familiar with mathematics in general but has no specific proficiency
in either optimization or coding theory will be able to comprehend the most important ideas
and concepts of the matter. Due to length restrictions, we unfortunatelly could not include
detailed examples of most of the presented concepts and results. Instead, those are accompanied,
whenever applicable, by sketchy figures that try to illustrate the underlying intuitions in a
non-rigorous manner.
The assumptions we make as to the reader’s mathematical background mainly consist of
undergraduate linear algebra. To a smaller extent, familiarity with graph theory and algorithmic
concepts will be helpful, and occasionally we will encounter probabilities. The notation and
nomenclature for these background topics are fixed in Chapter 2.
A concise notation throughout the entire thesis is unfortunately rendered impossible by its
cumulative nature: the papers in Part II, as they cover a relatively broad selection of topics and
are moreover written by diverse groups of authors, use different notational conventions, at
least in some aspects. For this introductory part, we have tried to extract the most common
notation wherever possible, so that for the attentive reader it should be easy to quickly adapt
to the specific notation that is introduced in each of the papers.

5
Chapter 2

Notation and Preliminaries

Notation
Within the next chapters, we use the symbols ℕ and ℤ for the natural (starting with 1) and
integral numbers, respectively, 𝔽2 for the binary field (which is sometimes called GF(2)), ℚ for
the rational numbers and ℝ for the reals. If 𝑥 ∈ ℝ, then ⌊𝑥⌋ and ⌈𝑥⌉ denote the largest integer
≤ 𝑥 and the smallest integer ≥ 𝑥, respectively.
For a set 𝑅 and 𝑛 ∈ ℕ, 𝑅u� denotes the 𝑛-dimensional vector space over 𝑅 if 𝑅 is a field, and the
set of 𝑛-tuples of elements of 𝑅 otherwise. In either case, the 𝑛-tuples are called vectors and
operations like addition or comparison, whenever applicable, are understood element-wise: if
𝑥 = (𝑥1 , … , 𝑥u� ) and 𝑦 = (𝑦1 , … , 𝑦u� ), then

𝑥 + 𝑦 = (𝑥1 + 𝑦1 , … , 𝑥u� + 𝑦u� )

and
𝑥 ≤ 𝑦 if and only if 𝑥u� ≤ 𝑦u� for 𝑖 = 1, … , 𝑛.

Matrices and Vectors


By 𝑋u�×u� we denote the set of matrices with 𝑚 rows and 𝑛 columns and entries from the set 𝑋.
For a matrix 𝐴, by 𝐴u�,u� we denote the element at the 𝑖-th row and 𝑗-th column of 𝐴, 𝐴u�,• and 𝐴•,u�
are the 𝑖-th row and 𝑗-th column, respectively, and the transpose of 𝐴 is the 𝑛 × 𝑚 matrix 𝐴u� with
entries 𝐴u�u�,u� = 𝐴u�,u� . Throughout the text we regard vectors 𝑥 ∈ 𝑋u� as column vectors, i.e., identify
them with 𝑘 × 1 matrices. We sometimes use (ordered) index sets instead of individual indexes:
for instance, if 𝑥 ∈ 𝑋u� and 𝐼 = (𝑖1 , … , 𝑖u� ), where {𝑖1 , … , 𝑖u� } ⊆ {1, … , 𝑛}, then 𝑥u� = (𝑥u�1 , … , 𝑥u�u� ).
The same notation is used for row and column indexing, respectively, of matrices. ⎛



1 2 3 ⎞



4 2 6
⎝ ⎠
When 𝐴 ∈ 𝑅u�×u� is an 𝑚 × 𝑛 matrix over a ring 𝑅, an elementary row operation on 𝐴 is one
of the following operations: (a) multiply a row of 𝐴 by a scalar: 𝐴u�,• ← 𝑠𝐴u�,• , or (b) replace ⎛ 1 2 3 ⎞

⎜ ⎟

a row by the weighted sum of itself and another row: 𝐴u�,• ← 𝑠𝐴u�,• + 𝑡𝐴u�,• . If 𝑅 is a field, any ⎜
2 1 3

⎝ ⎠
finite sequence of elementary row operations (taking the scalars from 𝑅) leaves the range
{𝐴𝑥 ∶ 𝑥 ∈ 𝑅u� } of 𝐴 unaltered. Such a sequence is called a Gaussian pivot on 𝐴u�,u� if it turns the ⎛ −3 0 −3 ⎞

⎜ ⎟

𝑗-th column of 𝐴 into the 𝑖-th unit vector, and Gaussian elimination if it changes a submatrix of ⎝
2 1 3

𝐴 into a triangular or diagonal matrix (the terms Gauss-Jordan pivot and, for diagonalization, example Gaussian pivot
Gauss-Jordan elimination are also used). operation

7
Chapter 2 Notation and Preliminaries

Graphs
A graph 𝐺 = (𝑉, 𝐸) is defined by a finite set 𝑉, called the nodes or vertices of 𝐺, and a set
𝐸 ⊆ 𝑉 × 𝑉 of edges or arcs of 𝐺. There are both undirected graphs, in which an edge (𝑢, 𝑣) ∈ 𝐸
is identified with the unordered set {𝑢, 𝑣}, and directed graphs, where (𝑢, 𝑣) is perceived as an
ordered pair. For a node 𝑣 ∈ 𝑉 of a directed graph, we define 𝛿+ (𝑣) = {(𝑣, 𝑢) ∶ (𝑣, 𝑢) ∈ 𝐸} and
𝛿− (𝑣) = {(𝑢, 𝑣) ∶ (𝑢, 𝑣) ∈ 𝐸} as the outbound and inbound edges, respectively, of 𝑣.
u�1 u�2
u�4 A finite sequence 𝑃 = (𝑣1 , … , 𝑣u� ) ∈ 𝑉u� is called a 𝑣1 –𝑣u� path, or simply path, of 𝐺, if (𝑣u� , 𝑣u�+1 ) ∈
u�1 u�3 u�4
𝐸 for 𝑖 = 1, … , 𝑘 − 1. Each path can alternatively be defined by a sequence of edges, 𝑃 =
u�3
u�2 (𝑒1 , … , 𝑒u�−1 ), where 𝑒u� = (𝑣u� , 𝑣u�+1 ) for 𝑖 = 1, … , 𝑘 − 1. The path 𝑃 is called a cycle if 𝑣1 = 𝑣u� . A
An acyclic directed graph is acyclic if no cycle exists.
graph with the path
u� = (u�1 , u�3 , u�4 ) (using For a graph 𝐺 = (𝑉, 𝐸) and 𝑉′ ⊆ 𝑉, the subgraph induced by 𝑉′ is the graph 𝐺′ = (𝑉′ , 𝐸′ )
vertices) or u� = (u�2 , u�4 )
(using edges) highlighted where 𝐸′ = {(𝑢, 𝑣) ∈ 𝐸 ∶ 𝑢 ∈ 𝑉′ and 𝑣 ∈ 𝑉′ } consists of all edges that connect two vertices
of 𝑉′ . Finally, a graph is called bipartite if there is a partition 𝑉 = 𝑉1 ∪𝑉 ̇ 2 such that 𝐸 ⊆
{𝑉1 × 𝑉2 } ∪ {𝑉2 × 𝑉1 }, i.e., no edge connects two vertices of the same set 𝑉u� , 𝑖 = 1, 2.

Complexity
The symbols P and NP are used to denote the complexity classes of problems that are solvable
(P) and verifiable (NP) in polynomial time, and a problem is called NP-hard if it is “at least as
hard” as every problem in NP, i.e., each problem in NP can be reduced to it in polynomial time.
The Landau notation 𝑓(𝑛) = 𝑂(𝑔(𝑛)) states that the asymptotic growth rate of 𝑓(𝑛) ∶ ℕ → ℝ is
upper bounded by 𝑔(𝑛), i.e., there exist 𝑀 > 0 and 𝑛0 ∈ ℕ such that 𝑓(𝑛) ≤ 𝑀𝑔(𝑛) for 𝑛 > 𝑛0 .
Note that a thorough understanding of complexity theory is not a prerequisite to reading this
text.

Probability
Since we only come across basic probability calculations and they are not central to our text,
we use simplified notation. Namely, we frequently denote both a random variable and its
outcomes by the same symbol, say 𝑥, and use 𝑃(𝑥) for the probability mass or density function
of 𝑥, whatever applies—the exact meaning of the symbols will always become clear from the
context. If 𝑦 is another random variable, 𝑃(𝑥 ∣ 𝑦0 ) is the conditional probability function of
𝑥 given the event {𝑦 ∈ 𝑦0 }. Similarly, 𝑃(𝑥0 ∣ 𝑦) is called the likelihood function of 𝑦, given
u�(u�0 ∣u�0 )u�(u�0 )
{𝑥 ∈ 𝑥0 }. Bayes’ theorem states that 𝑃(𝑥0 ∣ 𝑦0 ) = u�(u�0 ) for any two events 𝑥0 of 𝑥 and 𝑦0
of 𝑦, provided 𝑃(𝑦0 ) ≠ 0.

8
Chapter 3

Optimization Background

Mathematical optimization is a discipline of mathematics that is concerned with the solution of


problems arising from mathematical models that typically describe real-world problems, e.g. in
the areas of transportation, production planning, or organization processes. These models are
generally of the form

min 𝑓(𝑥)
subject to (s.t.) 𝑥 ∈ 𝑋,

where 𝑋 ⊆ ℝu� , for some 𝑛 ∈ ℕ, is the feasible set and 𝑓 ∶ 𝑋 → ℝ the objective function that
evaluates a feasible point 𝑥 ∈ 𝑋; 𝑓(𝑥) is called the objective value of 𝑥. An 𝑥∗ ∈ 𝑋 minimizing the
objective function is called an optimal solution, the corresponding value 𝑧∗ = 𝑓(𝑥∗ ) the optimal
objective value. If 𝑋 = ∅, the problem is said to be infeasible and we define 𝑧∗ = ∞ in that case.
If on the other hand 𝑓(𝑥) is unbounded from below among 𝑋, we define 𝑧∗ = −∞.
The theory of optimization further subdivides into several areas that depend on the structure of
both 𝑓 and 𝑋. Within this thesis we will encounter three major problem classes: linear programs
(LPs), integer linear programs (IPs), and combinatorial optimization problems, which are often
modeled by means of an IP.
A common ground in the analysis of these three types of problems is the polyhedral structure
of the feasible set. The part of polyhedral theory that is necessary for this text is reviewed in
Section 3.1. For an LP, the feasible set is a polyhedron that is given explicitly by means of a
defining set of linear inequalities, which makes these problems relatively easy to solve. LPs and
the simplex method, the most important algorithm to solve them, are covered in Section 3.2.
In contrast to LPs, the feasible region of an IP is given only implicitly as the set of integral
points within a polyhedron. Although the result exhibits again a polyhedral structure (as
long as finding an optimal solution is the concern), IPs are much harder to solve than LPs; in
fact, integer programming in general is an NP-hard optimization problem. Some theoretical
foundations and techniques to nonetheless tackle such problems are collected in Section 3.3.
Proofs, examples and detailed explanations are widely omitted in this chapter. For a very
exhaustive and detailed textbook on linear programming and the simplex method, see [6]. A
complete and rigorous yet challenging reference for linear and integer programming is the
book by Schrijver [7]. A well-written algorithm-centric introduction to linear, integer and also

9
Chapter 3 Optimization Background

nonlinear optimization can be found in [8]. Finally, for integer and combinatorial optimization
we refer to [9].

3.1 Polyhedral Theory

The mathematical objects that we talk about in this section live in the 𝑛-dimensional Euclidean
space ℝu� . Before we begin to discuss polyhedra and the theory around them, we briefly rush
through some basic concepts which are necessary for that task.

3.1.1 Convex Sets and Cones

Convexity is one of the most important concepts in mathematical optimization: intuitively


speaking, a geometric object is convex if the straight line between any two points of the object
u�1 lies completely inside of it.
u�1
u�2 u�2
3.1 Definition (convex and conic sets and hulls): A set 𝑋 ⊆ ℝu� is called convex if for any
a convex and a
nonconvex set 𝑥1 , 𝑥2 ∈ 𝑋 and 𝜆 ∈ [0, 1] also 𝜆𝑥1 + (1 − 𝜆)𝑥2 ∈ 𝑋.

A convex combination of 𝑋 is a sum of the form

∑ 𝜆u� 𝑥 ∶ 𝜆u� ≥ 0 for all 𝑥 ∈ 𝑋 and ∑ 𝜆u� = 1, (3.1)


u�∈u� u�∈u�
convex hull of six points

where in the case of an infinite 𝑋 we assume that almost all 𝜆u� = 0 so that the above expression
makes sense. The convex hull of 𝑋 is the smallest convex set containing 𝑋 or, alternatively, the
set of all convex combinations of elements of 𝑋. C

An important class of convex sets is the one of convex cones.

3.2 Definition (convex cones): 𝑋 ⊆ ℝu� is called a (convex) cone if for any 𝑥1 , 𝑥2 ∈ 𝑋 and
𝜆1 , 𝜆2 ≥ 0 also 𝜆1 𝑥1 + 𝜆2 𝑥2 ∈ 𝑋. A conic combination is defined like (3.1) but without the
conic hull of four points condition ∑u�∈u� 𝜆u� = 1. Analogously to the above, the conic hull conic(𝑋) is the smallest cone
containing 𝑋, or the set of all conic combinations of elements of 𝑋. C

Geometrically, the conic hull is the largest set that “looks the same” as conv(𝑋), from the
perspective of an observer that sits at the origin.

10
3.1 Polyhedral Theory

3.1.2 Affine Spaces

Recall the notion of linear independence: a set of points 𝑋 = {𝑥1 , … , 𝑥u� } ⊆ ℝu� is linearly
independent if no element 𝑥 of 𝑋 is contained in the linear subspace that is spanned by the
remainder 𝑋 ⧵ {𝑥} or, equivalently, if span(𝑋′ ) ≠ span(𝑋) for all 𝑋′ ⊊ 𝑋. Here, the span or linear
hull of a set 𝑋 ⊆ ℝu� means the smallest linear subspace containing 𝑋:

span(𝑋) = ⋂ {𝒱 ∶ 𝒱 is a subspace of ℝu� and 𝑋 ⊆ 𝒱} .


u�3
The linear hull has an alternative algebraic characterization by means of linear combinations, u�1

u� u�2 })
({u� u� 2})
1
span(𝑋) = {∑ 𝜆u� 𝑥u� ∶ 𝜆u� ∈ ℝ for all 𝑖 ∈ {1, … , 𝑘}} , (3.2) an
sp pan(
{
u�=1 s
=
which gives rise to the well-known algebraic formulation of linear independence: 𝑋 is linearly u�1 and u�2 are linearly
independent if and only if the system dependent, but both are
pairwise linearly indepen-
dent with u�3
u�
∑ 𝜆u� 𝑥u� = 0 (3.3)
u�=1

has the unique solution 𝜆1 = ⋯ = 𝜆u� = 0. To see this, note that if there was some 𝑗 ∈ {1, … , 𝑘}
such that 0 ≠ 𝑥u� ∈ span(𝑋 ⧵ {𝑥u� }), then (3.2) gave rise to a non-zero solution of (3.3).

For the study of polyhedra, we need the concepts of affine hulls and independence, respectively,
which is centered around the notion of an affine subspace in a very similar fashion as for the
linear case above.

3.3 Definition (affine spaces, hulls, and independence): A set 𝒜 ⊆ ℝu� is an affine sub-
space of ℝu� if there exists a linear subspace 𝒱 ⊆ ℝu� and some 𝑏 ∈ ℝu� such that

𝒜 = 𝑏 + 𝒱 = {𝑏 + 𝑣 ∶ 𝑣 ∈ 𝒱} . (3.4)
u�1 𝒜
The affine hull aff(𝑋) of a set 𝑋 = {𝑥1 , … , 𝑥u� } ⊆ ℝu� is the smallest affine subspace containing u�2 u�
𝑋. The set 𝑋 is called affinely independent if no 𝑥 ∈ 𝑋 fulfills 𝑥 ∈ aff (𝑋 ⧵ {𝑥}) or, equivalently,
if aff(𝑋) ≠ aff(𝑋′ ) for all 𝑋′ ⊊ 𝑋. 𝒱
affine space 𝒜 = u� + 𝒱 of
Finally, the dimension dim(𝒜) of 𝒜 in (3.4) is defined to be the dimension of 𝒱. dimension 1; u�1 , u�2 ∈ 𝒜
C
are affinely independent

An affine subspace can thus be envisioned as a linear space that has been translated by a vector.
This principle is reflected by the algebraic characterization of the affine hull of 𝑋 = {𝑥1 , … , 𝑥u� } ⊆
ℝu� by means of affine combinations: first, move to an arbitrary vector of 𝑋 (without loss of
generality, let this be 𝑥1 ), then add any linear combination of the directions from 𝑥1 to the other
𝑘 − 1 elements of 𝑋:
u�
aff(𝑋) = {𝑥1 + ∑ 𝜆u� (𝑥u� − 𝑥1 ) ∶ 𝜆u� ∈ ℝ for all 𝑖 ∈ {2, … , 𝑘}} ;
u�=2

11
Chapter 3 Optimization Background

u�
by defining 𝜆1 = 1 − ∑u�=2 𝜆u� one can easily derive the equivalent definition
u� u�
aff(𝑋) = {∑ 𝜆u� 𝑥u� ∶ 𝜆u� ∈ ℝ for all 𝑖 ∈ {1, … , 𝑘} and ∑ 𝜆u� = 1} .
u�=1 u�=1

An algebraic definition of affine dependence can be derived in exactly the same way as for
linear dependence: 𝑥u� ∈ aff (𝑋 ⧵ {𝑥u� }) if and only if one can write 𝑥u� as

𝑥u� = 𝑥u� + ∑ 𝜆u� (𝑥u� − 𝑥u� ),


u�≠u�,u�

where 𝑙 ≠ 𝑗, which is in turn equivalent to the fact that


u�
𝑥
∑ 𝜆u� ( u� ) = 0 (3.5)
u�=1
1

has a solution with 𝜆u� ≠ 0 (to see this, let 𝜆u� = 1 − ∑u�≠u�,u� 𝜆u� and 𝜆u� = −1). Hence, 𝑋 is affinely
independent if and only if (3.5) has the unique solution 𝜆1 = ⋯ = 𝜆u� = 0.

3.1.3 Polyhedra

A polyhedron is, intuitively speaking, a closed convex body whose surface decomposes into
“flat” pieces. Mathematically, this flatness is grasped by the concept of halfspaces.

3.4 Definition (polyhedra and polytopes): Let 𝑛 ∈ ℕ. A subset 𝐻 ⊆ ℝu� is called a hyper-
u� plane of ℝu� if there exist 𝑎 ∈ ℝu� , 𝑎 ≠ 0 and 𝛽 ∈ ℝ such that
u�
.
𝐻 = {𝑥 ∈ ℝu� ∶ 𝑎u� 𝑥 = 𝛽} .

Likewise, a (closed) halfspace of ℝu� is a set of the form
a hyperplane u� and cor-
responding halfspace ℋ ℋ = {𝑥 ∈ ℝu� ∶ 𝑎u� 𝑥 ≤ 𝛽} ,

where again 0 ≠ 𝑎 ∈ ℝu� and 𝛽 ∈ ℝ. One says in the above situation that the hyperplane or
halfspace, respectively, is induced by the pair (𝑎, 𝛽). The intersection of finitely many halfspaces
is called a polyhedron. A polytope is a polyhedron that is bounded.
The dimension dim(𝒫) of a polyhedron 𝒫 is defined to be the dimension of aff(𝒫), if 𝒫 ≠ ∅,
𝒫
and −1 otherwise. In both cases dim(𝒫) is one less than the maximum number of affinely
a independent vectors in 𝒫. C
2-dimensional polytope
(the affine hull is ℝ2 )
defined by five halfspaces Polyhedra are the fundamental structure of linear and integer linear optimization. From the
above definition, it follows that for a polyhedron 𝒫 there is a matrix 𝐴 ∈ ℝu�×u� and a vector
𝑏 ∈ ℝu� , for some 𝑚 ∈ ℕ, such that

𝒫 = 𝒫(𝐴, 𝑏) = {𝑥 ∈ ℝu� ∶ 𝐴𝑥 ≤ 𝑏} ,

12
3.1 Polyhedral Theory

i.e., 𝒫 is the solution set of a system of linear inequalities, each of which defines a halfspace.
Note that polyhedra, being intersections of convex sets, are convex themselves. Complementary
to the above implicit definition as the solution set of a system 𝐴𝑥 ≤ 𝑏, every polyhedron
admits, by a theorem of Minkowski, an explicit characterization by means of convex and conic
combinations. u�1
u�2
3.5 Theorem (Minkowski): The set 𝒫 ⊆ ℝu� is a polyhedron if and only if there are finite sets
𝑉, 𝑊 ⊆ ℝu� such that 𝒫 = conv(𝑉) + conic(𝑊). C
u�5
u�3
u�4
If 𝒫 in Theorem 3.5 is a polytope, then, since every nonempty cone is unbounded, 𝑊 = ∅ must the same polytope as the
hold; hence, every polytope is the convex hull of its so-called vertices or extreme points, which convex hull of its five
vertices u�1 , … , u�5
are the “corners” of the polytope as shown in the picture on the margin. Vertices are a special
instance of faces of a polyhedron, which are defined next.
3.6 Definition: Let 𝒫 ⊆ ℝu� be a polyhedron. An inequality of the form
𝑎u� 𝑥 ≤ 𝛽 (3.6)
with 𝑎 ∈ ℝu� and 𝛽 ∈ ℝ is said to be valid for 𝒫 if it is satisfied for all 𝑥 ∈ 𝒫. In that event,
the set
𝐹u�,u� = {𝑥 ∈ 𝒫 ∶ 𝑎u� 𝑥 = 𝛽}
constitutes a face of 𝒫, namely the face induced by (3.6). A zero-dimensional face is called a
vertex of 𝒫, one-dimensional faces are edges of 𝒫. A face 𝐹 for which dim(𝐹) = dim(𝒫) − 1
is called a facet of 𝒫. C

Note that a face of a polyhedron is a polyhedron itself, as it is obtained by adding the inequalities
𝛼u� 𝑥 ≤ 𝛽 and 𝛼u� 𝑥 ≥ 𝛽 to the system 𝐴𝑥 ≤ 𝑏. For a representation 𝒫(𝐴, 𝑏) of the polytope 𝒫,
u�2
each face has another characterization.
3.7 Lemma: Let 𝒫 = 𝒫(𝐴, 𝑏) with 𝐴 ∈ ℝu�×u� . For 𝐸 ⊆ {1, … , 𝑚}, the set u�1

𝐹u� = {𝑥 ∈ 𝒫 ∶ 𝐴u�,• 𝑥 = 𝑏u� }


two faces induced by
is a face of 𝒫. If 𝐹u� ≠ ∅, its dimension is 𝑛 − rank(𝐴eq(u�u�),• ), where eq(𝐹u� ) is the equality set of valid inequalities with
𝐹u� defined by eq(𝐹u� ) = {𝑖 ∶ 𝐴u�,• 𝑥 = 𝑏 for all 𝑥 ∈ 𝐹u� }. dim(u�1 ) = 0 and
C dim(u�2 ) = 1

Facets are of special importance because they are necessary and sufficient to describe a polyhe- u�2

dron: u�1
3.8 Theorem: Let 𝒫 = 𝒫(𝐴, 𝑏) be a polyhedron, and assume that no inequality in 𝐴𝑥 ≤ 𝑏 is
redundant, i.e., could be removed without altering 𝒫. Let 𝐼 ∪ 𝐽 denote the partition of row indices the same two faces with
defined by 𝐼 = {𝑖 ∶ 𝐴u�,• 𝑥 = 𝑏u� for all 𝑥 ∈ 𝒫}. Then, the inequalities in 𝐴u�,• 𝑥 ≤ 𝑏u� are in one-to-one their defining equality sets
highlighted
correspondence (via Lemma 3.7) to the facets of 𝒫. C

To describe a polyhedron 𝒫, we thus need only 𝑛 − dim(𝒫) equations plus as many inequalities
as 𝒫 has facets. Any inequality that induces neither a facet nor the whole polytope 𝒫 can be
dropped without changing the feasible set, and every system 𝐴𝑥̃ ≤ 𝑏̃ describing 𝒫 needs to
include at least one facet-inducing inequality for every facet of 𝒫.

13
Chapter 3 Optimization Background

3.2 Linear Programming

A linear program (LP) is an optimization problem that asks for the minimization of a linear
functional over a polyhedron. The most simple form would be

min 𝑐u� 𝑥 (3.7a)


s.t. 𝐴𝑥 ≤ 𝑏, (3.7b)

while an LP is said to be in standard form if it is stated as follows:

min 𝑐u� 𝑥 (3.8a)


s.t. 𝐴𝑥 = 𝑏 (3.8b)
𝑥 ≥ 0. (3.8c)

In both cases, 𝑥 ∈ ℝu� , 𝐴 ∈ ℝu�×u� and 𝑏 ∈ ℝu� are given, and we denote the feasible set by
the letter 𝒫 (for (3.7), 𝒫 = 𝒫(𝐴, 𝑏) in the notation of Section 3.1). Note that both forms are
equivalent in the sense that each can be transformed into the other. For example, given an LP in
polyhedral form, we can replace 𝑥 by variables 𝑥+ ∈ ℝu� and 𝑥− ∈ ℝu� , representing the positive
and negative part of 𝑥, respectively, and introduce auxiliary variables 𝑠 ∈ ℝu� to rewrite (3.7)
in standard form as

min 𝑐u� 𝑥+ − 𝑐u� 𝑥−


s.t. 𝐴𝑥+ − 𝐴𝑥− + 𝑠 = 𝑏
𝑥+ ≥ 0, 𝑥− ≥ 0, 𝑠 ≥ 0.

Moreover, if we had a maximization problem with objective max 𝑐u� 𝑥, it could be converted to
the above forms by the relation

max {𝑐u� 𝑥 ∶ 𝑥 ∈ 𝒫} = − min {−𝑐u� 𝑥 ∶ 𝑥 ∈ 𝒫} .

In view of this equivalence of different LP forms, we will in the following assume whatever
form allows for a clear presentation.
Note that if (3.7) has an optimal solution 𝑥∗ with objective value 𝑧∗ = 𝑐u� 𝑥∗ , we can represent
the set of optimal solutions by {𝑥 ∶ 𝐴𝑥 ≤ 𝑏 and 𝑐u� 𝑥 = 𝑧∗ }. This shows that the optimal set is
always a face of 𝒫. In addition, it is easy to show that if 𝒫 has any vertex (which is always the
case if the LP is in standard form), then every nonempty face of 𝒫 contains a vertex. Hence we
can conclude:
u�∗ u�

3.9 Observation: If an LP in standard form has a finite optimal objective value, there is always
a vertex of 𝒫 which is an optimal solution of the LP. C
example objective u� such
that u�u� u� is minimized
for u�∗ ; the dotted lines
show the hyperplanes The rest of this section is about characterizing and algorithmically finding such an optimal
u�u� u� = 0 and u�u� u� = u�u� u�∗ vertex.

14
3.2 Linear Programming

3.2.1 Duality

One of the most important concepts in linear programming is that of duality, meaning that
certain structures always occur in closely related pairs. In optimization, those structures include
convex cones and systems of linear (in)equalities. For our purposes, the most important result
is that of LP duality, a close relation of two LPs, as reviewed below.

3.10 Definition: Let an LP in standard form (3.8) be given; we call this the primal problem.
The associated LP

max 𝑏u� 𝑦 (3.9a)


s.t. 𝐴u� 𝑦 ≤𝑐 (3.9b)

is called the linear-programming dual of (3.8). C

Since we have seen that any LP can be transformed into standard form, one can also compute a
dual for every LP. In particular, it is easy to verify that the dual of the dual results in the primal
again. The motivation for LP duality lies in the following fundamental theorem.

3.11 Theorem (strong duality): Assume that either (3.8) or (3.9) are feasible. Then

min {𝑐u� 𝑥 ∶ 𝐴𝑥 = 𝑏, 𝑥 ≥ 0} = max {𝑏u� 𝑦 ∶ 𝐴u� 𝑦 ≤ 𝑐} ,

where we include the values ±∞ as described on page 9. If both are feasible, then both have an
optimal solution. C

u�u� u� / u�u� u�
Note that Theorem 3.11 implies the statement of weak duality, namely that whenever 𝑥 is
feasible for the primal and 𝑦 is feasible for the dual, then 𝑐u� 𝑥 ≥ 𝑏u� 𝑦. primal feasible
solutions
LP duality is extremely useful because it allows for very compact proofs of optimality: if one
wants to show that a certain solution 𝑥∗ of the primal LP is optimal, it suffices to provide a dual u�u� u�∗ = u�u� u�∗
feasible 𝑦∗ with the property that 𝑐u� 𝑥∗ = 𝑏u� 𝑦∗ . Such a 𝑦∗ is called a witness for the optimality
of 𝑥∗ . dual feasible
solutions

3.2.2 Primal and Dual Basic Solutions

In this section, we show how to represent a vertex of 𝒫 by means of a basis. It is assumed that
an LP is given in standard form (3.8) and that 𝐴 has full row rank 𝑚.
By Lemma 3.7, any vertex 𝑥 ̄ of 𝒫, which is a 0-dimensional face, can be characterized by a
subset of the constraints of (3.8) that is fulfilled with equality and has rank 𝑛. Since (3.8b) has
rank 𝑚 by assumption, 𝑥 ̄ is a vertex if and only if there is an index set 𝑁 with |𝑁| = 𝑛 − 𝑚 such
that 𝑥 ̄ is the unique solution of the system

𝐴𝑥 = 𝑏, 𝑥u� = 0. (3.10)

15
Chapter 3 Optimization Background

For 𝑖 ∈ 𝑁 we can thus disregard the corresponding 𝑖-th column of the system 𝐴𝑥 = 𝑏. Hence,
we can represent 𝑥 ̄ by 𝑚 linearly independent columns of 𝐴. Such a submatrix of 𝐴 is called a
(simplex) basis and the corresponding set of column indices is denoted by 𝐵 = {1, … , 𝑛} ⧵ 𝑁.
The variables 𝑥u� are called the basic variables, 𝑥u� are the non-basic variables. By (3.10), every
Example LP: vertex 𝑥 ̄ is a basic solution for (3.8b), i.e.,
u�
( 12 1) ( 1 ) = 1
u�2
𝑥u�̄ = 𝐴−1
•,u� 𝑏 and 𝑥u�̄ = 0 (3.11)
u�1 , u�2 ≥ 0

For u� = {1}, u�−1


•,u� u� =
for a basis 𝐵 of 𝐴. Conversely, an arbitrary basic solution of the form (3.11) is a vertex of 𝒫
2 · 1 = 2, so u� ̄ = only if additionally 𝑥 ̄ ≥ 0, i.e., it is feasible for (3.8), then called a basic feasible solution (BFS) of
(u�u� ̄ ) = (2, 0) is
̄ , u�u�
the corresponding BFS. the LP. Concludingly, 𝑥 ̄ being a BFS is necessary and sufficient for 𝑥 ̄ being a vertex of 𝒫, while
1 u� = {2} it should be noted that, in general, more than one BFS may correspond to the same vertex.

u� = {1} Let 𝐵 be a basis of 𝐴 and denote the feasible region of the dual (3.9) by 𝒟. Arguing similarly as
2
above, one can show that a vertex 𝑦 ̄ of 𝒟 must fulfill 𝐴u�•,u� 𝑦u�̄ = 𝑐u� (read 𝐴u�•,u� as (𝐴•,u� )u� ) and, in
order to be feasible, also
𝑐 − 𝐴u� 𝑦 ≥ 0 (3.12)
needs to hold. Hence the vector 𝑦 ̄ defined by 𝑦u�̄ = 𝑐u�u� 𝐴−1 •,u� is called the dual basic solution
associated to 𝑥 ̄ defined in (3.11); it is a dual BFS if 𝑦 ̄ ∈ 𝒟, i.e., if (3.12) holds.

3.2.3 The Simplex Method

If 𝑥 ̄ and 𝑦 ̄ are an associated pair of primal and dual basic solutions for a basis 𝐵 of 𝐴, it holds
that
𝑐u� 𝑥 ̄ = 𝑐u�u� 𝑥u�̄ + 𝑐u�u� 𝑥u�̄ = 0 + 𝑐u�u� 𝐴−1 u�
•,u� 𝑏 = 𝑏 𝑦,̄

i.e., the objective values of 𝑥 ̄ and 𝑦 ̄ for the primal and dual LP, respectively, coincide. In view
of Observation 3.9 and Theorem 3.11 this shows that solving an LP is tantamount to finding a
basis 𝐵 for which the associated primal and dual basic solutions 𝑥 ̄ and 𝑦 ̄ are both feasible (𝑦 ̄
then is a witness of the optimality of 𝑥).̄ The several variants of the simplex method comprise
algorithms that determine such a basis by a sequence of basis exchange operations in each of
which a single element of 𝐵 is exchanged.
To be more specific, denoting the objective value by 𝑧 = 𝑐u� 𝑥, by simple calculations starting
from the form 𝐴•,u� 𝑥u� + 𝐴•,u� 𝑥u� = 𝑏 of (3.8b) we obtain the following representation

𝑧 = 𝑐u�u� 𝐴−1 u� u� −1 u�̄ 𝑥


•,u� 𝑏 + (𝑐u� − 𝑐u� 𝐴•,u� 𝐴•,u� )𝑥u� = 𝑧 ̄ + 𝑐u� u�
−1 −1 ̄ ̄ 𝑥u� (3.13)
𝑥u� = 𝐴•,u� 𝑏 − 𝐴•,u� 𝐴•,u� 𝑥u� = 𝑏 − 𝐴u�

of 𝑧 and 𝑥u� with respect to 𝐵 in dependence of the values of 𝑥u� , where in the second step we
have introduced suitable abbreviations 𝑧,̄ 𝑏,̄ 𝑐u�̄ and 𝐴u� ̄ . In this form, we can immediately read
̄
off the values 𝑏 of the basic variables and the objective value 𝑧 ̄ for the current basic solution that
is defined by 𝑥u� = 0. The vector 𝑐u�u�̄ = (𝑐u�u� − 𝑦u�̄ 𝐴u� ) encodes the dual feasibility (3.12) of that
basis. Consequently 𝐵 must be an optimal basis if both 𝑏̄ ≥ 0 (primal feasibility) and 𝑐u�̄ ≥ 0
(dual feasibility) hold in (3.13).

16
3.2 Linear Programming

Otherwise, we can perform a simplex step: assume that the (𝑖, 𝑘)-th entry of 𝐴u� ̄ is non-zero.
It can be shown that by performing a Gaussian pivot on that entry, i.e., turning the relevant
column of (3.13) into a unit vector by elementary row operations, one essentially computes a
representation of the form (3.13) with respect to the adjacent basis 𝐵′ = 𝐵 ⧵ {𝑖} ∪ {𝑗}, where
𝑗 ∈ 𝑁 is the 𝑘-th entry of 𝑁. This notion of adjacency translates to the geometric interpretation,
since vertices corresponding to adjacent basises always share an edge of the polyhedron.
u�1
u�2
The primal simplex algorithm starts with a primal BFS (consult the literature for a method called
u�
phase 1 to find such an initial BFS) and then iteratively performs the following steps: u�5
u�3
u�4
(1) Choose a column (variable entering the basis) for which 𝑐u�̄ is negative, i.e., the corre-
example run of the primal
sponding entry of 𝑦 ̄ not yet dually feasible. This ensures that 𝑧 is nonincreasing, and it simplex, starting in u�2
usually decreases. and performing three
basis exchanges until the
optimal u�5 is reached;
(2) Choose a row (variable leaving the basis) in such a way as to ensure that the subsequent u�u� u�u� is decreasing on the
simplex step maintains primal feasibility; this can be achieved by a simple test called path

min-ratio rule.

(3) Perform the simplex step by pivoting on the column and row selected above.

The corresponding sequence of objective function values is nonincreasing. Under simple


conditions on the method of selecting indices, one can show that this procedure results in an
optimal basis, indicated by 𝑐u�̄ ≥ 0, after a finite number of steps.

The dual simplex algorithm, as the name suggests, sets off from a dual BFS (𝑐u�̄ ≥ 0) and then
does essentially the same as its primal counterpart (with the role of rows and columns of (3.13)
swapped during the basis exchange), maintaining dual feasibility and a nondecreasing objective
function until primal feasibility (𝑏̄ ≥ 0) is established.

Numerous variants and optimizations of the basic method described above exist. An important
one is the so-called revised simplex which is based on the observation that, especially for
𝑛 ≫ 𝑚, it is wasteful to pivot the complete system (3.13) in each step. Instead, one maintains
a representation of 𝐴−1
•,u� (usually in the form of an 𝐿𝑈 factorization), which can be shown to
be sufficient to carry out an iteration of the algorithm. Furthermore, it should be noted that
there exists an efficient method to incorporate upper bounds on the variables, e.g. of the form
0 ≤ 𝑥 ≤ 1, without having to increase the size of the formulation by the explicit addition of
constraints 𝑥 ≤ 1. Paper V compares several variants of the simplex algorithm when applied to
LP decoding as introduced in Section 5.2.

It has been shown that the worst-case complexity of the simplex algorithm is exponential in
the problem size [10]. The very contrary empirical observation however is that the number of
pivots before optimality is usually in 𝑂(𝑚). This explains why, although LP solving algorithms
with polynomial worst-case complexity exist, the simplex method is still the most prevalent
one in practice.

17
Chapter 3 Optimization Background

3.3 Integer Programming

In integer programming, we are concerned with LPs augmented by the additional requirement
that the solution be integral. Formally, we define an integer linear program (IP) as an optimization
problem of the form

min 𝑐u� 𝑥 (3.14a)


s.t. 𝐴𝑥 ≤ 𝑏 (3.14b)
feasible set 𝒫 ∩ ℤ2 and
LP relaxation of an IP 𝑥∈ ℤu� , (3.14c)

where we assume that all entries of 𝐴, 𝑏 and 𝑐 are rational. The LP that results when replacing
(3.14c) by 𝑥 ∈ ℝu� is called its LP relaxation. Let 𝒫 = 𝒫(𝐴, 𝑏) as before denote the feasible
set of the LP relaxation and 𝒫I = conv (𝒫(𝐴, 𝑏) ∩ ℤu� ) the convex hull of integer points in 𝒫.
Under the above assumption, it can be shown that 𝒫I is a polyhedron. Note that solving (3.14)
is essentially equivalent to solving min {𝑐u� 𝑥 ∶ 𝑥 ∈ 𝒫I }. Thus an IP can, in principle, be solved
by an LP: if 𝐴′ and 𝑏′ are such that 𝒫I = 𝒫(𝐴′ , 𝑏′ ), then the optimal IP solution is also optimal
for the LP

min 𝑐u� 𝑥
s.t. 𝐴′ 𝑥 ≤ 𝑏′
𝑥 ∈ ℝu� .

The problem is that, in general, it is hard to derive a description of 𝒫I from 𝒫. In fact, IPs are
u�∗IP NP-hard to solve in general, whereas we have seen above that linear programming is contained
u�
in P.
Two complementary approaches for solving IPs are important for this thesis: one is to tighten
equivalent LP polytope
𝒫I = conv(𝒫 ∩ ℤ2 ) and the LP relaxation (3.14) by adding cuts, i.e., inequalities that are valid for 𝒫I but not for 𝒫.
optimal IP solution u�∗IP The other, named branch-&-bound, is about recursively dividing the feasible space 𝒫 ∩ ℤu�
into smaller subproblems among which the optimal solution is searched, interleaved with the
generation of bounds that allow to skip most of these subproblems.

3.3.1 Cutting Planes

Assume that we try to solve the IP (3.14) by solving its LP relaxation, i.e., minimize 𝑐u� 𝑥 over 𝒫
instead of 𝒫I . If the LP solution 𝑥1 happens to be integral, it is clear that 𝑥1 is also optimal for
(3.14). Otherwise, by Theorem 3.8 there must exist at least one inequality that is valid for 𝒫I
u�1
u�2 but not for 𝑥1 . Any such inequality is called a cutting plane or simply cut for 𝑥1 . If we add a cut
to the LP relaxation (3.14) and solve the LP again, we necessarily get a new solution 𝑥2 ≠ 𝑥1
example cutting plane (because the cut is violated by 𝑥1 ). Since the feasible space was reduced, 𝑐u� 𝑥2 ≥ 𝑐u� 𝑥1 must
that cuts the non-integral
initial LP solution u�1 but hold.
is valid for 𝒫I ; it implies
a new LP solution u�2 with This method can be iterated as long as new cuts for the current solution 𝑥u� can be generated,
improved objective value leading to a sequence of LP solutions (𝑥u� )u� such that (𝑐u� 𝑥u� )u� is monotonically increasing. If the

18
3.3 Integer Programming

cuts are “good enough” (especially if they include the facets of 𝒫I ), some 𝑥u� will eventually
be feasible for 𝒫I and hence equal the IP solution 𝑥∗ . While there exists a cut-generation
algorithm, called Gomory-Chvátal method, that provably terminates in 𝑥∗ after a finite number
of steps, the number of cuts it usually introduces is prohibitive for practical applications. For
many classes of IPs, however, there exist special methods to derive cuts that are based on the
specific structure of the problem—we will encounter such a case in Section 5.3.

3.3.2 Branch-&-Bound

Let us again assume that we solve the LP relaxation of (3.14) and obtain a non-integral LP
solution 𝑥∅ ∈ 𝒫 ⧵ ℤu� with, say, 𝑥∅u� ∉ ℤ. The key idea of LP-based branch-&-bound is to define
two disjoint subproblems 𝑆0 and 𝑆1 of the original problem 𝑆∅ = (3.14) with the property
that the optimal IP solution 𝑥∗ of 𝑆∅ is contained in either 𝑆0 or 𝑆1 . To be specific, note that
necessarily 𝑥∗u� ∈ ℤ, so either 𝑥∗u� ≤ ⌊𝑥∅u� ⌋ or 𝑥∗u� ≥ ⌈𝑥∅u� ⌉ needs to hold, which gives rise to the two
subproblems u�0
u�0
u�
min 𝑐u� 𝑥 min 𝑐u� 𝑥 u�u�
s.t. 𝐴𝑥 ≤ 𝑏 s.t. 𝐴𝑥 ≤ 𝑏 u�u�1
u�1
𝑆0 ∶ and 𝑆1 ∶
𝑥u� ≤ ⌊𝑥∅u� ⌋ 𝑥u� ≥ ⌈𝑥∅u� ⌉ all integral points in 𝒫
are contained in either
𝑥 ∈ ℤu� 𝑥 ∈ ℤu� . u�0 = {u� ∈ 𝒫 ∶ u�2 ≥ 1} or
u�1 = {u� ∈ 𝒫 ∶ u�2 ≤ 0}, the
For both, we can again solve the LP relaxation (with the additional constraint on 𝑥u� ) to obtain LP two solutions u�u�0 and u�u�1
have larger objective value
solutions 𝑥0 and 𝑥1 with objective function values 𝑧0 and 𝑧1 , respectively. If both solutions are than u�u�
integral, clearly the one with smaller objective function value is optimal for (3.14). Otherwise,
the process can be recursed to split a subproblem into two sub-subproblems (e.g., 𝑆00 and 𝑆01 ),
and so forth, creating a binary tree of “problem nodes” whose leaves accord to problems that
either have an integral LP solution or are infeasible—in both cases, no further subdivision is
necessary.
u�∅
While this technique already reduces the search space in a substantial way, using advanced
bounding allows to further reduce the size of the abovementioned tree. Note that any feasible
u�0 u�1
solution 𝑥 ̂ ∈ 𝒫 ∩ ℤu� gives an upper bound on the optimal objective value 𝑧∗ = 𝑐u� 𝑥∗ , i.e.,
𝑐u� 𝑥 ̂ ≥ 𝑐u� 𝑥∗ . Furthermore, in any subproblem 𝑆, the objective value 𝑧u� of the optimal solution
u�10 u�11
𝑥u� to the corresponding LP relaxation is a lower bound on the optimal integral solution value 𝑧u�̂
of that subproblem. If we now solve the LP relaxation of some subproblem 𝑆 obtaining 𝑧u� and it
u�100 u�101
holds that 𝑧u� ≥ 𝑐u� 𝑥 ̂ for any feasible 𝑥 ̂ that has been found earlier, there is no need to subdivide
𝑆 (even if 𝑥u� is not integral): no feasible point of 𝑆 can improve upon the objective value of 𝑥.̂ an example b&b enumera-
tion tree; the leaves are not
further explored because
If we otherwise split 𝑆 into subproblems 𝑆′ and 𝑆″ and solve the respective LP-relaxations to the respective LPs are ei-
′ ″ ′ ″
obtain 𝑧u� and 𝑧u� , we can conclude that min{𝑧u� , 𝑧u� } ≤ 𝑧u�̂ = 𝑐u� 𝑥u�̂ , i.e., the smaller of both is a ther infeasible or have
lower bound on 𝑧u�̂ , because the optimal integral solution 𝑥u� for 𝑆 must be in either 𝑆′ or 𝑆″ . If integral solutions

this minimum is larger than 𝑧u� , we can improve the lower bound for 𝑆. This bound update can
possibly be propagated to the parent of 𝑆 if 𝑆 has a sibling for which a lower bound has already
been computed, and so forth.

19
Chapter 3 Optimization Background

The algorithm terminates if there are either no unexplored subproblems left, or if a feasible
solution 𝑥 ̂ for (3.14) and a lower bound 𝑧∅ on the optimal objective value 𝑧∗ has been found
such that 𝑥 ̂ = 𝑧∅ : it is then clear that no better solution than 𝑥 ̂ exists, so 𝑥 ̂ must be optimal.

In practice, the cutting-plane and branch-&-bound approach are often interleaved within a
branch-&-cut algorithm, where cutting planes might be inserted in each node of the branch-&-
bound tree. The algorithm presented in Paper VI is based on such a branch-&-cut strategy.

3.4 Combinatorial Optimization

An optimization problem is called combinatorial if the feasible set comprises all subsets of a
finite ground set Ξ = {𝜉1 , … , 𝜉u� } that fulfill a certain property, and the objective value of an
𝑆 ⊆ Ξ has the form 𝑓(𝑆) = ∑u�∈u� 𝑐u� for given cost values 𝑐u� ∈ ℝ associated to each element
𝜉 ∈ Ξ.

A popular example is the shortest-path problem: given a graph 𝐺 = (𝑉, 𝐸), two vertices 𝑠, 𝑡 ∈ 𝑉
and cost 𝑐u� associated to each edge 𝑒 ∈ 𝐸, it asks for an 𝑠–𝑡 path 𝑃∗ (in fact, only paths in which
each edge occurs at most once are allowed) with minimum total cost 𝑓(𝑃∗ ), where the objective
u�1 u�4 u�5 function 𝑓(𝑃) = ∑u�∈u� 𝑐u� accumulates, for each edge 𝑒 visited by the path 𝑃, the cost value 𝑐u�
u� u�2 u�6 u� assigned to 𝑒. Here, the ground set consists of the set of edges Ξ = 𝐸, and a subset of edges is
u�3 u�7 feasible if it forms (in appropriate ordering) a path in 𝐺.

Ξ = {u�1 , … , u�7 }; feasible In a combinatorial optimization problem, one can identify each subset 𝑆 ⊆ Ξ by its characteristic
are {u�1 , u�5 }, {u�1 , u�4 , u�7 },
{u�2 , u�6 }, and {u�3 , u�7 } (or incidence) vector 𝑥u� ∈ {0, 1}u� , where


{1 if 𝜉u� ∈ 𝑆,
𝑥u�u� = ⎨
{
⎩0 otherwise.

Then, the set 𝑋 = {𝑥u� ∶ 𝑆 ⊆ Ξ feasible} that represents the feasible solutions is a subset of
{0, 1}u� . As furthermore 𝑓(𝑆) = 𝑐u� 𝑥u� with 𝑐 = (𝑐u�1 , … , 𝑐u�u� )u� , every combinatorial optimization
problem with such an objective function can be represented by the LP

min 𝑐u� 𝑥
s.t. 𝑥 ∈ conv(𝑋)

whose feasible polytope 𝒫 is a subset of the unit hypercube [0, 1]u� .

In case of the shortest 𝑠–𝑡 path problem, an explicit formulation of the path polytope, i.e., the
convex hull of incidence vectors of 𝑠–𝑡 paths, is known. The corresponding LP to solve the

20
3.4 Combinatorial Optimization

shortest path problem is

min 𝑐u� 𝑥 = ∑ 𝑐u� 𝑥u� (3.15a)


u�∈u�
s.t. ∑ 𝑥u� − ∑ 𝑥u� = 1 (3.15b)
u�∈u�+ (u�) u�∈u�− (u�)
∑ 𝑥u� − ∑ 𝑥u� = −1 (3.15c)
u�∈u�+ (u�) u�∈u�− (u�)
∑ 𝑥u� − ∑ 𝑥u� = 0 for all 𝑣 ∉ {𝑠, 𝑡} (3.15d)
u�∈u�+ (u�) u�∈u�− (u�)
𝑥 ∈ [0, 1]|u�| , (3.15e)

where (3.15b) and (3.15c) ensure that the path starts in 𝑠 and ends in 𝑡, respectively, and the
so-called flow conservation constraints (3.15d) state that the path must leave any other vertex 𝑣
as often as it enters 𝑣.
In general, however, it is highly nontrivial to find an explicit representation of 𝒫. For a large
class of hard problems one can nevertheless at least formulate some LP-relaxation 𝒫′ ⊇ conv(𝑋)
such that 𝒫′ ∩ {0, 1}u� = 𝑋, i.e., there is an integer programming model

min 𝑐u� 𝑥
s.t. 𝑥 ∈ 𝒫′
𝑥 ∈ {0, 1}u�

of the problem, which can then be tackled by the methods presented in Section 3.3. For
several problems—in this thesis, most notably the decoding problem introduced in the following
chapter—this polyhedral approach to combinatorial optimization has led to the most efficient
algorithms known.

21
Chapter 4

Coding Theory Background

How can information be transmitted reliably via an error-prone transmission system? The
mathematical study of this question initiated the emergence of information theory and, since
one particularly important answer lies in the use of error-correcting codes, of coding theory as
a mathematical discipline of its own right.
In contrast to physical approaches to increase the reliability of communication (e.g. increased
transmit power, more sensitive antennas, …), error-correcting codes offer a completely ideal
solution to the problem. It has been shown that, by coding, an arbitrary level of reliability is
achievable, despite the unavoidable inherent unreliability of any technological equipment—in
a noisy data transmission
fact, the unparalleled development of microelectronic devices and their ability to communicate scenario
via both wired and wireless networks would not have been possible without the accompanying
progresses in coding theory.
This chapter reviews the basics of error-correcting codes and their application to reliable
communication. For a more complete coverage of these topics, we refer the interested reader to
the broad literature on the subject: the birth of information theory was Shannon’s seminal work
“A mathematical theory of communication” [11]. Recommendable textbooks on information
theory and its applications are [12, 13]. There are several books covering “classical” coding
theory (this term is elucidated later), e.g. [14], while modern aspects of coding are collected in
[15].

4.1 System Setup and the Noisy-Channel Coding Theorem

The principle of error-correction coding is to preventively include redundancy in the transferred


messages, thus communicating more than just the actual information, with the goal of enabling
the receiver to recover that information, even in the presence of noise on the transmission
channel. The general system setup we consider is as depicted in Figure 4.1:
• the function by which these “bloated” messages are computed from the original ones is
called the encoder;
• the channel introduces noise to the transmitted signal, i.e., at the receiver there is uncer-
tainty about what was actually sent;

23
Chapter 4 Coding Theory Background

introduces noise

information information perturbed recovered


encoder channel message
decoder
+ redundancy information

sender receiver

Figure 4.1: Model of the transmission system.

• the decoder tries to recover the original information from the received signal.
Throughout this text, we are concerned only with block codes, which means that the information
enters the encoder in form of chuncks of uniform size (the information words), each of which is
encoded into a unique coded message of again uniform (but larger) size (the codewords). For
now, we additionally restrict ourselves to the binary case, i.e., the alphabet for both information
and codewords is 𝔽2 = {0, 1}. This leads to the following definitions.
1000
0010 4.1 Definition (code): An (𝑛, 𝑘) code is a subset 𝒞 ⊆ 𝔽u�2 of cardinality 2u� . A bijective map
∈ 𝔽u�2
1110 from 𝔽u�2 onto 𝒞 is called an encoding function for 𝒞.
0110
Whenever the context specifies a concrete encoding function, and if there is no risk of ambiguity,
encoder the symbol 𝒞 will be used interchangeably for both the code and its encoding function.
𝒞(1000) The numbers 𝑘 and 𝑛 are referred to as information length and block length, respectively. Their
𝒞(0010)
∈ 𝔽u�2
quotient 𝑟 = 𝑘/𝑛 < 1 represents the amount of information per coded bit and is called the rate
𝒞(1110) of 𝒞. C
𝒞(0110)
block encoding of informa-
tion words into codewords The concept of redundancy is entailed by the fact that 𝒞 is a strict (and usually very small)
subset of the space 𝔽u�2 , i.e., most of the vectors in 𝔽u�2 are not codewords, which is intended to
make the codewords much easier to distinguish from each other than the informations words
of 𝔽u�2 .
Note that in this definition the encoder is secondary to the code. This reflects the fact that, for
the topics covered by this text, the structure of the set of codewords is more important than the
actual encoding function.
We assume that the channel through which the codewords are sent is memoryless, i.e., that the
1 0 0 1 1
noise affects each individual bit independently; it thus can be defined as follows.

4.2 Definition (binary-input memoryless channel): A binary-input memoryless channel


channel
is characterized by an output domain 𝒴 and the two conditional probability functions

0.8 0.15 0.6 0.7 0.5


𝑃(𝑦u� ∣ 𝑥u� = 0) and 𝑃(𝑦u� ∣ 𝑥u� = 1) (4.1)
transmission of five bits
through a noisy chan- that specify how the output 𝑦u� ∈ 𝒴 depends on the two possible inputs 0 and 1, respectively (we
nel with 𝒴 = [0, 1] assume that these two functions are not identical—otherwise, the output would be independent

24
4.1 System Setup and the Noisy-Channel Coding Theorem

of the input and would thus not contain any information about the latter). Even more compactly,
the frequently used log-likelihood ratio (LLR)

𝑃(𝑦u� ∣ 𝑥u� = 0)
𝜆u� = ln ( ) (4.2)
𝑃(𝑦u� ∣ 𝑥u� = 1)

represents the entire information revealed by the channel about the sent symbol 𝑥u� . If 𝜆u� > 0,
then 𝑥u� = 0 is more likely than 𝑥u� = 1, and vice versa if 𝜆u� < 0. The absolute value of 𝜆u�

−1.39

1.73

−0.41

−0.85

0
indicates the reliability of this tendency. C
example LLR values for the
above transmission
When the receiver observes the result 𝜆 ∈ ℝu� (we mostly use LLRs in favor of 𝑦 ∈ 𝒴u� from
now on) of the transmission of an encoded information word 𝑥 = 𝒞(𝑢) through the channel, it
has to answer the following question: which codeword 𝑥 ∈ 𝒞 do I believe has been sent, under
consideration of 𝑦? This “decision maker” is called the decoder, which is an algorithm realizing
a decoding function
decode ∶ ℝu� → ℝu� ; (4.3)
decoder
the decoder is intentionally (for reasons that will become clear later) allowed to output not
only elements of 𝔽u�2 but arbitrary points of the unit hypercube [0, 1]u� (which includes 𝔽u�2 via
the canonical embedding). We speak of decoding success if 𝑥 ∈ 𝒞 was sent and decode(𝜆) = 𝑥, decoding success for the
while a decoding error occurs if decode(𝜆) = 𝑥′ ≠ 𝑥, which includes the cases that 𝑥′ ∈ 𝒞, i.e., above transmission

the decoder outputs a codeword but not the one that was sent, and 𝑥′ ∉ 𝒞, i.e., the decoder
does not output a codeword at all.
Assuming a uniform prior on the sender’s side, i.e., that all possible information words occur
with the same probability (source coding, the task of accomplishing this assumption, is not decoder
covered here), the error-correcting performance of a communication system consisting of code,
channel, and decoding algorithm can be evaluated by means of its average frame-error rate
decoding failure (two
1 faulty bits) for the above
FER = ∑ 𝑃 (decode(𝜆) = 𝑥 ∣ 𝑥 was sent) . (4.4) transmission
|𝒞| u�∈𝒞

We can now state the main task of coding theory: given a certain channel model (4.1), design an
(𝑛, 𝑘) code and an accompanying decoder (4.3) such that the demands requested by the desired
application are fulfilled, which may include:
• The frame-error rate (4.4) should be sufficiently small in order to ensure reliable commu-
nication.
• The rate 𝑟 = 𝑘/𝑛 should be as large as possible, because a small rate corresponds to a
large number of transmitted bits per information bit, i.e., large coding overhead.
• The block length 𝑛 should be small: high block lengths generally increase the complexities
of both encoder and decoder and may additionally introduce undesirable latencies in e.g.
telephony applications.
• The complexity of the decoding algorithm needs to be appropriate.

25
Chapter 4 Coding Theory Background

It is intuitively clear that some of the above goals are opposed to each other. The first two,
however, are not as incompatible as one might suspect—Claude Shannon proved a stunning
result [11] which implies that, at a fixed positive code rate, the error probability can be made
arbitrarily small.

4.3 Theorem (noisy-channel coding theorem): For given 𝜀 > 0 and 𝑟 < 𝐶, where 𝐶 > 0
depends only on the channel, there exists a code 𝒞 with rate at least 𝑟, and a decoding algorithm
for 𝒞 such that the frame-error rate (4.4) of the system is below 𝜀. C

As beautiful as both the result and its proof (which is explained thoroughly in [12]) are, they
are unfortunately completely non-constructive in several ways:

• the decoding error probability vanishes only for the block length 𝑛 going to infinity;

• the proof makes use of a random coding argument, hence it does not say anything about
the performance of a concrete, finite code;

• the running time of the theoretical decoding algorithm used in the proof is intractable
for practical purposes.

As a consequence of the first two aspects, the search for and construction of “good” codes, i.e.,
codes that allow for the best error correction at a given finite block length 𝑛 and rate 𝑟, has
emerged as an research area on itself, which is nowadays often nicknamed “classical coding
theory.” For a long time, however, the problem of decoder complexity was not a major focus
of the coding theory community. The term “modern coding theory” nowadays refers to a
certain paradigm shift that has taken place since the early 1990’s, governed by the insight that
suboptimal codes which are develeoped jointly with harmonizing low-complexity decoding
algorithms can lead to a higher overall error-correcting performance in practical applications
than the “best” codes, if no decoder is able to exploit their theoretical strength within reasonable
running time (see [16] for the historical development of coding theory).

The rest of this chapter is organized as follows. In Section 4.2, we discuss both the optimal
MAP and the ML decoding rule, which are equivalent in our case. Afterwards, the prevalent
additive white Gaussian noise (AWGN) channel model is explained in Section 4.3. Section 4.4
introduces binary linear block codes, a subclass of general block codes that is most important
in practice and with some exceptions assumed throughout this thesis. A special type of linear
block codes, called turbo codes, is presented in Section 4.5. Finally, Section 4.6 explains how
codes and channels can be generalized to the non-binary case.

Note that this chapter does not cover any specific decoding algorithm, as both Chapter 5 and the
entire Part II are concerned with various decoding approaches using mathematical optimization.
For other decoding algorithms, e.g. the ones that are used in today’s electronic devices, we
refer to the literature.

26
4.2 MAP and ML Decoding

4.2 MAP and ML Decoding

An optimal decoder (with respect to frame-error rate) would always return the codeword that
was sent with highest probability among all codewords 𝑥 ∈ 𝒞, given the observed channel
output 𝑦:
𝑥MAP = arg max 𝑃(𝑥 ∣ 𝑦). (4.5)
u�∈𝒞

This is called MAP decoding. By Bayes’ theorem, we have

𝑃(𝑦 ∣ 𝑥)𝑃(𝑥)
𝑃(𝑥 ∣ 𝑦) = .
𝑃(𝑦)

Since 𝑃(𝑦) is independent of the sent codeword 𝑥 and by assumption 𝑃(𝑥) is constant on 𝒞, we
obtain the equivalent ML decoding rule:

𝑥ML = arg max 𝑃(𝑦 ∣ 𝑥). (4.6)


u�∈𝒞

Unfortunately, ML decoding is NP-hard in general [17], which motivates the search for special
classes of codes that are both strong and allow for an efficient decoding algorithm which at
least approaches the ML error-correction performance. On the other hand, it is desirable to
know the frame-error rate for a given code under exact ML decoding, because it (a) constitutes
the ultimate theoretical performance measure of the code itself and (b) serves as a “benchmark”
for the quality of suboptimal decoding algorithms.

4.3 The AWGN Channel

The most immediate and simple example of a binary-input memoryless symmetric channel as
defined in Definition 4.2 is called the binary symmetric channel (BSC): it flips a bit with probability
𝑝 < 1/2 and correctly transmits it with probability 𝑞 = 1 − 𝑝, and hence 𝜆u� = ± ln(𝑝/𝑞), i.e.,
there are only two possible channel outputs.

While the conceptual simplicity of the BSC is appealing, for practical applications it turns out
to be simplified too much. Imagine a device in which some incoming electromagnetic signal is
translated by a circuit (consisting of e.g. antenna, electronic filters etc.) into a voltage 𝑣 with
expected values 𝑣0 and 𝑣1 for the transmission of a 0 and 1, respectively. For a BSC channel
model, we could round 𝑣 to the closest of those values and pass that “hard” information (either
0.5
0 or 1) to the decoder. But clearly, knowing how far 𝑣 is from the value it is rounded to contains
valuable information about the reliability of the received signal—a value of 𝑣 close to the mean
0
(𝑣1 − 𝑣0 )/2 is less reliable than one close to either 𝑣0 or 𝑣1 (cf. the figures along Definition 4.2).
−√u�c √u�c
Consequently, the decoder should take that “soft” information into account.
densities u�(u�u� ∣ u�u� = 1)
(left) and u�(u�u� ∣ u�u� = 0)
The most prominent soft-output channel model is the AWGN channel, in which independent (right) of an AWGN chan-
Gaussian noise (as it appears frequently in nature) is added to each transmitted symbol. It is nel

27
Chapter 4 Coding Theory Background

characterized by a Gaussian distribution with mean (−1)u�u� √𝐸c and variance 𝜎2 . Here, 𝐸c is
the transmit energy per channel symbol and 𝜎2 is the noise energy of the channel [15]:
u� 2
u�−(−1) u� √u�c
1 − 12 ⋅( u� )
𝑃(𝑦u� ∣ 𝑥u� ) = 𝑒 . (4.7)
√2𝜋𝜎2

Note that the AWGN channel challenges the conceptual distinction between transmission success
and transmission error in favor of an ubiquitous presence of noise: the expected value ±√𝐸c
that corresponds to a “noiseless” transmission will be received only with probability zero.
For a given code rate 𝑟, an AWGN channel can be specified by a single quantity, called
information-oriented signal-to-noise ratio (SNR)

𝐸b 𝐸c
SNRb = = (4.8)
𝑁0 𝑟 ⋅ 2𝜎2

where 𝐸b = 𝐸c /𝑟 is the energy per information bit and 𝑁0 = 2𝜎2 is called the double-sided
power spectral density. It can be shown that the 𝑖-th LLR value 𝜆u� of an AWGN channel is itself
a normally distributed random variable,

𝜆u� ∼ 𝒩 (4𝑟(−1)u�u� · SNRb , 8𝑟 · SNRb ) , (4.9)

hence the specific values of 𝐸b and 𝜎 are irrelevant for the channel law.
In order to evaluate the performance of a specific code/decoder pair, it is common to state
the frame-error rate not only for a single SNR, but instead to plot (4.4) for a whole range of
SNR values (see e.g. Figure 7.3 on page 96). Since in the majority of cases the FER cannot
be determined analytically, these performance curves are usually obtained by Monte Carlo
simulation. To that end, (4.9) is utilized to generate a large number of channel outputs, until
a sufficient number of decoding errors (decode(𝑦) ≠ 𝑥) allows for a statistically significant
estimation of (4.4).

4.4 Binary Linear Block Codes

4.4 Definition (linear code): A binary (𝑛, 𝑘) code 𝒞 is called linear if 𝒞 is a linear subspace
of 𝔽u�2 . Consequently, a linear code admits a linear encoding function. C

Linear codes constitute the by far most important class of codes that are studied in literature.
This is justified by the fact that, for binary-input symmetric memoryless channels, the results
of Theorem 4.3 continue to hold when restricting to linear codes only [13, Ch. 6.2].
Linearity implies a vast amount of structure and allows codes to be compactly defined by
matrices, as introduced below. Note that all operations on binary vectors in this section are
performed in 𝔽2 , i.e., “modulo 2”.

28
4.4 Binary Linear Block Codes

4.5 Definition (dual code and parity-check matrices): The orthogonal complement

𝒞⊥ = {𝜉 ∈ 𝔽u�2 ∶ 𝜉u� 𝑥 = 0 for all 𝑥 ∈ 𝒞}

of a linear (𝑛, 𝑘) code 𝒞 is called the dual code of 𝒞, the elements of 𝒞⊥ are dual codewords of
𝒞.
A matrix 𝐻 ∈ 𝔽u�×u�
2 is a parity-check matrix for 𝒞 if its rows generate 𝒞⊥ (i.e., the rows contain

a basis of 𝒞 ) and hence the equation

𝒞 = {𝑥 ∶ 𝐻𝑥 = 0} (4.10)

completely characterizes 𝒞. In practice, 𝒞 is often defined by stating a parity-check matrix 𝐻


in the first place; in that event, we also speak of 𝐻 as the parity-check matrix of 𝒞. C

Since 𝒞⊥ is an (𝑛, 𝑛−𝑘) code, it follows that 𝑚 > 𝑛 − 𝑘 in the above definition; in practice,
𝑚 = 𝑛 − 𝑘 is usually the case. Because 𝒞 is linear, a linear encoding function can likewise
be defined by means of a so-called generator matrix 𝐺, the rows of which form a basis of 𝒞.
Within the scope of this text, however, parity-check matrices are by far more important.

4.4.1 Minimum Hamming Distance

It is intuitively clear that, for a code to be robust against channel noise, the codewords should
be maximally “distinguishable” from each other, i.e., there should be enough space between
𝔽u�2 𝔽u�2
any two of them. Hence, one of the most important measures for the quality of a single code is
intuition of a good code
its minimum distance, as defined below. with evenly spread code-
words and a bad code
4.6 Definition (minimum distance): The Hamming weight 𝑤H (𝑥) of a binary vector 𝑥 is
defined to be the number of 1s among 𝑥. The Hamming distance 𝑑H (𝑥, 𝑦) = 𝑤H (𝑥 − 𝑦) of two
vectors 𝑥, 𝑦 of equal length is the number of positions in which they differ. The minimum
(Hamming) distance of a linear code 𝒞 is defined as

𝑑min (𝒞) = min 𝑑H (𝑥, 𝑦) = min ∣{𝑖 ∶ 𝑥u� ≠ 𝑦u� }∣ ;


u�,u�∈𝒞 u�,u�∈𝒞
u�≠u� u�≠u�

it is equivalent to the minimum Hamming weight among all non-zero codewords,

𝑑min (𝒞) = min 𝑤H (𝑥), (4.11)


u�∈𝒞⧵{0}

by linearity. C

The problem of finding the minimum distance of general linear codes is an NP-hard problem
[18]. Nevertheless, integer programming techniques allow to compute 𝑑min for codes which are
not too large; this topic occurs in Papers III, VI and VII. In Section 5.4, with the pseudoweight
we will encounter a similar weight measure that is specific to the LP decoding algorithm.

29
Chapter 4 Coding Theory Background

4.4.2 Factor Graphs

Let 𝒞 be a binary linear code and 𝐻 ∈ 𝔽u�×u�


2 a parity-check matrix for 𝒞. As noted before in
(4.10), the condition
𝐻𝑥 = 0 (4.12)
is necessary and sufficient for 𝑥 being a codeword of 𝒞. A row-wise viewpoint of (4.12) leads to
the following definition.

4.7 Definition: Let 𝒞 be a linear (𝑛, 𝑘) code defined by the 𝑚 × 𝑛 parity-check matrix 𝐻.
Any code 𝒞′ such that 𝒞 ⊆ 𝒞′ is called a supercode of 𝒞. For 𝑗 ∈ {1, … , 𝑚}, the particular
supercode
𝒞u� = {𝑥 ∶ 𝐻u�,• 𝑥 = 0}
is called the 𝑗-th parity-check of 𝒞. It is a so-called single parity check (SPC) code, simply placing
a parity condition on the entries {𝑥u� ∶ 𝐻u�,u� = 1}. C

An obvious yet important consequence of the above definition is that

𝒞 = ⋂ 𝒞u� , (4.13)
u�

i.e., a linear code 𝒞 is the intersection of the supercodes defined by the rows of a parity-check
matrix for 𝒞.
The fact that a linear code is characterized by several parity-check conditions placed on subsets

1101100
⎞ of the variables is neatly visualized by a factor graph (or Tanner graph).
u� = ⎜
⎜ ⎟
⎜ 0 1 1 1 0 1 0⎟

⎝ 0 0 0 1 1 1 1⎠
4.8 Definition (factor graph): The factor graph representing a parity-check matrix 𝐻 ⊆ 𝔽u�×u� 2
u�1 u�3
of a linear code 𝒞 is a bipartite undirected graph 𝐺 = (𝑉 ∪̇ 𝐶, 𝐸) that has 𝑚 check nodes
u�2 𝐶 = {𝒞1 , … , 𝒞u� }, 𝑛 variable nodes 𝑉 = {𝑥1 , … , 𝑥u� } and an edge (𝒞u� , 𝑥u� ) whenever 𝐻u�,u� = 1. C
𝒞1 𝒞2
u�4 The factor graph representation plays an important role in the analysis and design of codes
u�5 u�6
and decoding algorithms. One of today’s most prominent decoding methods, named belief
𝒞3 propagation, works by iteratively exchanging messages (representing momentary beliefs or
u�7 probabilities) between the check and variable nodes, respectively, of the factor graph [19].
parity-check matrix and
Moreover, graph covers of the factor graph, as defined below in Section 5.4.3 and used extensively
corresponding factor in Paper VII have become an important tool to analyze LP decoding, belief propagation decoding,
graph of a (7, 4) code and their mutual relation [20].

4.5 Convolutional and Turbo Codes

Turbo codes constitute an important class of linear codes, as they were the first to closely
approach the promises of Theorem 4.3 using a very efficient decoding algorithm [2]. They are
constructed by combining two or more (terminated) convolutional codes. For the matter of

30
4.5 Convolutional and Turbo Codes

𝑣1,0 𝑣10,0
𝑠0

𝑠1

𝑠2

𝑠3

𝑆1 𝑆2 𝑆3 𝑆4 𝑆5 𝑆6 𝑆7 𝑆8 𝑆9

Figure 4.2: An example trellis graph with 𝑘 = 9 segments and 2u� = 4 states. Dashed edges have
input bit in(𝑒) = 0, for solid edges the input is in(𝑒) = 1. Hence, for example, the zero
input sequence 𝑢 = (0, … , 0) corresponds to the horizontal path (𝑣1,0 , 𝑣2,0 , … , 𝑣10,0 )
in 𝑇.

this work, it is important that the codewords of a convolutional code are in correspondence to
certain paths in a graph, as described below.
A terminated convolutional (𝑛, 𝑘) code 𝒞 with rate 𝑟 (where 1/𝑟 = 𝑛/𝑘 ∈ ℕ) and memory 𝑑 ∈ ℕ
can be compactly described by a finite state machine (FSM) with 2u� states 𝑆 = {𝑠0 , … , 𝑠2u�−1 }
and a state transition function 𝛿 ∶ 𝑆 × 𝔽2 → 𝑆 × 𝔽1/u�
2 that defines the encoding of an information
word 𝑢 ∈ 𝔽u�2 as follows. Initially, the FSM is in state 𝑠(1) = 𝑠0 . Then the bits 𝑢u� of 𝑢 are
subsequently fed into the FSM to determine the codeword 𝒞(𝑢), i.e., in each step 𝑖 ∈ ℕ, the
current state 𝑠(u�) ∈ 𝑆 together with the 𝑖-th input bit 𝑢u� determines via 0/1

𝛿(𝑠(u�) , 𝑢u� ) = (𝑠(u�+1) , 𝑥(u�) ) u�3


1/0 1/1
the subsequent state 𝑠(u�+1) as well as 𝑛/𝑘 output bits 𝑥(u�) = (𝑥(u�) (u�)
1 , … , 𝑥u�/u� ) that constitute the 0/1
u�1 u�2
part of the codeword that belongs to 𝑢u� . Finally, the machine has to terminate in the zero state,
0/0
i.e., 𝑠(u�+1) = 𝑠0 is required (this entails that some of the input bits are not free to choose and thus 1/0 1/1
have to be considered as part of the output instead; in favor of a clear presentation, however, u�0
we ignore this inexactness and assume that 𝑢 is in advance chosen such that 𝑠(u�+1) = 𝑠0 ; see
e.g. Paper IV for a more rigorous construction). The encoded word 𝑥 = 𝒞(𝑢) now consists of a 0/0
concatenation of the 𝑥(u�) , namely, FSM of a rate-1 convolu-
tional code: the edge labels
𝑥 = (𝑥(1) (1) (u�) (u�)
1 , … , 𝑥u�/u� , … , 𝑥1 , … , 𝑥u�/u� ) .
u�/u� give the respective
input (u�) and output (u�)
bits
The FSM of a convolutional code is always defined in such a way that this encoding is a linear
map, and hence 𝒞 is a linear code.
By “unfolding” the FSM along the time domain, we now associate a directed acyclic graph
𝑇 = (𝑉, 𝐸) to the convolutional code 𝒞, called its trellis (see Figure 4.2). Each vertex of 𝑇
corresponds to a state of the FSM at a specific time step, such that 𝑉 = {1, … , 𝑘 + 1} × 𝑆, where
we denote the vertex (𝑖, 𝑠) corresponding to state 𝑠 ∈ 𝑆 at step 𝑖 shortly by 𝑣u�,u� ∈ 𝑉.

31
Chapter 4 Coding Theory Background

The edges of 𝑇 in turn correspond to valid state transitions. For each 𝑖 ∈ {1, … , 𝑘} and 𝑠 ∈ 𝑆,
there are two edges emerging from 𝑣u�,u� , according to the two possible values of 𝑢u� (which are
encoded in the input labels in(𝑒) ∈ {0, 1} of the edges); both their output labels out(𝑒) ∈ 𝔽u�/u�
2
and target vertices 𝑣u�+1,u�′ are determined by the state transition function via

𝛿(𝑠, in(𝑒)) = (𝑠′ , out(𝑒)).

Hence, each edge 𝑒 = (𝑣u�,u� , 𝑣u�+1,u�′ ) of 𝑇 corresponds to the input of one bit at a specific step 𝑖
and a specific state 𝑠 of the encoder FSM that is in state 𝑠′ afterwards; the labels of 𝑒 define the
value of the input bit 𝑢u� = in(𝑒) and the output sequence 𝑥(u�) = out(𝑒), respectively. The trellis
𝑇 is thus “(𝑘 + 1)-partite” in the sense that 𝑉 partitions into 𝑘 + 1 subsets 𝑉u� such that edges
only exist between two subsequent sets 𝑉u� and 𝑉u�+1 . This motivates the definition of the 𝑖-th
trellis segment 𝑆u� = (𝑉u� ∪ 𝑉u�+1 , 𝐸u� ) according to the 𝑖-th encoding step as the subgraph induced
by 𝑉u� ∪ 𝑉u�+1 .
The transition function 𝛿 is always designed in such a way that if 𝛿(𝑠, 0) = 𝛿(𝑠′ , 𝑥′ ) and
𝛿(𝑠, 1) = 𝛿(𝑠″ , 𝑥″ ) then 𝑥′ ≠ 𝑥″ , i.e., at each encoding step, the two outputs corresponding to
an input bit 0 and 1, respectively, must be different. As a consequence, the codewords of 𝒞
are in one-to-one correspondence with the paths from 𝑣1,0 to 𝑣u�+1,0 in 𝑇: at step 𝑖 in state 𝑠,
the next 𝑛/𝑘 bits of the codeword determine which edge to follow from 𝑣u�,u� , while conversely
the output label of such an edge fixes the next 𝑛/𝑘 code bits. Due to the boundary constraints
𝑠(1) = 𝑠(u�+1) = 𝑠0 , some vertices and edges in the leading as well as the endmost 𝑑 segments are
not part of any such path and therefore are usually removed from 𝑇, as shown in the figure.
u� ∈ 𝔽u�2
In a turbo code 𝒞TC , now, several convolutional codes are concatenated in order to improve
u�
upon the rather weak error-correction performance of plain convolutional codes. In the
most common form, two identical convolutional codes 𝒞a and 𝒞b with rate 𝑟 = 1 each are
concatenated parallely, which means that the information word 𝑢 is encoded by both, but the
𝒞a 𝒞b
entries of 𝑢 are permuted by a fixed permutation (the interleaver) 𝜋 ∈ 𝕊u� before entering
the second component code 𝒞b . A codeword of the turbo code 𝒞TC then consists of the
u� 𝒞a (u�) 𝒞b (u�(u�)) concatenation of a copy of 𝑢, 𝒞a (𝑢) and 𝒞b (𝜋(𝑢)), so that the overall rate of 𝒞TC is 𝑟 = 1/3
encoding scheme (here, again, a small rate loss due to termination is ignored). In a more general setting, the
of a turbo code
term turbo-like codes refers to schemes that include serial concatenation, where the output of
one convolutional code is used as input for another convolutional code, or any combination
of parallel and serial concatenations—an example are 3-D turbo codes which are studied in
Paper VII.
Taking the path representation of codewords of 𝒞a and 𝒞b in their respective trellis graphs 𝑇a
and 𝑇b into account, from the above definition of a turbo code 𝒞TC we can derive a one-to-one
correspondence between codewords of 𝒞TC and pairs (𝑃a = (𝑒a1 , … , 𝑒au� ), 𝑃b = (𝑒b1 , … , 𝑒bu� )) of
paths in 𝑇a and 𝑇b , respectively, which additionally fulfill that

in(𝑒au� ) = in(𝑒bu�(u�) ), (4.14)

i.e., the 𝑖-th edge in 𝑃a must have the same input label as the 𝜋(𝑖)-th edge in 𝑃b , because
both equal the 𝑖-th input bit 𝑢u� . The application of this path–code relationship to decoding by

32
4.6 Non-Binary Codes and Higher-Order Modulation

mathematical optimization is introduced in Section 5.5, as it is the basis for the contributions
presented in Papers IV and VII.

4.6 Non-Binary Codes and Higher-Order Modulation

So far, we assumed binary data processing throughout coding and data transmission, as intro-
duced in Section 4.1. While today’s microelectronic systems, as is generally known, internally
rely on the binary representation of data, the above simplification is, in two different but related
ways, not the whole story in case of channel coding.
First, observe that the definition of (linear) codes can straightforwardly be generalized to any
finite field 𝔽u� for a prime power 𝑞: the information and codewords still lie in vector spaces and
linear maps can be defined as usual; the parity-check matrix of such a non-binary code then has
entries in {0, … , 𝑞 − 1}. Several constructions of strong codes rely on a non-binary field, thus a
restriction to the binary case would prevent us from using those codes.
Secondly, in many practical transmission systems the signal space is modeled by the complex
plane, where the real and imaginary axis, respectively, correspond to two different carrier waves
(e.g., two sines that are out of phase by 𝜋/2) such that any complex number 𝑧 represents a linear
combination of both waves that is then emitted onto the carrier medium. This technique is called
modulation. At the receiver’s side, a complementary demodulator measures the potentially
distorted wave and reconstructs (e.g. via Fourier analysis) a point 𝑧 ̃ in the complex plane.
In the most simple binary case, two complex numbers (e.g. 𝑧0 = 1 + 0𝑖 and 𝑧1 = −1 + 0𝑖) are u�
u�1 1 u�0
chosen that represent the values 0 and 1, respectively, of a single bit. If we now assume that
the channel adds independent Gaussian noise to both carrier waves, this case reduces to the
binary AWGN channel as introduced in Section 4.3.
In higher-order modulation, however, more than one bit of information is transmitted at once by binary modulation
choosing 𝑄 > 2 (usually, 𝑄 is a power of 2) possible complex signals {𝑧0 , … , 𝑧u�−1 }. Hence, the u�1 u�0
channel can be modeled by 𝑄 probability functions 𝑃(𝑧 ̃ ∣ 0), … , 𝑃(𝑧 ̃ ∣ 𝑄 − 1) where 𝑧 ̃ ∈ ℂ. u�
1
Non-binary codes and higher-order modulation can be combined in several ways. Most ob-
u�2 u�3
viously, if 𝑞 = 𝑄 then to each complex symbol 𝑧0 , … , 𝑧u�−1 an element of 𝔽u� can be assigned,
such that the channel transmits one entry of the codeword per signal. On the other hand, if e.g.
𝑞 = 2 and 𝑄 = 2u� , 𝑘 bits of the codeword can be sent at once by mapping the 2u� possible bit modulation with u� = 4
configurations to the 2u� chosen complex symbols. signals

In both cases, the expression for ML decoding (4.6) is more complex than in the binary case. In 01 00
u�
particular, the linearization to formulate ML decoding as an IP (Section 5.1) is not possible in 1
the same way because there is no individual LLR value corresponding to each channel signal.
Nevertheless, ML decoding of non-binary codes was formulated as an IP in [21], and in Paper II 10 11

we furthermore show how to incorporate higher-order modulation into the IP model.


example mapping of u� = 2
bits to 4 = 2u� complex
symbols

33
Chapter 5

Optimization Into Coding: The Connection

So far, the two subjects introduced above—linear and integer optimization in Chapter 3 on the
one hand and coding theory in Chapter 4 on the other—may seem to have little in common:
while the first consists of a mixture of (linear) algebra and probability, the latter is concerned
with solution algorithms for specific linear or discrete problems. The two areas become linked,
however, by the observation that decoding a received signal, in particular the (optimal) ML
decoding as introduced in Section 4.2, amounts to solving a combinatorial optimization problem
that can be formulated as an IP.

Therefore, this section introduces the abovementioned connection by reviewing the IP formu-
lation of ML decoding and is then mainly concerned with a particular LP relaxation of that
formulation, called LP decoding. The style of writing is intentionally a little more verbose than
it was in the two previous chapters because, first, this part is the most probable for the audience
to be unfamiliar with and, secondly, due to the recency of the subject, we are not aware of any
up-to-date, tutorial-like, yet mathematically stringent document that covers what we believe to
be its most important aspects.

A well-written resource for LP decoding is the dissertation of its inventor Jon Feldman [4].
Large parts of Section 5.4 are elaborately presented in [20] which is abounding in examples.
Not least, Paper I includes a literature survey of the algorithmic aspects of optimization-based
decoding until the time of its writing, as well as a thorough yet short coverage of the underlying
theory.

5.1 ML Decoding as Integer Program

While the perception of ML decoding (at least on the BSC) as a combinatorial optimization
problem is probably as old as coding theory itself (for example, the proof of its NP-hardness
by Berlekamp, McEliece, and Tilborg in 1978 [17] constitutes an obvious connection to the
optimization community), it has been only in 1998 that an integer programming formulation
of the problem was given [22], which consists of linearizations (with respect to ℝ) of both
the objective function and the code structure. In the following, we present a slightly modified
version (as in [23]) of that construction.

35
Chapter 5 Optimization Into Coding: The Connection

Recall that the ML codeword maximizes the likelihood function 𝑃(𝑦 ∣ 𝑥) for a received channel
output 𝑦 (4.6). Since the channel is assumed to be memoryless, we have [4, 22]
u�
𝑥ML
̂ = arg max ∏ 𝑃(𝑦u� ∣ 𝑥u� ) (5.1a)
u�∈𝒞 u�=1
u�
= arg min − ∑ ln 𝑃(𝑦u� ∣ 𝑥u� ) (5.1b)
u�∈𝒞 u�=1
u�
= arg min ∑ (ln 𝑃(𝑦u� ∣ 0) − ln 𝑃(𝑦u� ∣ 𝑥u� )) (5.1c)
u�∈𝒞 u�=1
𝑃(𝑦u� ∣ 0)
= arg min ∑ ln ( ) (5.1d)
u�∈𝒞 u�∶ u�u� =1
𝑃(𝑦u� ∣ 1)

Since the fraction in the last term exactly matches the LLR value 𝜆u� (4.2) which is known to the
observer, we see that ML decoding is equivalent to minimizing the linear functional 𝜆u� 𝑥 over
u�u�,• u� all codewords 𝑥 ∈ 𝒞.
8
How can we grasp this condition “𝑥 ∈ 𝒞” by an IP? The answer lies in the code-defining
7
equation 𝐻𝑥 = 0 (4.12) for a given parity-check matrix 𝐻 ∈ 𝔽u�×u�
2 , which can be ℝ-linearized
6 in virtue of auxiliary integer variables 𝑧 ∈ ℤu� as follows: The condition 𝑥 ∈ 𝒞 is eqivalent to
5 𝐻𝑥 = 0 (mod 2), which in turn is fulfilled if and only if the result of 𝐻𝑥, as an operation in the
4 reals, is a vector whose entries are even numbers. It is thus clear that the formulation
3
min 𝜆u� 𝑥 (5.2a)
2
s.t. 𝐻𝑥 − 2𝑧 = 0 (5.2b)
1
u�u� 𝑥∈ 𝔽u�2 , 𝑧∈ ℤu� (5.2c)
0
0 1 2 3 4
feasible integer points models ML decoding because (5.2b) can be achieved by an integral vector 𝑧 if and only if 𝐻𝑥 is
of the linearization even.
u�u�,• u� − 2u�u� = 0

Note that any IP formulation of the ML decoding problem can be easily modified to output the
minimum distance 𝑑min of a code: in view of (4.11), this is equivalent to determine a codeword
of minimum Hamming weight. By setting 𝜆 = (1, … , 1), the objective function value (5.2a)
equals the Hamming weigth of 𝑥, and an additional linear constraint ∑ 𝑥u� ≥ 1 excludes the
all-zero codeword, such that the IP solution must be a codeword of minimum Hamming weight
([24, 25], see also Papers VI and VII).

Interestingly, the above linearization of the code was apparently “forgotten” and several years
later reinvented in 2009 [26] and 2010 [25]. One possible explanation might be that while (5.2)
is very compact in terms of size, its LP relaxation is essentially useless: if (5.2c) is replaced
by its continuous counterpart, then the feasible region of the 𝑥 variables is the entire unit
hypercube—for any configuration of 𝑥, a corresponding real 𝑧 can be found such that (5.2b) is
fulfilled. It should be noted, however, that the formulation (5.2) has found recent justification
by the fact that it appears to perform very well with commercial IP solvers [24, 27].

36
5.2 LP Decoding

5.2 LP Decoding

It was the LP decoder introduced by Feldman [4, 5] that established linear programming in
the field of decoding by providing several equivalent IP formulations for which even the LP
relaxations exhibit a decoding performance that is of interest for practical considerations.
conv(𝒞1 )
The essence of Feldman’s LP decoder lies in the representation (4.13) of a code, together with
the fact that for the SPC codes 𝒞u� (Definition 4.7), a polynomially sized description of conv(𝒞u� )
by means of (in)equalities and potential auxiliary variables is possible. Instead of providing an
LP description of conv(𝒞) (which in view of the NP-hardness of ML decoding is unlikely to be
tractable), the LP decoder thus operates on the relaxation polytope
conv(𝒞2 )
𝒫(𝐻) = ⋂ conv(𝒞u� ) ⊇ conv(𝒞), (5.3) two constitutent polytopes
u� conv(𝒞1 ) and conv(𝒞2 )
(the circular dots represent
called the fundamental polytope [20] of the parity-check matrix 𝐻. The vertices of 𝒫(𝐻) are 𝔽u�2 )
also called pseudocodewords.
𝒫(u�)

5.1 Definition (LP decoder): Let 𝐻 be a parity-check matrix for the linear code 𝒞 and 𝜆 the
vector of channel LLR values. The LP decoder LP-decode(𝜆) outputs, for given 𝜆 ∈ ℝu� , the
conv(𝒞)
optimal solution 𝑥 ̂ of the LP

min 𝜆u� 𝑥 (5.4a)


𝒫(u�) = conv(𝒞1 ) ∩
s.t. 𝑥 ∈ 𝒫(𝐻), (5.4b) conv(𝒞2 ) is a superset of
conv(𝒞) with additional
fractional pseudocode-
where 𝒫(𝐻) is as defined in (5.3). C words ∉ 𝔽u�2

The above definition is a meaningful relaxation of conv(𝒞) because one can easily show that
𝒫(𝐻) ∩ {0, 1}u� = 𝒞, i.e., the codewords of 𝒞 and the integral vertices of 𝒫(𝐻) coincide, which
proves the following theorem.

5.2 Theorem (ML certificate [4]): The LP decoder has the ML certificate property:

𝑥 ̂ = LP-decode(𝜆) ∈ {0, 1}u� ⇒ 𝑥 ̂ = 𝑥ML ,

i.e., if 𝑥 ̂ is integral, it must be the ML codeword. Put another way, solving (5.4) as an IP with the
additional constraint 𝑥 ∈ {0, 1}u� constitutes a true ML decoder. C

Note that if we had 𝒫(𝐻) = conv(𝒞), the LP decoder would actually be an ML decoder. Because u�
this is not the case in general (moreover, it apparently does not hold for any interesting code;
see [28]), the inclusion conv(𝒞) ⊆ 𝒫(𝐻) is usually strict, and the difference 𝒫(𝐻) ⧵ conv(𝒞)
must be due to additional fractional vertices of 𝒫(𝐻), i.e., vectors for which at least one entry
is neither 0 nor 1. u�LP

Feldman gave three different formulations of conv(𝒞u� ), the convex hull of the SPC codes LLR vector u� for which
u�LP ∉ 𝔽u�2 is optimal
constituting 𝒞, to be used in (5.4b). In the context of this work, only the one described below,
which is named Ω in [4] and based on [29], is relevant.

37
Chapter 5 Optimization Into Coding: The Connection

5.3 Theorem: Let 𝐻 and 𝒞 as above and let 𝑁u� = {𝑖 ∶ 𝐻u�,u� = 1} be the indices covered by the 𝑗-th
parity check 𝒞u� of 𝒞. Then the inequalities

∑ 𝑥u� − ∑ 𝑥u� ≤ |𝑆| − 1 for all 𝑆 ⊆ 𝒩u� with |𝑆| odd (5.5a)
u�∈u� u�∈u�u� ⧵u�

0 ≤ 𝑥u� ≤ 1 for 𝑖 = 1, … , 𝑛 (5.5b)

precisely define the convex hull of 𝒞u� . C

As each inequality (5.5a) explicitly forbids one odd-sized set 𝑆, i.e., a configuration for which
𝐻u�,• 𝑥 ≡ 1 (mod 2) (it is violated by a binary vector 𝑥 if and only if 𝑥u� = 1 for 𝑖 ∈ 𝑆 and 𝑥u� = 0
for 𝑖 ∈ 𝑁u� ⧵ 𝑆), they are also called forbidden-set inequalities. Note that the number of such
inequalities is exponential in the size of 𝑁u� , which is why LP decoding was first proposed for
codes defined by a sparse matrix 𝐻, so-called LDPC codes [1, 30]. It will however turn out
in the following review of adaptive LP decoding that the inequalities (5.5a) can be efficiently
separated, which renders their exponential quantity harmless in practice.

5.3 Adaptive LP Decoding

adaptive LP decoding of The prohibitive size of the LP decoding formulation (5.4), especially for dense 𝐻 and larger block
the previous example: lengths, can be overcome by a cutting plane algorithm (cf. Section 3.3.1), called adaptive LP
u�
decoding, as proposed in [31, 32]. It starts with the trivial problem of minimizing the objective
function over the unit hypercube:

min 𝜆u� 𝑥
u�(1) s.t. 𝑥 ∈ [0, 1]u�
initial optimiza-
tion over [0, 1]u�
and then iteratively refines the domain of optimization by inserting those forbidden-set in-
equalities (5.5a) that are violated by the current solution, and hence constitute valid cuts. The
u�
procedure to find a cut in the 𝑗-th row of 𝐻 (it is shown in [31] that, at any time, one row of 𝐻
can provide at most one cut) is based on the following reformulation of (5.5a):

∑(1 − 𝑥u� ) + ∑ 𝑥u� ≥ 1. (5.6)


u�(2)
u�∈u� u�∈u�u� ⧵u�
cut for u�(1)
leads to
next solution u�(2)

u� To find a violating inequality (if it exists) of the form (5.6), an odd-sized set 𝑆 needs to be found
that minimizes the left-hand side of (5.6). It is easy to show [32] that this can be accomplished
u�LP by taking all 𝑖 with 𝑥u� > 1/2 and, if that set is even-sized, remove or add the index 𝑖∗ for which
𝑥u�∗ is closest to 1/2.
another cut yields
the same solu- When no more violating inequalities are found, the solution equals that of (5.4) and the algorithm
tion u�LP as before terminates. The total number of inequalities in the final model is however bounded by 𝑛2 ,

38
5.4 Analysis of LP Decoding

which shows that the adaptive approach indeed overcomes, with respect to size, the problems
of the model (5.5).
An important advantage of the separation approach is that one can immediately incorporate
additional types of cutting planes—if it is known how to solve the corresponding separation
problem, i.e., find violated cuts from the current LP solution—in order to tighten the LP relaxation
(5.4). A successful method of doing so is by using redundant parity-checks.
u�
5.4 Definition: Let 𝒞 be a linear code defined by a parity-check matrix 𝐻. A dual codeword
𝜉 ∈ 𝒞⊥ that does not appear as a row of 𝐻 is called a redundant parity-check (RPC). An RPC 𝜉
is said to induce a cut at the current LP solution 𝑥 if one of the inequalities (5.5a) derived from u�ML
𝜉 is violated by 𝑥. C an example RPC cut (in the
picture, the cut is a facet of
conv(𝒞)) that would lead
RPCs are called “redundant” because the rows of 𝐻 already contain a basis of 𝒞⊥ by definition, from the above LP solution
thus every RPC must be the (modulo-2) sum of two or more rows of 𝐻. The following result to the ML codeword u�ML
[23] gives a strong clue which RPCs might potentially induce cuts.

5.5 Lemma: Let 𝜉 ∈ 𝒞⊥ be a dual codeword and 𝑥 an intermediate solution of the adaptive LP
decoding algorithm. If
∣{𝑖 ∶ 𝜉u� = 1 and 𝑥u� ∉ {0, 1}}∣ = 1,
i.e., exactly one index of the fractional part of 𝑥 is contained in the support of 𝜉, then 𝜉 induces an
RPC cut for 𝑥. C

An efficient method to search for RPC cuts in view of the above observation works as follows
1
[23, 33]: Given an intermediate LP solution 𝑥, ∣u�u� − 2
∣ ↗

(1) sort the columns of 𝐻 according to an ascending ordering of ∣𝑥u� − 1/2∣, ⎛




1 * * ⎞⎟


⎜ ⎟

⎝ 1 * * ⎠
(2) perform Gaussian elimination on the reordered 𝐻 to diagonalize its leftmost part, resulting
structure of u�:̄ diagonal-
in an alternative parity-check matrix 𝐻,̃ then ized part at the left after
reordering of columns by
(3) search for cuts among the rows of 𝐻̃ as in adaptive LP decoding. ∣u�u� − 12 ∣

The motivation behind this approach is that, if the submatrix of 𝐻 corresponding to the fractional
part of 𝑥 has full column rank, the leftmost part of 𝐻̃ will be a diagonal matrix, and hence
by Lemma 5.5 every row of 𝐻̃ would induce a cut for 𝑥. The results reported in [23] and [33]
furthermore suggest that, even if this is not the case and thus the requirements of Lemma 5.5 are
not necessarily met, this “sort-&-diagonalize” strategy very often leads to cuts and substantially
improves the error-correcting capability of the plain LP decoder that does not involve RPCs.

5.4 Analysis of LP Decoding

While the aspects of LP decoding discussed so far include some useful theoretical results
about an individual run of the algorithm (most importantly, the ML certificate property given

39
Chapter 5 Optimization Into Coding: The Connection

in Theorem 5.2), there is no immediate theoretical approach to determine the average error-
correction performance (4.4) of a given code and channel under LP decoding, other than using
simulations as described in Section 4.3.

In the following, we briefly outline an approach to a theoretical performance analysis of LP


decoding that is based on a channel-specific rating of the vertices of the LP decoding polytope,
called pseudoweight. The theory presented in this section is based on the “plain” LP decoder as
defined in Definition 5.1, i.e., does not take the improvement via RPC cuts (Definition 5.4 and
the discussion thereafter) into account; most of the results can however be extended to that
case in a straightforward manner.

5.4.1 All-Zero Decoding and the Pseudoweight

First we introduce the very useful all-zeros assumption.

5.6 Theorem (Feldman [4]): If the LP decoder (5.4) is used on a binary-input memoryless
symmetric channel, the probability of decoding error is independent of the sent codeword: the FER
(4.4) satisfies
FER = 𝑃(LP-decode(𝜆) ≠ 0 ∣ 0 was sent). C

The proof of Theorem 5.6 relies on the symmetry of both the channel and the polytope. The
latter is due to the linearity of the code (which implies that conv(𝒞) basically “looks the same”
from any codeword 𝑥) and the 𝒞-symmetry [4, Ch. 4.4] of 𝒫(𝐻), which extends that symmetry
to the relaxed LP polytope. As a consequence of the theorem, when examining the LP decoder’s
error probability we can always assume that the all-zero codeword 0 ∈ 𝒞 was sent, which
greatly simplifies analysis.

Assume now that the all-zero codeword is sent through a channel and the result 𝜆 is decoded
by the LP decoder which solves (5.4) to obtain the optimal solution 𝑥.̂ The decoder fails if there
is a vertex 𝑥 of 𝒫(𝐻) such that
𝜆u� 𝑥 < 𝜆u� 0 = 0 (5.7)

(we assume here and in the following that in case of ties, i.e., 𝜆u� 𝑥 = 0 for some non-zero vertex
𝑥, the LP decoder correctly outputs 0; for the AWGN channel, ties can be neglected since they
occur with probability 0). The probability 𝑃(𝜆u� 𝑥 < 0) of the event (5.7), also called the pairwise
error probability between 𝑥 and 0, depends on the channel. In case of the AWGN channel, by
(4.9) we have
𝜆u� 𝑥u� ∼ 𝒩 (4𝑟 ⋅ SNRb 𝑥u� , 8𝑟 ⋅ SNRb 𝑥2u� )

and because the channel treats symbols independently and furthermore the sum of independent
Gaussian variables is again gaussian with mean and variance simply summing up, we obtain

𝜆u� 𝑥 ∼ 𝒩 (4𝑟 · SNRb ‖𝑥‖1 , 8𝑟 · SNRb ‖𝑥‖22 ) . (5.8)

40
5.4 Analysis of LP Decoding

Hence, 𝜆u� 𝑥 is again Gaussian, and the probability that 𝜆u� 𝑥 < 0 computes as (using the
abbreviations 𝜇 = 4𝑟 · SNRb ‖𝑥‖1 and 𝜎2 = 8𝑟 · SNRb ‖𝑥‖22 )
(u�−u�)2
1 0 − 1 ∞ 1 2
𝑃(𝜆u� 𝑥 < 0) = ∫ 𝑒 2u�2 d𝑥 = ∫u� 𝑒− 2 u� d𝑥.
√2𝜋𝜎2 −∞ √2𝜋 u�

∞ u�2
Introducing the 𝑄-function as 𝑄(𝑎) = ∫u� 1
𝑒− 2 d𝑥, we get
√2u�

𝜇 ⎛ ‖𝑥‖2 ⎞ ‖𝑥‖2
𝑃(𝜆u� 𝑥 < 0) = 𝑄 ( ) = 𝑄 ⎜

⎜√2𝑟 · SNRb 12 ⎟⎟
⎟ = 𝐹⎛
⎜ 12 ⎞ ⎟,
𝜎 ‖𝑥‖2 ⎠ ⎝ ‖𝑥‖2 ⎠

for a monotone function 𝐹, which motivates the following definition.

5.7 Definition ([20, 34]): Let 𝑥 be a non-zero vertex of 𝒫(𝐻). The (AWGN) pseudoweight of 𝑥
is defined as
𝑤AWGN
p (𝑥) = ‖𝑥‖21 / ‖𝑥‖22 . (5.9)
C

Observe that the pairwise error probability 𝑃(𝜆u� 𝑥 < 0) is a strictly monotonically decreasing
function of 𝑤AWGN
p (𝑥): the lower the pseudoweight is, the higher is the probability that the
LP decoder wrongly runs into 𝑥 instead of 0. The AWGN pseudoweight is thus a compact
expression that measures the “danger” of decoding error due to a specific vertex 𝑥 ∈ 𝒫(𝐻).

5.4.2 The Fundamental Cone

One simple parameter for estimating the average performance of the LP decoder, for a given
code 𝒞 and a parity-check matrix 𝐻, is the minimal pseudoweight among the non-zero vertices
of 𝒫(𝐻),
𝑤AWGN
p,min (𝐻) = min {𝑤p
AWGN (𝑥) ∶ 𝑥 ≠ 0 is a vertex of 𝒫(𝐻)} ,

which corresponds to the most probable non-zero vertex that accidentally becomes optimal
instead of the all-zero one. Note that for an integral vertex 𝑥 we have 𝑤AWGN
p (𝑥) = 𝑤H (𝑥) (cf.
Definition 4.6), which shows that for an ML decoder the minimum Hamming weight takes on
this role.
The minimum pseudoweight alone is however still a rather rough estimate of the decoding
performance: both the quantity of minimum-pseudoweight vertices and the (quantities of the)
next larger pseudoweights influence the error probability 𝑃(LP-decode(𝜆) ≠ 0). Therefore,
the pseudoweight enumerator of 𝒫(𝐻), i.e., a table containing all occuring pseudoweights of
the non-zero vertices alongside with their frequencies, would allow for a better estimation of
the decoding performance. Finally, for an exact computation of the error rate we would need a
description of the region

Λ = {𝜆 ∶ 𝜆u� 𝑥 ≥ 0 for all 𝑥 ∈ 𝒫(𝐻)} (5.10)

41
Chapter 5 Optimization Into Coding: The Connection

of channel outputs 𝜆 for which 0 is the optimal solution of (5.4), and then compute the
probability 1 − 𝑃(𝜆 ∈ Λ) by integrating the density function given by (5.8) over Λ. In
optimization language, Λ is called the dual cone of 𝒫(𝐻).
While the three tasks stated above appear to be ascendingly difficult—no efficient algorithm is
known to compute the minimum pseudoweight in general—it turns out that Λ as defined in
(5.10) can be determined by LP duality: assume that the LP decoder (5.4) is given in the form

min 𝜆u� 𝑥 (5.11a)


s.t. 𝐴𝑥 ≤ 𝑏, (5.11b)

where 𝐴 and 𝑏 represent (5.5). The dual of (5.11) is

max − 𝑏u� 𝑦 (5.12a)


s.t. 𝐴u� 𝑦 = −𝜆 (5.12b)
𝑦 ≥ 0. (5.12c)

By Theorem 3.11, 0 is optimal for (5.11) if and only if there is an 𝑦 that is feasible for (5.12)
𝒫(u�)
with 𝑏u� 𝑦 = 0. Now 0 is feasible for (5.11), hence 𝑏 ≥ 0, which together with (5.12c) implies
u�1,•
.
u� that 𝑦u� = 0 whenever 𝑏u� ≠ 0 in a solution 𝑦 with 𝑏u� 𝑦 = 0. Taking a closer look at (5.12), we
.
conclude
−u�
Λ u�2,• 𝜆 ∈ Λ ⇔ (−𝜆) ∈ conic ({𝐴u�,• ∶ 𝑏u� = 0}) .
dual cone Λ of 𝒫(u�) Note that Λ is also a polyhedron by Theorem 3.5.
spanned by two rows
u�1,• , u�2,• , and an exam- As a by-product of the above calculations, it appears that the rows of 𝐴𝑥 ≤ 𝑏 for which 𝑏u� ≠ 0 are
ple u� with −u� ∈ Λ
irrelevant for Λ and hence for the question whether the decoder fails or not. It is easy to show
that deleting those rows leads to conic(𝒫(𝐻)), which motivates the following definition.
5.8 Definition: The conic hull of the fundamental polytope,

𝒦(𝐻) = conic(𝒫(𝐻)),

is called the fundamental cone of 𝐻. By


ℋ(u�)

1
𝒦1 (𝐻) = {𝑥 ∈ 𝒦(𝐻) ∶ ‖𝑥‖1 = 1}
𝒫(u�)
ℋ1 (u�) we denote the intersection of 𝒦(𝐻) with the unit simplex. C

1
‖u�‖1 = 1 From the above discussion we can now formulate the following equivalent conditions for the
LP decoder to succeed.
5.9 Corollary: The following are equivalent:
(1) The LP decoder correctly decodes 𝜆 to 0.
(2) 𝜆 ∈ Λ.
(3) There is no 𝑥 ∈ 𝒦(𝐻) with 𝜆u� 𝑥 < 0.

42
5.4 Analysis of LP Decoding

(4) There is no 𝑥 ∈ 𝒦1 (𝐻) with 𝜆u� 𝑥 < 0. C

As a consequence, we can study either of the three sets 𝒫(𝐻), 𝒦(𝐻) or 𝒦1 (𝐻) in order to
characterize the LP decoder. Note that while the set 𝒦(𝐻) is larger than 𝒫(𝐻), its description
complexity is much lower, because we need only as many forbidden-set inequalities (5.5a) as
there are 1-entries in 𝐻. In addition, observe that the pseudoweight is invariant to scaling, i.e.,
𝑤AWGN
p (𝜏𝑥) = 𝑤AWGN
p (𝑥) for 𝜏 > 0. Consequently, the search for minimum pseudoweight
can be restrained to either 𝒦(𝐻) or 𝒦1 (𝐻) as well. For the latter, it takes on the particularly
simple form
2
𝑤AWGN
p,min (𝐻) = max {‖𝑥‖2 ∶ 𝑥 ∈ 𝒦1 (𝐻)} .

While the maximization of ‖·‖22 over a polytope is NP-hard in general, the most effective
algorithms to approach the minimum pseudoweight rely on the above formulation; see [35, 36]
and Paper VII.

5.4.3 Graph Covers

There is a fascinating combinatorial characterization of the fundamental polytope 𝒫(𝐻) derived


from the factor graph 𝐺 of a parity-check matrix 𝐻 (cf. Definition 4.8). Central to it is the
following definition.

5.10 Definition (graph cover): Let 𝐺 = (𝑉 ∪ 𝐶, 𝐸) be the factor graph associated to a parity-
check matrix 𝐻 with variable nodes 𝑉, check nodes 𝐶, and edge set 𝐸. For 𝑚 ∈ ℕ, an 𝑚-cover
of 𝐺 is a factor graph 𝐺 ̄ with variable nodes 𝑉̄ = 𝑉 × {1, … , 𝑚}, check nodes 𝐶 ̄ = 𝐶 × {1, … , 𝑚}
and a set of |𝐸| permutations {𝑒 ∈ 𝕊u� ∶ 𝑒 ∈ 𝐸} such that the edge set of 𝐺 ̄ is

𝐸̄ = {(𝒞(u�) (u�)
u� , 𝑥u� ) ∶ (𝒞u� , 𝑥u� ) = 𝑒 ∈ 𝐸 and 𝜋u� (𝑘) = 𝑙} , a 3-cover of the (7, 4)
code shown on page 30

where by 𝒞(u�) ̄ (u�)


u� = (𝒞u� , 𝑘) ∈ 𝐶 we denote the 𝑘-th copy of 𝒞u� ∈ 𝐶 (and 𝑉u� analogously). C

Despite the somewhat heavy notation in the above definition, the idea of a graph cover is rather
simple: make 𝑚 identical copies of 𝐺 and then, for every edge 𝑒 = (𝒞u� , 𝑥u� ) ∈ 𝐸, arbitrarily
“rewire” the 𝑚 copies of 𝑒 in a one-to-one fashion between the copies of 𝒞u� and 𝑥u� .

Since every graph cover of a factor graph 𝐺 defining a code 𝒞 is a factor graph itself, it defines
a code 𝒞̄ = 𝒞(̄ 𝐺)̄ that has 𝑚 times the block length of 𝒞. Let

𝑥 ̃ = (𝑥(1) (u�) (1) (u�)


1 , … , 𝑥1 , … , 𝑥u� , … , 𝑥u� ) ∈ 𝒞
̄

be such a codeword, where the entries are ordered in the obvious way. Then, the rational
𝑛-vector 𝑥 defined by
1 u�
𝑥u� = ∑ 𝑥(u�)
𝑚 u�=1 u�

43
Chapter 5 Optimization Into Coding: The Connection

is called the scaled pseudocodeword of 𝒞 associated to 𝑥.̃ Let 𝒬(𝐻) denote the union, over all
𝑀 > 0 and all 𝑀-covers 𝐺 ̄ of 𝐺, of the scaled pseudocodewords associated to all the codewords
of 𝒞(̄ 𝐺).
̄ It then holds that
𝒬(𝐻) = 𝒫(𝐻) ∩ ℚu� ,
i.e., 𝒬(𝐻) contains exactly the rational points of 𝒫(𝐻), and hence 𝒫(𝐻) = 𝒬(𝐻).
Graph covers have been proposed in [20], among other things, to study the relationship between
LP decoding and iterative methods, which can be shown to always compute solutions that are
optimal for some graph cover of 𝐺. In Paper VII, we use graph covers to estimate the minimum
pseudoweight of ensembles of codes.

5.5 LP Decoding of Turbo Codes

Since one can show that turbo(-like) codes, as introduced in Section 4.5, are special instances
of linear block codes, one could compute, for a given turbo code 𝒞TC , a parity-check matrix
𝐻 defining 𝒞TC , and apply all of the abovementioned theory and algorithms to decode them
using linear and integer optimization. In doing so, however, one would neglect the immedi-
ate combinatorial structure embodied in 𝒞 in virtue of the trellis graphs of the constituent
convolutional codes.
In fact, for an individual convolutional code 𝒞, it can be shown that ML decoding can be
performed by computing a shortest path (due to the simple structure of 𝑇, this can be achieved
in 𝑂(𝑘) time) in the trellis 𝑇 of 𝒞, after having introduced a cost value 𝑐u� to each trellis edge
𝑒 = (𝑣u�,u� , 𝑣u�+1,u�′ ) ∈ 𝐸: as every time step produces 𝑛/𝑘 output bits, 𝑒 determines the entries
𝑥(u�) = (𝑥(u�−1)u�/u�+1 , … , 𝑥u�u�/u� ) = 𝑥u� , for an appropriate index set 𝐼, of the codeword 𝑥. In view of
(5.1), we thus have to define
𝑐u� = ∑ 𝜆u�u�
u� ∶ out(u�)u� =1

such that the edge cost 𝑐u� reflects the portion of the objective function 𝜆u� 𝑥 contributed by
including 𝑒 in the path.
When making the transition to a turbo code 𝒞TC , independently computing a shortest path
𝑃a and 𝑃b in each component trellis 𝑇a and 𝑇b , respectively, would not ensure that 𝑃a and 𝑃b
are agreeable, i.e., fulfill (4.14), and hence match a codeword of 𝒞TC . An ML turbo decoding
algorithm would thus need to compute the minimum-cost pair of paths (𝑃a , 𝑃b ) in 𝑇a and 𝑇b
that additionally is agreeable. While there is no known combinatorial algorithm that efficiently
solves such a type of problem (which can be viewed as a generalization of what is called the
equal flow problem [37]), it is nontheless possible to combine LP decoding with the trellis
structure of turbo codes.
One approach is to resort to an LP formulation of the two shortest path problems on 𝑇a and
𝑇b as described in Section 3.4, and then link them by adding linear constraints that represent
(4.14): first, write down the constraints of (3.15) for each trellis 𝑇a and 𝑇b , where we assume

44
5.5 LP Decoding of Turbo Codes

that the decision variable representing an edge 𝑒 is called 𝑓u� instead of 𝑥u� . Then, for 𝑖 = 1, … , 𝑘,
add an additional contstraint
∑ 𝑓u� = ∑ 𝑓u�
u�∈u�au� ∶ u�∈u�bu�(u�) ∶
in(u�)=1 in(u�)=1

to model (4.14), where 𝐸xu� is the edge set of the 𝑖th segment of trellis 𝑇x , x ∈ {a, b}. By adding
these constraints, the resulting polytope is no longer integral, i.e., the constraints introduce
fractional vertices that do not correspond to an agreeable pair of paths. Note that there are no
𝑥 variables for the codeword in the model, since their values are uniquely determined by the
values of the 𝑓u� ; it is hence not necessary to include them during the optimization process.
In a similar way as described above (see Paper IV for the details), a cost value derived from
the LLR vector can be assigend to each edge such that the integer programming version of the
model with additional constraints 𝑓u� ∈ {0, 1} is equivalent to ML decoding. Its LP relaxation,
called the turbo LP decoder, thus has similar properties to the LP decoder from Definition 5.1; in
particular, it exhibits the ML certificate property.
While the error-correction performance of the turbo LP decoder has shown to be better than
that of the usual LP decoder run on a parity-check matrix representation of the turbo code,
solving the turbo LP with a generic LP solver, such as the simplex method, does not exploit the
abovementioned possibility to decode the constituent convolutional codes in linear time. In
Paper IV, we present an algorithm using this linear-time method as a subroutine to solve the
turbo decoding LP in a very efficient way.

45
Part II

Contributions

47
Chapter 6

Paper I: Mathematical Programming


Decoding of Binary Linear Codes:
Theory and Algorithms

Michael Helmling, Stefan Ruzika, and Akın Tanatmis

The following is a reformatted and revised copy of a preprint that is publicly available online
(http://arxiv.org/abs/1107.3715). The same content appeared in the following publication:

M. Helmling, S. Ruzika, and A. Tanatmis. “Mathematical programming decoding of binary


linear codes: theory and algorithms”. IEEE Transactions on Information Theory 58.7 (July
2012), pp. 4753–4769. doi: 10.1109/TIT.2012.2191697. arXiv: 1107.3715 [cs.IT]

This work was supported in part by the Center of Mathematical and Computational Modeling
(CM)², University of Kaiserslautern, Kaiserslautern, Germany.

49
Mathematical Programming
Decoding of Binary Linear Codes:
Theory and Algorithms

Michael Helmling Stefan Ruzika Akın Tanatmis

Mathematical programming is a branch of applied mathematics and


has recently been used to derive new decoding approaches, challenging
established but often heuristic algorithms based on iterative message
passing. Concepts from mathematical programming used in the context
of decoding include linear, integer, and nonlinear programming, net-
work flows, notions of duality as well as matroid and polyhedral theory.
This survey article reviews and categorizes decoding methods based on
mathematical programming approaches for binary linear codes over
binary-input memoryless symmetric channels.
Chapter 6 Mathematical Programming Decoding of Binary Linear Codes

6.1 Introduction

Based on an integer programming (IP)1 formulation of the maximum likelihood decoding


(MLD) problem for binary linear codes, linear programming decoding (LPD) was introduced
by Feldman et al. [1, 2]. Since then, LPD has been intensively studied in a variety of articles
especially dealing with low-density parity-check (LDPC) codes. LDPC codes are generally
decoded by heuristic approaches called iterative message-passing decoding (IMPD) subsuming
sum-product algorithm decoding (SPAD) [3, 4] and min-sum algorithm decoding (MSAD) [5].
In these algorithms, probabilistic information is iteratively exchanged and updated between
component decoders. Initial messages are derived from the channel output. IMPD exploits the
sparse structure of parity-check matrices of LDPC and turbo codes very well and achieves good
performance. However, IMPD approaches are neither guaranteed to converge nor do they have
the maximum likelihood (ML) certificate property, i.e., if the output is a codeword, it is not
necessarily the ML codeword. Furthermore, performance of IMPD is poor for arbitrary linear
block codes with a dense parity-check matrix. In contrast, LPD offers some advantages and
thus has become an important alternative decoding technique. First, this approach is derived
from the discipline of mathematical programming which provides analytical statements on
convergence, complexity, and correctness of decoding algorithms. Second, LPD is not limited
to sparse matrices.
This article is organized as follows. In Section 6.2, notation is fixed and well-known but relevant
results from coding theory and polyhedral theory are recalled. Complexity and polyhedral
properties of MLD are discussed in Section 6.3. In Section 6.4 a general description of LPD
is given. Several linear programming (LP) formulations dedicated to codes with low-density
parity-check matrices, codes with high-density parity-check matrices, and turbo-like codes are
categorized and their commonalities and differences are emphasized in Section 6.5. Based on
these LP formulations, different streams of research on LPD have evolved. Methods focusing
on efficient realization of LPD are summarized in Section 6.6, while approaches improving
the error-correcting performance of LPD at the cost of increased complexity are reviewed in
Section 6.7. Some concluding comments are made in Section 6.8.

6.2 Basics and Notation

This section briefly introduces a number of definitions and results from linear coding theory
and polyhedral theory which are most fundamental for the subsequent text.
A binary linear block code 𝐶 with cardinality 2u� and block length 𝑛 is a 𝑘-dimensional subspace
of the vector space {0, 1}u� defined over the binary field 𝔽2 . 𝐶 ⊆ {0, 1}u� is given by 𝑘 basis
vectors of length 𝑛 which are arranged in a 𝑘 × 𝑛 matrix 𝐺, called the generator matrix of the
code 𝐶.2
1
See the table on page 83 for a list of the acronyms used in this paper.
2
Note that single vectors in this paper are generally column vectors; however, in coding theory they are often
used as rows of matrices. The transposition of column vector u� makes it a row vector, denoted by u�u� .

52
6.2 Basics and Notation

1 1 1 0 1 0 0 0 𝐼

⎜ ⎞
⎜ 1 1 0 1 0 1 0 0⎟⎟
𝐻=⎜
⎜ ⎟

⎜1 0 1 1 0 0 1 0⎟⎟

⎝0 1 1 1 0 0 0 1⎠ 𝐽

Figure 6.1: Parity-check matrix and Tanner graph of an (8,4) code. The square nodes represent
the check nodes, while the round ones correspond to the variables.

The orthogonal subspace 𝐶⟂ of 𝐶 is defined as



{ u� ⎫
}
𝐶⊥ = ⎨𝑦 ∈ {0, 1}u� ∶ ∑ 𝑥u� 𝑦u� ≡ 0 (mod 2) for all 𝑥 ∈ 𝐶⎬
{
⎩ u�=1 }

and has dimension 𝑛 − 𝑘. It can also be interpreted as a binary linear code of dimension 𝑛 − 𝑘
which is referred to as the dual code of 𝐶. A matrix 𝐻 ∈ {0, 1}u�×u� whose 𝑚 ≥ 𝑛 − 𝑘 rows form a
spanning set of 𝐶⟂ is called a parity-check matrix of 𝐶. It follows from this definition that 𝐶 is
the null space of 𝐻 and thus a vector 𝑥 ∈ {0, 1}u� is contained in 𝐶 if and only if 𝐻𝑥 ≡ 0 (mod 2).
Normally, 𝑚 = 𝑛 − 𝑘 and the rows of 𝐻 ∈ {0, 1}(u�−u�)×u� constitute a basis of 𝐶⟂ . It should be
pointed out, however, that most LPD approaches (see Section 6.7) benefit from parity-check
matrices being extended by redundant rows. Moreover, additional rows of 𝐻 never degrade the
error-correcting performance of LPD. This is a major difference to IMPD which is generally
weakened by redundant parity checks, since they introduce cycles to the Tanner graph.
Let 𝑥, 𝑥′ ∈ {0, 1}u� . The Hamming distance between 𝑥 and 𝑥′ is the number of entries (bits)
with different values, i.e., 𝑑(𝑥, 𝑥′ ) = ∣{1 ≤ 𝑗 ≤ 𝑛 ∶ 𝑥u� ≠ 𝑥′u� }∣. The minimum (Hamming) distance
of a code, 𝑑(𝐶), is given by 𝑑(𝐶) = min{𝑑(𝑥, 𝑥′ ) ∶ 𝑥, 𝑥′ ∈ 𝐶, 𝑥 ≠ 𝑥′ }. The Hamming weight of
a codeword 𝑥 ∈ 𝐶 is defined as 𝑤(𝑥) = 𝑑(𝑥, 0), i.e., the number of ones in 𝑥. The minimum
Hamming weight of 𝐶 is 𝑤(𝐶) = min{𝑤(𝑥) ∶ 𝑥 ∈ 𝐶, 𝑥 ≠ 0}. For binary linear codes it holds that
𝑑(𝐶) = 𝑤(𝐶). The error-correcting performance of a code is, at least at high signal-to-noise
ratio (SNR), closely related to its minimum distance.
Let 𝐴 ∈ ℝu�×u� denote an 𝑚 × 𝑛 matrix and 𝐼 = {1, … , 𝑚}, 𝐽 = {1, … , 𝑛} be the row and column
index sets of 𝐴, respectively. The entry in row 𝑖 ∈ 𝐼 and column 𝑗 ∈ 𝐽 of 𝐴 is given by 𝐴u�,u� . The
𝑖th row and 𝑗th column of 𝐴 are denoted by 𝐴u�,. and 𝐴.,u� , respectively. A vector 𝑒 ∈ ℝu� is called
the 𝑖th unit column vector if 𝑒u� = 1, 𝑖 ∈ 𝐼, and 𝑒ℎ = 0 for all ℎ ∈ 𝐼 ⧵ {𝑖}.
A parity-check matrix 𝐻 can be represented by a bipartite graph 𝐺 = (𝑉, 𝐸), called its Tanner
graph (Figure 6.1). The vertex set 𝑉 of 𝐺 consists of the two disjoint node sets 𝐼 and 𝐽. The
nodes in 𝐼 are referred to as check nodes and correspond to the rows of 𝐻 whereas the nodes in
𝐽 are referred to as variable nodes and correspond to columns of 𝐻. An edge [𝑖, 𝑗] ∈ 𝐸 connects
node 𝑖 and 𝑗 if and only if 𝐻u�,u� = 1. Let 𝑁u� = {𝑗 ∈ 𝐽 ∶ 𝐻u�u� = 1} denote the index set of variables
incident to check node 𝑖, and analogously 𝑁u� = {𝑖 ∈ 𝐼 ∶ 𝐻u�u� = 1} for 𝑗 ∈ 𝐽. The degree of a
check node 𝑖 is the number of edges incident to node 𝑖 in the Tanner graph or, equivalently,
𝑑u� (𝑖) = ∣𝑁u� ∣. The maximum check node degree 𝑑max u� is the degree of the check node 𝑖 ∈ 𝐼 with
the largest number of incident edges. The degree of a variable node 𝑗, 𝑑u� (𝑗), and the maximum
variable node degree 𝑑max u� are defined analogously.

53
Chapter 6 Mathematical Programming Decoding of Binary Linear Codes

Tanner graphs are an example of factor graphs, a general concept of graphical models which
is prevalently used to describe probabilistic systems and related algorithms. The term stems
from viewing the graph as the representation of some global function in several variables that
factors into a product of subfunctions, each depending only on a subset of the variables. In case
of Tanner graphs, the global function is the indicator function of the code, and the subfunctions
are the parity-checks according to single rows of 𝐻. A different type of factor graphs will
appear later in order to describe turbo codes. Far beyond these purely descriptive purpose,
factor graphs have proven successful in modern coding theory primarily in the context of
describing and analyzing IMPD algorithms. See [6] for a more elaborate introduction.

Let 𝐶 be a binary linear code with parity-check matrix 𝐻 and 𝑥 ∈ 𝐶 ⊆ {0, 1}u� . The index set
supp(𝑥) = {𝑗 ∈ 𝐽 ∶ 𝑥u� = 1} is called the support of the codeword 𝑥. A codeword 0 ≠ 𝑥 ∈ 𝐶 is
called a minimal codeword if there is no codeword 0 ≠ 𝑦 ∈ 𝐶 such that supp(𝑦) ⊆ supp(𝑥).
Finally, 𝐷 is called a minor code of 𝐶 if 𝐷 can be obtained from 𝐷 by a series of shortening and
puncturing operations.

The relationship between binary linear codes and polyhedral theory follows from the observa-
tion that a binary linear code can be considered a set of points in ℝu� , i.e., 𝐶 ⊆ {0, 1}u� ⊆ ℝu� . In
the following, some relevant results from polyhedral theory are recalled. For a comprehensive
review on polyhedral theory the reader is referred to [7].

6.1 Definition: A subset 𝒫(𝐴, 𝑏) ⊆ ℝu� such that 𝒫(𝐴, 𝑏) = {𝜈 ∈ ℝu� ∶ 𝐴𝜈 ≤ 𝑏} where
𝐴 ∈ ℝu�×u� and 𝑏 ∈ ℝu� is called a polyhedron. C

In this article, polyhedra are assumed to be rational, i.e., the entries of 𝐴 and 𝑏 are taken
from ℚ. The 𝑖th row vector of 𝐴 and the 𝑖th entry of 𝑏 together define a closed halfspace
{𝜈 ∈ ℝu� ∶ 𝐴u�,. 𝜈 ≤ 𝑏u� }. In other words, a polyhedron is the intersection of a finite set of closed
halfspaces. A bounded polyhedron is called a polytope. It is known from polyhedral theory
that a polytope can equivalently be defined as the convex hull of a finite set of points. In this
work, the convex hull of a binary linear code 𝐶 is denoted by conv(𝐶) and referred to as the
codeword polytope.

Some characteristics of a polyhedron are its dimension, faces, and facets. To define them, the
notion of a valid inequality is needed.

6.2 Definition: An inequality 𝑟u� 𝜈 ≤ 𝑡, where 𝑟 ∈ ℝu� and 𝑡 ∈ ℝ, is valid for a set 𝒫(𝐴, 𝑏) ⊆
ℝu� if 𝒫(𝐴, 𝑏) ⊆ {𝜈 ∶ 𝑟u� 𝜈 ≤ 𝑡}. C

The following definition of an active inequality is used in several LPD algorithms.

6.3 Definition: An inequality 𝑟u� 𝜈 ≤ 𝑡, where 𝑟, 𝜈 ∈ ℝu� and 𝑡 ∈ ℝ, is active at 𝜈∗ ∈ ℝu� if


𝑟u� 𝜈∗ = 𝑡. C

Valid inequalities which contain points of 𝒫(𝐴, 𝑏) are of special interest.

54
6.2 Basics and Notation

6.4 Definition: Let 𝒫(𝐴, 𝑏) ⊆ ℝu� be a polyhedron, let 𝑟u� 𝜈 ≤ 𝑡 be a valid inequality for
𝒫(𝐴, 𝑏) and define 𝐹 = {𝜈 ∈ 𝒫(𝐴, 𝑏) ∶ 𝑟u� 𝜈 = 𝑡}. Then 𝐹 is called a face of 𝒫(𝐴, 𝑏). 𝐹 is a
proper face if 𝐹 ≠ ∅ and 𝐹 ≠ 𝒫(𝐴, 𝑏). C

The dimension dim(𝒫(𝐴, 𝑏)) of 𝒫(𝐴, 𝑏) ⊆ ℝu� is given by the maximum number of affinely
independent points in 𝒫(𝐴, 𝑏) minus one. Recall that a set of vectors 𝑣1 , … , 𝑣u� is affinely
u� u�
independent if the system {∑u�=1 𝜆u� 𝑣u� = 0, ∑u�=1 𝜆u� = 0} has no solution other than 𝜆u� = 0
for 𝑖 = 1, … , 𝑘. If dim(𝒫(𝐴, 𝑏)) = 𝑛, then the polyhedron is full-dimensional. It is a well-
known result that if 𝒫(𝐴, 𝑏) is not full-dimensional, then there exists at least one inequality
𝐴u�,. 𝜈 ≤ 𝑏u� such that 𝐴u�,. 𝜈 = 𝑏u� holds for all 𝜈 ∈ 𝒫(𝐴, 𝑏) (see e.g. [7]). Also, we have dim(𝐹) ≤
dim(𝒫(𝐴, 𝑏)) − 1 for any proper face of 𝒫(𝐴, 𝑏). A face 𝐹 ≠ ∅ of 𝒫(𝐴, 𝑏) is called a facet of
𝒫(𝐴, 𝑏) if dim(𝐹) = dim(𝒫(𝐴, 𝑏)) − 1.

In the set of inequalities defined by (𝐴, 𝑏), some inequalities 𝐴u�,. 𝜈 ≤ 𝑏u� may be redundant, i.e.,
dropping these inequalities does not change the solution set defined by 𝐴𝜈 ≤ 𝑏. A standard
result states that the facet-defining inequalities give a complete non-redundant description of a
polyhedron 𝒫(𝐴, 𝑏) [7].

A point 𝜈 ∈ 𝒫(𝐴, 𝑏) is called a vertex of 𝒫(𝐴, 𝑏) if there exist no two other points 𝜈1 , 𝜈2 ∈
𝒫(𝐴, 𝑏) such that 𝜈 = 𝜇1 𝜈1 + 𝜇2 𝜈2 with 0 ≤ 𝜇1 ≤ 1, 0 ≤ 𝜇2 ≤ 1, and 𝜇1 + 𝜇2 = 1.
Alternatively, vertices are zero dimensional faces of 𝒫(𝐴, 𝑏). In an LP problem, a linear cost
function is minimized on a polyhedron, i.e., min{𝑐u� 𝑥 ∶ 𝑥 ∈ 𝒫(𝐴, 𝑏)}, 𝑐 ∈ ℝu� . Unless the LP is
infeasible or unbounded, the minimum is attained on one of the vertices.

The number of constraints of an LP problem may be very large, e.g. Section 6.5 contains LPD
formulations whose description complexity grows exponentially with the block length for
general codes. In such a case it would be desirable to only include the constraints which are
necessary to determine the optimal solution of the LP with respect to a given objective function.
This can be accomplished by iteratively solving the associated separation problem, defined as
follows.

6.5 Definition: Let 𝒫(𝐴, 𝑏) ⊂ ℝu� be a rational polyhedron and 𝜈∗ ∈ ℝu� a rational vector.
The separation problem is to either conclude that 𝜈∗ ∈ 𝒫(𝐴, 𝑏) or, if not, find a rational vector
(𝑟, 𝑡) ∈ ℝu� × ℝ such that 𝑟u� 𝜈 ≤ 𝑡 for all 𝜈 ∈ 𝒫(𝐴, 𝑏) and 𝑟u� 𝜈∗ > 𝑡. In the latter case, (𝑟, 𝑡) is
called a valid cut. C

We will see applications of this approach in Sections 6.6 and 6.7.

There is a famous result about the equivalence of optimization and separation by Grötschel et
al. [8].

6.6 Theorem: Let 𝒫 be a proper class of polyhedra (see e.g. [7] for a definition). The optimization
problem for 𝒫 is polynomial time solvable if and only if the separation problem is polynomial
time solvable. C

55
Chapter 6 Mathematical Programming Decoding of Binary Linear Codes

6.3 Complexity and Polyhedral Properties

In this section, after referencing important NP-hardness results for the decoding problem, we
state useful properties of the codeword polytope, exploiting a close relation between coding
and matroid theory.

Integer programming provides powerful means for modeling several real-world problems. MLD
for binary linear codes is modeled as an IP problem in [2, 9]. Let 𝑦 ∈ ℝu� be the channel output.
In MLD the probability (or, in case of a continuous-output channel, the probability density)
𝑃(𝑦 ∣ 𝑥) is maximized over all codewords 𝑥 ∈ 𝐶. Let 𝑥∗ denote the ML codeword. It is shown in
[1] that for a symmetric memoryless channel the calculation of 𝑥∗ amounts to the minimization
of a linear cost function, namely
u�
𝑥∗ = arg max 𝑃(𝑦 ∣ 𝑥) = arg min ∑ 𝜆u� 𝑥u� , (6.1)
u�∈u� u�∈u� u�=1

u�(u� ∣u� =0)


where the values 𝜆u� = log u�(u�u�∣u�u�=1) are the so-called log-likelihood ratios (LLR). Consequently
u� u�
the IP formulation of MLD is implicitly given as

min{𝜆u� 𝑥 ∶ 𝑥 ∈ 𝐶}. (6.2)

Berlekamp et al. have shown that MLD is NP-hard in [10] by a polynomial-time reduction
of the three-dimensional matching problem to the decision version of MLD. An alternative
proof is via matroid theory: as shall be exposed shortly, there is a one-to-one correspondence
between binary matroids and binary linear codes. In virtue of this analogy, MLD is equivalent
to the minimum-weight cycle problem on binary matroids. Since the latter contains the max-
cut problem, which is known to be NP-hard [11], as a special case, the NP-hardness of MLD
follows.

Another problem of interest in the framework of coding theory is the computation of the
minimum distance of a given code. Berlekamp et al. [10] conjectured that computing the
distance of a binary linear code is NP-hard as well, which was proved by Vardy [12] about two
decades later. The minimum distance problem can again be reformulated in a matroid theoretic
setting. In 1969 Welsh [13] formulated it as the problem of finding a minimum cardinality
circuit in linear matroids.

In the following, we assume 𝐶 ⊆ {0, 1}u� to be canonically embedded in ℝu� when referring
to conv(𝐶) (see Figure 6.2 for an example). Replacing 𝐶 by conv(𝐶) in (6.2) leads to a linear
programming problem over a polytope with integer vertices. In general, computing an explicit
representation of conv(𝐶) is intractable. Nevertheless, some properties of conv(𝐶) are known
from matroid theory due to the equivalence of binary linear codes and binary matroids. In
the following, some definitions and results from matroid theory are presented. An extensive
investigation of matroids can be found in [14] or [15]. The definition of a matroid in general is
rather technical.

56
6.3 Complexity and Polyhedral Properties

u�2 (0, 1, 1) u�2 (0, 1, 1)

(1, 1, 0) (1, 1, 0)
u�3 u�3
(1, 0, 1) (1, 0, 1)

u�1 u�1
(0, 0, 0) (0, 0, 0)

Figure 6.2: The codewords of the single parity-check code 𝐶 = {𝑥 ∈ 𝔽32 ∶ 𝑥1 + 𝑥2 + 𝑥3 ≡ 0


(mod 2)} and the polytope conv(𝐶) in ℝ3 .

6.7 Definition: A matroid ℳ is an ordered pair ℳ = (𝐽, 𝒰) where 𝐽 is a finite ground set
and 𝒰 is a collection of subsets of 𝐽, called the independent sets, such that (1)–(3) hold.
(1) ∅ ∈ 𝒰.
(2) If 𝑢 ∈ 𝒰 and 𝑣 ⊂ 𝑢, then 𝑣 ∈ 𝒰.
(3) If 𝑢1 , 𝑢2 ∈ 𝒰 and ∣𝑢1 ∣ < ∣𝑢2 ∣ then there exists 𝑗 ∈ 𝑢2 ⧵ 𝑢1 such that 𝑢1 ∪ {𝑗} ∈ 𝒰. C

In this work, the class of 𝔽2 -representable (i.e., binary) matroids is of interest. A binary 𝑚 × 𝑛
matrix 𝐻 defines an 𝔽2 -representable matroid ℳ [𝐻] as follows. The ground set 𝐽 = {1, … , 𝑛}
is defined to be the index set of the columns of 𝐻. A subset 𝑈 ⊆ 𝐽 is independent if and only if
the column vectors 𝐻.,u� , 𝑢 ∈ 𝑈 are linearly independent in the vector space defined over the
field 𝔽2 . A minimal dependent set, i.e., a set 𝒱 ∈ 2u� ⧵ 𝒰 such that all proper subsets of 𝒱 are
in 𝒰, is called a circuit of ℳ [𝐻]. If a subset of 𝐽 is a disjoint union of circuits then it is called
a cycle.
The incidence vector 𝑥𝒞 ∈ ℝu� corresponding to a cycle 𝒞 ⊆ 𝐽 is defined by


{1 if 𝑗 ∈ 𝐶,
𝑥𝒞
u� = ⎨
{
⎩0 if 𝑗 ∉ 𝐶.

The cycle polytope is the convex hull of the incidence vectors corresponding to all cycles of a
binary matroid.
Some more relationships between coding theory and matroid theory (see also [16]) can be listed:
a binary linear code corresponds to a binary matroid, the support of a codeword corresponds to
a cycle (therefore, each codeword corresponds to the incidence vector of a cycle), the support of
a minimal codeword corresponds to a circuit, and the codeword polytope conv(𝐶) corresponds
to the cycle polytope. Let 𝐻 be a binary matrix, ℳ [𝐻] be the binary matroid defined by 𝐻
(𝐻 is a representation matrix of ℳ [𝐻]) and 𝐶 be the binary linear code defined by 𝐻 (𝐻 is a
parity-check matrix of 𝐶). It can easily be shown that the dual 𝐶⊥ of 𝐶 is the same object as
the dual of the binary matroid ℳ [𝐻]. We denote the dual matroid by ℳ [𝐺], where 𝐺 is the
generator matrix of 𝐶. Usually the matroid related terms are dualized by the prefix “co”. For

57
Chapter 6 Mathematical Programming Decoding of Binary Linear Codes

example, the circuits and cycles of a dual matroid are called cocircuits and cocycles, respectively.
The supports of minimal codewords and the supports of codewords in 𝐶⊥ are associated with
cocircuits and cocycles of ℳ [𝐻], respectively.
A minor of a parent matroid ℳ = (𝐽, 𝒰) is the sub-matroid obtained from ℳ after any
combination of contraction and restriction operations (see e.g. [14]). In the context of coding
theory, contraction corresponds to puncturing, i.e., the deletion of one or more columns from
the generator matrix of a parent code, and restriction corresponds to shortening, i.e., the
deletion of one or more columns from the parity-check matrix of a parent code.
Next, some results from Barahona and Grötschel [17] which are related to the structure of the
cycle polytope are rewritten in terms of coding theory. Kashyap provides a similar transfer in
[18]. Several results are collected in Theorem 6.8.

6.8 Theorem: Let 𝐶 be a binary linear code.


(a) If 𝑑(𝐶⊥ ) ≥ 3 then the codeword polytope is full-dimensional.
(b) The box inequalities
0 ≤ 𝑥u� ≤ 1 for all 𝑗 ∈ 𝐽 (6.3)
and the cocircuit inequalities

∑ 𝑥u� − ∑ 𝑥u� ≤ |ℱ| − 1 for all ℱ ⊆ supp(𝑞) with |ℱ| odd, (6.4)
u�∈ℱ u�∈supp(u�)⧵ℱ

where supp(𝑞) is the support of a dual minimal codeword 𝑞, are valid for the codeword
polytope.
(c) The box inequalities 𝑥u� ≥ 0, 𝑥u� ≤ 1 define facets of the codeword polytope if 𝑑(𝐶⊥ ) ≥ 3 and
𝑗 ∈ 𝐽 is not contained in the support of a codeword in 𝐶⊥ with weight three.
(d) If 𝑑(𝐶⊥ ) ≥ 3 and 𝐶 does not contain 𝐻⊥7 ((7,3,4) simplex code) as a minor, and if there exists
a dual minimal codeword 𝑞 of weight 3, then the cocircuit inequalities derived from supp(𝑞)
are facets of conv(𝐶). C

Part (b) of Theorem 6.8 implies that the set of cocircuit inequalities derived from the supports
of all dual minimal codewords provide a relaxation of the codeword polytope. In the polyhedral
analysis of the codeword polytope the symmetry property stated below plays an important
role.

6.9 Theorem: [17] If 𝑎u� 𝑥 ≤ 𝛼 defines a face of conv(𝐶) of dimension 𝑑, and 𝑦 is a codeword,
then the inequality 𝑎u�̄ 𝑥 ≤ 𝛼̄ also defines a face of conv(𝐶) of dimension 𝑑, where

𝑎u� if 𝑗 ∉ supp(𝑦),
𝑎u�̄ = {
−𝑎u� if 𝑗 ∈ supp(𝑦),

and 𝛼̄ = 𝛼 − 𝑎u� 𝑦. C

58
6.3 Complexity and Polyhedral Properties

Using this theorem, a complete description of conv(𝐶) can be derived from all facets containing
a single codeword [17].
Let 𝑞 be a dual minimal codeword. To identify if the cocircuit inequalities derived from supp(𝑞)
are facet-defining it should be checked if supp(𝑞) has a chord. For the formal definition
of chord, the symmetric difference △ which operates on two finite sets is used, defined by
𝐴△𝐵 = (𝐴 ⧵ 𝐵) ∪ (𝐵 ⧵ 𝐴). Note that if 𝐴 = supp(𝑞1 ), 𝐵 = supp(𝑞2 ) and supp(𝑞0 ) = 𝐴△𝐵, then
𝑞0 ≡ 𝑞1 + 𝑞2 (mod 2).

6.10 Definition: Let 𝑞0 , 𝑞1 , 𝑞2 ∈ 𝐶⊥ be dual minimal codewords. If

supp(𝑞0 ) = supp(𝑞1 )△ supp(𝑞2 ) and supp(𝑞1 ) ∩ supp(𝑞2 ) = {𝑗},

then 𝑗 is called a chord of supp(𝑞0 ). C

6.11 Theorem: [17] Let 𝐶 be a binary linear code without the (7, 3, 4) simplex code as a minor
and let supp(𝑞) be the support of a dual minimal codeword with Hamming weight at least 3 and
without chord. Then for all ℱ ⊆ supp(𝑞) with |ℱ| odd, the inequality

∑ 𝑥u� − ∑ 𝑥u� ≤ |ℱ| − 1


u�∈ℱ u�∈supp(u�)⧵ℱ

defines a facet of conv(𝐶). C

Optimizing a linear cost function over the cycle polytope, known as the cycle problem in terms
of matroid theory, is investigated by Grötschel and Truemper [19]. The work of Feldman et
al. [2] enables to use the matroid theoretic results in the coding theory context. As shown
above, solving the MLD problem for a binary linear code is equivalent to solving the cycle
problem on a binary matroid. In [19], binary matroids for which the cycle problem can be
solved in polynomial time are classified, based on Seymour’s matroid decomposition theory
[20]. Kashyap [16] shows that results from [19] are directly applicable to binary linear codes.
The MLD problem as well as the minimum distance problem can be solved in polynomial time
for the code families for which the cycle problem on the associated binary matroid can be
solved in polynomial time. This code family is called polynomially almost-graphic codes [16].
An interesting subclass of polynomially almost-graphic codes are geometrically perfect codes.
Kashyap translates the sum of circuits property (see [19]) to the realm of binary linear codes. If
the binary matroid associated with code 𝐶 has the sum of circuits property then conv(𝐶) can
be described completely and non-redundantly by the box inequalities (6.3) and the cocircuit
inequalities (6.4). These codes are referred to as geometrically perfect codes in [16]. The
associated binary matroids of geometrically perfect codes can be decomposed in polynomial
time into its minors which are either graphic (see [14]) or contained in a finite list of matroids.
From a coding theoretic point of view, a family of error-correcting codes is asymptotically bad
if either dimension or minimum distance grows only sublinearly with the code length. Kashyap
proves that the family of geometrically perfect codes unfortunately fulfills this property. We
refer to [16] for the generalizations of this result.

59
Chapter 6 Mathematical Programming Decoding of Binary Linear Codes

6.4 Basics of LPD

LPD was first introduced in [2]. This decoding method is, in principle, applicable to any binary
linear code over any binary-input memoryless channel.3 In this section, we review the basics
of the LPD approach based on [1].
Although several structural properties of conv(𝐶) are known, it is in general infeasible to
compute a concise description of conv(𝐶) by means of linear inequalities. In LPD, the linear
cost function of the IP formulation is minimized on a relaxed polytope 𝒫 where conv(𝐶) ⊆
𝒫 ⊆ ℝu� . Such a relaxed polytope 𝒫 should have the following desirable properties:
• 𝒫 should be easy to describe, and
• integral vertices of 𝒫 should correspond to codewords.
Together with the linear representation (6.1) of the likelihood function, this leads to one of
the major benefits of LPD, the so-called ML certificate property: If the LP decoder outputs
an integral optimal solution, it is guaranteed to be the ML codeword. This is a remarkable
difference to IMPD: If no general optimality condition applies (see e.g. [23, Sec. 10.3]), there is
no method to provably decide the optimality of a solution obtained by IMPD.
Each row (check node) 𝑖 ∈ 𝐼 of a parity-check matrix 𝐻 defines the local code

{ u� ⎫
}
𝐶u� = ⎨𝑥 ∈ {0, 1}u� ∶ ∑ 𝐻u�u� 𝑥u� ≡ 0 (mod 2)⎬
{
⎩ u�=1 }

that consists of the bit sequences which satisfy the 𝑖th parity-check constraint; these are called
local codewords. A particularly interesting relaxation of conv(𝐶) is

𝒫 = conv(𝐶1 ) ∩ ⋯ ∩ conv(𝐶u� ) ⊆ [0, 1]u� ,

known as the fundamental polytope [24]. The vertices of the fundamental polytope, the
so-called pseudocodewords, are a superset of 𝐶, where the difference consists only of non-
integral vertices. Consequently, optimizing over 𝒫 implies the ML certificate property. These
observations are formally stated in the following result (note that 𝐶 = 𝐶1 ∩ ⋯ ∩ 𝐶u� ).
6.12 Lemma ([24]): Let 𝒫 = conv(𝐶1 )∩⋯∩conv(𝐶u� ). If 𝐶 = 𝐶1 ∩⋯∩𝐶u� then conv(𝐶) ⊆ 𝒫
and 𝐶 = 𝒫 ∩ {0, 1}u� . C

The description complexity of the convex hull of any local code conv(𝐶u� ) and thus 𝒫 is usually
much smaller than the description complexity of the codeword polytope conv(𝐶).
LPD can be written as optimizing the linear objective function on the fundamental polytope 𝒫,
i.e.,

min{𝜆u� 𝑥 ∶ 𝑥 ∈ 𝒫}. (6.5)


3
In fact, Flanagan et al. [21] have recently generalized a substantial portion of the LPD theory to the nonbinary
case. Similarly, work has been done to include channels with memory; see e.g. [22].

60
6.4 Basics of LPD

Based on (6.5), the LPD algorithm which we refer to as bare linear programming decoding
(BLPD) is derived.

Bare LP decoding (BLPD)


Input: 𝜆 ∈ ℝu� , 𝒫 ⊆ [0, 1]u� .
Output: ML codeword or error.
1: Solve the LP given in (6.5).
2: if LP solution 𝑥∗ is integral then
3: Output 𝑥∗
4: else
5: Output error
6: end if

Because of the ML certificate property, if BLPD outputs a codeword, then it is the ML codeword.
BLPD succeeds if the transmitted codeword is the unique optimum of the LP given in (6.5).
BLPD fails if the optimal solution is non-integral or the ML codeword is not the same as the
transmitted codeword. Note that the difference between the performance of BLPD and MLD
is caused by the decoding failures for which BLPD finds a non-integral optimal solution. It
should be emphasized that in case of multiple optima it is assumed that BLPD fails.

In some special cases, the fundamental polytope 𝒫 is equivalent to conv(𝐶), e.g., if the under-
lying Tanner graph is a tree or forest [24]. In these cases MLD can be achieved by BLPD. Note
that in those cases also MSAD achieves MLD performance [5].

Observe that the minimum distance of a code can be understood as the minimum ℓ1 distance
between any two different codewords of 𝐶. Likewise the fractional distance of the fundamental
polytope 𝒫 can be defined as follows.

6.13 Definition: [2] Let 𝑉(𝒫) be the set of vertices (pseudocodewords) of 𝒫. The fractional
distance 𝑑frac (𝒫) is the minimum ℓ1 distance between a codeword and any other vertex of
𝑉(𝒫), i.e.

{ u� ⎫
}
𝑑frac (𝒫) = min ⎨∑ ∣𝑥u� − 𝑣u� ∣ ∶ 𝑥 ∈ 𝐶, 𝑣 ∈ 𝑉(𝒫), 𝑥 ≠ 𝑣⎬ .
{
⎩ u�=1 }
⎭ C

It follows that the fractional distance is a lower bound for the minimum distance of a code:
𝑑(𝐶) ≥ 𝑑frac (𝒫). Moreover, both definitions are related as follows. Recall that on the binary
symmetric channel (BSC), MLD corrects at least ⌈𝑑(𝐶)/2⌉ − 1 bit flips. As shown in [1], LPD
succeeds if at most ⌈𝑑frac (𝒫)/2⌉ − 1 errors occur on the BSC.

Analogously to the minimum distance, the fractional distance is equivalent to the minimum ℓ1
weight of a non-zero vertex of 𝒫. This property is used by the fractional distance algorithm
(FDA) to compute the fractional distance of a binary linear code [1]. If ℳ is the set of inequalities
describing 𝒫, let ℳu� be the subset of those inequalities which are not active at the all-zero
codeword. Note that these are exactly the inequalities with a non-zero right hand side. In FDA

61
Chapter 6 Mathematical Programming Decoding of Binary Linear Codes

the weight function ∑u�∈u� 𝑥u� is subsequently minimized on 𝒫 ∩ 𝑓 for all 𝑓 ∈ ℳu� in order to find
the minimum-weight non-zero vertex of 𝒫.

Fractional distance algorithm (FDA)


Input: 𝒫 ⊆ [0, 1]u� .
Output: Minimum-weight non-zero vertex of 𝒫.
1: for all 𝑓 ∈ ℳu� do
2: Set 𝒫′ = 𝒫 ∩ 𝑓.
3: Solve min {∑u�∈u� 𝑥u� ∶ 𝑥 ∈ 𝒫′ }.
4: end for
5: Choose the minimum value obtained over all 𝒫′ .

A more significant distance measure than 𝑑frac is the so-called pseudo-distance which quantifies
the probability that the optimal solution under LPD changes from one vertex of 𝒫 to another
[24, 25]. Likewise, the minimum pseudo-weight is defined as the minimum pseudo-distance
from 0 to any other vertex of 𝒫 and therefor identifies the vertex (pseudocodeword) which
is most likely to cause a decoding failure. Note that the pseudo-distance takes the channel’s
probability measure into account and thus depends on the chosen channel model.

Albeit no efficient algorithms are known to compute the exact minimum pseudo-weight of the
fundamental polytope of a code, promising heuristics as well as analytical bounds have been
proposed [24–26].

6.5 LPD Formulations for Various Code Classes

This section reviews various formulations of the polytope 𝒫 from (6.5), leading to optimized
versions of the general BLPD algorithm for different classes of codes.

In Step 1 of BLPD the LP problem is solved by a general purpose LP solver. These solvers usually
employ the simplex method since it performs well in practice. The simplex method iteratively
examines vertices of the underlying polytope until the vertex corresponding to the optimal
solution is reached. If there exists a neighboring vertex for which the objective function can be
improved in the current step, the simplex method moves to this vertex. Otherwise it stops. The
procedure of moving from one vertex to an other is called a simplex iteration. Details on the
simplex algorithm can be found in classical books about linear programming (see e.g. [27]).

The efficiency of the simplex method depends on the complexity of the constraint set describing
the underlying polytope. Several such explicit descriptions of the fundamental polytope 𝒫
have been proposed in the LPD literature. Some can be used for any binary linear code whereas
others are specialized for a specific code class. Using alternative descriptions of 𝒫, alternative
LP decoders are obtained. In the following, we are going to present different LP formulations.

62
6.5 LPD Formulations for Various Code Classes

6.5.1 LP Formulations for LDPC Codes

The solution algorithm referred to as BLPD in Section 6.4 was introduced by Feldman et al. [2].
In order to describe 𝒫 explicitly, three alternative constraint sets are suggested by the authors
by the formulations BLPD1, BLPD2, and BLPD3. In the following, some abbreviations are used
to denote both the formulation and the associated solution (decoding) algorithm, e.g., solving
an LP, subgradient optimization, neighborhood search. The meaning will be clear from the
context.

The first LP formulation, BLPD1, of [2] is applicable to LDPC codes.

min 𝜆u� 𝑥 (BLPD1) (6.6a)


s.t. ∑ 𝑤u�,u� = 1 𝑖 = 1, … , 𝑚 (6.6b)
u�∈u�u�

𝑥u� = ∑ 𝑤u�,u� ∀𝑗 ∈ 𝑁u� , 𝑖 = 1, … , 𝑚 (6.6c)


u�∈u�u�
with u�∈u�
0 ≤ 𝑥u� ≤ 1 𝑗 = 1, … , 𝑛 (6.6d)
0 ≤ 𝑤u�,u� ≤ 1 ∀𝑆 ∈ 𝐸u� , 𝑖 = 1, … , 𝑚 (6.6e)

Here, 𝐸u� = {𝑆 ⊆ 𝑁u� ∶ |𝑆| even} is the set of valid bit configurations within 𝑁u� . The auxiliary
variables 𝑤u�,u� used in this formulation indicate which bit configuration 𝑆 ∈ 𝐸u� is taken at parity
check 𝑖. In case of an integral solution, (6.6b) ensures that exactly one such configuration is
attained at every checknode, while (6.6c) connects the actual code bits, modeled by the variables
𝑥u� , to the auxiliary variables: 𝑥u� = 1 if and only if the set 𝑆 ∈ 𝐸u� contains 𝑗 for every check node
𝑖. Note that here we consider the LP relaxation, so it is not guaranteed that a solution of the
above program is indeed integral.

A second linear programming formulation for LDPC codes, BLPD2, is obtained by employing
the so-called forbidden set (FS) inequalities [28]. The FS inequalities are motivated by the
observation that one can explicitly forbid those value assignments to variables where |𝑆| is odd.
For all local codewords in 𝐶u� it holds that

∑ 𝑥u� − ∑ 𝑥u� ≤ |𝑆| − 1 for all 𝑆 ∈ Σu�


u�∈u� u�∈u�u� ⧵u�

where Σu� = {𝑆 ⊆ 𝑁u� ∶ |𝑆| odd}. Feldman et al. show in [2] that for each single parity-check
code 𝐶u� , the FS inequalities together with the box inequalities 0 ≤ 𝑥u� ≤ 1, 𝑗 ∈ 𝐽 completely
and non-redundantly describe conv(𝐶u� ) (the case ∣𝑁u� ∣ = 3 as depicted in Figure 6.2 is the only
exception where the box inequalities are not needed). In a more general setting, Grötschel
proved this result for the cardinality homogeneous set systems [29].

If the rows of 𝐻 are considered as dual codewords, the set of FS inequalities is a reinvention of

63
Chapter 6 Mathematical Programming Decoding of Binary Linear Codes

cocircuit inequalities explained in Section 6.3. BLPD2 is given below.

min 𝜆u� 𝑥 (BLPD2)


s.t. ∑ 𝑥u� − ∑ 𝑥u� ≤ |𝑆| − 1 ∀𝑆 ∈ Σu� , 𝑖 = 1, … , 𝑚
u�∈u� u�∈u�u� ⧵u�
0 ≤ 𝑥u� ≤ 1 𝑗 = 1, … , 𝑛

Feldman et al. [2] apply BLPD using formulations BLPD1 or BLPD2 to LDPC codes. Under the
BSC, the error-correcting performance of BLPD is compared with the MSAD on an random
rate-1/2 LDPC code with 𝑛 = 200, 𝑑u� = 3, 𝑑u� = 6; with the MSAD, SPAD on the random rate-1/4
LDPC code with 𝑛 = 200, 𝑑u� = 3, 𝑑u� = 4; with the MSAD, SPAD, MLD on the random rate-1/4
LDPC code with 𝑛 = 60, 𝑑u� = 3, 𝑑u� = 4. On these codes, BLPD performs better than MSAD but
worse than SPAD. Using BLPD2, the FDA is applied to random rate-1/4 LDPC codes with
𝑛 = 100, 200, 300, 400, 𝑑u� = 3 and 𝑑u� = 4 from an ensemble of Gallager [30]. For (𝑛 − 1, 𝑛)
Reed-Muller codes [31] with 4 ≤ 𝑛 ≤ 512 they compare the classical distance with the fractional
distance. The numerical results suggest that the gap between both distances grows with
increasing block length.

Another formulation for LDPC codes is given in Section 6.6.2 in the context of efficient imple-
mentations.

In a remarkable work, Feldman and Stein [32] have shown that the Shannon capacity of a
channel can be achieved with LP decoding, which implies a polynomial-time decoder and the
availability of an ML certificate. To this end, they use a slightly modified version of BLPD1
restricted to expander codes, which are a subclass of LDPC codes. See [32] for a formal definition
of expander codes as well as the details of the corresponding decoder.

6.5.2 LP Formulations for Codes with High-Density Parity-Check Matrices

The number of variables and constraints in BLPD1 as well as the number of constraints in
BLPD2 increase exponentially in the check node degree. Thus, for codes with high-density
parity-check matrices, BLPD1 and BLPD2 are computationally inefficient. A polynomial-sized
formulation, BLPD3, is based on the parity polytope of Yannakakis [33]. There are two types of
auxiliary variables in BLPD3. The variable 𝑝u�,u� is set to one if 𝑘 variable nodes are set to one in
the neighborhood of parity-check 𝑖, for 𝑘 in the index set

∣𝑁u� ∣
𝐾u� = {0, 2, … , 2 ⌊ ⌋} .
2

64
6.5 LPD Formulations for Various Code Classes

𝑣1 𝑣3 𝑣1 𝑣3

+ + 𝑣5 +
𝑣2 𝑣4 𝑣2 𝑣4

Figure 6.3: Check node decomposition for high-density parity-check codes according to [34].

Furthermore, the variable 𝑞u�,u�,u� is set to one if variable node 𝑗 is one of the 𝑘 variable nodes set
to one in the neighborhood of check node 𝑖.

min 𝜆u� 𝑥 (BLPD3)


s.t. 𝑥u� = ∑ 𝑞u�,u�,u� 𝑖 ∈ 𝑁u� , 𝑗 = 1, … , 𝑛
u�∈u�u�

∑ 𝑝u�,u� = 1 𝑖 = 1, … 𝑚
u�∈u�u�

∑ 𝑞u�,u�,u� = 𝑘𝑝u�,u� 𝑘 ∈ 𝐾u� , 𝑖 = 1, … 𝑚


u�∈u�u�
0 ≤ 𝑥u� ≤ 1 𝑗 = 1, … , 𝑛
0 ≤ 𝑝u�,u� ≤ 1 𝑘 ∈ 𝐾u� , 𝑖 = 1, … , 𝑚
0 ≤ 𝑞u�,u�,u� ≤ 𝑝u�,u� 𝑘 ∈ 𝐾u� , 𝑗 = 1, … , 𝑛, 𝑖 ∈ 𝑁u�

Feldman et al. [2] show that BLPD1, BLPD2, and BLPD3 are equivalent in the sense that the
𝑥-variables of the optimal solutions in all three formulations take the same values.

The number of variables and constraints in BLPD3 increases as 𝑂(𝑛3 ). By applying a decompo-
sition approach, Yang et al. [34] show that an alternative LP formulation which has size linear
in the length and check node degrees can be obtained (it should be noted that independently
from [34] a similar decomposition approach was also proposed in [35]). In the LP formulation
of [34] a high degree check node is decomposed into several low degree check nodes. Thus,
the resulting Tanner graph contains auxiliary check and variable nodes. Figure 6.3 illustrates
this decomposition technique: a check node with degree 4 is decomposed into 2 parity checks
each with degree at most 3. The parity-check nodes are illustrated by squares. In the example,
original variables are denoted by 𝑣1 , … , 𝑣4 while the auxiliary variable node is named 𝑣5 . In
general, this decomposition technique is iteratively applied until every check node has degree
less than 4. The authors show that the total number of variables in the formulation is less than
doubled by the decomposition. For the details of the decomposition [34] is referred.

For the ease of notation, suppose 𝐾 is the set of parity-check nodes after decomposition. If
𝑑u� (𝑘) = 3, 𝑘 ∈ 𝐾, then the parity-check constraint 𝑘 is of the form 𝑣u�1 + 𝑣u�2 + 𝑣u�3 ≡ 0 (mod 2).
Note that with our notation some of these variables 𝑣u�u� might represent the same variable node
𝑣u� , e.g. 𝑣5 from Figure 6.3 would appear in two constraints of the above form, as 𝑣1u� and 𝑣2u�′ ,
respectively. Yang et al. show that the parity-check constraint 𝑣u�1 + 𝑣u�2 + 𝑣u�3 ≡ 0 (mod 2) can

65
Chapter 6 Mathematical Programming Decoding of Binary Linear Codes

be replaced by the linear constraints

𝑣u�1 + 𝑣u�2 + 𝑣u�3 ≤ 2,


𝑣u�1 − 𝑣u�2 − 𝑣u�3 ≤ 0,
𝑣u�2 − 𝑣u�1 − 𝑣u�3 ≤ 0,
𝑣u�3 − 𝑣u�1 − 𝑣u�2 ≤ 0

(for a single check node of degree 3 the box inequalities are not needed). If 𝑑u� (𝑘) = 2 then
𝑣u�1 = 𝑣u�2 along with the box constraints models the parity-check. The constraint set of the
resulting LP formulation, which we call cascaded linear programming decoding (CLPD), is the
union of all constraints modeling the |𝐾| parity checks.

min 𝜆̄ u� 𝑣 (CLPD)
s.t. ∑ 𝑣u�u� − ∑ 𝑣u�u� ≤ |𝑆| − 1 ∀𝑆 ∈ Σu� , 𝑘 = 1, … , |𝐾|
u�∈u� u�∈u�u� ⧵u�
0 ≤ 𝑣u� ≤ 1 if 𝑑u� (𝑖) ≤ 2 ∀ 𝑖 ∶ 𝑗 ∈ 𝑁u�

In the objective function only the 𝑣 variables corresponding to the original 𝑥 variables have
non-zero coefficients. Thus, the objective function of CLPD is the same as of BLPD1. The
constraints in CLPD are the FS inequalities used in BLPD2 with the property that the degree of
the check node is less than 4.
Yang et al. prove that the formulations introduced in [2] and CLPD are equivalent. Again,
equivalence is used in the sense that in an optimal solution, the 𝑥-variables of BLPD1, BLPD2,
BLPD3, and the variables of the CLPD formulation which correspond to original 𝑥-variables
take the same values. Moreover, it is shown that CLPD can be used in FDA. As a result, the
computation of the fractional distance for codes with high-density parity-check matrices is also
facilitated. Note that using BLPD2, the FDA algorithm has polynomial running time only for
LDPC codes. If 𝒫 is described by the constraint set of CLPD, then in the first step of the FDA, it
is sufficient to choose the set ℱ from the facets formed by cutting planes of type 𝑣u�1 +𝑣u�2 +𝑣u�3 = 2
where 𝑣u�1 , 𝑣u�2 , and 𝑣u�3 are variables of the CLPD formulation. Additionally, an adaptive branch &
bound method is suggested in [36] to find better bounds for the minimum distance of a code.
On a random rate-1/4 LDPC code with 𝑛 = 60, 𝑑u� = 3 and 𝑑u� = 4, it is demonstrated that this
yields a better lower bound than the fractional distance does.

6.5.3 LP Formulations for Turbo-Like Codes

The various LP formulations outlined so far have in common that they are derived from a parity-
check matrix which defines a specific code. A different approach is to describe the encoder by
means of a finite state machine, which is the usual way to define so-called convolutional codes.
The bits of the information word are subsequently fed into the machine, each causing a state
change that emits a fixed number of output bits depending on both the current state and the
input. In a systematic code, the output always contains the input bit. The codeword, consisting

66
6.5 LPD Formulations for Various Code Classes

0 0
0 0 0 0 0 0 0

1 1 1 1 1 1 1

1
0

2 2 2 2 2 2 2
1

3 3 3 3 3 3 3

Figure 6.4: Excerpt from a trellis graph with four states and initial state 0. The style of an edge
indicates the respective information bit, while the labels refer to the single parity
bit.

of the concatenation of all outputs, can thus be partitioned into the systematic part which is a
copy of the input and the remaining bits, being referred to as the parity output.

A convolutional code is naturally represented by a trellis graph (Figure 6.4), which is obtained
by unfolding the state diagram in the time domain. Each vertex of the trellis represents the state
at a specific point in time, while edges correspond to valid transitions between two subsequent
states and are labelled by the corresponding input and output bits. Each path from the starting
node to the end node corresponds to a codeword.4 The cost of a codeword is derived from the
received LLR values and the edge labels on the path associated with this codeword. See [23] for
an in-depth survey of these concepts.

Convolutional codes are the building blocks of turbo codes, which revolutionized coding theory
because of their near Shannon limit error-correcting performance [37]. An (𝑛, 𝑘) turbo code
consists of two convolutional codes 𝐶u� and 𝐶u� , each of input length 𝑘, which are linked by a
so-called interleaver that requires the information bits of 𝐶u� to match those of 𝐶u� after being
scrambled by some permutation 𝜋 ∈ 𝕊u� which is fixed for a given code.5 It is this coupling
of rather weak individual codes and the increase of complexity arising therefrom that entails
the vast performance gain of turbo codes. A typical turbo code (and only this case is covered
here; it is straightforward to generalize) consists of two identical systematic encoders of rate
1
2
each. Only one of the encoders 𝐶u� and 𝐶u� , however, contributes its systematic part to the
resulting codeword, yielding an overall rate of 2/3, i.e. 𝑛 = 3𝑘 (since their systematic parts
differ only by a permutation, including both would imply an embedded repetition code). We
thus partition a codeword 𝑥 into the systematic part 𝑥u� and the parity outputs 𝑥u� and 𝑥u� of 𝐶u�
and 𝐶u� , respectively.
4
We intentionally do not discuss trellis termination here and assume that the encoder always ends in a fixed
terminal state; cf. [23] for details.
5
Using exactly two constituent convolutional encoders eases notation and is the most common case, albeit not
being essential for the concept—in fact, recent development suggest that the error-correcting performance
benefits from adding a third encoder [38].

67
Chapter 6 Mathematical Programming Decoding of Binary Linear Codes

u�u�1 u�u�2 u�u�3 u�u�u�


u�u�0 u�u�1 u�u�2 u�u�u�−1 u�u�u�
u�u�1 u�u�2 u�u�3 u�u�u�
u�u�1 u�u�2 u�u�3 u�u�u�

interleaver 𝜋
u�u�u�(1) u�u�u�(2) u�u�u�(3) u�u�u�(u�)

u�u�1 u�u�2 u�u�3 u�u�u�


u�u�0 u�u�1 u�u�2 u�u�u�−1 u� u�u�u�
u�u�1 u�u�2 u�u�3 u�u�

Figure 6.5: The Forney-style factor graph of a turbo code. The interleaver 𝜋 links the systematic
bits 𝑥u� of both encoders 𝐶u� (upper part) and 𝐶u� (lower part).

A turbo code can be compactly represented by a so-called Forney-style factor graph (FFG) as
shown in Figure 6.5. As opposed to Tanner graphs, in an FFG all nodes are functional nodes,
whereas the (half-)edges correspond to variables. In our case, there are variables of two types,
namely state variables 𝑠u� u� (𝜈 ∈ {𝑎, 𝑏}), reflecting the state of 𝐶u� at time step 𝑗, and a variable
for each bit of the codeword 𝑥. Each node 𝑇u� u� represents the indicator function for a valid
state transition in 𝐶u� at time 𝑗 and is thus incident to one systematic and one parity variable
as well as the “before” and “after” state 𝑠u� u� u�
u�−1 and 𝑠u� , respectively. Note that such a node 𝑇u�
corresponds to a vertical “slice” (often called a segment) of the trellis graph of 𝐶u� , and each
valid configuration of 𝑇u�u� is represented by exactly one edge in the respective segment.

Turbo codes are typically decoded by IMPD techniques operating on the factor graph. Feldman
[1] in contrast introduced an LP formulation, turbo code linear programming decoding (TCLPD),
for this purpose. This serves as an example that mathematical programming is a promising
approach in decoding even beyond formulations based on parity-check matrices.

In TCLPD, the trellis graph of each constituent encoder 𝐶u� is modeled by flow conservation
and capacity constraints [39], along with side constraints appropriately connecting the flow
variables 𝑓u� to auxiliary variables 𝑥u� and 𝑥u� , respectively, which embody the codeword bits.

For 𝜈 ∈ {𝑎, 𝑏}, let 𝐺u� = (𝑆u� , 𝐸u� ) be the trellis corresponding to 𝐶u� , where 𝑆u� is the index set
of nodes (states) and 𝐸u� is the set of edges (state transitions) 𝑒 in 𝐺u� . Let 𝑠start,u� and 𝑠end,u�
denote the unique start and end node, respectively, of 𝐺u� . We can now define a feasible flow 𝑓u�
in the trellis 𝐺u� by the system

∑ 𝑓u�
u� = 1, ∑ 𝑓u�
u� = 1, (6.7a)
u�∈out(u�start,u� ) u�∈in(u�end,u� )
∑ 𝑓u� u�
u� = ∑ 𝑓u� ∀ 𝑠 ∈ 𝑆u� ⧵ {𝑠start,u� , 𝑠end,u� }, (6.7b)
u�∈out(u�) u�∈in(u�)
𝑓u�
u� ≥ 0 ∀ 𝑒 ∈ 𝐸u� . (6.7c)

Let 𝐼u� u�
u� and 𝑂u� denote the set of edges in 𝐺u� whose corresponding input and output bit,
respectively, is a 1 (both being subsets of the 𝑗-th segment of 𝐺u� ), the following constraints

68
6.6 Efficient LP Solvers for BLPD

relate the codeword bits to the flow variables:

𝑥u� u�
u� = ∑ 𝑓u� for 𝑗 = 1, … , 𝑘 and 𝜈 ∈ {𝑎, 𝑏}, (6.8a)
u�∈u�u�
u�

𝑥u�u� = ∑ 𝑓u�u� for 𝑗 = 1, … , 𝑘, (6.8b)


u�∈u�u�u�

𝑥u�u�(u�) = ∑ 𝑓u�u� for 𝑗 = 1, … , 𝑘. (6.8c)


u�∈u�u�u�

We can now state TCLPD as

min ∑ (𝜆u� )u� 𝑥u� + (𝜆u� )u� 𝑥u� (TCLPD)


u�∈{u�,u�}
s.t. (6.7a)–(6.8c) hold,

where 𝜆 is split in the same way as 𝑥.


The formulation straightforwardly generalizes to all sorts of “turbo-like” codes, i.e., codes built
by convolutional codes plus interleaver conditions. In particular, Feldman and Karger have
applied TCLPD to repeat-accumulate (RA(𝑙)) codes [40]. The encoder of an RA(𝑙) repeats the
information bits 𝑙 times, and then sends them to an interleaver followed by an accumulator,
which is a two-state convolutional encoder. The authors derive bounds on the error rate of
TCLPD for RA codes which were later improved and extended by Halabi and Even [41] as well
as by Goldenberg and Burshtein [42].
Note that all 𝑥 variables in TCLPD are auxiliary: we could replace each occurence by the sum
of flow variables defining it. In doing so, (6.8b) and (6.8c) break down to the condition

∑ 𝑓u�u� = ∑ 𝑓u�u� for 𝑗 = 1, … , 𝑘. (6.9)


u�∈u�u�u�(u�) u�∈u�u�u�

Because the rest of the constraints defines a standard network flow, TCLPD models a minimum
cost flow problem plus the 𝑘 additional side constraints (6.9). Using a general purpose LP
solver does not exploit this combinatorial substructure. As was suggested already in [1], in
[43] Lagrangian relaxation is applied to (6.9) in order to recover the underlying shortest-path
problem. Additionally, the authors of [43] use a heuristic based on computing the 𝐾 shortest
paths in a trellis to improve the decoding performance. Via the parameter 𝐾 the trade-off
between algorithmic complexity and error-correcting performance can be controlled.

6.6 Efficient LP Solvers for BLPD

A successful realization of BLPD requires an efficient LP solver. To this end, several ideas have
been suggested in the literature. CLPD (cf. Section 6.5) can be considered an efficient LPD
approach since the number of variables and constraints are significantly reduced. We review
several others in this section.

69
Chapter 6 Mathematical Programming Decoding of Binary Linear Codes

6.6.1 Solving the Separation Problem

The approach of Taghavi and Siegel [44] tackles the large number of constraints in BLPD2.
In their separation approach called adaptive linear programming decoding (ALPD), not all FS
inequalities are included in the LP formulation as in BLPD2. Instead, they are iteratively added
when needed. As in Definition 6.5, the general idea is to start with a crude LP formulation
and then improve it. Note that this idea can also be used to improve the error-correcting
performance (see Section 6.7). In the initialization step, the trivial LP min{𝜆u� 𝑥 ∶ 𝑥 ∈ [0, 1]u� }
is solved. Let (𝑥∗ )u� be the optimal solution in iteration 𝑘. Taghavi and Siegel show that it
can be checked in 𝑂(𝑚𝑑max u� + 𝑛 log 𝑛) time if (𝑥∗ )u� violates any FS inequality derived from
𝐻u�,. 𝑥 = 0 (mod 2) for all 𝑖 ∈ 𝐼 (recall that 𝑚 × 𝑛 is the dimension of 𝐻 and 𝑑max
u� is the maximum
maximum check-node degree). This check can be considered as a special case of the greedy
separation algorithm (GSA) introduced in [29]. If some of the FS inequalities are violated then
these inequalities are added to the formulation and the modified LP is solved again with the
new inequalities. ALPD stops if the current optimal solution (𝑥∗ )u� satisfies all FS inequalities.
If (𝑥∗ )u� is integral then it is the ML codeword, otherwise an error is output. ALPD does not
yield an improvement in terms of frame error rate since the same solutions are found as in the
formulations in the previous section. However, the computational complexity is reduced.

An important algorithmic result of [44] is that ALPD converges to the same optimal solution as
BLPD2 with significantly fewer constraints. It is shown empirically that in the last iteration of
ALPD, less constraints than in the formulations BLPD2, BLPD3, and CLPD are used. Taghavi
and Siegel [44] prove that their algorithm converges to the optimal solution on the fundamental
polytope after at most 𝑛 iterations with at most 𝑛(𝑚 + 2) constraints.

Under the binary-input additive white Gaussian noise channel (BIAWGNC), [44] uses various
random (𝑑u� , 𝑑u� )-regular codes to test the effect of changing the check node degree, the block
length, and the code rate on the number of FS inequalities generated and the convergence of
their algorithm. Setting 𝑛 = 360 and rate 𝑅 = 1/2, the authors vary the check node degree in the
range of 4 to 40 in their computational testing. It is observed that the average and the maximum
number of FS inequalities remain below 270. The effect of changing block length 𝑛 between
30 and 1920 under 𝑅 = 1/2 is demonstrated on a (3, 6)-regular LDPC code. For these codes,
it is demonstrated that the number of FS inequalities used in the final iteration is generally
between 0.6𝑛 and 0.7𝑛. Moreover, it is reported that the number of iterations remain below
16. The authors also investigate the effect of the rate on the number of FS inequalities created.
Simulations are performed on codes with 𝑛 = 120 and 𝑑u� = 3 where the number of parity checks
𝑚 vary between 15 and 90. For most values of 𝑚 it is observed that the average number of FS
inequalities ranges between 1.1𝑚 and 1.2𝑚. For ALPD, BLPD2, and SPAD (50 iterations), the
average decoding time is testet for (3, 6)-regular and (4, 8)-regular LDPC codes with various
block lengths. It is shown that ALPD outperforms BLPD with respect to computation time, whil
still being slower than SPAD. Furthermore, increasing the check node degree does not increase
the computation time of ALPD as much as the computation time of BLPD. The behavior of
ALPD, in terms of the number of iterations and the FS inequalities used, under increasing SNR
is tested on a (3, 6)-regular LDPC code with 𝑛 = 240. It is concluded that ALPD performs more

70
6.6 Efficient LP Solvers for BLPD

iterations and uses more FS inequalities for the instances it fails. Thus, decoding time decreases
with increasing SNR.

In [45] ALPD is improved further in terms of complexity. The authors use some structural
properties of the fundamental polytope. Let (𝑥∗ )u� be an optimal solution in iteration 𝑘. In [44]
it is shown that, if (𝑥∗ )u� does not satisfy an FS inequality derived from check node 𝑖, then (𝑥∗ )u�
satisfies all other FS inequalities derived from 𝑖 with strict inequality. Based on this result,
Taghavi et al. [45] modify ALPD and propose the decoding approach we refer to as modified
adaptive linear programming decoding (MALPD). In the (𝑘 + 1)st iteration of MALPD, it is
checked in 𝑂(𝑚𝑑max u� ) time if (𝑥∗ )u� violates any FS inequality derived from 𝐻u�,. 𝑥 = 0 (mod 2)
for some 𝑖 ∈ 𝐼. This check is performed only for those parity checks 𝑖 ∈ 𝐼 which do not induce
any active FS inequality at (𝑥∗ )u� . Moreover, it is proved that inactive FS inequalities at iteration
𝑘 can be dropped. In any iteration of MALPD, there are at most 𝑚 FS inequalities. However,
the dropped inequalities might be inserted again in a later iteration; therefore the number of
iterations for MALPD can be higher than for ALPD.

6.6.2 Message Passing-Like Algorithms

An approach towards low complexity LPD of LDPC codes was proposed by Vontobel and Kötter
in [46]. Based on an FFG representation of an LDPC code, they derive an LP, called primal
linear programming decoding (PLPD), which is based on BLPD1. The FFG, shown in Figure 6.6,
and the Tanner graph are related as follows.

𝑣u�,u� 𝐶u� 𝑣u�,u�′

= =
𝑢u�,u�′ 𝑢u�,u� = = =

= 𝐴u�
𝑥u� 𝑢u�,0 𝑥u�′

Figure 6.6: A Forney-style factor graph for PLPD.

For each parity check, the FFG exhibits a node 𝐶u� which is incident to a variable-edge 𝑣u�,u� for
each 𝑗 ∈ 𝑁u� and demands those adjacent variables to form a configuration that is valid for the
local code 𝐶u� , i.e., their sum must be even. This corresponds to a check node in the Tanner
graph and thus to (6.6b) and (6.6c) except that now there are, for the moment, independent
local variables 𝑣u�,u� for each 𝐶u� . Additionally, the FFG generalizes the concept of row-wise local
codes 𝐶u� to the columns of 𝐻, in such a way that the 𝑗th column is considered a local repetition
code 𝐴u� that requires the auxiliary variables 𝑢u�,u� for each 𝑖 ∈ 𝑁u� ∪ {0} to be either all 1 or all
0. By this, the variable nodes of the Tanner graph are replaced by check nodes 𝐴u� —recall that
in an FFG all nodes have to be check nodes. There is a third type of factor nodes, labelled by
“=”, which simply require all incident variables to take on the same value. These are used to

71
Chapter 6 Mathematical Programming Decoding of Binary Linear Codes

establish consistency between the row-wise variables 𝑣u�,u� and the column-wise variables 𝑢u�,u� as
well as connecting the codeword variables 𝑥u� to the configurations of the 𝐴u� .
From this discussion it is easily seen that the FFG indeed ensures that any configuration of the
𝑥u� is a valid codeword. The outcome of writing down the constraints for each node and relaxing
integrality conditions on all variables is the LP

min 𝜆u� 𝑥 (PLPD)


s.t. 𝑥u� = 𝑢u�,0 𝑗 = 1, … , 𝑛,
𝑢u�,u� = 𝑣u�,u� ∀(𝑖, 𝑗) ∈ 𝐼 × 𝐽 ∶ 𝐻u�,u� = 1,
𝑢u�,u� = ∑ 𝛼u�,u� ∀𝑖 ∈ 𝑁u� , 𝑗 = 1, … , 𝑛,
u�∈u�u� ,u�∋u�

∑ 𝛼u�,u� = 1 for all 𝑗 = 1, … , 𝑛,


u�∈u�u�

𝑣u�,u� = ∑ 𝑤u�,u� ∀𝑗 ∈ 𝑁u� , 𝑖 = 1, … , 𝑚,


u�∈u�u� ,u�∋u�

∑ 𝑤u�,u� = 1 for all 𝑖 = 1, … , 𝑚,


u�∈u�u�
𝛼u�,u� ≥ 0 ∀𝑆 ∈ 𝐴u� , 𝑗 = 1, … , 𝑛,
𝑤u�,u� ≥ 0 ∀𝑆 ∈ 𝐸u� , 𝑖 = 1, … , 𝑚,

where the sets 𝐸u� are defined as in BLPD1.


While bloating BLPD1 in this manner seems inefficient at first glance, the reason behind is that
the LP dual of PLPD, leads to an FFG which is topologically equivalent to the one of the primal LP,
which allows to use the graphical structure for solving the dual. After manipulating constraints
of the dual problem to obtain a closely related, “softened” dual linear programming decoding
(SDLPD) formulation, the authors propose a coordinate-ascent-type algorithm resembling the
min-sum algorithm and show convergence under certain assumptions. In this algorithm, all the
edges of FFG are updated according to some schedule. It is shown that the update calculations
required during each iteration can be efficiently performed by the SPAD. The coordinate-ascent-
type algorithm for SDLPD is guaranteed to converge if all the edges of the FFG are updated
cyclically.
Under the BIAWGNC, the authors compare the error-correcting performance of the coordinate-
ascent-type algorithm (max iterations: 64, 256) against the performance of the MSAD (max
iterations: 64, 256) on the (3, 6)-regular LDPC code with 𝑛 = 1000 and rate 𝑅 = 1/2. MSAD
performs slightly better than the coordinate-ascent-type algorithm. In summary, [46] shows
that it is possible to develop LP based algorithms with complexities similar to IMPD.
The convergence and the complexity of the coordinate-ascent-type algorithm proposed in [46]
are studied further in [47] by Burshtein. His algorithm has a new scheduling scheme and its
convergence rate and computational complexity are analyzed under this scheduling. With
this new scheduling scheme, the decoding algorithm from [46] yields an iterative approximate
LPD algorithm for LDPC codes with complexity in 𝑂(𝑛). The main difference between the

72
6.6 Efficient LP Solvers for BLPD

two algorithms is the selection and update of edges of the FFG. In [46] all edges are updated
cyclically during one iteration, whereas in [47], only few selected edges are updated during
one particular iteration. The edges are chosen according to the variable values obtained during
previous iterations.

6.6.3 Nonlinear Programming Approach

As an approximation of BLPD for LDPC codes, Yang et al. [36] introduce the box constraint
quadratic programming decoding (BCQPD) whose time complexity is linear in the code length.
BCQPD is a nonlinear programming approach derived from the Lagrangian relaxation (see [7]
for an introduction to Lagrangian relaxation) of BLPD1. To achieve BCQPD, a subset of the
set of the constraints are incorporated into the objective function. To simplify notation, Yang
et al. rewrite the constraint blocks (6.6b) and (6.6c) in the general form 𝐴𝑦 = 𝑏 by defining a
single variable vector 𝑦 = (𝑥, 𝑤)u� ∈ {0, 1}u� (so 𝐾 is the total number of variables in BLPD1) and
choosing 𝐴 and 𝑏 appropriately. Likewise, the objective function coefficients are rewritten in a
vector 𝑐, wich equals 𝜆 followed by the appropriate number of zeros. The resulting formulation
is min{𝑐u� 𝑦 ∶ 𝐴𝑦 = 𝑏, 𝑦 ∈ [0, 1]u� }. Using a multiplier 𝛼 > 0, the Lagrangian of this problem is
min 𝑐u� 𝑦 + 𝛼(𝐴𝑦 − 𝑏)u� (𝐴𝑦 − 𝑏)
s.t. 0 ≤ 𝑦u� ≤ 1 for 𝑘 = 1, … , 𝐾.
If 𝐴𝑦 = 𝑏 is violated then a positive value is added to the original objective function 𝑐u� 𝑦, i.e.,
the solution 𝑦 is penalized. Setting 𝑄 = 2𝛼𝐴u� 𝐴 and 𝑟 = 𝑐 − 2𝛼𝐴u� 𝑏 the BCQPD problem
min 𝑦u� 𝑄𝑦 + 2𝑟u� 𝑦 (BCQPD)
s.t. 0 ≤ 𝑦u� ≤ 1 for 𝑘 = 1, … , 𝐾
is obtained. Since 𝑄 is a positive semi-definite matrix, i.e., the objective function is convex, and
since the set of constraints constitutes a box, each 𝑦u� can be minimized separately. This leads
to efficient serial and parallel decoding algorithms. Two methods are proposed in [36] to solve
the BCQPD problem, the projected successive overrelaxation method (PSORM) and the parallel
gradient projection method (PGPM). These methods are generalizations of Gauss-Seidel and
Jacobi methods [48] with the benefit of faster convergence if proper weight factors are chosen.
PSORM and PGPM benefit from the low-density structure of the underlying parity-check
matrix.
One of the disadvantages of IMPD is the difficulty of analyzing the convergence behavior of such
algorithms. Yang et al. showed both theoretically and empirically that BCQPD converges under
some assumptions if PSORM or PGPM is used to solve the quadratic programming problem.
Moreover, the complexity of BCQPD is smaller than the complexity of SPAD. For numerical
tests, the authors use a product code with block length 45 = 1024 and rate (3/4)5 = 0.237. The
BIAWGNC is used. It is observed that the PSORM method converges faster than PGPM. The
error-correcting performance of SPAD is poor for product codes due to their regular structure.
For the chosen product code, Yang et al. demonstrate that PSORM outperforms SPAD in
computational complexity as well as in error-correcting performance.

73
Chapter 6 Mathematical Programming Decoding of Binary Linear Codes

6.6.4 Efficient LPD of SPC Product Codes

The class of single parity-check (SPC) product codes is of special interest in [34]. The authors
prove that for SPC product codes the fractional distance is equal to the minimum Hamming
distance, which explains why the error-correcting performance of LPD approaches MLD for
these codes at high SNR values. Furthermore, they propose a low complexity algorithm which
approximately computes the CLPD optimum for SPC product codes. This approach is based on
the observation that the parity-check matrix of an SPC product code can be decomposed into
component SPC codes. A Lagrangian relaxation of CLPD is obtained by keeping the constraints
from only one component code in the formulation and moving all other constraints to the
objective function with a penalty vector. The resulting Lagrangian dual problem is solved
by subgradient algorithms (see [7]). Two alternatives, subgradient decoding (SD) and joint
subgradient decoding (JSD) are proposed. It can be proved that subgradient decoders converge
under certain assumptions.

The number of iterations performed against the convergence behavior of SD is tested on


the (4,4) SPC product code, which has length 𝑛 = 256, rate 𝑅 = (3/4)4 ≈ 0.32 and is defined
as the product of four SPC codes of length 4 each. All variants tested (obtained by keeping
the constraints from component code 𝑗 = 1, 2, 3, 4 in the formulation) converge in less than 20
iterations. For demonstrating the error-correcting performance of SD if the number of iterations
are set to 5, 10, 20, 100, the (5,2) SPC product code (𝑛 = 25, rate 𝑅 = (4/5)2 = 0.64) is used. The
error-correcting performance is improved by increasing the number of iterations. Under the
BIAWGNC, this code and the (4,4) SPC product code are used to compare the error-correcting
performance of SD and JSD with the performance of BLPD and MLD. It should be noted that for
increasing SNR values, the error-correcting performance of BLPD converges to that of MLD for
SPC codes. JSD and SD approach the BLPD curve for the code with 𝑛 = 25. For the SPC product
code with 𝑛 = 256 the subgradient algorithms perform worse than BLPD. For both codes, the
error-correcting performance of JSD is superior to SD. Finally, the (10, 3) SPC product code with
𝑛 = 1000 and rate 𝑅 = (9/10)3 ≈ 0.729 is used to compare the error-correcting performance
of SD and JSD with the SPAD. Again the BIAWGNC is used. It is observed that SD performs
slightly better than the SPAD with a similar computational complexity. JSD improves the
error-correcting performance of the SD at the cost of increased complexity.

6.6.5 Interior Point Algorithms

Efficient LPD approaches based on interior point algorithms are studied by Vontobel [49],
Wadayama [50], and Taghavi et al. [45]. The use of interior point algorithms to solve LP
problems as an alternative to the simplex method was initiated by Karmarkar [51]. In these
algorithms, a starting point in the interior of the feasible set is chosen. This starting point
is iteratively improved by moving through the interior of the polyhedron in some descent
direction until the optimal solution or an approximation is found. There are various interior
point algorithms and for some, polynomial time convergence can be proved. This is an advantage
over the simplex method which has exponential worst case complexity.

74
6.7 Improving the Error-Correcting Performance of BLPD

The proposed interior point algorithms aim at using the special structure of the LP problem.
The resulting running time is a low-degree polynomial function on the block length. Thus,
fast decoding algorithms based on interior point algorithms may be developed for codes with
large block lengths. In particular affine scaling algorithms [49], primal-dual interior point
algorithms [45, 49] and primal path following interior point algorithm [50] are considered.
The bottleneck operation in interior point methods is to solve a system of linear equations
depending on the current iteration of the algorithm. Efficient approaches to solve this system
of equations are proposed in [45, 49], the former containing an extensive study, including
investigation of appropriate preconditioners for the often ill-conditioned equation system.
The speed of convergence to the optimal vertex of the algorithms in [50] and [45] under the
BIAWGNC are demonstrated on a nearly (3, 6)-regular LDPC code with 𝑛 = 1008, 𝑅 = 1/2 and
a randomly-generated (3, 6)-regular LDPC code with 𝑛 = 2000, respectively.

6.7 Improving the Error-Correcting Performance of BLPD

The error-correcting performance of BLPD can be improved by techniques from integer pro-
gramming. Most of the improvement techniques can be grouped into cutting plane or branch
& bound approaches. In this section, we review the improved LPD approaches mainly with
respect to this categorization.

6.7.1 Cutting Plane Approaches

The fundamental polytope 𝒫 can be tightened by cutting plane approaches. In the following,
we refer to valid inequalities as inequalities satisfied by all points in conv(𝐶). Valid cuts are
valid inequalities which are violated by some non-integral vertex of the LP relaxation. Feldman
et al. [2] already address this concept; besides applying the “Lift and project” technique which
is a generic tightening method for integer programs [52], they also strengthen the relaxation
by introducing redundant rows into the parity-check matrix (or, equivalently, redundant parity-
checks into the Tanner graph) of the given code (cf. Section 6.2). When using the BLPD2
formulation, we derive additional FS inequalities from the redundant parity-checks without
increasing the number of variables. We refer to such inequalities as redundant parity-check
(RPC) inequalities. RPC inequalities may include valid cuts which increase the possibility that
LPD outputs a codeword. An interesting question relates to the types of inequalities required to
describe the codeword polytope conv(𝐶) exactly. It turns out that conv(𝐶) cannot be described
completely by using only FS and box inequalities; the (7, 3, 4) simplex code (dual of the (7, 4, 3)
Hamming code) is given as a counter-example in [2]. More generally, it can be concluded from
[53] that these types of inequalities do not suffice to describe all facets of a simplex code.
RPCs can also be interpreted as dual codewords. As such, for interesting codes there are
exponentially many RPC inequalities. The RPC inequalities cutting off the non-integral optimal
solutions are called RPC cuts [44]. An analytical study under which circumstances RPCs can
induce cuts is carried out in [24]. Most notably, it is shown that RPCs obtained by adding no

75
Chapter 6 Mathematical Programming Decoding of Binary Linear Codes

more than (𝑔 − 2)/2 dual codewords, where 𝑔 is the length of a shortest cycle in the Tanner
graph, never change the fundamental polytope.
There are several heuristic approaches in the LPD literature to find cut inducing RPCs [2,
44, 54, 55]. In [2], RPCs which result from adding any two rows of 𝐻 are appended to the
original parity-check matrix. The authors of [45] find RPCs by randomly choosing cycles in the
fractional subgraph of the Tanner graph, which is obtained by choosing only the fractional
variable nodes and the check nodes directly connected to them. They give a theorem which
states that every possible RPC cut must be generated by such a cycle. Their approach is a
heuristic one since the converse of that theorem does not hold. In [54] the column index set
corresponding to an optimal LP solution is sorted. By re-arranging 𝐻 and bringing it to row
echelon form, RPC cuts are searched. In [55], the parity-check matrix is reformulated such
that unit vectors are obtained in the columns of the parity-check matrix which correspond to
fractional valued bits in the optimal solution of the current LP. RPC cuts are derived from the
rows of the modified parity-check matrix.
The approaches in [28], [44], and [55] rely on a noteworthy structural property of the fun-
damental ploytope. Namely, it can be shown that no check node of the associated Tanner
graph (regardless of the existence of redundant parity-checks) can be adjacent to only one
non-integral valued variable node.
Feldman et al. [2] test the lift and project technique on a random rate-1/4 LDPC code with
𝑛 = 36, 𝑑u� = 3 and 𝑑u� = 4 under the BIAWGNC. Moreover, a random rate-1/4 LDPC code with
𝑛 = 40, 𝑑u� = 3, and 𝑑u� = 4 is used to demonstrate the error-correcting performance of BLPD
when the original parity-check matrix is extended by all those RPCs obtained by adding any
two rows of the original matrix. Both tightening techniques improve the error-correcting
performance of BLPD, though the benefit of the latter is rather poor, due to the abovementioned
condition on cycle lengths.
The idea of tightening the fundamental polytope is usually implemented as a cutting plane
algorithm, i.e., the separation problem is solved (see Definition 6.5 and Section 6.6.1). In
cutting plane algorithms, an LP is solved which contains only a subset of the constraints of
the corresponding optimization problem. If the optimal LP solution is a codeword then the
cutting plane algorithm terminates and outputs the ML codeword. Otherwise, valid cuts from a
predetermined family of valid inequalities are searched. If some valid cuts are found, they are
added to the LP formulation and the LP is resolved. In [44, 54, 55] the family of valid cuts is FS
inequalities derived from RPCs.
In [54] the main motivation for the greedy cutting plane algorithm is to improve the fractional
distance. This is demonstrated for the (7, 4, 3) Hamming code, the (24, 12, 8) Golay code and a
(204, 102) LDPC code. As a byproduct under the BSC it is shown on the (24, 12, 8) Golay code
and a (204, 102) LDPC code that the RPC based approach of [54] improves the error-correcting
performance of BLPD.
In the improved LPD approach of [44], first ALPD (see Section 6.6) is applied. If the solution is
non-integral, an RPC cut search algorithm is employed. This algorithm can be briefly outlined
as follows:

76
6.7 Improving the Error-Correcting Performance of BLPD

(1) Given a non-integral optimal LP solution 𝑥∗ , remove all variable nodes 𝑗 for which 𝑥∗u� is
integral from the Tanner graph.
(2) Find a cycle by randomly walking through the pruned Tanner graph.
(3) Sum up (in 𝔽2 ) the rows 𝐻 which correspond to the check nodes in the cycle.
(4) Check if the resulting RPC introduces a cut.
The improved decoder of [44] performs noticeably better than BLPD and SPAD. This is shown
under the BIAWGNC on (3, 4)-regular LDPC codes with 𝑛 = 32, 100, 240.
The cutting plane approach of [55] is based on an IP formulation of MLD, which is referred to
as IPD (note that this formulation was already mentioned in [9]). Auxiliary variables 𝑧 ∈ ℤu�
model the binary constraints 𝐻𝑥 = 0 over 𝔽2 in the real number field ℝu� .

min 𝜆u� 𝑥 (IPD)


s.t. 𝐻𝑥 − 2𝑧 = 0
𝑥 ∈ {0, 1}u� , 𝑧 ∈ ℤu�

In [55], the LP relaxation of IPD is the initial LP problem which is solved by a cutting plane
algorithm. Note that the LP relaxation of IPD is not equivalent to the LP relaxations given in
Section 6.5. In almost all improved (in the error-correcting performance sense) LPD approaches
reviewed in this article first the BLPD is run. If BLPD fails, some technique to improve BLPD
is used with the goal of finding the ML codeword at the cost of increased complexity. In
contrast, the approach by Tanatmis et al. in [55] does not elaborate on the solution of BLPD, but
immediately searches for cuts which can be derived from arbitrary dual codewords. To this end,
the parity-check matrix is modified and the conditions under which certain RPCs define cuts are
checked. The average number of iterations performed and the average number of cuts generated
in the separation algorithm decoding (SAD) of [55] are presented for the (3, 6) random regular
codes with 𝑛 = 40, 80, 160, 200, 400 and for the (31, 10), (63, 39), (127, 99), (255, 223) BCH
codes. Both performance measures seem to be directly proportional to the block length. The
error-correcting performance of SAD is measured on the random regular (3, 4) LDPC codes
with block length 100 and 200, and Tanner’s (155, 64) group structured LDPC code [56]. It is
demonstrated that the improved LPD approach of [55] performs better than BLPD applied in
the adaptive setting [44] and better than SPAD. One significant numerical result is that SAD
proposed in [55] performs much better than BLPD for the (63, 39) and (127, 99) BCH codes,
which have high-density parity check matrices. In all numerical simulations the BIAWGNC is
used.
Yufit et al. [57] improve SAD [55] and ALPD [44] by employing several techniques. The authors
propose to improve the error-correcting performance of these decoding methods by using RPC
cuts derived from alternative parity-check matrices selected from the automorphism group of
𝐶, Aut(𝐶). In the alternative parity-check matrices, the columns of the original parity-check
matrix are permuted according to some scheme. At the first stage of Algorithm 1 of [57], SAD
is used to solve the MLD problem. If the ML codeword is found then Algorithm 1 terminates,
otherwise an alternative parity-check matrix from Aut(𝐶) is randomly chosen and the SAD

77
Chapter 6 Mathematical Programming Decoding of Binary Linear Codes

is applied again. In the worst case this procedure is repeated 𝑁 times where 𝑁 denotes a
predetermined constant. A similar approach is also used to improve ALPD in Algorithm 2 of
[57]. Yufit et al. enhance Algorithm 1 with two techniques to improve the error-correcting
performance and complexity. The first technique, called parity-check matrix adaptation, is to
alter the parity-check matrix prior to decoding such that at the columns of the parity-check
matrix which correspond to least reliable bits, i.e., bits with the smallest absolute LLR values,
unit vectors are obtained. The second technique, which is motivated by MALPD of [45], is
to drop the inactive inequalities at each iteration of SAD, in order to avoid that the problem
size increases from iteration to iteration. Under the BIAWGNC, it is demonstrated on the
(63, 36, 11) BCH code and the (63, 39, 9) BCH code that SAD can be improved both in terms
of error-correcting performance and computational complexity.

6.7.2 Facet Guessing Approaches

Based on BLPD2, Dimakis et al. [28] improve the error-correcting performance of BLPD with
an approach similar to FDA (see Section 6.4). They introduce facet guessing algorithms which
iteratively solve a sequence of related LP problems. Let 𝑥∗ be a non-integral optimal solution of
BLPD, 𝑥ML be the ML codeword, and ℱ be a set of faces of 𝒫 which do not contain 𝑥∗ . This set
ℱ is given by the set of inequalities which are not active at 𝑥∗ .

The set of active inequalities of a pseudocodeword 𝑣 is denoted by 𝔸(𝑣). In facet guessing


algorithms, the objective function 𝜆u� 𝑥 is minimized over 𝑓 ∩ 𝒫 for all 𝑓 ∈ 𝒦 ⊆ ℱ where 𝒦 is
an arbitrary subset of ℱ. The optimal solutions are stored in a list. In random facet guessing
decoding (RFGD), |𝒦| of the faces 𝑓 ∈ ℱ are chosen randomly. If 𝒦 = ℱ then exhaustive facet
guessing decoding (EFGD) is obtained. From the list of optimal solutions, the facet guessing
algorithms output the integer solution with minimum objective function value. It is shown that
EFGD fails if there exists a pseudocodeword 𝑣 ∈ 𝑓 such that 𝜆u� 𝑣 < 𝜆u� 𝑥ML for all 𝑓 ∈ 𝔸(𝑥ML ).
For suitable expander codes this result is combined with the following structural property of
expander-based codes also proven by the authors. The number of active inequalities at some
codeword is much higher than at a non-integral pseudocodeword. Consequently, theoretical
bounds on the decoding success conditions of the polynomial time algorithms EFGD and RFGD
for expander codes are derived. The numerical experiments are performed under the BIAWGNC,
on Tanner’s (155, 64) group-structured LDPC code and on a random LDPC code with 𝑛 = 200,
𝑑u� = 3, 𝑑u� = 4. For these codes the RFG algorithm performs better than the SPAD.

6.7.3 Branch & Bound Approaches

Linear programming based branch & bound is an implicit enumeration technique in which a
difficult optimization problem is divided into multiple, but easier subproblems by fixing the
values of certain discrete variables. We refer to [7] for a detailed description. Several authors
improved LPD using the branch & bound approach.

78
6.7 Improving the Error-Correcting Performance of BLPD

Breitbach et al. [9] solved IPD by a branch & bound approach. Depth-first and breadth-first
search techniques are suggested for exploring the search tree. The authors point out the
necessity of finding good bounds in the branch & bound algorithm and suggest a neighborhood
search heuristic as a means of computing upper bounds. In the heuristic, a formulation is used
which is slightly different to IPD. We refer to this formulation as alternative integer programming
decoding (AIPD). AIPD can be obtained by using error vectors. Let 𝑦 ̄ = 12 (1 − sign(𝜆)) be the
hard decision for the LLR vector 𝜆 obtained from the BIAWGNC. Comparing 𝑦 ̄ ∈ {0, 1}u� with a
codeword 𝑥 ∈ 𝐶 results in an error vector 𝑒 ∈ {0, 1}u� , i.e., 𝑒 = 𝑦 ̄ + 𝑥 (mod 2). Let 𝑠 = 𝐻𝑦,̄ and
define 𝜆̄ by 𝜆̄ u� = ∣𝜆u� ∣. IPD can be reformulated as

min 𝜆̄ u� 𝑒 (AIPD)
s.t. 𝐻𝑒 − 2𝑧 = 𝑠
𝑒 ∈ {0, 1}u� , 𝑧 ∈ ℤu� .

In the neighborhood search heuristic of [9], first a feasible starting solution 𝑒0 is calculated by
setting the coordinates of 𝑒0 corresponding to the 𝑛 − 𝑚 most reliable bits (i.e., those 𝑗 ∈ 𝐽 such
that ∣𝑦u� ∣ are largest) to 0. These are the non-basic variables while the 𝑚 basic variables are found
from the vector 𝑠 ∈ {0, 1}u� . Starting from this solution a neighborhood search is performed
by exchanging basic and non-basic variables. The tuple of variables yielding a locally best
improvement in the objective function is selected for iterating to the next feasible solution.

In [9], numerical experiments are performed under the BIAWGNC, on the (31, 21, 5) BCH
code, the (64, 42, 8) Reed-Muller code, the (127, 85, 13) BCH code and the (255, 173, 23) BCH
code. The neighborhood search with single position exchanges performs very similar to MLD
for the (31, 21, 5) BCH code. As the block length increases the error-correcting performance
of the neighborhood search with single position exchanges gets worse. An extension of this
heuristic allowing two position exchanges is applied to the (64, 42, 8) Reed-Muller code, the
(127, 85, 13) BCH code, and the (255, 173, 23) BCH code. The extended neighborhood search
heuristic improves the error-correcting performance at the cost of increased complexity. A
branch & bound algorithm is simulated on the (31, 21, 5) BCH code and different search tree
exploration schemes are investigated. The authors suggest a combination of depth-first and
breadth-first search.

In [58], Draper et al. improve the ALPD approach of [44] with a branch & bound technique.
Branching is done on the least certain variable, i.e., 𝑥u� such that ∣𝑥∗u� − 1/2∣ is smallest for 𝑗 ∈ 𝐽.
Under the BSC, it is observed on Tanner’s (155, 64, 20) code that the ML codeword is found
after few iterations in many cases.

In [36] two branch & bound approaches for LDPC codes are introduced. In ordered constant
depth decoding (OCDD) and ordered variable depth decoding (OVDD), first BLPD1 is solved.
If the optimal solution 𝑥∗ is non-integral, a subset 𝒯 ⊆ ℰ of the set of all non-integral bits
ℰ is chosen. Let 𝑔 = |𝒯|. The subset 𝒯 is constituted from the least certain bits. The term
“ordered” in OCDD and OVDD is motivated by this construction. It is experimentally shown in
[36] that choosing the least certain bits is advantageous in comparison to a random choice of
bits. OVDD is a breadth first branch & bound algorithm where the depth of the search tree is

79
Chapter 6 Mathematical Programming Decoding of Binary Linear Codes

restricted to 𝑔. Since this approach is common in integer programming, we do not give the
details of OVDD and refer to [7] instead. For OVDD, the number of LPs solved in the worst
case is 2u�+1 − 1.
In OCDD, 𝑚-element subsets ℳ of 𝒯, i.e., ℳ ⊆ 𝒯 and 𝑚 = |ℳ|, are chosen. Let 𝑏 ∈ {0, 1}u� .
For any ℳ ⊆ 𝒯, 2u� LPs are solved, each time adding a constraint block

𝑥u� = 𝑏u� for all 𝑘 ∈ ℳ

to BLPD1, thus fixing 𝑚 bits. Let 𝑥 ̂ be the solution with the minimum objective function value
among the 2u� LPs solved. If 𝑥 ̂ is integral, OCDD outputs 𝑥;̂ otherwise another subset ℳ ⊆ 𝒯
u� u�
is chosen. Since OCDD exhausts all 𝑚-element subsets of 𝒯, in the worst case (u� )2 + 1 LPs
are solved.
The branch & bound based improved LPD of Yang et al. [36] can be applied to LDPC codes with
short block length. For the following numerical tests, the BIAWGNC is used. Under various
settings of 𝑚 and 𝑔 it is shown on a random LDPC code with 𝑛 = 60, 𝑅 = 1/4, 𝑑u� = 4, and 𝑑u� = 3
that OCDD has a better error-correcting performance than BLPD and SPAD. Several simulations
are done to analyze the trade-off between complexity and error-correcting performance of
OCDD and OVDD. For the test instances and parameter settings6 used in [36] it has been
observed on the above-mentioned code that OVDD outperforms OCDD. This behavior is
explained by the observation that OVDD applies the branch & bound approach on the most
unreliable bits. On a longer random LDPC code with 𝑛 = 1024, 𝑅 = 1/4, 𝑑u� = 4, and 𝑑u� = 3, it is
demonstrated that the OVDD performs better than BLPD and SPAD.
Another improved LPD technique which can be interpreted as a branch & bound approach is
randomized bit guessing decoding (RBGD) of Dimakis et al. [28]. RBGD is inspired from the
special case that all facets chosen by RFGD (see Section 6.7.2) correspond to constraints of type
𝑥u� ≥ 0 or 𝑥u� ≤ 1. In RBGD, 𝑘 = 𝑐 log 𝑛 variables, where 𝑐 > 0 is a constant, are chosen randomly.
Because there are 2u� different possibile configurations of these 𝑘 variables, BLPD2 is run 2u�
times with associated constraints for each assignment. The best integer valued solution in terms
of the objective function 𝜆 is the output of RBGD. Note that by setting 𝑘 to 𝑐 log 𝑛, a polynomial
complexity in 𝑛 is ensured. Under the assumption that there exists a unique ML codeword,
exactly one of the 2u� bit settings matches the bit configuration in the ML codeword. Thus,
RBGD fails if a non-integral pseudocodeword with a better objective function value coincides
with the ML codeword in all 𝑘 components. For some expander codes, the probablilty that the
RBGD finds the ML codeword is given in [28]. To find this probability expression, the authors
first prove that, for some expander-based codes, the number of non-integral components in
any pseudocodeword scales linearly in block length.
Chertkov and Chernyak [59] apply the loop calculus approach [60], [61] to improve BLPD.
Loop calculus is an approach from statistical physics and related to cycles in the Tanner
graph representation of a code. In the context of improved LPD, it is used to either modify
objective function coefficients [59] or to find branching rules for branch and bound [62]. Given
a parity-check matrix and a channel output, linear programming erasure decoding (LPED)
6
The parameters u� and u� are chosen such that OVDD and OCDD have similar worst case complexity.

80
6.8 Conclusion

[59] first solves BLPD. If a codeword is found then the algorithm terminates. If a non-integral
pseudocodeword is found then a so-called critical loop is searched by employing loop calculus.
The indices of the variable nodes along the critical loop form an index set ℳ ⊆ 𝐽. LPED lowers
the objective function coefficients 𝜆u� of the variables 𝑥u� , 𝑗 ∈ ℳ, by multiplying 𝜆u� with 𝜖,
where 0 ≤ 𝜖 < 1. After updating the objective function coefficients, BLPD is solved again. If
BLPD does not find a codeword then the selection criterion for the critical loop is improved.
LPED is tested on the list of pseudocodewords found in [35] for Tanner’s (155, 64, 20) code. It
is demonstrated that LPED corrects the decoding errors of BLPD for this code.

In [62], Chertkov combines the loop calculus approach used in LPED [59] with RFGD [28].
We refer to the combined algorithm as loop guided guessing decoding (LGGD). LGGD differs
from RFGD in the sense that the constraints chosen are of type 𝑥u� ≥ 0 or 𝑥u� ≤ 1 where 𝑗 is
in the index set 𝑀, the index set of the variable nodes in the critical loop. LGGD starts with
solving BLPD. If the optimal solution is non-integral then the critical loop is found with the
loop calculus approach. Next, a variable 𝑥u� , 𝑗 ∈ 𝑀, is selected randomly and two partial LPD
problems are deduced. These differ from the original problem by only one equality constraint
𝑥u� = 0 or 𝑥u� = 1. LGGD chooses the minimum of the objective values of the two subproblems.
If the corresponding pseudocodeword is integral then the algorithm terminates. Otherwise
the equality constraints are dropped, a new 𝑗 ∈ 𝑀 along the critical loop is chosen, and two
new subproblems are constructed. If the set 𝑀 is exhausted, the selection criterion of the
critical loop is improved. LGGD is very similar to OCDD of [36] for the case that 𝑔 = |𝑀| and
𝑚 = 1. In LGGD branching is done on the bits in the critical loop whereas in OCDD branching
is done on the least reliable bits. As in [59], LGGD is tested on the list of pseudocodewords
generated in [35] for Tanner’s (155, 64, 20) code. It is shown that LGGD improves BLPD under
the BIAWGNC.

SAD of [55] is improved in terms of error-correcting performance by a branch & bound approach
in [57]. In Algorithm 3 of [57], first SAD is employed. If the solution is non-integral then a
depth-first branch & bound is applied. The non-integral valued variable with smallest LLR
value is chosen as the branching variable. Algorithm 3 terminates as soon as the search tree
reaches the maximally allowed depth 𝐷u� . Under the BIAWGNC, on the (63, 36, 11) BCH code
and the (63, 39, 9) BCH code Yufit et al. [57] demonstrate that the decoding performance of
Algorithm 3 (enhanced with parity-check matrix adaptation) approaches MLD.

6.8 Conclusion

In this survey we have shown how the decoding of binary linear block codes benefits from
a wide range of concepts which originate from mathematical optimization—mostly linear
programming, but also quadratic (nonlinear) and integer programming, duality theory, branch
& bound methods, Lagrangian relaxation, network flows, and matroid theory. Bringing together
both fields of research does lead to promising new algorithmic decoding approaches as well as
deeper structural understanding of linear block codes in general and special classes of codes—
like LDPC and turbo-like codes—in particular. The most important reason for the success of

81
References

this connection is the formulation of MLD as the minimization of a linear function over the
codeword polytope conv(𝐶). We have reviewed a variety of techniques of how to approximate
this polytope, whose description complexity in general is too large to be computed efficiently.
For further research on LPD of binary linear codes, two general directions can be distinguished.
One is to decrease the algorithmic complexity of LPD towards reducing the gap between LPD
and IMPD, the latter of which still outperforms LPD in practice. The other direction aims at
increasing error-correcting performance, tightening up to MLD performance. This includes
a continued study of RPCs as well as the characterization of other, non-RPC facet-defining
inequalities of the codeword polytope.
There are other lines of research related to LPD and IMPD which are not covered in this article.
Flanagan et al. [21] have generalized LP decoding, along with several related concepts, to
nonbinary linear codes. Another possible generalization is to extend to different channel models
[22]. Connecting two seemingly different decoding approaches, structural relationship between
LPD and IMPD has been discussed in [63]. Moreover, the discovery that both decoding methods
are closely related to the Bethe free energy approximation, a tool from statistical physics, has
initiated vital research [64]. Also, of course, research on IMPD itself, independent of LPD, is
still ongoing with high activity. A promising direction of research is certainly the application
of message passing techniques to mathematical programming problems beyond LPD [65].

Acknowledgment

We would like to thank Pascal O. Vontobel and Frank Kienle for their comments and suggestions.
We also thank the anonymous referees for the helpful reviews.

References

[1] J. Feldman. “Decoding error-correcting codes via linear programming”. PhD thesis. Cam-
bridge, MA: Massachusetts Institute of Technology, 2003.
[2] J. Feldman, M. J. Wainwright, and D. R. Karger. “Using linear programming to decode
binary linear codes”. IEEE Transactions on Information Theory 51.3 (Mar. 2005), pp. 954–
972. doi: 10.1109/TIT.2004.842696. url: www.eecs.berkeley.edu/~wainwrig/Papers/
FelWaiKar05.pdf.
[3] F. R. Kschischang, B. J. Frey, and H.-A. Loeliger. “Factor graphs and the sum-product
algorithm”. IEEE Transactions on Information Theory 47.2 (Feb. 2001), pp. 498–519. doi:
10.1109/18.910572. url: www.comm.utoronto.ca/frank/papers/KFL01.pdf.
[4] S. M. Aji and R. J. McEliece. “The generalized distributive law”. IEEE Transactions on
Information Theory 46.2 (Mar. 2000), pp. 325–343. doi: 10.1109/18.825794.
[5] N. Wiberg. “Codes and decoding on general graphs”. PhD thesis. Linköping, Sweden:
Linköping University, 1996.

82
References

Table 6.7: List of abbreviations used in Paper I.


AIPD = alternative integer programming decoding
ALPD = adaptive linear programming decoding
BCQPD = box-constrained quadratic programming decoding
BIAWGNC = binary-input additive white Gaussian noise channel
BLPD = bare linear programming decoding
BSC = binary symmetric channel
CLPD = cascaded linear programming decoding
EFGD = exhaustive facet guessing decoding
FDA = fractional distance algorithm
FFG = Forney style factor graph
FS = forbidden set
GSA = greedy separation algorithm
IMPD = iterative message-passing decoding
IP = integer programming
IPD = integer programming decoding
JSD = joint subgradient decoding
LLR = log-likelihood ratio
LDPC = low-density parity-check
LGGD = loop guided guessing decoding
LP = linear programming
LPD = linear programming decoding
LPED = linear programming erasure decoding
MALPD = modified adaptive linear programming decoding
ML = maximum likelihood
MLD = maximum-likelihood decoding
MSAD = min-sum algorithm decoding
OCDD = ordered constant depth decoding
OVDD = ordered variable depth decoding
PGPM = parallel gradient projection method
PLPD = primal linear programming decoding
PSORM = projected successive overrelaxation method
RA = repeat accumulate
RBGD = randomized bit guessing decoding
RFGD = randomized facet guessing decoding
RPC = redundant parity-check
SAD = separation algorithm decoding
SD = subgradient decoding
SDLPD = softened dual linear programming decoding
SNR = signal-to-noise ratio
SPAD = sum-product algorithm decoding
SPC = single parity-check
TCLPD = turbo code linear programming decoding

83
References

[6] H.-A. Loeliger. “An introduction to factor graphs”. IEEE Signal Processing Magazine 21.1
(Jan. 2004), pp. 28–41. issn: 1053-5888. doi: 10.1109/MSP.2004.1267047.
[7] G. L. Nemhauser and L. A. Wolsey. Integer and Combinatorial Optimization. Wiley-
Interscience series in discrete mathematics and optimization, John Wiley & Sons, 1988.
[8] M. Grötschel, L. Lovász, and A. Schrijver. Geometric Algorithms and Combinatorial Opti-
mization. 2nd ed. Algorithms and Combinatorics. Springer, 1993.
[9] M. Breitbach et al. “Soft-decision decoding of linear block codes as optimization problem”.
European Transactions on Telecommunications 9 (1998), pp. 289–293.
[10] E. Berlekamp, R. J. McEliece, and H. C. A. van Tilborg. “On the inherent intractability
of certain coding problems”. IEEE Transactions on Information Theory 24.3 (May 1978),
pp. 954–972. doi: 10.1109/TIT.1978.1055873.
[11] R. M. Karp. “Reducibility among combinatorial problems”. In: Complexity of Computer
Computations. Ed. by R. E. Miller, J. W. Thatcher, and J. D. Bohlinger. The IBM Research
Symposia Series. Springer US, 1972, pp. 85–103. isbn: 978-1-4684-2003-6. doi: 10.1007/978-
1-4684-2001-2_9.
[12] A. Vardy. “The intractability of computing the minimum distance of a code”. IEEE Trans-
actions on Information Theory 43.6 (Nov. 1997), pp. 1757–1766. doi: 10.1109/18.641542.
[13] D. J. A. Welsh. “Combinatorial problems in matroid theory”. In: Combinatorial Mathe-
matics and its Applications. Ed. by D. J. A. Welsh. London, U.K.: Academic Press, 1971,
pp. 291–307.
[14] J. G. Oxley. Matroid Theory. Oxford University Press, 1992.
[15] D. J. A. Welsh. Matroid Theory. L. M. S. Monographs. Academic Press, 1976.
[16] N. Kashyap. “A decomposition theory for binary linear codes”. IEEE Transactions on
Information Theory 54.7 (July 2008), pp. 3035–3058. doi: 10.1109/TIT.2008.924700. url:
http://www.ece.iisc.ernet.in/~nkashyap/Papers/code_decomp_final.pdf.
[17] F. Barahona and M. Grötschel. “On the cycle polytope of a binary matroid”. Journal of
Combinatorial Theory Series B 40 (1986), pp. 40–62.
[18] N. Kashyap. “On the convex geometry of binary linear codes”. In: Proceedings of the
Inaugural UC San Diego Workshop on Information Theory and its Applications (ITA). La
Jolla, CA, Feb. 2006. url: http://ita.ucsd.edu/workshop/06/talks.
[19] M. Grötschel and K. Truemper. “Decomposition and optimization over cycles in binary
matroids”. Journal of Combinatorial Theory Series B 46 (1989), pp. 306–337.
[20] P. D. Seymour. “Decomposition of regular matroids”. Journal of Combinatorial Theory
Series B 28 (1980), pp. 305–359.
[21] M. Flanagan et al. “Linear-programming decoding of nonbinary linear codes”. IEEE
Transactions on Information Theory 55.9 (Sept. 2009), pp. 4134–4154. issn: 0018-9448. doi:
10.1109/TIT.2009.2025571. arXiv: 0804.4384 [cs.IT].

84
References

[22] A. Cohen et al. “LP decoding for joint source-channel codes and for the non-ergodic
Polya channel”. IEEE Communications Letters 12.9 (2008), pp. 678–680. issn: 1089-7798.
doi: 10.1109/LCOMM.2008.080713.
[23] S. Lin and D. Costello Jr. Error Control Coding. 2nd ed. Upper Saddle River, NJ: Prentice-
Hall, Inc., 2004. isbn: 0130426725.
[24] P. O. Vontobel and R. Koetter. Graph-cover decoding and finite-length analysis of message-
passing iterative decoding of LDPC codes. 2005. arXiv: cs/0512078 [cs.IT].
[25] G. D. Forney Jr. et al. “On the effective weights of pseudocodewords for codes defined
on graphs with cycles”. In: Codes, systems, and graphical models. Ed. by B. Marcus and
J. Rosenthal. Vol. 123. The IMA Volumes in Mathematics and its Applications. Springer
Verlag, New York, Inc., 2001, pp. 101–112.
[26] M. Chertkov and M. G. Stepanov. “Polytope of correct (linear programming) decoding
and low-weight pseudo-codewords”. In: Proceedings of IEEE International Symposium on
Information Theory. St. Petersburg, Russia, July 2011, pp. 1648–1652. doi: 10.1109/ISIT.
2011.6033824.
[27] A. Schrijver. Theory of Linear and Integer Programming. John Wiley & Sons, 1986.
[28] A. G. Dimakis, A. A. Gohari, and M. J. Wainwright. “Guessing facets: polytope structure
and improved LP decoding”. IEEE Transactions on Information Theory 55.8 (Aug. 2009),
pp. 4134–4154. doi: 10.1109/TIT.2009.2023735. arXiv: 0709.3915 [cs.IT].
[29] M. Grötschel. “Cardinality homogeneous set systems, cycles in matroids, and associated
polytopes”. In: The Sharpest Cut: The Impact of Manfred Padberg and His Work. MPS-SIAM
Series on Optimization. Society for Industrial Mathematics, 2004. Chap. 8, pp. 99–120.
[30] R. G. Gallager. “Low-density parity-check codes”. IRE Transactions on Information Theory
8.1 (Jan. 1962), pp. 21–28. issn: 0096-1000. doi: 10.1109/TIT.1962.1057683.
[31] G. D. Forney Jr. “Codes on graphs: normal realizations”. IEEE Transactions on Information
Theory 47.2 (Feb. 2001), pp. 529–548. doi: 10.1109/18.910573.
[32] J. Feldman and C. Stein. “LP decoding achieves capacity”. In: Proceedings of the 16th Annual
ACM-SIAM Symposium on Discrete Algorithms. Philadelphia, PA, Jan. 2005, pp. 460–469.
isbn: 0-89871-585-7.
[33] M. Yannakakis. “Expressing combinatorial optimization problems by linear programs”.
Journal of Computer and System Sciences 43 (1991), pp. 441–466. doi: 10.1145/62212.62232.
[34] K. Yang, X. Wang, and J. Feldman. “A new linear programming approach to decoding
linear block codes”. IEEE Transactions on Information Theory 54.3 (Mar. 2008), pp. 1061–
1072. doi: 10.1109/TIT.2007.915712.
[35] M. Chertkov and M. Stepanov. “Pseudo-codeword landscape”. In: Proceedings of IEEE
International Symposium on Information Theory. Nice, France, June 2007, pp. 1546–1550.
doi: 10.1109/ISIT.2007.4557442.
[36] K. Yang, J. Feldman, and X. Wang. “Nonlinear programming approaches to decoding
low-density parity-check codes”. IEEE Journal on Selected Areas in Communications 24.8
(Aug. 2006), pp. 1603–1613. doi: 10.1109/JSAC.2006.879405.

85
References

[37] C. Berrou and A. Glavieux. “Near optimum error correcting coding and decoding: turbo-
codes”. IEEE Transactions on Communications 44.10 (Oct. 1996), pp. 1261–1271. issn:
0090-6778. doi: 10.1109/26.539767.
[38] C. Berrou et al. “Improving the distance properties of turbo codes using a third component
code: 3D turbo codes”. IEEE Transactions on Communications 57.9 (Sept. 2009), pp. 2505–
2509. doi: 10.1109/TCOMM.2009.09.070521.
[39] R. K. Ahuja, T. L. Magnanti, and J. B. Orlin. Network Flows. Prentice-Hall, 1993.
[40] J. Feldman and D. R. Karger. “Decoding turbo-like codes via linear programming”. Journal
of Computer and System Sciences 68 (4 June 2004), pp. 733–752. issn: 0022-0000. doi:
10.1016/j.jcss.2003.11.005.
[41] N. Halabi and G. Even. “Improved bounds on the word error probability of RA(2) codes
with linear-programming-based decoding”. IEEE Transactions on Information Theory 51.1
(Jan. 2005), pp. 265–280. doi: 10.1109/TIT.2004.839509.
[42] I. Goldenberg and D. Burshtein. “Error bounds for repeat-accumulate codes decoded via
linear programming”. In: Proceedings of the International Symposium on Turbo Codes and
Iterative Information Processing. Brest, France, Sept. 2010, pp. 487–491.
[43] A. Tanatmis, S. Ruzika, and F. Kienle. “A Lagrangian relaxation based decoding algorithm
for LTE turbo codes”. In: Proceedings of the International Symposium on Turbo Codes and
Iterative Information Processing. Brest, France, Sept. 2010, pp. 369–373. doi: 10.1109/ISTC.
2010.5613906.
[44] M. H. Taghavi and P. H. Siegel. “Adaptive methods for linear programming decoding”.
IEEE Transactions on Information Theory 54.12 (Dec. 2008), pp. 5396–5410. doi: 10.1109/
TIT.2008.2006384. arXiv: cs/0703123 [cs.IT].
[45] M. H. Taghavi, A. Shokrollahi, and P. H. Siegel. “Efficient implementation of linear
programming decoding”. IEEE Transactions on Information Theory 57.9 (Sept. 2011),
pp. 5960–5982. doi: 10.1109/TIT.2011.2161920. arXiv: 0902.0657 [cs.IT].
[46] P. O. Vontobel and R. Koetter. “On low-complexity linear-programming decoding of
LDPC codes”. European Transactions on Telecommunications 18 (2007), pp. 509–517.
[47] D. Burshtein. “Iterative approximate linear programming decoding of LDPC codes with
linear complexity”. IEEE Transactions on Information Theory 55.11 (2009), pp. 4835–4859.
issn: 0018-9448. doi: 10.1109/TIT.2009.2030477.
[48] D. P. Bertsekas and J. N. Tsitsiklis. Parallel and Distributed Computation: Numerical
Methods. Belmont, Massachusetts: Athena Scientific, 1997.
[49] P. O. Vontobel. “Interior-point algorithms for linear-programming decoding”. In: Pro-
ceedings of the IEEE Information Theory Workshop. UC San Diego. La Jolla, CA, Jan. 2008,
pp. 433–437.
[50] T. Wadayama. “An LP decoding algorithm based on primal path-following interior point
method”. In: Proceedings of IEEE International Symposium on Information Theory. Seoul,
Korea, June 2009, pp. 389–393. doi: 10.1109/ISIT.2009.5205741.

86
References

[51] N. Karmarkar. “A new polynomial-time algorithm for linear programming”. Combinator-


ica 4.4 (1984), pp. 373–396.
[52] L. Lovász and A. Schrijver. “Cones of matrices and set-functions and 0-1 optimization”.
SIAM Journal on Optimization 1.2 (1991), pp. 166–190.
[53] M. Grötschel and K. Truemper. “Master polytopes for cycles in binary matroids”. Linear
Algebra and its Applications 114/115 (1989), pp. 523–540.
[54] M. Miwa, T. Wadayama, and I. Takumi. “A cutting-plane method based on redundant
rows for improving fractional distance”. IEEE Journal on Selected Areas in Communications
27.6 (Aug. 2009), pp. 1005–1012. issn: 0733-8716. doi: 10.1109/JSAC.2009.090818.
[55] A. Tanatmis et al. “A separation algorithm for improved LP-decoding of linear block
codes”. IEEE Transactions on Information Theory 56.7 (July 2010), pp. 3277–3289. issn:
0018-9448. doi: 10.1109/TIT.2010.2048489.
[56] R. M. Tanner et al. “LDPC block and convolutional codes based on circulant matrices”.
IEEE Transactions on Information Theory 50.12 (2004), pp. 2966–2984. doi: 10.1109/TIT.
2004.838370.
[57] A. Yufit, A. Lifshitz, and Y. Be’ery. “Efficient linear programming decoding of HDPC
codes”. IEEE Transactions on Communications 59.3 (Mar. 2011), pp. 758–766. issn: 0090-
6778. doi: 10.1109/TCOMM.2011.122110.090729.
[58] S. C. Draper, J. S. Yedidia, and Y. Wang. “ML decoding via mixed-integer adaptive linear
programming”. In: Proceedings of IEEE International Symposium on Information Theory.
Nice, France, June 2007, pp. 1656–1660. doi: 10.1109/ISIT.2007.4557459.
[59] M. Chertkov and V. Y. Chernyak. “Loop calculus helps to improve belief propagation and
linear programming decodings of low-density-parity-check codes”. In: Proceedings of the
44th Annual Allerton Conference on Communication, Control and Computing. Monticello,
IL, Sept. 2006. arXiv: cs/0609154 [cs.IT].
[60] M. Chertkov and V. Y. Chernyak. “Loop calculus in statistical physics and information
science”. Physical Review E 73.6 (June 2006), p. 065102. doi: 10.1103/PhysRevE.73.065102.
arXiv: cond-mat/0601487 [cond-mat.stat-mech].
[61] M. Chertkov and V. Y. Chernyak. “Loop series for discrete statistical models on graphs”.
Journal of Statistical Mechanics: Theory and Experiment 2006 (2006), P06009. doi: 10.1088/
1742-5468/2006/06/P06009. arXiv: cond-mat/0603189 [cond-mat.stat-mech].
[62] M. Chertkov. “Reducing the error floor”. In: Proceedings of the IEEE Information Theory
Workshop. Tahoe City, CA, Sept. 2007, pp. 230–235. doi: 10.1109/ITW.2007.4313079.
[63] P. O. Vontobel and R. Koetter. “On the relationship between linear programming decoding
and min-sum algorithm decoding”. In: Proceedings of the International Symposium on
Information Theory and Applications (ISITA). Parma, Italy, Oct. 2004, pp. 991–996.
[64] J. S. Yedidia, W. T. Freeman, and Y. Weiss. “Constructing free-energy approximations and
generalized belief propagation algorithms”. IEEE Transactions on Information Theory 51.7
(2005), pp. 2282–2312. doi: 10.1109/TIT.2005.850085.

87
References

[65] M. Bayati, D. Shah, and M. Sharma. “Max-product for maximum weight matching:
convergence, correctness, and LP duality”. IEEE Transactions on Information Theory 54.3
(2008), pp. 1241–1251. doi: 10.1109/TIT.2007.915695.

88
Chapter 7

Paper II: ML vs. BP Decoding of Binary and


Non-Binary LDPC Codes

Stefan Scholl, Frank Kienle, Michael Helmling, and Stefan Ruzika

This chapter is a reformatted and revised version of the following publication that was presented
at the ISTC conference and appeared in the refereed conference proceedings:

S. Scholl, F. Kienle, M. Helmling, and S. Ruzika. “ML vs. BP decoding of binary and
non-binary LDPC codes”. In: Proceedings of the International Symposium on Turbo Codes
and Iterative Information Processing. Gothenburg, Sweden, Aug. 2012, pp. 71–75. doi:
10.1109/ISTC.2012.6325201

89
ML vs. BP Decoding of Binary and
Non-Binary LDPC Codes

Stefan Scholl Michael Helmling


Frank Kienle Stefan Rukiza

It has been shown that non-binary LDPC codes exhibit a better error
correction performance than binary codes for short block lengths. How-
ever, this advantage was up to now only shown under belief-propagation
decoding. To gain new insights, we investigate binary and non-binary
codes under ML decoding. Our analysis includes different modulation
schemes and decoding algorithms. For ML decoding under different
modulation schemes, a flexible integer programming formulation is pre-
sented. In this paper, we show that short non-binary LDPC codes are
not necessarily superior to binary codes with respect to ML decoding.
The decoding gain observed under BP decoding originates mainly in
the more powerful non-binary decoding algorithm.
Chapter 7 ML vs. BP Decoding of Binary and Non-Binary LDPC Codes

7.1 Introduction

Forward error correction is a vital part of digital communication systems. LDPC codes [1] have
been adopted in many communication standards, e.g. WiMAX [2], and show an error-correction
capability near the Shannon limit for long block lengths. However, for short block lengths
there is still a gap in communications performance to the theoretical limit.
There exist both binary and non-binary LDPC codes. For short block lengths, non-binary
LDPC codes show a superior performance. The gain of non-binary codes exhibits especially
under higher-order modulation, when the information in the receiver can be fully processed on
symbol level. These advantages of non-binary LDPC codes have been shown by the DAVINCI
project [3], but solely under BP decoding.
When considering a communication scheme based on LDPC channel codes, we have different
orthogonal options respecting:
• channel code: binary LDPC codes or non-binary LDPC codes,
• modulation: BPSK or higher order modulation with bit-level demodulator or symbol-level
demodulator,
• decoding algorithm: suboptimal heuristics (binary/non-binary BP) or optimal decoding
(ML).
These three aspects can be combined together and form a large design space. Not all combina-
tions have been investigated in literature yet, like ML decoding under higher-order modulation.
In this paper we analyze each of the schemes with respect to their communications performance.
An overview over all considered cases is given in Table 7.1.
In general, the communications performance of a channel coding system is determined by two
important factors: (a) the channel code (code structure) used at the transmitter, and (b) the
applied decoding algorithm at the receiver.
Under BP decoding it is not possible to distinguish between the performance contribution of
the code structure and the contribution of the decoding algorithm. Up to now, simulations for
non-binary codes have only been presented under suboptimal BP decoding. To investigate the
performance loss of these suboptimal BP algorithms, we utilize an ML decoder to derive ML
bounds on the decoding performance.
We use an integer programming (IP) formulation to tackle the ML decoding problem. A key
feature of the utilized IP formulation is its high flexibility to cover all transmission cases using
ML decoding (see Table 7.1). We merge two already known IP formulations for binary and
non-binary codes to obtain a new IP formulation that can handle various types of channel
codes and modulation schemes. This IP problem is solved optimally and thus exactly solves the
ML problem.
The task of this paper is to explore the different transmission schemes with respect to per-
formance loss of the suboptimal BP and its low-complexity variant for non-binary codes,

92
7.2 Binary and Non-Binary LDPC Codes

decoding algorithm BPSK 64-QAM bit de- 64-QAM symbol


mod. demod.
binary channel codes
iterative BP (suboptimal) binary BP binary BP
ML (optimal) IP decoder [4] IP decoder [4] our new IP
non-binary channel codes
iterative BP (suboptimal) non-binary BP non-binary BP non-binary BP
ML (optimal) IP decoder [4] on IP decoder [4] on our new IP
binary image binary image
low-complexity decoding [5] our new IP or
modified BP

Table 7.1: Design space for transmission systems based on LDPC codes.

performance gain of symbol-level demodulation, and comparison of the binary and non-binary
code structure.

7.2 Binary and Non-Binary LDPC Codes

In this section we present the used transmission scheme, the decoding algorithms and the code
constructions.

Transmission Scheme

Throughout this paper we assume the following transmission scheme. A code 𝐶 is defined
by its parity-check matrix 𝐇 of dimension (𝑁 − 𝐾) × 𝑁, which has elements from a Galois
field GF(𝑞 = 2u� ) = {𝛼0 , 𝛼1 , … , 𝛼u�−1 } as entries. The sent codewords are denoted by 𝐱 =
(𝑥0 , 𝑥1 , … , 𝑥u�−1 ), 𝑥u� ∈ GF(𝑞).
For a non-binary code (𝑞 > 2) the parity-check matrix can alternatively be described by its
binary image, a binary (𝑁b −𝐾b )×𝑁b matrix 𝐇b [6] (with 𝑁b = 𝑁𝑚 and 𝐾b = 𝐾𝑚). Accordingly,
a non-binary codeword can also be expressed as binary vector 𝐱b = (𝑥b,0 , … , 𝑥b,u�b ) In case of
binary codes we have 𝐱 = 𝐱b and 𝐇 = 𝐇b .
The modulator groups 𝑄 codebits of 𝐱b and maps them onto one complex modulation symbol
𝑠u� ∈ Σ. Here, Σ denotes the modulation alphabet of size 2u� . In case of 64-QAM modulation,
Gray mapping is used. For 𝐱 ∈ 𝐶, denote by 𝐬(𝐱) = (𝑠(𝐱)0 , … , 𝑠(𝐱)u�s−1 ) ∈ Σu�s the vector
of transmitted modulation symbols, where 𝑁s = 𝑁b /𝑄 is the number of symbols per code-
word. A transmission system using a binary code with 64-QAM is called bit-interleaved coded
modulation (BICM).

93
Chapter 7 ML vs. BP Decoding of Binary and Non-Binary LDPC Codes

We assume an AWGN channel with noise variance 𝜎2 that outputs the received vector 𝐲.

Regarding the demodulator, we have different possibilities: BPSK demodulation, 64-QAM


demodulation on bit level and 64-QAM demodulation on symbol level.

The bit-level demodulators (BPSK and 64-QAM bit-level demodulation) calculate a log-likelihood
ratio (LLR) 𝝀 for each code bit 𝑥b,u� , namely

∑u�(𝐱) u�b,u� =0 𝑃(𝑦u� ∣ 𝑠(𝐱)u� )


u� ∶
𝜆(𝑥b,u� ∣ 𝑦u� ) = ln
∑u�(𝐱) u�b,u� =1 𝑃(𝑦u� ∣ 𝑠(𝐱)u� )
u� ∶

2
which reduces to 𝜆u� = 𝑦
u�2 u�
in case of BPSK demodulation.

The symbol demodulator for 64-QAM calculates a log-density ratio (LDR) for each sent symbol.
Each LDR consists of 𝑞 reliabilities 𝜆(𝑘), 𝛼u� ∈ GF(𝑞). The demodulation on symbol level can
be described by

𝑃(𝑦u� ∣ 𝑠u� = 𝑠u� ) 1 2 1 2


𝜆u� (𝑘) = ln ( ) = − 2 ∥𝑦u� − 𝑠u� ∥2 + 2 ∥𝑦u� − 𝑠0 ∥2 .
𝑃(𝑦u� ∣ 𝑠u� = 𝑠0 ) 2𝜎 2𝜎

Throughout this paper we use BPSK or 64-QAM modulation and codes over GF(2) and
GF(64).

Belief-Propagation Decoding

LDPC codes can be efficiently decoded with the well-known belief propagation or sum-product
algorithm. The general algorithm for codes over GF(𝑞) is shortly reviewed in the following.

The BP for LDPC codes [7] operates on the Tanner graph of the parity-check matrix 𝐇 consisting
of variable nodes (corresponding to the code bits or symbols) and check nodes (corresponding
to the parity-check constraints). The variable nodes and check nodes are connected by edges
and iteratively exchange probabilistic messages over these edges. The number of exchanged
messages per edge is equal to 𝑞 − 1. For further details, the reader is referred to [7].

Decoders for binary LDPC codes can be efficiently implemented in hardware. Non-binary BP
decoders are far more complex, especially because of the need to store all exchanged messages
in memory.

One possibility to overcome these large memory requirements is to restrict the number of
exchanged messages to a maximum of 𝑛m per edge (𝑛m ≪ 𝑞). This approach is used in the
extended min-sum (EMS) algorithm [5]. Choosing the value of 𝑛m carefully can considerably
reduce the memory requirements without introducing significant extra loss in communications
performance.

94
7.2 Binary and Non-Binary LDPC Codes

1 0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 CN layers to
protect degree-
0 0 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 1 VNs
0 0 0 1 0 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 1
0 0 0 0 0 1 0 0 0 0 0 1 0 1 1 1 0 0 0 0 0 0 0 0
0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0 0
𝐻Macro = 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0 0
0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 1 1 0 0 0 0 0
0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 1 1 0 0 0 0 CN layers to
initialize hid-
0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 1 0 0 0 den VNs
0 0 0 0 0 1 0 0 0 1 0 1 0 0 0 0 0 0 0 0 1 1 0 0
0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 1 0
0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 1 1

information hidden nodes parity nodes


nodes

Figure 7.2: Macro matrix of a multi-edge type LDPC code of rate 𝑅 = 1/2.

Code Construction

For a fair comparison of binary and non-binary LDPC codes it is crucial to use state-of-the-art
codes. The codes are designed for proper BP performance in the convergence region. We use
multi-edge-type codes [8] as binary LDPC codes and row-optimized non-binary LDPC codes
[6]. In the following, the code constructions are described.

We use a so-called macro matrix to describe the code structure of multi-edge type LDPC codes.
This kind of structure, which can efficiently be implemented in hardware and is utilized in
many communication systems (e.g. [2]) is the common way to represent LDPC codes.

A parity-check matrix is obtained by lifting the macro matrix. Lifting describes the process of
establishing shifted identity 𝑧 × 𝑧 matrices at the positions of ones in the macro matrix. A zero
indicates an all-zero matrix of size 𝑧 × 𝑧 in the parity-check matrix. The corresponding cyclic
shift entries are derived by a progressive edge-growth technique [9].

One showcase macro matrix for a rate 𝑅 = 1/2 multi-edge type code is shown in Figure 7.2. Note
that the hidden nodes themselves require an elaborated connectivity structure. Hidden nodes
are initialized with a zero LLR; thus, we have to avoid stopping sets for the belief propagation
algorithm during code design. Special CN layers are utilized, which have to ensure that the
hidden nodes are initialized with values within the first decoding iteration.

The non-binary LDPC codes are regular (2,4) quasi-cyclic codes over GF(64). We have con-
structed these codes using the graphs by Declercq [10] as macro matrices. To obtain larger
graphs of appropriate length, we again use a lifting process utilizing a progressive edge-growth

95
Chapter 7 ML vs. BP Decoding of Binary and Non-Binary LDPC Codes

100

10−1

frame error rate

10−2

binary (multi-edge type),


10−3 u�b = 1150
binary WiMAX,
u�b = 1152
GF(64), u�b = 1152

10−4
0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4
𝐸b /𝑁0 (dB)

Figure 7.3: BP decoding of three similar binary and non-binary LDPC codes, 𝑁b ≈ 2300. For
each code, the figure shows performance curves with 5, 10, 20 and 40 iterations
(less iterations correspond to higher frame error rate).

technique [9]. The non-binary matrix entries are chosen such as to optimize the local mini-
mum distance of the rows [6]. This optimization ensures state-of-the-art performance in the
convergence region.

With the presented code constructions, we designed a binary multi-edge type code and a
non-binary code over GF(64) that competes with a WiMAX LDPC code. Simulation results
under BP decoding are presented in Figure 7.3. The multi-edge type code and the non-binary
code obtain a significant gain over the WiMAX code. However, the gain of the non-binary code
over the multi-edge type code is negligible for the considered block length of about 2300 bits.

For our further investigations we restrict to much shorter block lengths (𝑁 = 96), because
non-binary codes are stronger for shorter codes and codes of this size can be tackled with the
IP-based ML decoder.

7.3 IP-based ML Decoding

ML decoding is an NP-complete problem, and for most block codes of practical size no efficient
ML decoding algorithms exist. However, by using an IP formulation, the ML problem for short
codes can be tackled with commercially available IP solvers.

96
7.3 IP-based ML Decoding

Our goal is an ML performance analysis based on IP decoding for the different transmission
schemes shown in Table 7.1. We need an IP formulation that is capable of decoding binary and
non-binary codes under bit-level and symbol-level demodulation.

In this section, we present a new flexible IP formulation that can be used for all considered
transmission schemes. We combine two known approaches, merging the binary IP formulation
from [4] with a symbol-level input stage from [11]. This fully flexible formulation can be
applied to all transmission schemes in Table 7.1, covering binary and non-binary codes as well
as higher-order modulation.

The details of the utilized formulation are presented in the following. For each received symbol
𝑦u� and each 𝑠u� ∈ Σ ⧵ {𝑠0 }, we introduce a binary decision variable 𝜉u�u� acting as an indicator that
𝑠(𝐱)u� = 𝑠u� (the case 𝑠(𝐱)u� = 𝑠0 is covered by the configuration that all 𝜉u�u� = 0, 𝑘 = 1, … , 𝑞 − 1).
We assume that 𝑠0 is the symbol which 𝟎 is mapped to by the Gray mapping. The constraints

u�−1
∑ 𝜉u�u� ≤ 1 for 𝑗 = 0, … , 𝑁s − 1 (7.1)
u�=1

ensure that the decoder decides for exactly one symbol for each transmission. The Gray code
used in 64-QAM maps each binary 𝑄-tuple 𝐛 = (𝑏0 , … , 𝑏u�−1 ) to one symbol 𝑔(𝐛) ∈ Σ. For
̄ = {𝑠u� ∈ Σ ∶ (𝑔−1 (𝑠u� ))u� = 1} be the set of channel symbols for which
𝑙 = 0, … , 𝑄 − 1, let 𝑔(𝑙)
the 𝑙-th corresponding bit is 1. With this notation, we can relate the decision variables of the
binary code, 𝐱b = (𝑥b,u� , … , 𝑥b,u�b−1 ) to the channel symbols 𝜉u�u� : Let 𝑖 = 𝑗u� ⋅ 𝑚 + 𝑙u� be the division
with remainder of 𝑖 by 𝑚, then the constraint for 𝑥b,u� is

𝑥b,u� = ∑ 𝜉u�u�u� . (7.2)


u�u� ∈u�(u�
̄ u� )

For the underlying binary code we employ the compact formulation

𝐇b 𝐱b − 2𝐳 = 𝟎 (7.3)

with auxiliary variables 𝐳 ∈ ℤu�b−u�b [4]. This completes the constraint set; as for the the
objective function, a linear expression can be derived immediately from the ML condition:

u�s −1 u�−1
𝐱ML = arg max 𝑃(𝐲 ∣ 𝐬(𝐱)) = arg max ∑ ∑ 𝜉u�u� 𝜆u� (𝑘)
𝐱∈u� 𝐱∈u� u�=0 u�=1

97
Chapter 7 ML vs. BP Decoding of Binary and Non-Binary LDPC Codes

Now we have all the necessary ingredients at hand to state the complete IP:
u�s −1 u�−1
max ∑ ∑ 𝜉u�u� 𝜆u� (𝑘)
u�=0 u�=1
u�−1
s.t. ∑ 𝜉u�u� ≤ 1 for 𝑗 = 0, … , 𝑁s − 1
u�=1
𝑥b,u� = ∑ 𝜉u�u�u� for 𝑖 = 0, … , 𝑁b − 1
u�u� ∈u�(u�
̄ u� )
𝐇b 𝐱b − 2𝐳 = 𝟎
𝝃 ∈ {0, 1}u�−1×u�s
𝐱 ∈ {0, 1}u�b , 𝐳 ∈ ℤu�b−u�b

7.4 Results

In this chapter, we present simulation results for binary and non-binary LDPC codes. We
compare codes with similar code parameters under BP and ML decoding, focussing on the
convergence region. As example codes we use a binary multi-edge type code (𝐾 = 50) and a
non-binary code over GF(64) (𝐾b = 48). For this very short block length the macro matrix
(Figure 7.2) of the multi-edge type code is slightly modified to optimize the minimum distance.
All codes are of rate 𝑅 = 1/2. The BP decoders run with 40 iterations of layered decoding.
In the following we consider three different cases:
(1) Binary and non-binary BP: comparison with respect to ML performance.
(2) Low-complexity decoding: influence of input message truncation.
(3) Higher-order modulation (64-QAM): comparison of demodulation on symbol level and
bit level.

Case 1: Binary and Non-Binary BP

In the first case, we compare binary and non-binary LDPC codes under BPSK modulation.
Figure 7.4 shows the simulation results for ML and BP decoding. Using BP decoding the
non-binary code outperforms the binary one by 0.5 dB.
Furthermore, as a byproduct we show the performance degradation of the suboptimal BP for
LDPC codes. For short block length this degradation is on the order of 1 dB for binary LDPC
codes and 0.5 dB for codes over GF(64).
By utilizing an optimal ML decoder we can distinguish between the gain introduced by the non-
binary code structure and the non-binary decoding algorithm. Comparing the ML performance
(Figure 7.4) reveals that the non-binary code structure is not superior to the binary one. The

98
7.4 Results

100

10−1

10−2
frame error rate

10−3

10−4

10−5 GF(64), BP
binary, BP (multi-edge type
GF(64), ML
10−6 binary, ML(multi-edge type)

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 5.5 6


𝐸b /𝑁0 (dB)

Figure 7.4: Binary vs. non-binary LDPC codes under ML and BP decoding using BPSK modula-
tion.

gain of the non-binary system merely arises from the more powerful but also more complex
non-binary BP. This fact poses the question whether it is a good idea to use the high-complexity
non-binary BP to improve the error-correction performance in new communication systems, or
to rather spend some extra effort (and complexity) for improved decoding of binary LDPC codes.
Various different approaches for improved binary BP algorithms can be found in literature, e.g.
[12].

However, low-complexity versions of the non-binary BP have to be analyzed and compared to


possible new binary decoders with improved decoding performance.

Case 2: Low-Complexity Algorithm

In the second case, we present an analysis of low-complexity EMS decoding [5]. The EMS
decoding algorithm is a suboptimal BP that operates with truncated messages (LDRs). This
truncation reduces the high memory requirements in hardware implementations [13]. Remind,
however, that this truncation introduces a loss in communications performance.

In our case study we analyze the influence of the message truncation at the decoder input.
After the input message truncation we use an ML or an unmodified BP decoder. The results are
lower bounds for the performance of the EMS and similar message truncation algorithms.

99
Chapter 7 ML vs. BP Decoding of Binary and Non-Binary LDPC Codes

100

10−1

frame error rate 10−2

10−3

10−4 without trunc.


u�m = 9
u�m = 13
u�m = 17
10−5
1 1.5 2 2.5 3 3.5 4 4.5
𝐸b /𝑁0 (dB)

Figure 7.5: Low-complexity decoding: performance loss using truncated messages at the de-
coder input. The solid lines show the performance of the EMS algorithm, while the
dashed ones correspond to ML decoding.

Figure 7.5 shows the simulation results for different numbers of used messages 𝑛m .If we process
only 17 out of 64 messages, the memory is greatly reduced and the additional loss is less than
0.1 dB using BP decoding. For the ML decoder the loss is slightly higher. If we use only 9
out of 64 input messages for the BP the complexity reduction is even larger, but more loss in
error-correction performance is introduced. However, even with 𝑛m = 9 the non-binary code
still outperforms the binary code by a few tenths dB.

Case 3: Higher-Order Modulation (64-QAM)

In the third case, we consider higher order modulation (64-QAM). We present a comparison
between bit-level and symbol-level demodulators in conjunction with codes over GF(64).

One advantage of non-binary LDPC codes over GF(64) is that a 64-QAM modulator can map
the code symbols directly onto modulation symbols. On the receiver side, a symbol-level
demodulator and a non-binary decoder can be used to fully process the information on symbol
level, resulting in joint demodulation and decoding. This advantage is expected to give additional
gain.

We measure the performance gain of the symbol-level demodulator over the bit-level demodula-
tor for the short GF(64) code. The results for decoding under BP and ML are given in Figure 7.6.

100
7.5 Conclusion

100

10−1
frame error rate

10−2

10−3

bit demod., BP
10−4 symbol demod., BP
bit demod., ML
symbol demod., ML
10−5
13 13.5 14 14.5 15 15.5 16 16.5 17 17.5 18
𝐸b /𝑁0 (dB)

Figure 7.6: 64-QAM: bit-level demodulation vs. symbol-level demodulation.

Under BP decoding the performance gain of the symbol-level demodulator is approximately


0.2 dB. Under ML decoding, the gain is slightly larger.
This simulations show the superior performance of symbol-level demodulation for non-binary
LDPC codes. In terms of hardware complexity, the symbol-level demodulator needs to calculate
64 distances for each received symbol. However, the EMS decoding algorithm greatly reduces
demodulation complexity, because only 𝑛m distances need to be computed.
Now, we repeat the analysis for a binary code and BICM. With our new IP formulation it is
possible to determine the ML performance of the binary code under symbol-level demodulation.
Simulation results are presented in Figure 7.7. The symbol-level demodulation provides a gain of
a few tenths dB. Additionally, we want to point out that the binary BP looses 2 dB performance
with respect to the ML decoder.

7.5 Conclusion

In this paper we investigated binary and non-binary LDPC codes under different modulation
schemes and decoding algorithms. Especially the new ML decoding results led to new insights
in the decoding performance of non-binary LDPC codes. We have shown that the advantage of
non-binary LDPC codes is mainly due to their non-binary decoding algorithm. Furthermore, we
analyzed low-complexity decoding by message truncation and showed the superior performance
of symbol-level demodulation for higher-order modulation.

101
References

100

10−1

frame error rate


10−2

2 dB

10−3
bit demod., BP
bit demod., ML
symbol demod., ML
10−4
13 14 15 16 17 18 19
𝐸b /𝑁0 (dB)

Figure 7.7: BICM: bit-level demodulation vs. symbol-level demodulation for a binary LDPC
code with 𝐾b = 48.

Acknowledgment

We thank Norbert Wehn and Horst W. Hamacher for their valuable comments and suggestions.
We gratefully acknowledge financial support by the Center of Mathematical and Computational
Modelling of the University of Kaiserslautern.

References

[1] R. G. Gallager. Low-Density Parity-Check Codes. Cambridge, Massachusetts: M.I.T. Press,


1963.
[2] IEEE Standard for Local and Metropolitan Area Networks Part 16: Air Interface for Fixed
and Mobile Broadband Wireless Access Systems Amendment 2: Physical and Medium
Access Control Layers for Combined Fixed and Mobile Operation in Licensed Bands and
Corrigendum 1. Feb. 2006. doi: 10.1109/IEEESTD.2006.99107.
[3] S. Pfletschinger et al. “Performance evaluation of non-binary LDPC codes on wireless
channels”. In: Proc. ICT Mobile Summit. Santander, Spain, June 2009.
[4] M. Breitbach et al. “Soft-decision decoding of linear block codes as optimization problem”.
European Transactions on Telecommunications 9 (1998), pp. 289–293.

102
References

[5] A. Voicila et al. “Low-complexity, low-memory EMS algorithm for non-binary LDPC
codes”. In: IEEE International Conference on Communications. Glasgow, UK, June 2007,
pp. 671–676. doi: 10.1109/ICC.2007.115.
[6] C. Poulliat, M. Fossorier, and D. Declercq. “Design of regular (2,dc )-LDPC codes over
GF(𝑞) using their binary images”. IEEE Transactions on Communications 56.10 (Oct. 2008),
pp. 1626–1635. doi: 10.1109/TCOMM.2008.060527.
[7] D. Declercq and M. Fossorier. “Decoding algorithms for nonbinary LDPC codes over
GF(𝑞)”. IEEE Transactions on Communications 55.4 (Apr. 2007), pp. 633–643. doi: 10.1109/
TCOMM.2007.894088.
[8] T. J. Richardson and R. L. Urbanke. Modern Coding Theory. Cambridge University Press,
2008. isbn: 978-0-521-85229-6.
[9] X.-Y. Hu, E. Eleftheriou, and D. Arnold. “Regular and irregular progressive edge-growth
tanner graphs”. IEEE Transactions on Information Theory 51.1 (Jan. 2005), pp. 386–398.
doi: 10.1109/TIT.2004.839541.
[10] D. Declercq. Sparse Graphs and non-binary LDPC codes Database. 2011. url: http://perso-
etis.ensea.fr/~declercq/graphs.php.
[11] M. F. Flanagan. “A unified framework for linear-programming based communication
receivers”. IEEE Transactions on Communications 59.12 (2011), pp. 3375–3387. doi: 10.
1109/TCOMM.2011.100411.100417.
[12] N. Varnica, M. P. C. Fossorier, and A. Kavcic. “Augmented belief propagation decoding of
low-density parity check codes”. IEEE Transactions on Communications 55.7 (July 2007),
pp. 1308–1317. doi: 10.1109/TCOMM.2007.900611.
[13] T. Lehnigk-Emden and N. Wehn. “Complexity evaluation of non-binary Galois field
LDPC code decoders”. In: Proceedings of the International Symposium on Turbo Codes and
Iterative Information Processing. Brest, France, Sept. 2010, pp. 53–57. doi: 10.1109/ISTC.
2010.5613874.

103
Chapter 8

Paper III: Integer Programming as a Tool for


Analysis of Channel Codes

Stefan Scholl, Frank Kienle, Michael Helmling, and Stefan Ruzika

This chapter is a reformatted and revised version of the following publication that was presented
at the SCC conference and appeared in the refereed conference proceedings:

S. Scholl, F. Kienle, M. Helmling, and S. Ruzika. “Integer programming as a tool for analysis
of channel codes”. In: Proceedings of the 9th international ITG conference on Systems,
Communications and Coding. Munich, Germany, Jan. 2013, pp. 1–6

105
Integer Programming as a Tool for
Analysis of Channel Codes

Stefan Scholl Michael Helmling


Frank Kienle Stefan Rukiza

Linear and integer programming have recently gained interest as new


approaches for decoding channel codes. In this paper, we present a
framework for the analysis of arbitrary linear block codes based on
integer programming. We review how to analyze ML decoding perfor-
mance and minimum distance. It is shown that integer programming
offers an efficient way for ML decoding. Frame error rates for ML de-
coding of various codes with block lengths of several hundred bits are
simulated. Furthermore, we introduce new formulations for weight
distributions and reliability-based decoding heuristics, like Chase and
ordered statistics decoding. New simulation results for WiMAX LDPC
and LTE turbo codes under ML decoding are also shown.
Chapter 8 Integer Programming as a Tool for Analysis of Channel Codes

8.1 Introduction

Forward error correction is an essential part of modern digital communication systems. Differ-
ent channel codes are used in today’s communications applications, like LDPC codes, turbo
codes, BCH and Reed-Solomon (RS) codes.
In hardware implementations of channel decoders, usually suboptimal decoding heuristics are
used that exhibit a low hardware complexity. LDPC codes [1] are decoded by the sum-product
algorithm (SPA) [2] based on the Tanner-graph representation [3]. For turbo codes, the turbo
decoding scheme based on two convolutional decoders plus an interleaver is utilized [4]. For
soft decoding of BCH and RS codes reliability-based heuristics are emerging [5, 6].
The optimal decoding strategy is maximum-likelihood (ML) decoding. However, it is very
difficult to carry out ML decoding on block codes of practical interest because of the high com-
putational complexity. For many codes from communication standards, even the performance
of ML decoding in terms of frame error rate (FER) is unknown.
Another approach to the decoding problem is mathematical optimization theory. The decoding
problem is then formulated as an integer program (IP) or its relaxation as a linear program (LP).
Recently, the optimization-based approach has led to new insights in channel coding and to
new decoding algorithms. In this paper, we show how mathematical optimization can be used
to analyze channel codes. We present a framework, based on IP formulations, which allows us
to analyze:
• the ML decoding performance,
• minimum-distance properties and the error floor, and
• reliability-based decoding heuristics.
After introducing the notation and common decoding algorithms in Section 8.2, we present
a short introduction into IP in Section 8.3. Different techniques to solve IP problems, like
branch-and-bound, cutting-plane algorithms and branch-and-cut, are reviewed.
In Section 8.4 we show IP formulations for analyzing general linear block codes. First, two IP
formulations for ML decoding are reviewed. This approach allows for ML decoding simulations
of WiMAX LDPC [7] and LTE turbo codes [8] of reasonable length (up to 1000 bits). Moreover,
we present two new IP formulations for modelling reliability-based decoding algorithms: Chase
decoding [5] and ordered-statistics decoding (OSD) [6]. A third IP formulation deals with the
code’s minimum-distance properties and allows for error-floor estimation using the weight
distribution and the union bound [9].
In Section 8.5 we present an IP formulation dedicated to turbo codes based on network flows.
It takes into account the special trellis structure of the turbo codes and has proven to be more
efficient than the model for general linear block codes.
Simulation results for ML decoding, the union bound and reliability-based decoding can be
found in Section 8.6.

108
8.2 Preliminaries

8.2 Preliminaries

In this section we introduce the terminology and briefly describe the widely used decoding
algorithms for linear block codes.

8.2.1 Transmission Scheme

Throughout this paper we assume the following transmission scheme: A code 𝐶 is defined by
its parity-check matrix 𝐇 of dimension (𝑁 − 𝐾) × 𝑁. The code rate is given by 𝑅 = 𝐾/𝑁. A
codeword is denoted by 𝐱 = (𝑥0 , 𝑥1 , … , 𝑥u�−1 ), 𝑥u� ∈ {0, 1} and satisfies 𝐇𝐱 = 𝟎 (mod 2). During
transmission, the bits are modulated using BPSK and sent over an additive white Gaussian noise
(AWGN) channel that outputs the received log-likelihood ratios (LLRs) 𝐲 = (𝑦0 , 𝑦1 , … , 𝑦u�−1 ).
Let 𝐲̄ be the hard decision vector of 𝐲, then the transmission can be interpreted as an addition of
a binary error vector 𝐞, such that 𝐲̄ = 𝐱 + 𝐞. The syndrome vector is denoted by 𝐬 = 𝐇𝐲̄ = 𝐇𝐞.

8.2.2 Maximum-Likelihood Decoding

A straightforward ML decoder compares the received LLRs to all code words of the used channel
code. Thus 2u� metrics have to be calculated. Since 2u� is very large for codes of practical length,
this approach is very time consuming.

8.2.3 Belief-Propagation Decoding

LDPC codes can be decoded efficiently but suboptimally by the well-known SPA [2]. The SPA
for LDPC codes works on the Tanner-graph representation of the parity-check matrix 𝐇. In
the Tanner graph, probabilistic messages are exchanged iteratively between the variable nodes
and the check nodes.

Decoders for LDPC codes based on the suboptimal iterative SPA can be implemented efficiently
in hardware, but introduce a loss in communications performance due to cycles in the Tanner
graph. This problem has a higher impact for codes of smaller block length and higher code
rate.

8.2.4 Turbo Decoding

Turbo codes are usually decoded with the well-known turbo decoding algorithm. It basically
consists of two convolutional decoders with soft output stages that are connected by an
interleaver. The two convolutional decoders iteratively exchange probabilistic messages [4].
Turbo decoders based on this iterative heuristic can efficiently be implemented in hardware.
However, they also introduce a loss in communications performance.

109
Chapter 8 Integer Programming as a Tool for Analysis of Channel Codes

8.2.5 Reliability-Based Decoding

Reliability-based decoding algorithms are alternative decoding algorithms for linear block
codes. They do not demand a sparse parity-check matrix, so they are suitable for soft-decoding
of BCH and RS codes as well. The two main representatives are the Chase algorithm [5] and
OSD [6].

In a Chase decoder, at first the 𝑙 least reliable bit positions (LRP) are determined. All configura-
tions of these bits are enumerated and an algebraic hard decision decoder is applied to each
configuration, thereby forming a list of codeword candidates. A list decoding procedure then
selects the most probably sent codeword.

OSD is based on a re-encoding procedure and is applicable for arbitrary linear block codes. First,
the most reliable independent bits positions (MRIP) have to be found. The MRIPs are treated
as information bits and are re-encoded using a modified parity-check or generator matrix to
obtain a codeword. To improve decoding performance, this process is repeated several times,
but with some of the MRIPs flipped. The process of flipping all possible combinations of 𝑙
bits and applying re-encoding is called order-𝑙 reprocessing. A list decoder selects the most
probably sent codeword.

8.3 Fundamentals of Solving IP Problems

In this section, we introduce the general framework of integer programming models and recap
some solution strategies.

Let 𝐜 ∈ ℚu� , 𝐛 ∈ ℚu� , and 𝐀 ∈ ℚu�×u� be given. Let 𝐱 ∈ ℝu� denote the vector of variables.
Integer programming is the mathematical discipline which deals with the optimization of a
linear objective function 𝐜u� 𝐱 over the feasible set

𝑃u� ∶= 𝑃 ∩ ℤu� ∶= {𝐱 ∶ 𝐀𝐱 = 𝐛} ∩ ℤu� = {𝐱 ∈ ℤu� ∶ 𝐀𝐱 = 𝐛}.

A general integer programming problem can thus be written as

min or max 𝐜u� 𝐱


s. t. 𝐀𝐱 = 𝐛
𝐱 ∈ ℤu� .

Without loss of generality, we may consider minimization problems only. In contrast to linear
programming problems, solutions are required to be integral. General IP problems—as well as
many special cases—are NP-hard. However, due to extensive studies of theoretical properties,
the development of sophisticated methods, as well as increasing computational capabilities,
IP proved to be very useful to model and solve many real-world optimization problems (e. g.
production planning, scheduling, routing problems, …).

110
8.4 IP Formulations for Linear Block Codes

IP problems can be solved in a naïve way by explicitly enumerating all possible values of the
variable vector 𝐱 and choosing the one that yields the minimal objective function value. Though
correct, this procedure is guaranteed to be exponential in the number of components of 𝐱. To
avoid excessive computational effort, a theory inducing more efficient algorithms was developed
which relies on techniques like implicit enumeration, relaxation and bounds, and the usage of
problem-specific structural properties. In the following, we will highlight some basic concepts
while an in-depth exposition to this field can be found in [10].
Branch-and-bound is a wide-spread technique realizing the concept of divide-and-conquer in
the context of IP: for each 𝑖 ∈ {0, … , 𝑛 − 1} and 𝑣 ∈ ℤ, an optimal solution 𝐱∗ either satisfies
𝑥∗u� ≤ 𝑣 or 𝑥∗u� ≥ 𝑣 + 1. These two constraints induce two subproblems (each possessing a smaller
feasible set) and, for at least one of them, 𝑥∗ is optimal. Iterative application of this branching
step yields IP problems of manageable size. For each (sub)problem, primal and dual bounds are
obtained by relaxation techniques and heuristics. They allow to prune branches of the search
tree, thus reducing the search area (implicit enumeration). More advanced topics like variable
fixing, warm-start techniques, or the computation of bounds contribute to the efficiency of the
procedure.
Cutting-plane algorithms rely on the idea of solving the IP problem as the linear programming
problem
min{𝐜u� 𝐱 ∶ 𝐱 ∈ conv(𝑃u� )}.
Yet, the convex hull of the feasible set of the IP problem is in general hard to compute explicitly.
Therefore, approaches are developed which iteratively solve the LP relaxation of the IP and
compute a cutting plane, i.e., a valid inequality separating the feasible set 𝑃u� from the optimal
solution of the LP relaxation. These cuts are added to the formulation in all subsequent iterations.
Important questions address the convergence of this procedure, polyhedral properties of the
polyhedra of interest, as well as the generation of strong valid inequalities.
A mixture of strategies such as branch-and-cut and the utilization of problem-intrinsic structure
might lead to enhanced solution strategies. Implementing an efficient IP problem solver is
certainly a demanding task. For a special purpose solver, the problem has to be thoroughly
understood, a suitable algorithm has to be chosen and realized efficiently. But there also exists
a bandwidth of all-purpose solvers, both open-source (like GLPK, COIN-OR) and commercial
(e.g. CPLEX, Gurobi), which may be sufficient to solve various different IP problems such as
the ones mentioned in this article.

8.4 IP Formulations for Linear Block Codes

In this section, several IP formulations are presented that cover different aspects of code analysis.
All formulations are general and can therefore be applied to arbitrary linear block codes. In the
first part of this section we review two closely related IP formulations for the ML decoding
problem. In the second part we present two new IP formulations that model Chase decoding
and OSD. In the third part, minimum distance properties are investigated.

111
Chapter 8 Integer Programming as a Tool for Analysis of Channel Codes

8.4.1 ML Decoding

The ML decoding problem can be formulated as an integer linear optimization problem in two
ways as presented in [11] and [12] as follows:

u�−1 u�−1
min ∑ 𝑦u� 𝑥u� min ∑ ∣𝑦u� ∣ 𝑒u�
u�=0 u�=0
(8.1)
s.t. 𝐇𝐱 − 2𝐳 = 𝟎 s.t. 𝐇𝐞 − 2𝐳 = 𝐬
𝐱 ∈ {0, 1}u� , 𝐳 ∈ ℤu�−u� 𝐞 ∈ {0, 1}u� , 𝐳 ∈ ℤu�−u� .

The formulation on the left relies on the parity-check equation, whereas the one on the right
relies on the syndrome and the error vector 𝑒. Both formulations model the ML decoding
problem exactly and are thus equivalent.

The variables 𝐱 model bits of the codeword, while 𝐳 is a vector of artificial variables to account
for the modulo-2 arithmetic. Interestingly, these formulations prove to be efficient enough
such that a general purpose IP solver can tackle the problem for codes of practical interest. For
Monte-Carlo simulation of the FER performance, this IP has to be solved millions of times, once
for every simulated frame.

We are able to run ML simulations for LDPC codes, turbo codes, BCH codes and the binary
image [13] of Reed-Solomon codes with the same IP formulation.

8.4.2 Chase Decoding and OSD

With some modifications, IP formulations can model not only the ML decoding problem, but
also some decoding heuristics. We introduce two new formulations, one for Chase decoding
and one for OSD, that take into account the reliability based decoding. The key element of the
new IP formulations are additional constraints which ensure that reliable bits and unreliable
bits are treated basically different.

Chase decoding allows all errors at the least reliable positions to be corrected, but only 𝑡 errors
at the most reliable positions (MRP) (𝑡 is the number of bits the algebraic decoder can correct).
The set of indices containing the MRP is denoted by 𝐼MRP . Chase decoding is modeled as

u�−1
min ∑ ∣𝑦u� ∣ 𝑒u�
u�=0
s.t. 𝐇𝐞 − 2𝐳 = 𝐬
∑ 𝑒u� ≤ 𝑡
u�∈u�MRP
𝐞 ∈ {0, 1}u� , 𝐳 ∈ ℤu�−u� .

112
8.4 IP Formulations for Linear Block Codes

OSD decoding corrects 𝑙 errors at the most reliable independent positions (MRIP) and all errors
at the remaining low reliable positions. The set of indices containing the MRIP is denoted by
𝐼MRIP . OSD decoding is modeled by the IP

u�−1
min ∑ ∣𝑦u� ∣ 𝑒u�
u�=0
s.t. 𝐇𝐞 − 2𝐳 = 𝐬
∑ 𝑒u� ≤ 𝑙
u�∈u�MRIP
𝐞 ∈ {0, 1}u� , 𝐳 ∈ ℤu�−u� .

Note that these formulations induce tighter feasibility polyhedra compared to those in (8.1)
due to the additional constraints. It can thus be expected that the related problems are also
easier to solve.

8.4.3 IP Formulation for Weight-Profile Analysis

IP formulations are not only able to analyze the FER performance of channel codes and their
decoding algorithms. Other interesting code parameters like minimum distance and the weight
distribution {𝐴1 , 𝐴2 , … , 𝐴u� } can be investigated using IP.

In [14] an IP formulation was presented to calculate the minimum distance. In order to obtain
the weight distribution coefficients, we slightly modify this formulation by replacing the weight
constraint. The modified formulation for the weight distribution looks for valid codewords 𝐱
with weight 𝑤. The number of solutions for a specific weight 𝑤 equals the weight distribution
coefficient 𝐴u� .

The two IP formulations for minimum-distance computation (left) and weight-distribution


analysis (right) are as follows:

u�−1
min ∑ 𝑥u� min 0
u�=0
s.t. 𝐇𝐱 − 2𝐳 = 𝟎 s.t. 𝐇𝐱 − 2𝐳 = 𝟎
u�−1 u�−1
(8.2)
∑ 𝑥u� ≥ 1 ∑ 𝑥u� = 𝑤
u�=0 u�=0
𝐱 ∈ {0, 1}u� , 𝐳 ∈ ℤu�−u� 𝐱 ∈ {0, 1}u� , 𝐳 ∈ ℤu�−u� .

where with the latter formulation we instruct the IP solver to output all optimal solutions. Note
that in contrast to other minimum-distance calculation methods, this approach is applicable to
arbitrary linear block codes.

113
Chapter 8 Integer Programming as a Tool for Analysis of Channel Codes

8.5 Specialized IP Formulation for Turbo Codes

Up to now, we have presented IP formulations that are general and can be applied to every
linear block code. However, the decoding problem can be modeled more efficiently if special
code structures are taken into account. In this section, we present a dedicated formulation for
ML decoding of turbo codes based on network flows.

The formulation is based on trellis graphs 𝐺u� of the constituent encoders 𝐶u� , 𝜈 ∈ {𝑎, 𝑏},
and the fact that ML decoding amounts to the determination of a shortest path through both
trellises which respects certain equality constraints in order to guarantee consistency with the
interleaver.

The trellis graphs of the constituent encoders 𝐶u� are modeled by flow conservation and capacity
constraints [15], along with side constraints appropriately connecting the flow variables 𝐟u�
(𝜈 ∈ {𝑎, 𝑏}) to auxiliary variables 𝐱u� and 𝐱u� , respectively, which embody the codeword bits.

For 𝜈 ∈ {𝑎, 𝑏}, let 𝐺u� = (𝑆u� , 𝐸u� ) be the trellis according to 𝐶u� , where 𝑆u� is the index set of
nodes (states) and 𝐸u� is the set of edges (state transitions) 𝑒 in 𝐺u� . For 𝑠 ∈ 𝑆 let out(𝑠) and
in(𝑠) denote the sets of outgoing and inbound edges, respectively, of 𝑠. Let 𝑠start,u� and 𝑠end,u�
denote the unique start and end node, respectively, of 𝐺u� . We can then define a feasible flow
𝐟u� in the trellis 𝐺u� by the system

∑ 𝑓u�
u� = 1, (8.3a)
u�∈out(u�start,u� )
∑ 𝑓u� u� = 1, (8.3b)
u�∈in(u�end,u� )
∑ 𝑓u� u� = ∑ 𝑓u�
u� for all 𝑠 ∈ 𝑆u� ⧵ {𝑠start,u� , 𝑠end,u� }, (8.3c)
u�∈out(u�) u�∈in(u�)
u�
𝑓u� ∈ {0, 1} for all 𝑒 ∈ 𝐸u� . (8.3d)

The integrality conditions (8.3d) ensure that the solution is actually a path, i. e., no fractional
flow values can occur. For the related problem of LP decoding, these constraints are relaxed to
0 ≤ 𝑓u�
u� ≤ 1.

Let 𝐼u� u�
u� and 𝑂u� denote the set of edges in 𝐺u� whose corresponding input and output bit,
respectively, is a 1 (both being subsets of the 𝑗-th segment of 𝐺u� ), the following constraints
relate the bits of the codeword 𝐱 = (𝐱u� , 𝐱u� , 𝐱u� ) to the flow variables:

𝑥u� u�
u� = ∑ 𝑓u� for 𝑗 = 1, … , 𝐾 and 𝜈 ∈ {𝑎, 𝑏}, (8.4a)
u�∈u�u�
u�

𝑥u�u� = ∑ 𝑓u�u� for 𝑗 = 1, … , 𝐾, (8.4b)


u�∈u�u�u�

𝑥u�u�(u�) = ∑ 𝑓u�u� for 𝑗 = 1, … , 𝐾. (8.4c)


u�∈u�u�u�

114
8.6 Simulation Results

We can now state the network-flow based turbo code formulation (TC-NFP) as

min ∑ (𝑦u� )u� 𝑥u� + (𝑦u� )u� 𝑥u� (TC-NFP)


u�∈{u�,u�}
s.t. (8.3)–(8.4) hold.

where the LLR vector 𝐲 is split in the same way as 𝐱.

Since all 𝑥 variables in TC-NFP are auxiliary, we can replace each occurrence by the sum of
flow variables defining it. In doing so, (8.4b) and (8.4c) break down to the condition

∑ 𝑓u�u� = ∑ 𝑓u�u� for 𝑗 = 1, … , 𝐾. (8.5)


u�∈u�u�u�(u�) u�∈u�u�u�

This shows that TC-NFP constitutes a shortest path problem (in the LP relaxation: a minimum
cost flow problem) plus the 𝐾 additional side constraints (8.5).

Due to the presence of equations (8.3d), TC-NFP is an integer program and hard to solve in
general. However, commercial IP solvers like CPLEX are able to detect the network flow
sub-structure and utilize it to speed up the solution process.

8.6 Simulation Results

In this section we present simulation results obtained by the IP formulations described above.
The corresponding IP problems have been solved using the commercial state-of-the-art IP
solver CPLEX [16]. The codes are investigated with respect to ML performance, error floor and
performance of reliability decoding. The following codes are considered:

• WiMAX LDPC codes (𝑅 = 5/6),

• WiMAX-like LDPC codes (𝑅 = 1/2),

• BCH and RS codes,

• LTE turbo codes.

In the following, we use the union bound

u�
𝐸 ⎞
𝑃u� ≤ ∑ 𝐴u� 𝑄 ⎛
⎜√2𝑖𝑅 u� ⎟
𝑁0⎠
u�=1 ⎝
for an error floor estimation, where 𝐸u� is the energy per information bit, 𝑄(𝑥) is the Q-function
and 𝑅 is the code rate [9]. The weight distribution coefficients 𝐴u� have been obtained by the IP
formulations from (8.2).

115
Chapter 8 Integer Programming as a Tool for Analysis of Channel Codes

block length 𝐴5 𝐴6 𝐴7 𝐴8
𝑁 = 576 24 0 312
𝑁 = 672 0 0 168 924

Table 8.1: Weight distribution for the WiMAX LDPC codes with 𝑅 = 5/6.

100
(576, 480) SPA
(576, 480) ML
10−1 (576, 480) UB
(672, 560) SPA
10−2 (672, 560) ML
(672, 560) UB
frame error rate

(864, 720) SPA


10−3 (864, 720) ML
(1056, 880) SPA
(1056, 880) ML
10−4

10−5

10−6

10−7
2.5 3 3.5 4 4.5 5 5.5 6
𝐸b /𝑁0 (dB)

Figure 8.2: ML vs. SPA decoding of four WiMAX LDPC codes with 𝑅 = 5/6. The dotted lines
show the union bound estimation of the two smallest codes.

8.6.1 WiMAX-Compliant LDPC, 𝑅 = 5/6

First we consider some high-rate LDPC codes from the WiMAX standard [7] with code rate
𝑅 = 5/6 (Figure 8.2). ML decoding is done with the parity-check formulation (left hand side of
(8.1)). Furthermore, the union bound for error-floor estimation based on the weight distributions
in Table 8.1 is given.

The ML results are compared with SPA decoding using 40 iterations with layered scheduling.
The loss of SPA ranges from 0.5 to 1 dB. A significant improvement in the error floor region
can also be observed. Note that in hardware implementations of WiMAX LDPC decoders
usually fewer iterations of the SPA are carried out. Moreover, often suboptimal-check node
implementations are used that introduce additional loss.

116
8.6 Simulation Results

100
(96, 48) SPA
10−1 (96, 48) ML
(96, 48) UB
(240, 120) SPA
10−2 (240, 120) ML
(240, 120) UB
frame error rate

10−3 (384, 192) SPA


(384, 192) ML
10−4 (384, 192) UB

10−5

10−6

10−7

10−8
0 1 2 3 4 5 6 7
𝐸b /𝑁0 (dB)

Figure 8.3: ML vs. SPA decoding of four WiMAX-like LDPC codes with 𝑅 = 1/2. The dotted
lines show the union bound estimation.

8.6.2 WiMAX-like LDPC, 𝑅 = 1/2

The block lengths of the standard-compliant WiMAX LDPC codes with rate 1/2 are too large
to be tackled by the IP solver. Therefore we designed some non-standard LDPC codes with
shorter block lengths that have a similar structure as the WiMAX-compliant codes. During
code design, we follow the design process of the WiMAX compliant codes. We take the macro
matrix from the WiMAX standard and apply a lifting process to obtain a parity-check matrix
[7]. In contrast to the (longer) WiMAX-compliant codes, we use smaller matrices for lifting
and apply a progressive edge-growth technique [17] to determine the shift values.

The results are LDPC codes with shorter block lengths than specified in the WiMAX standard,
but they have the same underlying structure. Therefore, we call these codes “WiMAX-like”
LDPC codes. Figure 8.3 shows the simulation results under ML decoding and SPA. The SPA
performance is obtained by using 40 iterations and layered scheduling.

8.6.3 BCH and RS Codes

Decoding of BCH and RS codes is considered using soft information (Figure 8.4). We present
results for ML decoding, Chase decoding and OSD for the BCH(1023,993) code. Furthermore,
the RS(63,55) code is investigated under OSD for its binary image [13].

117
Chapter 8 Integer Programming as a Tool for Analysis of Channel Codes

100
BCH, Chase(5)
BCH, Chase(10)
10−1 BCH, OSD(1)
BCH, OSD(2)
BCH, ML
frame error rate 10−2 RS, OSD(2)
RS, OSD(3)
RS, OSD(4)
10−3

10−4

10−5

10−6
3 3.5 4 4.5 5 5.5 6 6.5 7 7.5 8 8.5
𝐸b /𝑁0 (dB)

Figure 8.4: Decoding performance of the BCH(1023,993) and RS(63,55) codes under Chase, OSD
and ML decoding.

8.6.4 LTE Compliant Turbo Codes

ML decoding of turbo codes is done using the network-flow-based IP formulation (Section 8.5).
This formulation exploits the turbo code structure and therefore has a higher simulation speed.

In Figure 8.5, simulation results for LTE turbo codes [8] are given. The ML performance is
compared to the turbo decoding algorithm using Log-MAP and 8 iterations. Note that in
hardware implementations usually the Max-Log-MAP is used, which introduces an additional
loss of about 0.1 dB.

ML as Lower Bound for New Decoding Algorithms

The newly obtained ML decoding simulation results are lower bounds on the FER for decod-
ing algorithms. This is very useful to evaluate the correction capability of (new) decoding
algorithms.

Figure 8.6 shows different decoding algorithms for an exemplary (96,48) LDPC code. Besides
the SPA, we apply OSD and SPA with information correction (6 bits corrected) proposed in [18].
It can easily be read off that OSD(3) nearly achieves ML performance, OSD(2) approaches the
ML performance by 0.3 dB. The SPA with information correction of 6 bits performs within
0.5 dB.

118
8.6 Simulation Results

100
u� = 40, Log-MAP
u� = 40, ML
u� = 72, Log-MAP
10−1 u� = 72, ML
u� = 120, Log-MAP
u� = 120, ML
frame error rate

10−2

10−3

10−4

10−5
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5
𝐸b /𝑁0 (dB)

Figure 8.5: Comparison of ML and turbo decoding (Max-Log-MAP with 8 iterations) using LTE
turbo codes of three different lengths with 𝑅 = 1/3.

100

10−1

10−2
frame error rate

10−3

10−4
SPA (20 iter.)
SPA (500 iter.)
10−5 SPA with infor-
mation correction
OSD(2)
10−6 RS, OSD(3)
ML

10−7
0 1 2 3 4 5 6
𝐸b /𝑁0 (dB)

Figure 8.6: Comparison of different decoding algorithm for an exemplary (96,48) LDPC code.

119
References

8.7 Conclusion

This paper provides a framework to analyze linear block codes using integer programming.
Several IP formulations have been presented that allow to analyze ML decoding performance,
minimum-distance properties, the error floor and reliability-based decoding algorithms. Es-
pecially, new IP formulations have been presented for the Chase algorithm and OSD as well
as for evaluating the weight distributions. It has been shown that ML decoding via IP can be
very efficient. Block codes of lengths up to 1000 bits have been successfully simulated in a
reasonable amount of time. New simulation results for ML decoding of various linear block
codes have been shown. The presented as well as additional ML simulation results can be found
online [19].

Acknowledgment

We gratefully acknowledge partial financial support by the DFG (project-ID: KI 1754/1-1) as


well as the DAAD project MathTurbo (project-ID: 54565400).

References

[1] R. G. Gallager. Low-Density Parity-Check Codes. Cambridge, Massachusetts: M.I.T. Press,


1963.
[2] D. J. C. MacKay. “Good error-correcting codes based on very sparse matrices”. IEEE
Transactions on Information Theory 45.2 (Mar. 1999), pp. 399–431. doi: 10.1109/18.748992.
[3] R. M. Tanner. “A recursive approach to low complexity codes”. IEEE Transactions on
Information Theory 27.5 (Sept. 1981), pp. 533–547. issn: 0018-9448. doi: 10.1109/TIT.1981.
1056404.
[4] C. Berrou, A. Glavieux, and P. Thitimajshima. “Near shannon limit error-correcting
coding and decoding: turbo-codes”. In: IEEE International Conference on Communications.
May 1993, pp. 1064–1070. doi: 10.1109/ICC.1993.397441.
[5] D. Chase. “Class of algorithms for decoding block codes with channel measurement
information”. IEEE Transactions on Information Theory 18.1 (Jan. 1972), pp. 170–182. doi:
10.1109/TIT.1972.1054746.
[6] M. P. C. Fossorier and S. Lin. “Soft-decision decoding of linear block codes based on
ordered statistics”. IEEE Transactions on Information Theory 41.5 (Sept. 1995), pp. 1379–
1396. doi: 10.1109/18.412683.
[7] IEEE Standard for Local and Metropolitan Area Networks Part 16: Air Interface for Fixed
and Mobile Broadband Wireless Access Systems Amendment 2: Physical and Medium
Access Control Layers for Combined Fixed and Mobile Operation in Licensed Bands and
Corrigendum 1. Feb. 2006. doi: 10.1109/IEEESTD.2006.99107.

120
References

[8] Evolved Universal Terrestrial Radio Access (E-UTRA); Multiplexing and Channel Coding
(Release 8). Technical Specification. 3rd Generation Partnership Project; Group Radio
Access Network, Dec. 2008.
[9] S. Lin and D. Costello Jr. Error Control Coding. 2nd ed. Upper Saddle River, NJ: Prentice-
Hall, Inc., 2004. isbn: 0130426725.
[10] G. L. Nemhauser and L. A. Wolsey. Integer and Combinatorial Optimization. Wiley-
Interscience series in discrete mathematics and optimization, John Wiley & Sons, 1988.
[11] M. Breitbach et al. “Soft-decision decoding of linear block codes as optimization problem”.
European Transactions on Telecommunications 9 (1998), pp. 289–293.
[12] A. Tanatmis et al. “A separation algorithm for improved LP-decoding of linear block
codes”. IEEE Transactions on Information Theory 56.7 (July 2010), pp. 3277–3289. issn:
0018-9448. doi: 10.1109/TIT.2010.2048489.
[13] M. El-Khamy and R. J. McEliece. “Iterative algebraic soft-decision list decoding of Reed-
Solomon codes”. IEEE Journal on Selected Areas in Communications 24.3 (Mar. 2006),
pp. 481–490. doi: 10.1109/JSAC.2005.862399.
[14] M. Punekar et al. “Calculating the minimum distance of linear block codes via integer
programming”. In: Proceedings of the International Symposium on Turbo Codes and Iterative
Information Processing. Brest, France, Sept. 2010, pp. 329–333. doi: 10.1109/ISTC.2010.
5613894.
[15] R. K. Ahuja, T. L. Magnanti, and J. B. Orlin. Network Flows. Prentice-Hall, 1993.
[16] IBM ILOG CPLEX Optimization Studio. Software Package. Version 12.4. 2011.
[17] X.-Y. Hu, E. Eleftheriou, and D. Arnold. “Regular and irregular progressive edge-growth
tanner graphs”. IEEE Transactions on Information Theory 51.1 (Jan. 2005), pp. 386–398.
doi: 10.1109/TIT.2004.839541.
[18] N. Varnica and M. Fossorier. “Belief-propagation with information correction: improved
near maximum-likelihood decoding of low-density parity-check codes”. In: Proceedings
of IEEE International Symposium on Information Theory. Chicago, IL, June–July 2004. doi:
10.1109/ISIT.2004.1365380.
[19] M. Helmling and S. Scholl. Database of ML Simulation Results. Ed. by University of
Kaiserslautern. 2014. url: http://www.uni-kl.de/channel-codes.

121
Chapter 9

Paper IV: Towards an Exact Combinatorial


Algorithm for LP Decoding of Turbo Codes

Michael Helmling and Stefan Ruzika

The following chapter is a reformatted and revised copy of a preprint that is publicly available
online (http://arxiv.org/abs/1301.6363). A reduced version of the paper (due to the limitation of
5 pages for conference submissions, all proofs were omitted) was presented at the 2013 ISIT
conference and appeared in the following refereed conference proceedings:

M. Helmling and S. Ruzika. “Towards combinatorial LP turbo decoding”. In: Proceedings of


IEEE International Symposium on Information Theory. Istanbul, Turkey, July 2013, pp. 1491–
1495. doi: 10.1109/ISIT.2013.6620475. arXiv: 1301.6363 [cs.IT]

123
Towards an
Exact Combinatorial Algorithm
for LP Decoding of Turbo Codes

Michael Helmling Stefan Ruzika

We present a novel algorithm that solves the turbo code LP decoding


problem in a finite number of steps by Euclidean distance minimizations,
which in turn rely on repeated shortest-path computations in the trellis
graph representing the turbo code. Previous attempts to exploit the
combinatorial graph structure only led to algorithms which are either
of heuristic nature or do not guarantee finite convergence. A numerical
study shows that our algorithm clearly beats the running time, up to a
factor of 100, of generic commercial LP solvers for medium-sized codes,
especially for high SNR values.
Chapter 9 Towards an Exact Combinatorial Algorithm for LP Decoding of Turbo Codes

9.1 Introduction

Since its introduction by Feldman et al. in 2002 [1], Linear Programming-based channel
decoding has gained tremendous interest because of its analytical power—LP decoding exhibits
the maximum-likelihood (ML) certificate property [2], and the decoding behavior is completely
determined by the explicitly described “fundamental” polytope [3]—combined with noteworthy
error-correcting performance and the availability of efficient decoding algorithms.
Turbo codes, invented by Berrou et al. in 1993 [4], are a class of concatenated convolutional
codes that, together with a heuristic iterative decoding algorithm, feature remarkable error-
correcting performance.
While the first paper on LP decoding [1] actually dealt with turbo codes, the majority of
publications in the area of LP decoding now focusses on LDPC codes [5] which provide similar
performance (cf. [6] for a recent overview). Nevertheless, turbo codes have some analytical
advantages, most importantly the inherent combinatorial structure by means of the trellis
graph representations of the underlying convolutional encoders. ML Decoding of turbo codes
is closely related to shortest-path and minimum-network-flow problems, both being classical,
well-studied topics in optimization theory for which plenty efficient solution methods exist.
The hardness of ML decoding is caused by additional conditions on the path through the trellis
graphs (they are termed agreeability constraints in [1]) posed by the turbo code’s interleaver.
Thus ML (LP) decoding is equivalent to solving a (LP-relaxed) shortest-path problem with
additional linear side constraints.
So far, two methods for solving the LP have been proposed: General purpose LP solvers like
CPLEX [7] are based on the matrix representation of the LP problem. They utilize either the
simplex method or interior point approaches [8], but do not exploit any structural properties
of the specific problem. Lagrangian relaxation in conjunction with subgradient optimization
[1, 9], on the other hand, utilizes this structure, but has practical limitations, most notably it
usually converges very slowly.
This paper presents a new approach to solve the LP decoding problem exactly by an algorithm
that exploits its graphical substructure, thus combining the analytical power of the LP approach
with the running-time benefits of a combinatorial method which seems to be a necessary
requirement for practical implementation. Our basic idea is to construct an alternative polytope
in the space defined by the additional constraints (called constraints space) and show how
the LP solution corresponds to a specific point 𝑧𝒬 LP of that polytope. Then, we show how to
computationally find 𝑧𝒬 LP by a geometric algorithm that relies on a sequence of shortest-path
computations in the trellis graphs.
The reinterpretation of constrained optimization problems in constraints space was first de-
veloped in the context of multicriteria optimization in [10], where it is applied to minimum
spanning tree problems with a single side constraint. In 2010, Tanatmis [11] applied this theory
to the turbo decoding problem. His algorithm showed a drastic speedup compared to a general
purpose LP solver, however it only works for up to two constraints, while in real-world turbo
codes the number of constraints equals the information length.

126
9.2 Background and Notation

By adapting an algorithm by Wolfe [12] to compute in a polytope the point with minimum
Euclidean norm, we are able to overcome these limitations and decode turbo codes with lengths
of practical interest. The algorithm is, compared to previous methods, advantageous not only in
terms of running time, but also gives valuable information that can help to improve the error-
correcting performance. Furthermore, branch-and-bound methods for integer programming-
based ML decoding depend upon fast lower bound computations, mostly given by LP relaxations,
and can often be significantly improved by dedicated methods that evaluate combinatorial
properties of the LP solutions. Since our LP decoder contains such information, it could also be
considered a step towards IP-based algorithms with the potential of practical implementation.

9.2 Background and Notation

9.2.1 Definition of Turbo Codes

A 𝑘-dimensional subspace 𝐶 of the vector space 𝔽u�2 (where 𝔽2 = {0, 1} denotes the binary field),
is called an (𝑛, 𝑘) binary linear block code, where 𝑛 is the block length and 𝑘 the information (or
input) length. One way to define a code is by an appropriate encoding function 𝑒u� , for which
any bijective linear mapping from 𝔽u�2 onto 𝐶 qualifies. This paper deals with turbo codes [4], a
special class of block codes built by interconnecting (at least) two convolutional codes (see e.g.
[13]). For the sake of clear notation, we focus on turbo codes as used in the 3GPP LTE standard
[14]—i.e., systematic, parallely concatenated turbo codes with two identical terminated rate-1
constituent encoders—despite the fact that our approach is applicable to arbitrary turbo coding
schemes. An in-depth covering of turbo code construction can be found in [15].
An (𝑛, 𝑘) turbo code TC = TC(𝐶, 𝜋) is defined by a rate-1 convolutional (𝑛u� , 𝑘) code 𝐶 with
constraint length 𝑑 and a permutation 𝜋 ∈ 𝕊u� such that 𝑛 = 𝑘 + 2 ⋅ 𝑛u� . Because we consider
terminated convolutional codes only (i.e., there is a designated terminal state of the encoder),
the final 𝑑 bits of the information sequence (also called the tail) are not free to choose and thus
can not carry any information. Consequently, those bits together with the corresponding 𝑑
output bits are considered part of the output, which yields 𝑛u� = 𝑘 + 2 ⋅ 𝑑 and a code rate slightly
below 1. Let 𝑒u� ∶ 𝔽u�2 ⟶ 𝔽u�2u� be the associated encoding function. Then, the encoding function
of TC is defined as
𝑒TC ∶ 𝔽u�2 ⟶ 𝔽u�+2⋅u�
2
u�
(9.1)
𝑒TC (𝑥) = (𝑥 ∣ 𝑒u� (𝑥) ∣ 𝑒u� (𝜋(𝑥)))
where 𝜋(𝑥) = (𝑥u�(1) , … , 𝑥u�(u�) ). In other words, the codeword for an input word 𝑥 is obtained
by concatenating
• a copy of 𝑥 itself,
• a copy of 𝑥 encoded by 𝐶, and
• a copy of 𝑥, permuted by 𝜋 and encoded by 𝐶 afterwards.
Figure 9.1 shows a circuit-type visualization of this definition.

127
Chapter 9 Towards an Exact Combinatorial Algorithm for LP Decoding of Turbo Codes

𝑥
𝑥 ∈ 𝔽u�2 𝐶 𝑒u� (𝑥)
𝜋
𝐶 𝑒u� (𝜋(𝑥))

Figure 9.1: Turbo encoder with two convolutional encoders 𝐶 and interleaver 𝜋.

9.2.2 Trellis Graphs of Convolutional Codes

A convolutional code with a specific length is represented naturally by its trellis graph, which
is obtained by unfolding the code-defining finite state machine in the time domain: Each vertex
of the trellis represents the state at a specific point in time, while edges correspond to valid
transitions between two subsequent states and exhibit labels with the corresponding input and
output bit, respectively. The following description of convolutional codes loosely follows [6,
Section V.C], albeit the notation slightly differs.

We denote a trellis by 𝑇 = (𝑉, 𝐸) with vertex set 𝑉 and edge set 𝐸. Vertices are indexed by time
step and state; i.e., 𝑣u�,u� denotes the vertex corresponding to state 𝑠 ∈ {0, … , 2u� − 1} at time
𝑖 ∈ {1, … , 𝑘 + 𝑑 + 1}. An edge in turn is identified by the time and state of its tail vertex plus
its input label, so 𝑒u�,u�,u� denotes the edge outgoing from 𝑣u�,u� with input bit 𝑏 ∈ {0, 1}. We call
vertical “slices”, i.e., the subgraphs induced by the edges of a single time step, segments of the
trellis. Formally, the segment at time 𝑖 is

𝑆u� = (𝑉u� , 𝐸u� ),


where 𝑉u� = {𝑣u�,u� ∈ 𝑉 ∶ 𝑗 ∈ {𝑖, 𝑖 + 1}}
and 𝐸u� = {𝑒u�,u�,u� ∈ 𝐸 ∶ 𝑗 = 𝑖} .

Because the initial and final state of the convolutional encoder are fixed, the leading as well as
the trailing 𝑑 segments contain less than 2u� vertices. Figure 9.2 shows the first few segments of
a trellis with 𝑑 = 2.

By construction, the paths from the starting node to the end node in a trellis of a convolutional
code 𝐶 are in one-to-one correspondence with the codewords of 𝐶: Let 𝐼u� ⊂ 𝐸u� and 𝑂u� ⊂ 𝐸u� be
those edges of 𝑆u� whose input label and output label, respectively, is a 1. The correspondence
between a codeword 𝑦 ∈ 𝔽u�+2⋅u�
2 and the corresponding path 𝑃 = (𝑒1 , … , 𝑒u�+u� ) in 𝑇 is given by


{𝑒 ∈ 𝐼u�+u� for 1 ≤ 𝑖 ≤ 𝑑,
𝑦u� = 1 ⇔ ⎨ u�+u� (9.2)
{
⎩𝑒u�−u� ∈ 𝑂u�−u� for 𝑑 < 𝑖 ≤ 𝑘 + 2 ⋅ 𝑑,

where the first part accounts for the 𝑑 “input” tail bits that are prepended by convention. From
(9.2), for each 𝑒 ∈ 𝐸 an index set 𝐽u� (𝑒) can be computed with the property that 𝑒 ∈ 𝑃 ⇒ 𝑦u� = 1
for all 𝑗 ∈ 𝐽u� (𝑒). In our case, ∣𝐽u� (𝑒)∣ varies from 0 (for edges in 𝑆u� , 𝑖 ≤ 𝑘, with output label 0) to
2 (for edges in 𝑆u� , 𝑘 + 1 ≤ 𝑖 ≤ 𝑘 + 𝑑, with both input and output label 1).

128
9.2 Background and Notation

0 0
0 0 0 0 0 0 0

1 1 1 1 1 1 1

1
0

2 2 2 2 2 2 2
1

3 3 3 3 3 3 3

𝑆1 𝑆2

Figure 9.2: Excerpt from a trellis graph with four states and initial state 0. The style of an edge
indicates the respective information bit, while the labels refer to the single parity
bit.

The path-codeword relation can be exploited for maximum-likelihood (ML) decoding, if the
codewords are transmitted through a memoryless binary-input output-symmetric (MBIOS)
channel: Let 𝜆 ∈ ℝu�+2⋅u� be the vector of LLR values of the received signal. If we assign to each
edge 𝑒 ∈ 𝐸 the cost
𝑐(𝑒) = ∑ 𝜆u� ,
u�∈u�u� (u�)

it can be shown [2] that the shortest path in 𝑇 corresponds to the ML codeword.

9.2.3 Trellis Representation of Turbo Codes

For turbo codes, we have two isomorphic trellis graphs, 𝑇1 and 𝑇2 , according to the two compo-
nent convolutional encoders. Let formally 𝑇 = (𝐺1 ∪ 𝐺2 , 𝐸1 ∪ 𝐸2 ), and by 𝑃 = 𝑃1 ∘ 𝑃2 denote
the path that consists of 𝑃1 in 𝑇1 and 𝑃2 in 𝑇2 . Only certain paths, called agreeable, actually
correspond to codewords; namely, an agreeable path 𝑃1 ∘ 𝑃2 = (𝑒11 , … , 𝑒1u�+u� , 𝑒21 , … , 𝑒2u�+u� ) must
obey the 𝑘 consistency constraints

𝑒1u� ∈ 𝐼1u� ⇔ 𝑒2u�(u�) ∈ 𝐼2u�(u�) for 𝑖 = 1, … , 𝑘 (9.3)

because both encoders operate on the same information word, only that it is permuted for
the second encoder. Consequently, ML decoding for turbo codes can be formulated as finding
the shortest agreeable path in 𝑇. If an agreeable path contains 𝑒1u� ∈ 𝐼1u� , it must also contain
𝑒2u�(u�) ∈ 𝐼2u�(u�) , and thus 𝑖 ∈ 𝐽u� (𝑒) for both 𝑒 = 𝑒1u� and 𝑒 = 𝑒2u�(u�) . To avoid counting the LLR
values 𝜆u� of the systematic bits (1 ≤ 𝑖 ≤ 𝑘) twice in the objective function, we use the modified
cost
⎧ u�u�
̂ ̂ { 2 if 1 ≤ 𝑗 ≤ 𝑘,
𝑐(𝑒) = ∑ 𝜆u� with 𝜆u� = ⎨ (9.4)
u�∈u�u� (u�) ⎩𝜆u� otherwise.
{

129
Chapter 9 Towards an Exact Combinatorial Algorithm for LP Decoding of Turbo Codes

Then, the ML decoding problem for turbo codes can be stated as the combinatorial optimization
problem

(TC-ML) min ∑ 𝑐(𝑒) (9.5a)


u�∈u�=u�1 ∪u�2
s.t. 𝑃1 is a path in 𝑇1 (9.5b)
𝑃2 is a path in 𝑇2 (9.5c)
𝑃 is agreeable. (9.5d)

The codeword variables 𝑦u� can be included into (TC-ML) by the constraints

⎧ u�
{ ∑ u� for 1 ≤ 𝑖 ≤ 𝑘,
{u�u�(u�)∋u� 2
𝑦u� = ⎨ (9.6)
{
{ ∑ 𝑓u� for 𝑖 > 𝑘,
⎩u�u�(u�)∋u�

where the factor 1/2 is analogical to (9.4). However, these variables are purely auxiliary in the
LP and thus not needed.

It is straightforward to formulate TC-ML as an integer linear program by introducing a binary


flow variable 𝑓u� ∈ {0, 1} for each 𝑒 ∈ 𝐸1 ∪ 𝐸2 . The constraints (9.5b) and (9.5c) can be restated
in terms of flow conservation and capacity constraints [16] which define the path polytopes
𝒫1path and 𝒫2path , respectively. By also transforming (9.3) and (9.5a), we obtain

(TC-IP) min ∑ 𝑐(𝑒) ⋅ 𝑓u� (9.7a)


u�∈u�1 ∪u�2
s.t. 𝑓1 ∈ 𝒫1path (9.7b)
𝑓2 ∈ 𝒫2path (9.7c)
∑ 𝑓u� = ∑ 𝑓u� 𝑖 = 1, … , 𝑘 (9.7d)
u�∈u�1u� u�∈u�2u�(u�)

𝑓u� ∈ {0, 1}, 𝑒 ∈ 𝐸. (9.7e)

9.2.4 Polyhedral Theory Background

Besides coding theory, this paper requires some bits of polyhedral theory. A polytope is the
convex hull of a finite number of points: 𝒫 = conv (𝑣1 , … , 𝑣u� ). It can be described either by its
vertices (or extreme points), i.e., the unique minimal set fulfilling this defining property, or as the
intersection of a finite number of halfspaces: 𝒫 = ⋂u� u� u�
u�=1 {𝑥 ∶ 𝑎u� 𝑥 ≤ 𝑏u� }. An inequality 𝑎 𝑥 ≤ 𝑏
u�
is called valid for 𝒫 if it is true for all 𝑥 ∈ 𝒫. In that case, the set 𝐹u�,u� = {𝑥 ∈ 𝒫 ∶ 𝑎 𝑥 = 𝑏} is
called the face induced by the inequality. For any 𝑟 satisfying 𝑎u� 𝑟 ≥ 𝑏 (𝑎u� 𝑟 > 𝑏) we say that the
inequality separates (strongly separates) 𝑟 from 𝒫.

130
9.3 The LP Relaxation and Conventional Solution Methods

9.3 The LP Relaxation and Conventional Solution Methods

ML decoding of general linear block codes is known to be NP-hard [17]. While the computa-
tional complexity of TC-IP is still open, it is widely believed that this problem is NP-hard as
well, which would imply that no polynomial-time algorithm can solve TC-IP unless P=NP.1
By relaxing (9.7e) to 𝑓u� ∈ [0, 1], we get the LP relaxation (referred to as TC-LP) of the integer
program TC-IP, which in contrast can be solved efficiently by the simplex method or interior
point approaches [8]. Feldman et al. [1] were the first to analyze this relaxation and attested it
reasonable decoding performance.

A general purpose LP solver, however, does not make use of the combinatorial substructure
contained in TC-IP via (9.7b) and (9.7c) and thus wastes some potential of solving the problem
more efficiently—while LPs are solvable in polynomial time, they do not scale too well, and the
number of variables (about 2 ⋅ |𝑉| = (𝑘 + 𝑑) ⋅ 2u�+2 ) and constraints (|𝑉| + 𝑘) in TC-LP is very
large (practical values of 𝑑 range roughly from 3 to 8).

Note that without the consistency constraints (9.7d), we could solve TC-LP by simply computing
shortest paths in both trellis graphs, which is possible in time 𝒪(𝑑 ⋅ (𝑘 + 𝑑)) (as 𝑑 is usually
considered fixed, this amounts to a linear time complexity), even in the presence of negative
weights, because the graphs are acyclic [19]. A popular approach for solving optimization
problems that comprise “easy” subproblem plus some “complicating” additional constraints is
to solve the Lagrangian dual [20] by subgradient optimization. If we define 𝑔u� (𝑓) = ∑u�∈u�1 𝑓u� −
u�
∑u�∈u�2 𝑓u� , the constraints (9.7d) can be compactly rewritten as
u�(u�)

𝑔u� (𝑓) = 0 for 𝑖 = 1, … , 𝑘. (9.8)

The Lagrangian relaxation with multiplier 𝜇 ∈ ℝu� is defined as


u�
(TC-LR) 𝑧(𝜇) = min ∑ 𝑐(𝑒) ⋅ 𝑓u� + ∑ 𝜇u� ⋅ 𝑔u� (𝑓) (9.9a)
u�∈u�1 ∪u�2 u�=1
s.t. 𝑓1 ∈ 𝒫1path (9.9b)
𝑓2 ∈ 𝒫2path (9.9c)
𝑓u� ∈ {0, 1}, 𝑒 ∈ 𝐸. (9.9d)

For all 𝜇 ∈ ℝu� , the objective value of TC-LR is smaller or equal to that of TC-LP. The Lagrangian
dual problem is to find multipliers 𝜇 that maximize this objective, thus minimizing the gap to
the LP solution. It can be shown that in the optimal case both values coincide. Note that the
feasible region of TC-LR is the combined path polytope of both 𝑇1 and 𝑇2 , so it can be solved
by a shortest-path routine in both trellises with modified costs, and the integrality condition
on 𝑓 is fulfilled automatically. Applying Lagrangian relaxation to turbo decoding was already
proposed by Feldman et al. [1] and further elaborated by Tanatmis et al. [9]; the latter reference
1
Note that with state-of-the-art software and prohibitive computational effort, ML turbo decoding can be
simulated off-line on desktop computers at least for small and medium code sizes [18].

131
Chapter 9 Towards an Exact Combinatorial Algorithm for LP Decoding of Turbo Codes

combines the approach with a heuristic to tighten the integrality gap between TC-LP and
TC-IP.

The Lagrangian dual is typically solved by a subgradient algorithm that iteratively adjusts the
multipliers 𝜇, converging (under some mild conditions) to the optimal value [20]. However,
the convergence is often slow in practice and the limit is not guaranteed to be ever reached
exactly. Additionally, the dual only informs us about the objective value; recovering the actual
solution of the problem requires additional work. In summary, subgradient algorithms suffer
from three major flaws. The main result of this paper is an alternative algorithm which exhibits
none of these.

9.4 An Equivalent Problem in Constraints Space

Like Lagrangian dualization, our algorithm also uses a relaxed formulation of TC-IP with
modified objective function that resembles TC-LR. However, via geometric interpretation of the
image of the path polytope in the “constraints space”, as defined below, the exact LP solution is
found in finitely many steps.

9.4.1 The Image Polytope 𝒬

Let 𝒫path = 𝒫1path × 𝒫2path be the feasible region of TC-LR. We define the map

𝔇 ∶ 𝒫path → ℝu�+1
𝑓 ↦ (𝑔1 (𝑓), … , 𝑔u� (𝑓), 𝑐(𝑓))u�

where 𝑐(𝑓) = ∑u�∈u�1∪u�2 𝑐(𝑒) ⋅ 𝑓u� is a short hand for the objective function value of TC-LP. For a
path 𝑓, the first 𝑘 coordinates 𝑣u� , 𝑖 = 1, … , 𝑘, of 𝑣 = 𝔇(𝑓) tell if and how the conditions 𝑔u� (𝑓) = 0
are violated, while the last coordinate 𝑣u�+1 equals the cost of 𝑓. Let 𝒬 = 𝔇(𝒫path ) be the image
of the path polytope under 𝔇. The following results are immediate:

9.1 Lemma:

(1) 𝒬 is a polytope.

(2) If 𝑓 represents an agreeable path in 𝑇, then 𝔇(𝑓) is located on the (𝑘 + 1)st axis (henceforth
called 𝑐-axis or 𝐴u� ).

(3) If 𝑣 is a vertex of 𝒬 and 𝑣 = 𝔇(𝑓) for some 𝑓 ∈ 𝒫path , then 𝑓 is also a vertex of 𝒫path . C

In the situation that 𝑣 = 𝔇(𝑓) we will also write 𝑓 = 𝔇−1 (𝑣) with the meaning that 𝑓 is any
preimage of 𝑣, which need not be unique.

132
9.4 An Equivalent Problem in Constraints Space

We consider the auxiliary problem

(TC-LP𝒬 ) 𝑧𝒬
LP = min 𝑣u�+1 (9.10a)
s.t. 𝑣∈𝒬 (9.10b)
𝑣 ∈ 𝐴u� (9.10c)

the solution of which is the lower “piercing point” of the axis 𝐴u� through 𝒬. Note that due to
(9.10c), 𝑘 of the 𝑘 + 1 variables in TC-LP(𝒬) are fixed to zero, thus the problem is in a sense
one-dimensional, the feasible region being the (one-dimensional) projection of 𝒬 onto 𝐴u� .
Nevertheless, the following theroem shows that TC-LP𝒬 and TC-LP are essentially equivalent.

9.2 Theorem: Let 𝑣LP be an optimal solution of TC-LP𝒬 with objective value 𝑧𝒬 LP and 𝑓LP =
𝔇−1 (𝑣LP ) ∈ 𝒫path the corresponding flow. Then 𝑧𝒬
LP = 𝑧 LP , the optimal objective value of TC-LP,
and 𝑓LP is an optimal solution of TC-LP. C

Proof. First we show 𝑧𝒬LP ≤ 𝑧LP . Let 𝑓LP be an optimal solution of TC-LP with cost 𝑐(𝑓LP ) = 𝑧LP .
Then 𝔇(𝑓LP ) = (0, … , 0, 𝑧LP ) by definition of 𝔇, since 𝑓LP is feasible and thus 𝑔1 (𝑓LP ) = ⋯ =
𝑔u� (𝑓LP ) = 0. Hence 𝔇(𝑓LP ) ∈ 𝐴u� ∩ 𝒬 with 𝔇(𝑓LP )u�+1 = 𝑧LP , from which it follows that
𝑧𝒬
LP ≤ 𝑧LP .

If we assume on the other hand that 𝑧𝒬LP < 𝑧LP , there must be a 𝑣 ∈ 𝐴u� ∩ 𝒬 such that 𝑣u�+1 < 𝑧LP .
By definition of 𝔇 this implies the existence of a flow 𝑓 = 𝔇−1 (𝑣) with 𝑔1 (𝑓) = ⋯ = 𝑔u� (𝑓) = 0,
hence a feasible one, and 𝑐(𝑓) = 𝑣u�+1 < 𝑧LP , contradicting optimality of 𝑧LP . 

While we do not have an explicit representation of 𝒬 (by means of either vertices or inequalities)
at hand, we can easily minimize linear functionals over 𝒬:

9.3 Observation: The problem

(LP𝒬 ) min 𝛾u� 𝑣


s.t. 𝑣 ∈ 𝒬

can be solved by first computing an optimal solution 𝑓∗ of the weighted sum problem
u�
(TC-WS) min ∑ 𝛾u� ⋅ 𝑔u� (𝑓) + 𝛾u�+1 ⋅ 𝑐(𝑓)
u�=1
s.t. 𝑓 ∈ 𝒫path

and then taking the image of 𝑓∗ under 𝔇. As noted before, this can be achieved within running
time 𝒪(𝑛). C

Note that TC-WS is closely related to TC-LR: as long as 𝛾u�+1 ≠ 0, we get the same problem by
setting 𝜇u� = 𝛾u� /(𝛾u�+1 ) in TC-LR.

133
Chapter 9 Towards an Exact Combinatorial Algorithm for LP Decoding of Turbo Codes

9.4.2 Solving TC-LP𝒬 with Nearest-Point Calculations

Our algorithm solves TC-LP𝒬 by a series of nearest-point computations between 𝒬 and ref-
erence points 𝑟u� on 𝐴u� , the last of which gives a face of 𝒬 containing the optimal solution
𝑣LP .
For each 𝑟 ∈ ℝu�+1 , we denote by

NP(𝑟) = arg min ‖𝑣 − 𝑟‖2


u�∈𝒬

the nearest point to 𝑟 in 𝒬 with respect to Euclidean norm and define 𝑎(𝑟) = 𝑟 − NP(𝑟) and
𝑏(𝑟) = 𝑎(𝑟)u� NP(𝑟). The following well-known result will be used frequently below.
9.4 Lemma: The inequality
𝑎(𝑟)u� 𝑣 ≤ 𝑏(𝑟) (9.11)
is valid for 𝒬 and induces a face containing NP(𝑟), which we call NF(𝑟). If 𝑟 ∉ 𝒬, (9.11) strongly
separates 𝑟 from 𝒬. C

The following theorem is the foundation of our algorithm.


9.5 Theorem: There exists an 𝜀 > 0 such that 𝑣LP ∈ NF(𝑟) holds for all 𝑟 inside the open line
segment (𝑣LP , 𝑣LP − (0, … , 0, 𝜀)u� ). C

Our constructive proof of Theorem 9.5 shows how find a point inside the interval mentioned
in the theorem. The outline is as follows: At first, start with a reference point 𝑟 ∈ 𝐴u� that is
guaranteed to be located below 𝑣LP . Then, we iteratively compute NF(𝑟) and update 𝑟 to be the
intersection of 𝐴u� with the hyperplane defining NF(𝑟). The following lemmas show that this
procedure, which is illustrated in Figure 9.3 for the two-dimensional case, is valid and finite.
The first result is that the hyperplane defining NF(𝑟) is always oriented “downwards”.
9.6 Lemma: Let 𝑟 = (0, … , 0, 𝜌)u� with 𝜌 < 𝑧𝒬 u�
LP and let 𝑎(𝑟) 𝑣 ≤ 𝑏(𝑟) be the inequality defined
in (9.11). Then 𝑎(𝑟)u�+1 < 0. C

Proof. Assuming 𝑎(𝑟)u�+1 ≥ 0, we obtain 𝑎(𝑟)u� 𝑣LP = 𝑎(𝑟)u�+1 𝑧𝒬 u�


LP ≥ 𝑎(𝑟)u�+1 𝜌 = 𝑎(𝑟) 𝑟 > 𝑏(𝑟),
which contradicts 𝑣LP ∈ 𝒬 by Lemma 9.4. Note that the equalities hold because both 𝑣LP and
𝑟 are elements of 𝐴u� , the first inequality stems from the assumptions on 𝑎(𝑟)u�+1 and 𝜌, and the
second follows from Lemma 9.4. 

Next we show that updating 𝑟 leads to a different nearest face, unless we have arrived at the
optimal solution.
9.7 Lemma: Under the same assumptions as in Lemma 9.6, let 𝑠 ∈ ℝu�+1 with
𝑏(𝑟)
𝑠 = (0, … , 0, )
𝑎(𝑟)u�+1
be the point where the separating hyperplane and 𝐴u� intersect. If NF(𝑟) = NF(𝑠), then 𝑠 = 𝑣LP .C

134
9.4 An Equivalent Problem in Constraints Space

𝐴u� 𝐴u�

𝒬 𝒬

𝑟u�+1
𝑣u�+1
𝑟u� 𝑟u�

𝑣u� 𝑣u�

(a) Step i: 𝑣u� = 𝔇(𝑓u� ) is found as nearest point (b) Step 𝑖 + 1: Note that the induced face of 𝒬
to some reference point 𝑟u�−1 . The intersec- here is a facet, while it was a 0-dimensional
tion of the separating hyperplane with the face in step 𝑖.
axis 𝐴u� , 𝑟1 , will be the reference point of the
next iteration.
𝐴u� 𝐴u�

𝒬 𝒬

𝑟u�+2 = 𝑣LP 𝑟u�+2 = 𝑣u�+3 = 𝑣LP

𝑣u�+2 𝑟u�+1 𝑣u�+2

𝑣u�+1 𝑣u�+1

(c) Step 𝑖 + 2 (zoomed in): The facet induced in (d) Step 𝑖 + 3: Optimality is detected by 𝑣u�+3 =
this step intersects 𝐴u� at 𝑣LP , but the algo- 𝑟u�+2 . The solution 𝔇−1 (𝑣LP ) is returned.
rithm can not yet detect this.

Figure 9.3: Schematic execution of Algorithm 9.1 in image space.

135
Chapter 9 Towards an Exact Combinatorial Algorithm for LP Decoding of Turbo Codes

Proof. We use contraposition to show that 𝑠 ≠ 𝑣LP implies NF(𝑟) ≠ NF(𝑠), so assume 𝑠 ≠ 𝑣LP .
We know that 𝑎(𝑟)u� 𝑣 ≤ 𝑏(𝑟) is valid for 𝒬 and 𝑎(𝑟)u� 𝑠 = 𝑎(𝑟)u�+1 𝑠u�+1 = 𝑏(𝑟) by construc-
tion. This implies that 𝑠 ∉ 𝒬; otherwise we would have 𝑠 = 𝑣LP because for all 𝜁 < 𝑠u�+1 ,
𝑎(𝑟)u� (𝜁𝑒u�+1 ) > 𝑏(𝑟), so 𝑠 would really be the lowest point on 𝐴u� that is also in 𝒬 and thus
optimal.

It follows that 𝑦u� = NP(𝑠) ≠ 𝑠. Since 𝑦u� ∈ 𝒬 and 𝑎(𝑟)u� 𝑣 ≤ 𝑏(𝑟) is valid for 𝒬, we have
𝑎(𝑟)u� 𝑦u� ≤ 𝑏(𝑟).

Case 1: 𝑎(𝑟)u� 𝑦u� < 𝑏(𝑟). Then 𝑦u� ∉ NF(𝑟), but 𝑦u� ∈ NF(𝑠) by definition, which proves the
claim for this case.

Case 2: 𝑎(𝑟)u� 𝑦u� = 𝑏(𝑟). From 𝑎(𝑟)u� 𝑟 > 𝑏(𝑟) and 𝑎(𝑟)u� 𝑠 = 𝑏(𝑟) we obtain

𝑎(𝑟)u� 𝑟 > 𝑎(𝑟)u� 𝑠 (9.12a)


⇒ 𝑎(𝑟)u� (𝑟 − 𝑠) > 0 (9.12b)
⇒ 𝑎(𝑟)u�+1 (𝑟u�+1 − 𝑧u�+1 ) > 0 (9.12c)
⇒ 𝑟u�+1 < 𝑠u�+1 , (9.12d)

where we have used again 𝑎(𝑟)u�+1 < 0 and the fact that 𝑟, 𝑠 ∈ 𝐴u� .

Applying Lemma 9.6 to 𝑠 as reference point we obtain 𝑎(𝑠)u�+1 = (𝑠 − 𝑦u� )u�+1 < 0,
hence

(𝑦u� )u�+1 > 𝑠u�+1 (9.13a)


⇒ (𝑦u� )u�+1 (𝑠u�+1 − 𝑟u�+1 ) > 𝑠u�+1 (𝑠u�+1 − 𝑟u�+1 ) by (9.12d) (9.13b)
⇒ 𝑦u�u� (𝑠 − 𝑟) > 𝑠u� (𝑠 − 𝑟) (9.13c)
⇒ 𝑦u�u� 𝑠 − 𝑦u�u� 𝑟 + 𝑠u� 𝑟 − 𝑠u� 𝑠 >0 (9.13d)

Plugging the definitions into 𝑎(𝑟)u� 𝑦u� = 𝑏(𝑟) = 𝑎(𝑟)u� 𝑠 yields (𝑟 − NP(𝑟))u� 𝑦u� =
(𝑟 − NP(𝑟))u� 𝑠 or 𝑟u� 𝑠 − 𝑟u� 𝑦u� = NP(𝑟)u� 𝑠 − NP(𝑟)u� 𝑦u� . Using this we continue from
(9.13d) with

⇒ 𝑦u�u� 𝑠 + NP(𝑟)u� 𝑠 − NP(𝑟)u� 𝑦u� − 𝑠u� 𝑠 > 0 (9.13e)


⇒ NP(𝑟)u� (𝑠 − 𝑦u� ) > 𝑠u� (𝑠 − 𝑦u� ) (9.13f)
⇒ 𝑎(𝑠)u� NP(𝑟) > 𝑎(𝑠)u� 𝑠 > 𝑏(𝑠) (9.13g)

Thus, NP(𝑟) ∉ NF(𝑠) = {𝑣 ∈ 𝒬 ∶ 𝑎(𝑠)u� 𝑣 ≤ 𝑏(𝑠)}, but NP(𝑟) ∈ NF(𝑟) by definition, so


those faces must differ. 

Now we show the auxiliary result that if two inequalities induce the same face, then also every
convex combination of them does.

136
9.4 An Equivalent Problem in Constraints Space

9.8 Lemma: Let 𝒫 be a polytope, 𝑥1 , 𝑥2 ∈ 𝒫, and 𝑟1 , 𝑟2 ∉ 𝒫. If the inequalities

𝐻1 ∶ (𝑟1 − 𝑥1 )u� 𝑥 ≤ (𝑟1 − 𝑥1 )u� 𝑥1


and 𝐻2 ∶ (𝑟2 − 𝑥2 )u� 𝑥 ≤ (𝑟2 − 𝑥2 )u� 𝑥2

both induce the same face 𝐹 of 𝒫, then also

𝐻̄ ∶ (𝑟 ̄ − 𝑥)̄ u� 𝑥 ≤ (𝑟 ̄ − 𝑥)̄ u� 𝑥 ̄

with 𝑟 ̄ = 𝜆𝑟1 + (1 − 𝜆)𝑟2 , 𝑥 ̄ = 𝜆𝑥2 + (1 − 𝜆)𝑥2 , 0 ≤ 𝜆 ≤ 1, is valid and induces 𝐹. C

Proof. We first show that 𝐻̄ is valid. For 𝑥 ∈ 𝒫

(𝑟 ̄ − 𝑥)̄ u� 𝑥 = 𝜆(𝑟1 − 𝑥1 )u� 𝑥 + (1 − 𝜆)(𝑟2 − 𝑥2 )u� 𝑥


≤ 𝜆(𝑟1 − 𝑥1 )u� 𝑥1 + (1 − 𝜆)(𝑟2 − 𝑥2 )u� 𝑥2
u�
= (𝜆(𝑟1 − 𝑥1 ) + (1 − 𝜆)(𝑟2 − 𝑥2 )) (𝜆𝑥1 + (1 − 𝜆)𝑥2 )
= (𝑟 ̄ − 𝑥)̄ u� 𝑥,̄

where we have used the fact that 𝐻1 is satisfied with equality for 𝑥 = 𝑥2 and vice versa because
of the assumptions. Since we have shown that 𝐻̄ is valid, it must induce a face 𝐹.̄ It remains to
show that 𝐹 = 𝐹.̄
𝐹 ⊆ 𝐹:̄ 𝑥 ∈ 𝐹 fulfills both 𝐻1 and 𝐻2 with equality, so we can carry out the above calculation
with an “=” in the second line to conclude 𝑥 ∈ 𝐹.̄
𝐹 ̄ ⊆ 𝐹: Let 𝑥 ∈ 𝐹 ̄ and assume 𝑥 ∉ 𝐹, which implies (𝑟u� − 𝑥u� )u� 𝑥 < (𝑟u� − 𝑥u� )u� 𝑥u� for 𝑖 ∈ {1, 2}.
Then 𝜆(𝑟1 − 𝑥1 )u� 𝑥1 + (1 − 𝜆)(𝑟2 − 𝑥2 )u� 𝑥2 > 𝜆(𝑟1 − 𝑥1 )u� 𝑥 + (1 − 𝜆)(𝑟2 − 𝑥2 )u� 𝑥 =
(𝑟 ̄ − 𝑥)̄ u� 𝑥 = (𝑟 ̄ − 𝑥)̄ u� 𝑥 ̄ = 𝜆(𝑟1 − 𝑥1 )u� 𝑥1 + (1 − 𝜆)(𝑟2 − 𝑥2 )u� 𝑥2 , which is a contradiction.
This concludes the proof. 

The above lemma is used to show that the part of 𝐴u� that lies below 𝑣LP dissects into intervals
such that reference points within one interval yield the same face of 𝒬.

9.9 Lemma: If 𝑟1 , 𝑟2 ∈ 𝐴u� with 𝑟1u�+1 < 𝑟2u�+1 < 𝑧𝒬 1 2 1


LP and NF(𝑟 ) = NF(𝑟 ), then NF(𝑟) = NF(𝑟 )
1 2
for all 𝑟 ∈ [𝑟 , 𝑟 ]. C

Proof. Let 𝑣u� = NP(𝑟u� ) for 𝑖 ∈ {1, 2}. By Lemma 9.8, for each 𝜆 ∈ (0, 1) and 𝑟 ̄ = 𝜆𝑟1 +(1−𝜆)𝑟2 ,
𝑣 ̄ = 𝜆𝑣1 + (1 − 𝜆)𝑣2 , it holds

{𝑣 ∈ 𝒬 ∶ (𝑟 ̄ − 𝑣)̄ u� 𝑣 = (𝑟 ̄ − 𝑣)̄ u� 𝑣}̄ = NF(𝑟1 ),

and applying the converse statement from Lemma 9.4 follows 𝑣 ̄ = NP(𝑟),̄ so NF(𝑟)̄ = NF(𝑟1 )
as claimed. 

Now we have alle the ingredients at hand to prove our theorem.

137
Chapter 9 Towards an Exact Combinatorial Algorithm for LP Decoding of Turbo Codes

Proof (of Theorem 9.5). First we show that there exists at least one 𝑟 with the desired proper-
ties.
Choose some arbitrary 𝑟0 ∈ 𝐴u� with 𝑟0u�+1 < 𝑧𝒬 0 0
LP (thus 𝑟 ∉ 𝒬). If 𝑣LP ∈ NF(𝑟 ), we are done.
Otherwise, Lemma 9.7 tells us how to find an 𝑟 with 𝑟u�+1 > 𝑟u�+1 such that NF(𝑟1 ) ≠ NF(𝑟0 ).
1 1 0

Iterating this argument and assuming that 𝑣LP is never contained in the induced face results in
a sequence (𝑟u� )u� with 𝑟u�+1 u�
u�+1 > 𝑟u�+1 for all 𝑖. Because of Lemma 9.9, NF(𝑟
u�+1 ) ≠ NF(𝑟u� ) implies
u�+1 u� u�
NF(𝑟 ) ≠ NF(𝑟 ) for all 0 ≤ 𝑙 < 𝑖 + 1, so that all NF(𝑟 ) are distinct. But since there are
only finitely many faces of 𝒬, this can not be true, so eventually there must be an 𝑖∗ such that

𝑣LP ∈ NF(𝑟u� ).
Now let 𝑟∗ ∈ 𝐴u� be any such point whose existence we have just proven, 𝑣u� = NP(𝑟∗ ) and
𝜆 ∈ (0, 1]. Let 𝑟 ̄ = 𝜆𝑟∗ + (1 − 𝜆)𝑣LP and 𝑣 ̄ = 𝜆𝑣u� + (1 − 𝜆)𝑣LP . We use similar arguments as
in the proof of Lemma 9.8 to show that

(𝑟 ̄ − 𝑣)̄ u� 𝑣 ≤ (𝑟 ̄ − 𝑣)̄ u� 𝑣 ̄ (9.14)

induces NF(𝑟∗ ). For 𝑣 ∈ 𝒬,

(𝑟 ̄ − 𝑣)̄ u� 𝑣 = (𝜆(𝑟∗ − 𝑣u� ) + (1 − 𝜆)(𝑣LP − 𝑣LP ))u� 𝑣


= 𝜆(𝑟∗ − 𝑣u� )u� 𝑣
≤ 𝜆(𝑟∗ − 𝑣u� )u� 𝑣u�
= 𝜆(𝜆(𝑟∗ − 𝑣u� )u� 𝑣u� + (1 − 𝜆)(𝑟∗ − 𝑣u� )u� 𝑣LP )
= 𝜆(𝑟∗ − 𝑣u� )u� (𝜆𝑣u� + (1 − 𝜆)𝑣LP )
= (𝑟 ̄ − 𝑣)̄ u� 𝑣.̄

So the inequality is valid, and since again for 𝑣 ∈ NF(𝑟∗ ) equality holds in the third line, we
know that the face 𝐹 ̄ induced by (9.14) contains NF(𝑟∗ ).
Now let 𝑣 ∈ 𝐹,̄ i.e., (𝑟 −
̄ 𝑥)̄ u� 𝑣 = (𝑟 −
̄ 𝑥)̄ u� 𝑥.̄ From the above equations we conclude 𝜆(𝑟∗ − 𝑣u� )u� 𝑣 =
𝜆(𝑟 − 𝑣u� ) 𝑣u� , and because 𝜆 > 0 this implies 𝑣 ∈ NF(𝑟∗ ).
∗ u�

Because the above holds for any 0 < 𝜆 ≤ 1, we can choose 𝑟 ̄ arbitrarily close to 𝑣LP on 𝐴u� ,
which completes the proof. 

9.4.3 Solving the Nearest Point Problems

It remains to show how to solve the nearest point problems arising in the discussion above. To
that end, we utilize an algorithm by Wolfe [12] that finds in a polytope the point with minimum
Euclidean norm. Wolfe’s algorithm elaborates on a set of vertices of the polytope that are
obtained via minimization of linear objective functions. In our situation, this means that LP𝒬
has to be solved repeatedly, which by Observation 9.3 boils down to the linear-time solvable
weighted-sum shortest-path problem TC-WS. Note that by subtracting 𝑟 from the results of LP𝒬
and adding 𝑟 to the final result, the algorithm can be used to calculate the minimum distance
between 𝒬 and 𝑟 also in the case 𝑟 ≠ 0.

138
9.4 An Equivalent Problem in Constraints Space

The algorithm in [12] maintains in each iteration a subset 𝑃 of the vertex set 𝑉(𝒬) and a point
𝑥 such that 𝑥 = NP(aff(𝑃)) lies in the relative interior of conv(𝑃), where aff(𝑃) is the affine
hull of 𝑃. Such a set is called a corral, and we denote the nearest point in aff(𝑃) by 𝑣aff
u� .

Initially 𝑃 = {𝑣0 } for an arbitrary vertex 𝑣0 and 𝑥 = 𝑣0 . Note that then 𝑣aff
u� = 𝑣0 and 𝑃 is
indeed a corral. Then the following is executed iteratively (we explain afterwards how the
computations are actually carried out):
(1) Solve 𝑝 = arg minu�∈𝒬 (𝑥u� 𝑣).

(2) If 𝑝 = 0 (0 is optimal) or 𝑥u� 𝑝 = 𝑥u� 𝑥 (𝑥 is optimal), stop. Otherwise, set 𝑃 ∶= 𝑃 ∪ {𝑝} and
compute 𝑦 = 𝑣affu� .

(3) If 𝑦 is in the relative interior of conv(𝑃), 𝑃 is a corral. Set 𝑥 ∶= 𝑦 and continue at (1).
(4) Determine 𝑧 ∈ conv(𝑃) ∩ conv{𝑥, 𝑦} with minimum distance to 𝑦; 𝑧 will be a boundary
point of conv(𝑃).
(5) Remove from 𝑃 some point that is not on the smallest face of conv(𝑃) containing 𝑧, set
𝑥 ∶= 𝑧, and continue at (3)).
The algorithm will eventually find a corral 𝑃 such that the nearest point of 𝒬 equals 𝑣aff
u� .

The computations in each step are performed as follows:


(1) This matches the solution of TC-WS.
(2) If we interchangeably use the symbol 𝑃 for both the set of points and the matrix that
contains the elements of 𝑃 as columns, every 𝑣 ∈ aff(𝑃) can be characterized by some
𝜆 ∈ ℝ|u�| such that 𝑣 = 𝑃𝜆 and 𝑒u� 𝜆 = 1. Thus, the subproblem of determining 𝑣aff
u� can
be written as

min ‖𝑃𝜆‖22 = 𝜆u� 𝑃u� 𝑃𝜆


s.t. 𝑒u� 𝜆 = 1.

It can be shown [12] that this is equivalent to solving the system of linear equations

(𝑒𝑒u� + 𝑃u� 𝑃)𝜇 = 𝑒


1 (9.15)
𝜆= 𝜇.
∥𝜇∥1

As an efficient method to solve (9.15), Wolfe suggests to maintain an upper triangular


matrix 𝑅 such that 𝑅u� 𝑅 = 𝑒𝑒u� + 𝑃u� 𝑃. Then the solution 𝜇 can be found by first solving
𝑅u� 𝜇̄ = 𝑒 for 𝜇̄ and then 𝑅𝜇 = 𝜇̄ for 𝜇; both can be done by a simple backward substitution.
When 𝑃 changes, 𝑅 can be updated relatively easily without the necessity of a complete
recomputation [12].
(3) 𝑦 is in the relative interior of conv(𝑃) if and only if all coefficients 𝜆u� in the convex
representation of 𝑦 satisfy 𝜆u� > 0.

139
Chapter 9 Towards an Exact Combinatorial Algorithm for LP Decoding of Turbo Codes

(4) By construction 𝑥 ∈ conv(𝑃). Let 𝑥 = ∑u�∈u� 𝜆u� 𝑣 and 𝑦 = ∑u�∈u� 𝜇u� 𝑣, where ∑u�∈u� 𝜆u� =
∑u�∈u� 𝜇u� = 1, but 𝜇u� ≤ 0 for at least one 𝑝. The goal can then be restated as finding
the minimal 𝜃 ∈ [0, 1] such that 𝑧u� = 𝜃𝑥 + (1 − 𝜃)𝑦 ∈ conv(𝑃). Substituting the above
expressions yields
𝑧u� = ∑ (𝜃𝜆u� + (1 − 𝜃)𝜇u� ) 𝑣,
u�∈u�
and the condition is that all coefficients are nonnegative. Thus, for all 𝑣 with 𝜇u� ≤ 0,
𝜇u�
𝜃≥
𝜇u� − 𝑣u�

must hold. In summary, 𝜃 can be computed by the rule

𝜇u�
𝜃 = min {1, max { ∶ 𝜇u� < 0}} .
𝜇u� − 𝑣u�

(5) A point not contained in the smallest face of conv(𝑃) containing 𝑧 is not needed for the
convex description of 𝑧 = ∑u�∈u� 𝜆u� 𝑣; thus it can be identified by 𝜆u� = 0.

9.4.4 Recovering the Optimal Flow and Pseudocodeword

So far we have shown how to compute the optimal objective value, but not the LP solution, i. e.
the flow 𝑓LP ∈ 𝒫path and the (pseudo)codeword 𝑦. The algorithm yields its solution 𝑣LP by
means of a convex combination of extreme points of 𝒬:
u� u�
𝑣LP = ∑ 𝜆u� 𝑣u� , 𝜆u� ≥ 0, ∑ 𝜆u� = 1.
u�=1 u�=1

During its execution the preimage paths 𝑓u� = 𝔇−1 (𝑣u� ) can be stored alongside with the 𝑣u� . Then,
the LP-optimal flow 𝑓LP is obtained by summing up the paths with the same weight coefficients
𝜆, i.e.,
u�
𝑓LP = ∑ 𝜆u� 𝔇−1 (𝑣u� ).
u�=1
In order to get the corresponding pseudocodeword, a simple computation based on (9.6) suffices.
For most applications, however, the values of 𝑦 are of interest only in the case that the decoder
has found a valid codeword, i.e., 𝑡 = 1 in the above sum. In such a case, the most recent solution
of (TC-WS) is an agreeable path that immediately gives the codeword. No intermediate paths
have to be stored, which can save a substantial amount of space and running time.

9.4.5 Efficient Reference Point Updates

As suggested by the proof of Theorem 9.5, the nearest point algorithm is run iteratively,
and between two runs the 𝑘 + 1st component of 𝑟 is increased by means of the rule 𝑟u�+1 =

140
9.5 The Complete Algorithm

𝑏(𝑟)/𝑎(𝑟)u�+1 . This section describes how some information from the previous iteration can be
re-used to provide an efficient warm start for the next nearest point run.
Assume that in iteration 𝑖 the point NP(𝑟u� ) = 𝑣u�+1 has been found, inducing the face NF(𝑟u� )
defined by 𝑎(𝑟u� )u� 𝑣 ≤ 𝑏(𝑟u� ) of 𝒬. Recall that NPA internally computes the minimum 𝑙2 norm of
𝒬 − 𝑟u� . Thus, it outputs 𝑣u�+1
̄ = 𝑣u�+1 − 𝑟u� as the convex combination of 𝑡 ≤ 𝑘 + 1 points 𝑣u�̄ = 𝑣u� − 𝑟u� ,
all of which are located on the corresponding face NF(𝑟 ̂ u� ) of 𝒬 − 𝑟u� :
u� u�
𝑣u�+1
̄ = 𝑣u�+1 − 𝑟u� = ∑ 𝜆u� (𝑣u� − 𝑟u� ) = ∑ 𝜆u� 𝑣u�̄ .
u�=1 u�=1

In the subsequent nearest point calculation, the norm of 𝒬 − 𝑟u�+1 is minimized. Obviously
̂ u� ) corresponds to a face NF(𝑟
NF(𝑟 ̂ u�+1 ) of 𝒬 − 𝑟u�+1 , and we can initialize the algorithm with
that face by simply adding 𝑟 − 𝑟u�+1 to 𝑣 ̄ and each 𝑣u�̄ , 𝑗 = 1, … , 𝑡, which yields
u�

u�
𝑣u�+1
̄ + 𝑟u� − 𝑟u�+1 = 𝑣u�+1 − 𝑟u�+1 = ∑ 𝜆u� (𝑣u� − 𝑟u�+1 )
u�=1

and all 𝑣u� − 𝑟u�+1 are vertices of 𝒬 − 𝑟u�+1 . Note that 𝑟u� − 𝑟u�+1 is zero in all but the last component,
so this update takes only 𝑡 ≤ 𝑘 + 1 steps.
In order to warm-start the nearest point algorithm, the auxiliary matrix 𝑅 has to be recomputed
as well. Using its definition 𝑅u� 𝑅 = 𝑒𝑒u� + 𝑉u� 𝑉, we can efficiently compute 𝑅 by Cholesky
decomposition. After these updates we can directly start the nearest point algorithm in Step 2.
Numerical experiments have shown that this speeds up LP decoding by a factor of two. In
particular, the computation time of the Cholesky decomposition is negligible.

9.5 The Complete Algorithm

Algorithm 9.1 formalizes the procedure developed in Section 9.4 in pseudocode. The initial
reference point 𝑟0 is generated by first minimizing 𝑐(𝑓) on 𝒫path (thus solving TC-WS with
𝛾 = (0, … , 0, 1) and projecting the result in constraints space onto 𝐴u� (Line 4). Thereby we
ensure that either 𝑟0 ∉ 𝒬 or it is located on the boundary of 𝒬, in which case it already is the
optimal LP solution. The solution of the nearest point problem and the recovery of the original
flow are encapsulated in Lines 7 and 8.

9.6 Numerical Results

9.6.1 Running Time Comparison

To evaluate the computational performance of our algorithm, we compare its running time
with the commercial general purpose LP solver CPLEX [7] which is said to be one of the most
competitive implementations available.

141
Chapter 9 Towards an Exact Combinatorial Algorithm for LP Decoding of Turbo Codes

Algorithm 9.1 Combinatorial Turbo LP Decoder (CTLP)


1: Initialize edge cost 𝑐(𝑓) by (9.4).
2: 𝑓0 ← arg min {𝑐(𝑓) ∶ 𝑓 ∈ 𝒫path }.
3: 𝑣0 ← 𝔇(𝑓0 ).
4: 𝑟0 ← (0, … , 0, 𝑣0u�+1 )u�
5: 𝑖←0
6: while 𝑣u� ≠ 𝑟u� do
7: 𝑣u�+1 ← NP(𝑟u� ) = arg minu�∈𝒬 ∥𝑣 − 𝑟u� ∥2
8: 𝑓u�+1 = 𝔇−1 (𝑣u�+1 )
9: 𝑎u�+1 ← 𝑣u�+1 − 𝑟u�
10: 𝑏u�+1 ← 𝑎(u�+1)u� 𝑣u�+1
u�
11: 𝑟u�+1 ← (0, … , 0, 𝑏u�+1 /𝑎u�+1
u�+1 )
12: 𝑖←𝑖+1
13: end while
14: return 𝑓u�

SNR (dB) 0 1 2 3 4 5
time CPLEX (s × 10−2 ) 9.1 9.5 9.6 9.6 9.6 9.8
time CTLP (s × 10−2 ) 1.4 0.9 0.5 0.29 0.24 0.22
ratio 6.5 10.6 19 33 40 45

Table 9.4: Average CPU time per decoded instance (in 1/100s of seconds) for the (132, 40) LTE
turbo code, and the ratio by which CPLEX takes longer than our algorithm.

Simulations were run using LTE turbo codes with block lengths 132, 228, and 396, respectively,
and a three-dimensional turbo code with block length 384 (taken from [21]) with various SNR
values. For each SNR value, we have generated up to 105 noisy frames, where the computation
was stopped when 200 decoding errors occured. This should ensure sufficient significance of
the average results shown in Tables 9.4 and 9.6 to 9.8. The ML performance results are taken
from [22].

As one can see, the benefit of using the new algorithm is larger for high SNR values. This
becomes most eminent for the 3-D code for which the dimension of 𝒬 is the highest, where
the new algorithm is slower than CPLEX for SNRs up to 2. The reason for this behavior can

SNR (dB) 0 1 2 3 4 5
time CPLEX (s × 10−1 ) 3.1 3.4 4.2 4.6 4.7 4.7
time CTLP (s × 10−1 ) 0.7 0.4 0.15 0.05 0.04 0.04
ratio 4.4 8.5 28 92 118 118

Table 9.6: Average decoding CPU time per instance for the (228, 72) LTE turbo code.

142
9.6 Numerical Results

10−1

time/instance(s)

10−2

CPLEX
CTLP

0 1 2 3 4 5
SNRb (dB)

Figure 9.5: CPU time comparison for the (132, 40) LTE Turbo code depending on the SNR value
(note the logarithmic time scale).

SNR (dB) 0 1 2 3 4
time CPLEX (s × 10−1 ) 4.4 4.2 3.6 3.3 3.2
time CTLP (s × 10−1 ) 6.3 4.1 0.6 0.09 0.08
ratio 0.7 1 6 37 40

Table 9.7: Average decoding CPU time per instance for the (396, 128) LTE turbo code.

SNR (dB) 0 1 2 3 4
time CPLEX (s) 1.4 1.2 0.9 0.72 0.57
time CTLP (s) 4.5 3.1 0.8 0.04 0.014
ratio 0.31 0.39 1.1 18 41

Table 9.8: Average decoding CPU time per instance for a (384, 128) 3-D turbo code.

143
Chapter 9 Towards an Exact Combinatorial Algorithm for LP Decoding of Turbo Codes

SNR (dB) 0 2 4
optimal face dimension 25.2 3.6 0.01
integral LP solution 0.26 0.89 0.9995
trivial instances 0 0.13 0.64
major nearest-point cycles 221 53 4
main loops of Algorithm 9.1 4.36 1.9 0.7

Table 9.9: Statistical data for the (132, 40) LTE turbo code; average per-instance counts.

be explained by analyzing statistical information about various internal parameters of the


algorithm when run with different SNR values:

• The average dimension of the optimal nearest face, found in the last iteration of the
algorithm, drops substantially with increasing SNR. Intuitively, it is not surprising that
finding a face that needs less vertices to describe can be found more efficiently.

• In particular, the share of instances for which the LP solution is integral (and thus, the
face dimension is zero) increases with the SNR.

• There are some trivial instances where the initial shortest path among both trellis graphs
is already a valid codeword. This occurs more often for low channel noise and allows for
extremely fast solution (no nearest point calculations have to be carried out).

• The average number of major cycles of the nearest point algorithm performed per instance
is seen to drop rapidly with increasing SNR.

• Likewise, the the average number of main loops (Line 6 of Algorithm 9.1) drops, reducing
the required calls to CTLP.

Table 9.9 exemplarily contains the average per-instance values of these parameters for the
(132, 40) LTE code and SNRs 0, 2, and 4.

9.6.2 Numerical Stability

For larger codes, the dimension of 𝒬 becomes very large which leads to numerical difficulties in
the nearest point algorithm: the equation systems solved during the execution sometimes have
rank “almost zero” which leads to division by very small numbers, resulting in the floating-point
value NaN. Careful adjustment of the tolerance values for equivalence checks help to eliminate
this problem at least for the block lengths presented in this numerical study.

In addition, it has proven beneficial to divide all objective values by 10 in advance. Intuitively,
this compresses 𝒬 along the 𝑐-axis, evening out the extensiveness of the polytope in the
different dimensions (note that for all axes other than 𝑐, the values only range from −1 to 1).

144
9.7 Improving Error-Correcting Performance

10−2

10−3
time/instance(s)

10−4

10−5
update u�
shortest paths
least-squares problems
10−6 solution generation

0 1 2 3 4 5
SNRb (dB)

Figure 9.10: Average per-instance CPU time spent on various subroutines of the algorithm
decoding the (132, 40) LTE code: update of 𝑅, shortest-path computations, solution
of the least squares problems, and generation of solutions in path space.

9.7 Improving Error-Correcting Performance

As discussed above, Algorithm 9.1 can be easily modified to return a list of paths 𝑓u� , 𝑖 = 1, … , 𝑡,
such that the LP solution is a convex combination of that paths. Each 𝑓u� can be split into a paths
𝑓1u� and 𝑓2u� through trellis 𝑇1 and 𝑇2 , respectively. A path in a trellis, in turn, can uniquely be
extended to a codeword. Thus, we have a total of 2𝑡 candidate codewords. By selecting among
them the codeword with minimum objective function value, we obtain a heuristic decoder
(Heuristic A in the following) that always outputs a valid codeword, and has the potential of a
better error-correcting performance than pure LP decoding.

A slightly better decoding performance, at the cost of once more increased running time, is
reached if we consider not only the paths that constitute the final LP solution but rather all
intermediate solutions of TC-WS. We call this modification Heuristic B.

Simulation results for the (132, 40) LTE code are shown in Figure 9.11. As one can see, the
frame error rate indeed drops notably when using the heuristics, but for low SNR values there
still remains a substantial gap to the ML decoding curve. At 5 dB, Heuristic B empirically
reaches ML performance; for lower SNR values it is comparable to a Log-MAP turbo decoder
with 8 iterations.

145
Chapter 9 Towards an Exact Combinatorial Algorithm for LP Decoding of Turbo Codes

100

10−1

10−2

FER
10−3

10−4
LP Decoding
Heuristic A
10−5 Heuristic B
ML Decoding

0 1 2 3 4 5
SNRb (dB)

Figure 9.11: Decoding performance of the proposed heuristic enhancements on the (132, 40)
LTE turbo code.

9.8 Conclusion and Outlook

We have shown how the inherent combinatorial network-flow structure of turbo codes in form
of the trellis graphs can be utilized to construct a highly efficient LP solver, specialized for
that class of codes. The decrease in running time, compared to a general purpose solver, is
dramatic, and in contrast to classical approaches based on Lagrangian dualization, the algorithm
is guaranteed to terminate after a finite number of steps with the exact LP solution.
It is still an open question, however, if and how the LP can be solved in a completely combinatorial
manner. The nearest point algorithm suggested in this paper introduces a numerical component,
which is necessary at this time but rather undesirable since it can lead to numerical problems
in high dimension.
Another direction for further research is to examine the usefulness of our decoder as a building
block of branch-and-bound methods that solve the integer programming problem, i.e., ML
decoders. Several properties of the decoder suggest that this might be a valuable task. For
instance, the shortest paths can be computed even faster if a portion of the varibales is fixed, or
the algorithm could be terminated prematurely if the reference point exceeds a known upper
bound at the current node of the branch-and-bound tree.
Finally, the concepts presented here might be of inner-mathematical interest as well. Optimiza-
tion problems that are easy to solve in principle but have some complicating constraints are
very common in mathematical optimization. Being able to efficiently solve their LP relaxation
is a key component of virtually all fast exact or approximate solution algorithms.

146
References

Acknowledgments

We would like to acknowledge the German Research Council (DFG), the German Academic
Exchange Service (DAAD), and the Center for Mathematical and Computational Modelling
(CMCM) of the University of Kaiserslautern for financial support.

References

[1] J. Feldman, D. R. Karger, and M. Wainwright. “Linear programming-based decoding


of turbo-like codes and its relation to iterative approaches”. In: Proceedings of the 40th
Annual Allerton Conference on Communication, Control and Computing. Monticello, IL,
2002, pp. 467–477.
[2] J. Feldman. “Decoding error-correcting codes via linear programming”. PhD thesis. Cam-
bridge, MA: Massachusetts Institute of Technology, 2003.
[3] P. O. Vontobel and R. Koetter. Graph-cover decoding and finite-length analysis of message-
passing iterative decoding of LDPC codes. 2005. arXiv: cs/0512078 [cs.IT].
[4] C. Berrou, A. Glavieux, and P. Thitimajshima. “Near shannon limit error-correcting
coding and decoding: turbo-codes”. In: IEEE International Conference on Communications.
May 1993, pp. 1064–1070. doi: 10.1109/ICC.1993.397441.
[5] R. G. Gallager. “Low-density parity-check codes”. IRE Transactions on Information Theory
8.1 (Jan. 1962), pp. 21–28. issn: 0096-1000. doi: 10.1109/TIT.1962.1057683.
[6] M. Helmling, S. Ruzika, and A. Tanatmis. “Mathematical programming decoding of binary
linear codes: theory and algorithms”. IEEE Transactions on Information Theory 58.7 (July
2012), pp. 4753–4769. doi: 10.1109/TIT.2012.2191697. arXiv: 1107.3715 [cs.IT].
[7] IBM ILOG CPLEX Optimization Studio. Software Package. Version 12.4. 2011.
[8] A. Schrijver. Theory of Linear and Integer Programming. John Wiley & Sons, 1986.
[9] A. Tanatmis, S. Ruzika, and F. Kienle. “A Lagrangian relaxation based decoding algorithm
for LTE turbo codes”. In: Proceedings of the International Symposium on Turbo Codes and
Iterative Information Processing. Brest, France, Sept. 2010, pp. 369–373. doi: 10.1109/ISTC.
2010.5613906.
[10] S. Ruzika. “On Multiple Objective Combinatorial Optimization”. PhD thesis. Kaiserslau-
tern, Germany: University of Kaiserslautern, 2007.
[11] A. Tanatmis. “Mathematical Programming Approaches for Decoding of Binary Linear
Codes”. PhD thesis. Kaiserslautern, Germany: University of Kaiserslautern, Jan. 2011.
[12] P. Wolfe. “Finding the nearest point in a polytope”. Mathematical Programming 11 (1
1976), pp. 128–149. issn: 0025-5610. doi: 10.1007/BF01580381.
[13] D. J. C. MacKay. Information Theory, Inference, and Learning Algorithms. Cambridge
University Press, 2003. url: http://www.inference.phy.cam.ac.uk/itprnn/book.html.

147
References

[14] TS 36.212 v11.0.0: LTE E-UTRA Mutliplexing and Channel Coding. 3rd Generation Partner-
ship Project (3GPP), Oct. 2012. url: http://www.etsi.org/deliver/etsi_ts/136200_136299/
136212/11.00.00_60/ts_136212v110000p.pdf.
[15] S. Lin and D. Costello Jr. Error Control Coding. 2nd ed. Upper Saddle River, NJ: Prentice-
Hall, Inc., 2004. isbn: 0130426725.
[16] R. K. Ahuja, T. L. Magnanti, and J. B. Orlin. Network Flows. Prentice-Hall, 1993.
[17] E. Berlekamp, R. J. McEliece, and H. C. A. van Tilborg. “On the inherent intractability
of certain coding problems”. IEEE Transactions on Information Theory 24.3 (May 1978),
pp. 954–972. doi: 10.1109/TIT.1978.1055873.
[18] A. Tanatmis et al. “Numerical comparison of IP formulations as ML decoders”. In: IEEE
International Conference on Communications. Cape Town, South Africa, May 2010, pp. 1–5.
doi: 10.1109/ICC.2010.5502303.
[19] T. H. Cormen et al. Introduction to Algorithms. 2nd ed. MIT Press, 2001.
[20] G. L. Nemhauser and L. A. Wolsey. Integer and Combinatorial Optimization. Wiley-
Interscience series in discrete mathematics and optimization, John Wiley & Sons, 1988.
[21] E. Rosnes, M. Helmling, and A. Graell i Amat. “Pseudocodewords of linear programming
decoding of 3-dimensional turbo codes”. In: Proceedings of IEEE International Symposium
on Information Theory. St. Petersburg, Russia, July 2011, pp. 1643–1647. doi: 10.1109/ISIT.
2011.6033823.
[22] M. Helmling and S. Scholl. Database of ML Simulation Results. Ed. by University of
Kaiserslautern. 2014. url: http://www.uni-kl.de/channel-codes.

148
Chapter 10

Paper V: Towards Hardware Implementation


of the Simplex Algorithm for LP Decoding

Florian Gensheimer, Michael Helmling, and Stefan Scholl

This chapter is a reformatted revised version of the following publication that was presented
at the 2013 (CM)² Young Researcher Symposium and appeared in the refereed conference
proceedings:

F. Gensheimer, M. Helmling, and S. Scholl. “Towards hardware implementation of the


simplex algorithm for LP decoding”. In: Proceedings of the Young Researcher Symposium.
Center for Mathematical & Computational Modelling (CM)2 . Kaiserslautern, Germany:
Fraunhofer, Nov. 2013, pp. 12–17. url: http://d-nb.info/1044179767

149
Towards Hardware Implementation
of the Simplex Algorithm for LP
Decoding

Florian Gensheimer Michael Helmling Stefan Scholl

Combining methods of mathematical optimization theory and applica-


tions from communications engineering recently led to new approaches
for error correction in data transmission systems. In this new disci-
pline, also called LP decoding, researchers so far focussed on theoretical
aspects. In this paper, we study how these new algorithms from math-
ematical theory can be accelerated using dedicated hardware imple-
mentations. We address complexity issues of the widely used simplex
algorithms and show how new interesting mathematical problems arise
if aspects from hardware design are considered.
Chapter 10 Towards Hardware Implementation of the Simplex Algorithm for LP Decoding

10.1 Introduction

Linear programming (LP) is one of the main topics of mathematical optimization. LP problems
arise in many economical and industrial areas like production planning, scheduling or logistics.
They also play an important role as subproblems in branch-and-bound and cutting-plane
algorithms for integer programs, which have a large field of application. The most important
algorithm for solving linear programs is the simplex algorithm by G. B. Dantzig (see e.g. [1]).
Although the simplex algorithm has an exponential worst-case complexity, it turned out to
be very efficient in practice. A rather new application of linear programming is forward error
correction or channel coding (see [2, 3] for an introduction).
Channel coding, an engineering discipline, is an essential technique used for correcting errors in
digital communication systems that occur during the transmission of data. These transmission
errors occur due to bad reception quality, interference between different devices (e.g. mobile
phones) or because of physical damage of the devices (storage). Channel coding is used in
nearly every communication device today, including smartphones, TV and radio broadcasting,
DSL internet access or satellite communications and also in storage devices such as DVDs or
USB memory sticks.
A channel coding system consists of two stages: encoder and decoder. The encoder is placed at
the transmitter and adds redundant bits before the data is transmitted. The decoder is placed at
the receiver and uses the previously added redundancy in order to correct potential transmission
errors without the need of retransmission. The decoder is the heart of every channel coding
system, since it performs the actual data correction with sophisticated algorithms.
One new approach for error correction decoding is the use of algorithms from the field of
mathematical programming: as shown by Feldman et al. [4], it is possible to formulate the
decoding problem as a linear program which in turn can be solved and analyzed by methods
of mathematical optimization. This symbiosis of linear programming from mathematics and
channel coding from engineering is called LP decoding. LP decoding recently led to new
interesting mathematical problems and better algorithms for error correction systems.
Mathematical optimization algorithms are commonly implemented in software and executed
on a platform based on a general purpose processor, such as high performance servers, PCs or
laptops, that require large space and power. However, many popular communication devices,
like smartphones, need to be small and portable and exhibit low power consumption while
processing data with high speed.
A general purpose device like a PC processor is designed for great flexibility which allows it to
run many different applications. However, if a hardware chip is dedicatedly designed for just
one single algorithm, the chip can be highly optimized for exactly this algorithm. In such a
case, data operations and memory requirements are well known and the chip architecture can
be tailored to the specfic application’s requirements.
Implementing algorithms as a dedicated hardware circuit shows several advantages over general
purpose hardware:

152
10.2 The Simplex Algorithm

• speed: speed-ups of several orders of magnitude can be achieved over a PC,


• portability and area: the chip area is often within the region of only several mm2 ,
• power: power consumption is often in the order of milliwatts (PC/laptop: several 10 to
several 100 Watts).
In the remainder of this paper we consider these dedicated electrical circuits implemented on
a chip and call it a hardware implementation or simply hardware. As we will see later, it is
a challenging task to implement an algorithm as hardware, because different aspects of the
hardware and the algorithm itself have to be considered.
Hardware implementations for the simplex algorithm and LP decoding have not been inves-
tigated deeply so far. In [5], a hardware implementation of the simplex algorithm has been
presented. However, this implementation uses the standard simplex method without modi-
fications and does not consider LP decoding. For an efficient hardware implementation it is
advantageous to exploit problem specific properties and modifications already at the algorithm
level.
As alternatives to the simplex method, interior-point algorithms [6] and a quadratic-program-
ming approach [7] for LP decoding were studied in literature. However, both papers only
consider software implementations, and the proposed algorithms are substantially more complex
to implement in hardware than the simplex method.
While LP decoding so far has been studied mainly from a theoretical point of view, the aim of
the present paper is to investigate how LP decoding algorithms can actually be implemented to
work in a communication device. We analyze different variants of the simplex algorithm for LP
decoding and reveal large differences in hardware complexity. Additionally, we propose a chip
architecture of an LP decoder that can be implemented efficiently.

10.2 The Simplex Algorithm

We start by introducing the fundamentals of linear programming and the simplex method,
before reviewing advanced methods based on duality.
A linear program consists of a linear objective function, the value of which is to be optimized,
and a set of linear constraints, i.e. (in)equalities, that limit the values of the variables in the
program. Any LP can be written in the standard form

min 𝑧 = 𝑐u� 𝑥 (10.1a)


s. t. 𝐴𝑥 = 𝑏 (10.1b)
𝑥 ≥ 0, (10.1c)

where 𝑥 ∈ ℝu� are the decision variables, 𝐴 ∈ ℝu�×u� is a matrix with 𝑚 ≤ 𝑛, and 𝑏 ∈ ℝu� ,
𝑐 ∈ ℝu� are vectors. 𝐴 and 𝑏 describe the so-called functional constraints (10.1b), while (10.1c)
defines the nonnegativity constraints. The vector 𝑐 ∈ ℝu� represents the objective function.

153
Chapter 10 Towards Hardware Implementation of the Simplex Algorithm for LP Decoding

Each row of (10.1b) and (10.1c) defines a hyperplane or a halfspace of ℝu� , respectively. Thus,
the feasible region 𝒫 = {𝑥 ∈ ℝu� ∶ 𝐴𝑥 = 𝑏, 𝑥 ≥ 0} of the LP is a polyhedron, i.e., the intersection
of a finite number of hyperplanes and halfspaces. For a fixed objective value 𝑧, (10.1a) is a
hyperplane. Geometrically, the minimization can be interpreted as pushing that hyperplane in
direction −𝑐 as far as possible without leaving the polyhedron.
The (primal) simplex algorithm is based on the fact that an optimal solution is obtained in a
vertex of the polyhedron. The idea of the algorithm is to move iteratively from one vertex of
the polyhedron to another in such a way that the sequence of objective function values (10.1a)
of the vertices is nonincreasing. Its key observation is that each vertex corresponds to a basis,
i.e. a linearly independent size-𝑚 subset of 𝐴’s columns. In order to move to an adjacent vertex,
exactly one nonbasic column of 𝐴 is exchanged with a basic column. For each nonbasic column
a reduced cost value can be computed that tells if adding this column potentially improves the
objective value.
In practice, the algorithm operates on the simplex tableau 𝑇, a two-dimensional array that
contains information about the LP as well as the current basis and reduced costs. Every iteration
of the algorithm consists of three main steps:
(1) computation of the reduced costs: these determine the column that enters the basis,
(2) min-ratio rule: determines the column that leaves the basis,
(3) basis exchange: is carried out by a pivot operation on 𝑇u�∗,u� , i.e., Gaussian elimination step
of the form:
for 𝑖 ∈ {1, … , 𝑚} ⧵ {𝑖∗ } do
if 𝑇u�,u� ≠ 0 then
u�u�,u�
𝑇u�,• ← 𝑇u�,• − u�u�∗,u� ⋅ 𝑇u�∗,•
end if
end for

The pivot operation constitutes the main computational effort of the simplex algorithm. It is
therefore crucial to implement this operation efficiently. This operation is repeated in every
iteration until the reduced cost vector is nonnegative, which indicates that the vertex of the
current basis is optimal.
A second major approach for solving linear programs stems from duality theory. The basis
of this theory is the so-called dual program, a special linear program that corresponds to the
original LP. For the linear program in standard form (10.1), the dual program (DLP) has the
form

max 𝑏u� 𝜋 (10.2a)


s.t. 𝐴u� 𝜋 ≤ 𝑐, (10.2b)

where 𝜋 ∈ ℝu� are the dual variables and 𝐴u� 𝜋 ≤ 𝑐 are the dual constraints. Both linear
programs (LP) and (DLP) have the same optimal objective value. Hence, one can equivalently
solve the dual problem instead of the program (LP).

154
10.3 Designing Hardware

The dual simplex algorithm works on the usual simplex tableau T of (LP). The main difference
to the primal simplex algorithm is the fact that the working solution 𝑥 is infeasible during
execution, while the optimality condition is always fulfilled. This is contrary to the primal
simplex which always stays feasible, but does not terminate before optimality is established.
In many applications, like e.g. the LP decoding problem, all variables 𝑥u� have an upper bound of
1. Both the primal and dual simplex algorithm would have to introduce an artificial variable and
an additional side constraint for each such variable because the problem has to be transformed
into “standard form” (see e.g. [1]). This leads to a larger simplex tableau and a less efficient
simplex algorithm. Special versions of the simplex algorithm (both primal and dual) avoid
those extra variables and constraints by implicitly handling upper bounds (see [8, 9]).
Linear programs for practical problems often suffer degenerate pivot operations, which means
that the step length when moving from one vertex to another reduces to zero, hence the
objective function value does not improve. This can dramatically increase the running time
and, if occurring frequently, leads to numerical instabilities in the algorithm. Especially for LPs
that are highly degenerate—unfortunately, the LP decoding problems was shown to be among
those—special care must be taken to avoid degeneracy. We have adapted the so-called ad-hoc
procedure by P. Wolfe [10] to our case which can be implemented with negligible computational
overhead and effectively eliminates the influence of degeneracy for the LP decoding problem.

10.3 Designing Hardware

In this section we present basics and challenges of hardware design. We show how hardware
implementations that are highly optimized for one application can outperform general purpose
platforms.
As already mentioned in Section 10.1, dedicated hardware implementations are often faster,
smaller and consume less power than software implementations running on a general purpose
PC. In general, three design goals of hardware implementation can be identified:
• high calculation speed, i.e., high clock frequency,
• small area (leads to reduced production cost and small chip size), and
• small power consumption (to avoid power supply and heat dissipation problems).
To accelerate an algorithm using hardware, it has to be implemented in form of an electrical
digital circuit by the hardware designer. Since software (algorithm) and hardware (microchip)
development are fundamentally different, this task is not straightforward. In the following we
will present some aspects that become important when implementing algorithms in hardware.

10.3.1 Memories for Data Storage

Usually a modern PC provides a large amount of memory (up to several gigabytes) for storing
data during the calculations, in order to meet the requirements of a wide range of applications.

155
Chapter 10 Towards Hardware Implementation of the Simplex Algorithm for LP Decoding

In dedicated hardware implementations the memory sizes can be tailored exactly to the applica-
tion’s requirement. This allows for large memory reductions; the resulting memory is often in
the order of kilobytes. Additionally, the algorithm itself can be modified to reduce the amount
of required storage, finally resulting in a smaller hardware implementation.

10.3.2 Number Representation

Another important aspect is the representation of numerical values that are processed by the
algorithm. In a digital system, numbers are represented by vectors of bits. Two fundamental
ways to interpret bit vectors as numbers are the floating-point and the fixed-point representations
[11]. On a PC the values are usually represented by double-precision floating-point numbers
with a large number of bits (commonly 64). Double-precision floating-point values provide
a very good resolution and a wide range. However, floating-point numbers require complex
arithmetic hardware, and are therefore avoided in hardware design whenever possible. Instead,
it is desired to use fixed-point numbers which lead to a much lower hardware complexity.

A second question arises consequently: How much bits are sufficient to represent the numbers?
Usually a low number of bits allows for faster and smaller hardware units and is therefore
beneficial (see Figure 10.1). Furthermore, smaller memories can be used for storing data if the
number of bits is small. Choosing a low number of bits however leads to a bad resolution of the
values and very limited accuracy of the calculations, which may or may not affect the outcome
of the algorithm (cf. Table 10.2). There is a clear trade-off between calculation accuracy and
hardware complexity (i.e., calculation speed, area and power consumption). This has to be
considered when implementing algorithms in hardware. The exact number of required bits in
the hardware implementation is mostly determined by extensive simulations.

10.3.3 Arithmetic Units

For arithmetic operations that are carried out in an algorithm, arithmetic hardware units such
as adders, multipliers, and so forth have to be implemented. Arithmetic operations have to be
used carefully. Assuming fixed-point numbers, additions and subtractions are often cheap to
implement. However, multiplications and divisions consume large amounts of hardware area
and power and lead to high hardware complexity (see Figure 10.1). Also complex arithmetic
operations like logarithms, roots, trigonometric functions etc. exhibit high hardware complexity.
Sometimes these functions can be simplified by using approximations such as linearization
or simple iterative heuristics. Note that multiplications or divisions by powers of two can be
implemented very easily by binary shift operations.

The aspects mentioned so far, although this listing is by no means complete, show impor-
tant differences between software and hardware design. Therefore, the term “complexity” in
software engineering is different from “complexity” in hardware design. When dealing with

156
10.4 Simplex Performance Study

+ ×
2 bit operands

+
×
4 bit operands

Figure 10.1: Comparison of chip area for adders and multipliers for different numbers of bits of
the fixed-point operands.

2-bit represetation 4-bit representation (with decimal point)

100.0 = −4.0
100.1 = −3.5
⋮ ⋮
10 = −2
000.0 = 0
11 = −1
000.1 = 0.5
00 = 0
001.0 = 1.0
01 = 1
001.1 = 1.5
⋮ ⋮
011.1 = 2.5

4 different levels (resolution = 1) 16 different levels (resolution = 0.5)

Table 10.2: Example for fixed-point numbers using 2 and 4 bits (2s complement), respectively.
Note the difference in range and resolution.

hardware implementations, additional and more sophisticated measures of complexity have to


be considered.

10.4 Simplex Performance Study

In this section, we evaluate the variants of the simplex algorithm mentioned in Section 10.2:
• the revised primal simplex,
• the revised primal simplex for bounded variables, and
• the dual simplex for bounded variables.
We analyze their running-time performance and their behavior under limited numerical preci-
sion with fixed-point arithmetic. The numerical comparisons are done with the (7, 4) Hamming
code and a random (20, 10) LDPC code (see [2] for details).

157
Chapter 10 Towards Hardware Implementation of the Simplex Algorithm for LP Decoding

Primal Primal (bounded) Dual (bounded)


(7,4) Hamming code
variables 38 31 31
constraints 31 24 24
tableau size 32 × 32 25 × 25 25 × 32
(20,10) LDPC code
variables 120 100 100
constraints 100 80 80
tableau size 101 × 101 81 × 81 81 × 101

Table 10.3: Comparison of LP size parameters for the primal simplex and both the primal and
dual simplex with bounded variables.

10.4.1 Running-Time Performance

The number of pivot operations, the number of variables and constraints in the LP, and the size
of the tableau matrix have major influence on the running-time complexity of the LP solver.
The sizes of the LPs for our example decoding problems are shown in Table 10.3. As one can see,
the standard primal has the largest size, because it does not handle upper bounds efficiently.

As mentioned in Section 10.2, the simplex iteratively performs pivot steps. The running time is
proportional to the number of pivots which may vary for different objective functions. In the
channel decoding application, this function is related to the channel noise. Hence, to estimate
the running time under realistic conditions, we generate random noise for different signal-to-
noise (SNR) ratios; a higher SNR value corresponds to less noise on the channel. In Figure 10.4,
the relative frequencies of the iteration counts are shown for the (7, 4) Hamming code. The
continuous lines show the values for SNR 0. The performance of the primal simplex variants
proved to be independent of the SNR value. Only the dual simplex benefits substantially from a
lower noise level, which is indicated by the dotted curve.

As one can see, the dual simplex method dramatically outperforms the primal variants in terms
of iteration counts. An explanation for this effect can be found in the initialization procedure of
the simplex algorithms: in the primal simplex, always the same, fixed starting basis is used. On
average, the corresponding vertex of the polyhedron will be rather far away from the optimal
vertex, such that many pivots are necessary. In contrast, the dual simplex for bounded variables
[9] chooses an initial solution that would be optimal in the absence of channel noise. Intuitively,
this basis is likely to be “closer” to the optimum even if noise is present. Because this starting
solution of the dual simplex is not primal feasible, it cannot be used to speed up the primal
algorithms.

Figure 10.5 shows for the larger code how the average number of iterations per instance depends
on the noise level. Again, the dual simplex is much faster, finishing after 1.66 iterations on

158
10.4 Simplex Performance Study

Dual (Bounded), SNR 0


Primal, SNR 0

frequency of occurrence
Primal (Bounded), SNR 0
0.6 Dual (Bounded), SNR 4

0.4

0.2

0
0 10 20 30 40
number of simplex iterations

Figure 10.4: Number of iterations until optimality is reached for different simplex variants,
using the decoding LP for the (7, 4) Hamming code.

60
avg. number of iterations

40
Primal simplex (bounded)
Primal simplex
Dual simplex (bounded)

20

0
0 1 2 3 4 5
SNRb (dB)

Figure 10.5: Comparison of the average number of iterations per LP between the three different
simplex variants on the (20, 10) LDPC code.

159
Chapter 10 Towards Hardware Implementation of the Simplex Algorithm for LP Decoding

100

frame error rate


10−1
64-bit floating
6.2 fixed
10.4 fixed
16.8 fixed
6.2 fixed (bounded)
10−2 10.4 fixed (bounded)
16.8 fixed (bounded)

0 1 2 3 4 5
SNRb (dB)

Figure 10.6: Decoding performance of the primal simplex algorithms with fixed-point arith-
metics (standard and bounded variables, respectively) on the Hamming code. The
dual algorithms are not shown here because their decoding performance did not
change compared to floating-point numbers.

average for the highest SNR value.

10.4.2 Comparison of Fixed- and Floating-Point Implementation

As outlined in Section 10.3, using fixed-point numbers with low precision is preferable from
the hardware point of view. On the other hand, if the numerical precision is too low, round-off
errors can break the algorithm in numerous ways, leading to wrong solutions or even infinite
loops. Hence a compromise needs to be found between simplicity and correctness. For the LP
decoding application, the latter directly translates into the frame error rate, i.e., the average
probability of a decoding error. Depending on the usage scenario, a slightly increased error
rate might be tolerable, if this in return allows for low-complexity hardware.

For the (7, 4) Hamming code, we compare fixed-point arithmetic with resolutions 16.8, 10.4,
and 6.2 with usual double-precision floating-points. Here, the notation 𝑥.𝑦 means that each
number is represented by 𝑥 bits, 𝑦 of which constitute the fractional part. This implies that
numbers from −2u�−u�−1 to 2u�−u�−1 − 2−u� can be represented, with a resolution of 2−u� .

In our experiments, the dual simplex proves extremely stable, showing practically the same
error-correcting performance even with the very poor 6.2 resolution. The primal variants
achieve this performance with 16.8; with 10.4 bits the performance drops but is still reasonable,
while both primal algorithms are practically useless with the lowest resolution. The resulting
error-performance curves for the primal variants are shown in Figure 10.6.

160
10.4 Simplex Performance Study

100

frame error rate

64-bit floating
16.8 fixed (dual)
8.3 fixed (dual)
10−1
16.8 fixed (primal bdd.)
8.3 fixed (primal bdd.)
16.8 fixed (primal)
8.3 fixed (primal)

0 1 2 3 4 5
SNRb (dB)

Figure 10.7: Decoding performance of the primal and dual simplex algorithms on the LDPC
code with different number representions.

For the larger code, we compare the floating-point implementation with fixed-point precisions
16.8 and 8.3. Results are shown in Figure 10.7. Again, the dual algorithm is most stable with
low precision and practically achieves floating-point performance with 8.3 fixed-point bits.
The primal algorithms significantly loose performance with this resolution, but are comparable
to the floating-point performance with 16.8.
For both codes, the two primal simplex variants largely show the same behavior, whereas the
dual simplex for bounded variables performs clearly superior, in terms of both the number of
iterations (and thus overall running time) and the robustness against low-precision numerical
resolution.

10.4.3 Proposed Hardware Architecture

In Figure 10.8 we propose a hardware architecture that implements the dual simplex for bounded
variables.
It shows the five main parts required by the simplex. The current tableau is stored in the
tableau memory. A pivot unit accomplishes pivoting of a column on the fly when it is read.
It is followed by an additional unit that performs simple bit flip operations to take care of
the variable bounds. The entering variable is determined according to some pricing rule and
checks for optimality, i.e., when the simplex has to stop. The leaving variable block is needed
to compute the leaving variable using e.g. the min-ratio rule.
Additionally a controller is shown. The controller takes care of the initialization of the other
hardware units, keeps track of the basis variable positions, and extracts the values of the

161
Chapter 10 Towards Hardware Implementation of the Simplex Algorithm for LP Decoding

stores current tableau, column of pivots handling of upper pricing: e.g.


columnwise access the tableau a column bound violations steepest edge rule

determining

column pivoting

bound handling
entering variable
Tableau
RAM
determining
leaving variable
updates column entering variable
in tableau RAM

leaving variable (pivot row)


Controller e.g. by min-ratio rule

Figure 10.8: Proposed hardware architecture for the dual simplex algorithm.

variables when the result is declared optimal.

10.5 Conclusion

In this paper, we have studied the LP decoding problem with three variants of the simplex
algorithm. In terms of running time, the dual simplex for bounded variables has shown to
outperform the competiting primal variants, finding the optimal solution substantially faster
(up to a factor of 50). With respect to hardware implementation complexity, we have also
analyzed the behavior of these algorithms under low-precision fixed-point arithmetic. Again,
the dual simplex has shown to be most stable, achieving a similar error-correction level as the
64-bit floating-point implementation with 6 and 8 fixed-point bits per number for the Hamming
and LDPC code, respectively.

Both the number of iterations and the low size of the number representation are important
steps towards a low-power, high-performance hardware LP decoder implementation.

10.6 Outlook and Future Research

Other improvements of the simplex algorithm, specialized for hardware requirements, might
further decrease the complexity of hardware LP decoding. Some possible directions are:

10.6.1 Simplex in the Log-Domain

The simplex algorithm usually involves a lot of multiplications and division. However these
two operations are costly to implement in hardware, as we have shown in Section 10.3.

162
References

One option to transform multiplication and divison into a much simpler operation is to perform
calculations in the logarithm domain. If the logarithm of all operands is used, a multiplication
transforms into an addition and a division into a simple subtraction, thus avoiding costly
operations. However this avoidance comes at additional cost for taking logarithm and the
exponential function. Further investigations have to show how far the hardware complexity
can be reduced by this approach.

10.6.2 Adaptive LP Decoding

For larger codes, the complete LP decoding formulation contains a large number of constraints,
most of which are not actually needed to describe the optimal solution. Taghavi and Siegel
[12] have proposed an adaptive method that starts with an empty LP and inserts inequalities
on demand, greatly reducing the size of the number of constraints. Two aspects make this
approach particularly appealing: Firstly, as shown in [12] it is possible to upper bound the
number of necessary constraints to a very small number, compared to the complete LP. Secondly,
inserting additional constraints is possible without any extra operations if one uses the dual
simplex algorithm. Since the dual simplex algorithm turned out as the most efficient one by our
numerical study in Section 10.4, using it for adaptive LP decoding is extremely promising.

10.6.3 Polyhedral Theory for Fixed-Point Numbers

Mathematically, fixed-point numbers constitute a uniform, discrete grid in the Euclidean space.
Ordinary polyhedral theory always assumes a continuous space, thus real numbers with infinite
precision. A theoretical study of “discrete” polyhedral theory could, besides being an interesting
field of research on its own, lead to more efficient algorithms exploiting the grid structure.
Additionally, it might be possible to find theoretical results about numerical stability for given
fixed-point precision.

Acknowledgment

We gratefully acknowledge the Center of Mathematical and Computational Modelling of the


University of Kaiserslautern and the German Research Council (DFG) for financial support.

References

[1] A. Schrijver. Theory of Linear and Integer Programming. John Wiley & Sons, 1986.
[2] S. Lin and D. Costello Jr. Error Control Coding. 2nd ed. Upper Saddle River, NJ: Prentice-
Hall, Inc., 2004. isbn: 0130426725.

163
References

[3] M. Helmling, S. Scholl, and A. Tanatmis. “Mathematical optimization based channel


coding: current achievements and future research”. In: Proceedings of the Young Researcher
Symposium. Center for Mathematical & Computational Modelling (CM)2 . Kaiserslautern,
Feb. 2011, pp. 16–21. url: http://nbn-resolving.de/urn:nbn:de:0074-750-0.
[4] J. Feldman, M. J. Wainwright, and D. R. Karger. “Using linear programming to decode
binary linear codes”. IEEE Transactions on Information Theory 51.3 (Mar. 2005), pp. 954–
972. doi: 10.1109/TIT.2004.842696. url: www.eecs.berkeley.edu/~wainwrig/Papers/
FelWaiKar05.pdf.
[5] S. Bayliss et al. “An FPGA implementation of the simplex algorithm”. In: Proceedings
of the IEEE International Conference on Field Programmable Technology (FPT). Bangkok,
Thailand, Dec. 2006, pp. 49–56. doi: 10.1109/FPT.2006.270294.
[6] T. Wadayama. “An LP decoding algorithm based on primal path-following interior point
method”. In: Proceedings of IEEE International Symposium on Information Theory. Seoul,
Korea, June 2009, pp. 389–393. doi: 10.1109/ISIT.2009.5205741.
[7] K. Yang, X. Wang, and J. Feldman. “A new linear programming approach to decoding
linear block codes”. IEEE Transactions on Information Theory 54.3 (Mar. 2008), pp. 1061–
1072. doi: 10.1109/TIT.2007.915712.
[8] H. W. Hamacher and K. Klamroth. Lineare und Netzwerk-Optimierung. 2nd ed. Vieweg,
Apr. 2006. isbn: 978-3-528-03155-8.
[9] H. M. Wagner. “The dual simplex algorithm for bounded variables”. Naval Research
Logistics Quarterly 5.3 (1958), pp. 257–261. issn: 1931-9193. doi: 10.1002/nav.3800050306.
[10] P. Wolfe. “A technique for resolving degeneracy in linear programming”. Journal of the
Society for Industrial and Applied Mathematics 11.2 (1963), pp. 205–211. doi: 10.1137/
0111016. eprint: http://epubs.siam.org/doi/pdf/10.1137/0111016.
[11] D. A. Patterson and J. Hennessy. Computer Organization and Design. 3rd ed. San Francisco,
CA, USA: Morgan Kaufmann Publishers Inc., 2004. isbn: 1558606041.
[12] M. H. Taghavi and P. H. Siegel. “Adaptive methods for linear programming decoding”.
IEEE Transactions on Information Theory 54.12 (Dec. 2008), pp. 5396–5410. doi: 10.1109/
TIT.2008.2006384. arXiv: cs/0703123 [cs.IT].

164
Chapter 11

Paper VI: Efficient Maximum-Likelihood


Decoding of Linear Block Codes on Binary
Memoryless Channels

Michael Helmling, Eirik Rosnes, Stefan Ruzika, and Stefan Scholl

The following chapter is a reformatted and revised copy of a preprint that is publicly available
online (http://arxiv.org/abs/1403.4118). Its contents were presented at the 2014 ISIT conference
and appeared in the following refereed conference proceedings:

M. Helmling, E. Rosnes, S. Ruzika, and S. Scholl. “Efficient maximum-likelihood decoding


of linear block codes on binary memoryless channels”. In: Proceedings of IEEE International
Symposium on Information Theory. Honolulu, HI, USA, June–July 2014, pp. 2589–2593. doi:
10.1109/ISIT.2014.6875302. arXiv: 1403.4118 [cs.IT]

This work was partially funded by the DFG (grant RU-1524/2-1) and by DAAD / RCN (grant
54565400 within the German-Norwegian Collaborative Research Support Scheme).

165
Efficient Maximum-Likelihood De-
coding of Linear Block Codes on Bi-
nary Memoryless Channels

Michael Helmling Stefan Ruzika


Eirik Rosnes Stefan Scholl

In this work, we consider efficient maximum-likelihood decoding of


linear block codes for small-to-moderate block lengths. The presented
approach is a branch-and-bound algorithm using the cutting-plane
approach of Zhang and Siegel (IEEE Trans. Inf. Theory, 2012) for obtain-
ing lower bounds. We have compared our proposed algorithm to the
state-of-the-art commercial integer program solver CPLEX, and for all
considered codes our approach is faster for both low and high signal-to-
noise ratios. For instance, for the benchmark (155, 64) Tanner code our
algorithm is more than 11 times as fast as CPLEX for an SNR of 1.0 dB
on the additive white Gaussian noise channel. By a small modification,
our algorithm can be used to calculate the minimum distance, which
we have again verified to be much faster than using the CPLEX solver.
Chapter 11 Efficient Maximum-Likelihood Decoding of Linear Block Codes

11.1 Introduction

Determining the optimal decoding behavior of error-correcting codes is of significant impor-


tance, e. g., to benchmark different coding schemes. When no a priori information on the
transmitted codeword is known, maximum-likelihood decoding (MLD) is an optimal decoding
strategy. It is known that this problem is NP-hard in general [1] such that its complexity grows
exponentially in the block length of the code, unless P=NP. Currently, the best known approach
for general block codes is to use a state-of-the-art (commercial) integer program (IP) solver (see
[2]) like CPLEX [3].

In this work, we present a branch-and-bound approach for efficient MLD of linear block
codes. The problem of MLD is closely related to that of calculating the minimum distance
of a code, which has attracted some attention recently. For instance, in [4, 5], Rosnes et
al. proposed an efficient branch-and-bound algorithm to determine all low-weight stopping
sets/codewords in a low-density parity-check (LDPC) code. Although the two problems are
similar, the bounding step in the algorithm from [4, 5] cannot efficiently be adapted to the
scenario of MLD. Conversely, however, the algorithm presented here can also calculate the
minimum distance, and our numerical experiments show that this is very efficient compared to
CPLEX.

Linear programming (LP) decoding of binary linear codes, as first introduced by Feldman et al.
in [6], approximates MLD by relaxing the decoding problem into an easier to solve LP problem.
The LP problem contains a set of linear inequalities that are derived from the parity-check
constraints of a (redundant) parity-check matrix representing the code. As shown in [7], these
constraints can iteratively and adaptively be added to the decoding problem, which significantly
reduces the overall complexity of LP decoding. Elaborating on this idea, Zhang and Siegel [8]
proposed an efficient search algorithm for new violated (redundant) parity-check constraints (or
“cuts”) that tighten the decoding polytope. Depending on the structure of the underlying code,
for some codes, this “cutting-plane” LP decoding algorithm performs close to MLD for high
signal-to-noise ratios (SNRs) on the additive white Gaussian noise (AWGN) channel, although
for most codes, e. g., the (155, 64) Tanner code [9], there is still a gap in decoding performance
to MLD [8]. For lower values of the SNR, there could be a significant performance degradation
with respect to MLD.

The algorithm proposed in this work closes that gap by using the cutting-plane algorithm for
lower bounds and the well-known sum-product (SP) decoder [10] with order-𝑖 re-encoding [11]
for upper bounds within a sophisticated branch-and-bound framework such that the output
always and provably is the maximum-likelihood (ML) codeword. Our numerical study in
Section 11.6 shows that it is much faster than CPLEX for all codes under consideration, and
moreover is able to decode some of the codes on which CPLEX fails completely.

168
11.2 Notation and Background

11.2 Notation and Background

This section establishes some basic definitions and results needed for the rest of the paper.
Let 𝒞 denote a binary linear code of length 𝑛 represented by an 𝑚 × 𝑛 parity-check matrix
𝐇. The code is used on a binary-input memoryless output-symmetric channel with input
𝐜 = (𝑐0 , … , 𝑐u�−1 ) ∈ 𝒞 and channel output denoted by the length-𝑛 vector 𝐫 = (𝑟0 , … , 𝑟u�−1 ).
The ML decoder can be described by the following optimization problem [12]:

̂ = arg min 𝜓u� (𝐜) = arg min 𝜓u� (𝐜)


𝐜ML (11.1)
𝐜∈𝒞 𝐜∈conv(𝒞)

where 𝜓u� (𝐜) = 𝝀 ⋅ 𝐜u� and (⋅)u� denotes the transpose of its argument, 𝝀 = (𝜆0 , … , 𝜆u�−1 ) is a
vector of log-likelihood ratios (LLRs) defined by

Pr(𝑟u� ∣ 𝑐u� = 0)
𝜆u� = log ( )
Pr(𝑟u� ∣ 𝑐u� = 1)

for all 𝑖, 0 ≤ 𝑖 ≤ 𝑛 − 1, and conv(𝒞) is the convex hull of 𝒞 in ℝu� , where ℝ denotes the real
numbers. The MLD problem in (11.1) can be formulated as an IP which in general is an NP-hard
problem. As an approximation to MLD, Feldman et al. [6] relaxed the codeword polytope
conv(𝒞) in the following way.
Define
𝒞u� = {𝐜 ∈ {0, 1}u� ∶ 𝐡u� ⋅ 𝐜u� = 0}
where 𝐡u� = (ℎu�,0 , … , ℎu�,u�−1 ) is the 𝑗th row of the parity-check matrix 𝐇 and 0 ≤ 𝑗 ≤ 𝑚 − 1.
Furthermore, let conv(𝒞u� ) denote the convex hull of 𝒞u� in ℝu� . The fundamental polytope
𝒫(𝐇) of the parity-check matrix 𝐇 is defined as [13]
u�−1
𝒫(𝐇) = ⋂ conv(𝒞u� ). (11.2)
u�=0

The MLD problem in (11.1) can now be relaxed to

𝐩̂ LP = arg min 𝜓u� (𝐩)


𝐩∈𝒫(𝐇)

where the solution, by definition, is a pseudocodeword with fractional entries in general. Note
that the LP decoder has the ML certificate property, which means that in case 𝐩̂ LP is integral, it
is an optimal solution to (11.1).
For each row 𝐡u� , 0 ≤ 𝑗 ≤ 𝑚 − 1, in the matrix 𝐇 the linear inequalities behind the fundamental
polytope in (11.2) are

∑ 𝑝u� − ∑ 𝑝u� ≤ |𝒱| − 1, for all odd-sized 𝒱 ⊆ 𝒩(𝑗) (11.3)


u�∈𝒱 u�∈𝒩(u�)⧵𝒱

where 𝒩(𝑗) = {𝑖 ∶ ℎu�,u� = 1, 0 ≤ 𝑖 ≤ 𝑛 − 1}.

169
Chapter 11 Efficient Maximum-Likelihood Decoding of Linear Block Codes

For a given row 𝐡u� of a parity-check matrix 𝐇 and a vector 𝐩 ∈ [0, 1]u� : If there exists an odd
set 𝒱 ⊆ 𝒩(𝑗) such that the corresponding inequality from (11.3) does not hold, then we say
that the 𝑗th parity-check constraint induces a cut at 𝐩.
Central to our branch-and-bound algorithm is the concept of constraint sets. A constraint set 𝐹
is a set
{(𝜌u� , 𝑐u�u� ) ∶ 𝑐u�u� ∈ {0, 1} ∀𝜌u� ∈ Γ},
where Γ ⊆ {0, … , 𝑛−1}. If (𝜌u� , 𝑐u�u� ) is a constraint, then position 𝜌u� is said to be 𝑐u�u� -constrained,
which means that position 𝜌u� is committed to the value 𝑐u�u� in a codeword, while positions not
in 𝐹 are uncommitted.
Let 𝒞(u�) denote the subset of codewords from 𝒞 consistent with the constraint set 𝐹. Then,
we can define
𝜓(u�)
min,u� = min
(u�)
𝜓u� (𝐜)
𝐜∈𝒞
as the minimum value of the objective function for codewords consistent with 𝐹. In the
following, 𝜓̄ (u�) (u�)
min,u� will denote any lower bound on 𝜓min,u� . Also, a constraint set 𝐹 is said to be
valid if 𝒞(u�) is nonempty.
Our proposed branch-and-bound MLD algorithm relies heavily on tight lower bounds on
𝜓(u�)
min,u� , which are provided by an LP-based decoding algorithm by Zhang and Siegel [8]. We
briefly describe this algorithm, denoted as the ZS decoding algorithm, below in Section 11.2.1.

11.2.1 Zhang and Siegel’s LP-Based Decoding Algorithm

The ZS decoding algorithm is based on adaptive LP decoding as described in [7] and incorporates
an efficient cut-search algorithm, as described in [8]. First the LP problem is initialized with
the box constraints. †Solve the LP problem to get an optimal solution 𝐩∗ . If 𝐩∗ is integral,
then terminate the algorithm and return the ML codeword 𝐩∗ . Otherwise, the cut-search
algorithm (Algorithm 1 in [8]) is applied to each row of the parity-check matrix of the code. If
at least one is found, add all found cuts into the LP problem and repeat the procedure from †.
Otherwise, search for cuts from redundant parity-checks. To this end, reduce 𝐇 by Gaussian
elimination to reduced row echelon form, where the columns of 𝐇 are processed in the order
of the “fractionality” (i. e., closeness to 1/2) of the corresponding coordinate of 𝐩∗ . Now, the
cut-search algorithm is applied to each row of the obtained modified matrix 𝐇.̃ If no cut is
found, then terminate. Otherwise, add all found cuts into the LP problem as constraints and
repeat the procedure from †. The algorithm above has been detailed in Algorithm 2 in [8], and
we refer the interested reader to [8] for further details.

11.3 Basic Branch-and-Bound Algorithm

Our proposed algorithm is a branch-and-bound algorithm on constraint sets and uses the ZS
decoding algorithm, as briefly described above, as a basic component in the bounding step.

170
11.3 Basic Branch-and-Bound Algorithm

Thus, there is a one-to-one correspondence between the nodes in the search tree and constraint
sets. In the following, when we speak about the left and right child constraint set, denoted by
𝐹↓0 and 𝐹↓1 , respectively, we mean the constraint set of the left and right child nodes in the
search tree.
Now, to each constraint set 𝐹, we associate three real numbers

𝜓̄ (u�) (u�)↓0 (u�)↓1


min,u� , 𝜓̄ min,u� , and 𝜓̄ min,u�

which are current lower bounds on


↓0 ↓1
𝜓(u�) (u� ) (u� )
min,u� , 𝜓min,u� , and 𝜓min,u� ,

respectively. When a constraint set is created, these values are initiated to −∞.
The algorithm maintains a list 𝐿 of active constraint sets which is initiated with the uncon-
strained set ∅. In each iteration, a constraint set 𝐹 is selected from the list according to the
node selection rule (see below). A valid codeword, i. e., a feasible solution of the IP, is generated
by decoding the LLR vector (where the constraints imposed by 𝐹 are enforced by altering the
respective LLR values to ±∞) using the SP algorithm [10] with order-𝑖 re-encoding [11], for
some integer 𝑖, as a post-processing step. The upper bound on the objective function decreases if
the decoder output has a lower objective function value than the previous best candidate. After-
wards a lower bound on 𝜓(u�)min,u� is computed by running the ZS algorithm, where the variables
contained in 𝐹 are fixed to their corresponding values. If an integral solution is returned, i. e., a
pseudocodeword with no fractional coordinates, it is considered another candidate codeword
in the same way as above. Otherwise, and if the computed lower bound is less than the current
upper bound 𝜏, the algorithm branches on a fractional position, selected by the branching rule
(see below), of the pseudocodeword by adding two constraint sets, namely 𝐹 augmented by the
chosen branching position fixed to 0 and 1, respectively, to the list of active nodes. Then, the
next set is chosen from 𝐿 until one of the termination criteria in Step 2 of Algorithm 11.1 on
page 173 (which gives a formal description of the overall algorithm) is fulfilled. Note that the
computations to produce a lower bound on 𝜓(u�) min,u� for a given constraint set 𝐹 are collected
into Algorithm 11.2, denoted by LUBD.

11.3.1 Bounding Step

The complexity of Algorithm 11.1 depends heavily on the tightness of the lower bounds
computed in Step 5 (from Algorithm 11.2), i. e., on how close 𝜓u� (𝐩)̂ is to the value 𝜓(u�)
min,u� . To
find the best pseudocodeword 𝐩,̂ we have used the procedure detailed in Algorithm 11.2.
Note that
min{𝜓(u�↓0) (u�↓1) (u�)
min,u� , 𝜓min,u� } = 𝜓min,u�

for any node (constraint set) 𝐹. This allows us to update the current lower bound 𝜓̄ (u�)
min,u� of
𝐹 (and, recursively, also the ancestors of 𝐹), potentially increasing its value, once both of its
children have been processed (see Steps 18 to 22 of Algorithm 11.1). Tightening the bounds

171
Chapter 11 Efficient Maximum-Likelihood Decoding of Linear Block Codes

in the search tree is important for decreasing the complexity of the algorithm because nodes
whose lower bound exceeds the objective value of the currently best candidate solution (i.e.,
the current upper bound) can be skipped, thereby reducing the search space.

11.3.2 Branching Step

We have used the following simple branching rule to select the position 𝑝 in Step 15 of Algo-
rithm 11.1: Take an unconstrained position where the corresponding entry in the decoded
pseudocodeword 𝐩̂ is closest to 1/2. This simple procedure seems to work very well in prac-
tice.

11.3.3 The Processing Order of the List 𝐿

The node selection rule, i. e., the method by which a constraint set 𝐹 is selected from 𝐿 in Step 3
of Algorithm 11.1, has great influence on the overall complexity. The most common schemes
are depth-first search, according to LIFO (last in – first out) processing of 𝐿, and breadth-first
search, where 𝐿 is processed in FIFO (first in – first out) fashion. Another popular method,
called best-bound search, selects the next constraint set by 𝐹′ = arg minu�∈u� 𝜓̄ (u�)
min,u� , with the
goal of tightening the overall lower bound as fast as possible.
In our experiments, the following mixed strategy has proven to be most efficient. Apply
depth-first processing in general, but every 𝑀 iterations, for a fixed integer 𝑀, and only if
𝜓̄ (u�)
min,u� < 𝜏 − 𝛿 for a fixed 𝛿 > 0, where 𝐹 is the constraint set from the previous iteration,
select the next node by the best-bound rule above.

11.4 Improvements

In this section, we present some improvements to the basic algorithm from Section 11.3.

11.4.1 Tuning the ZS Algorithm for Adaptive LP Decoding

A linear inequality constraint of the general form 𝐚 ⋅ 𝐱u� ≤ 𝑏, where 𝐚 and 𝑏 are constants, is
called active at the point 𝐱∗ if it holds with equality for 𝐱 = 𝐱∗ . Otherwise, it is called inactive.
For an LP problem with a set of linear inequality constraints, the optimal solution 𝐱LP is a
vertex of the polytope formed by the hyperplanes of all constraints that are active at 𝐱LP . Thus,
constraints inactive at 𝐱LP can be removed without changing the optimal solution.
The ZS decoding algorithm uses adaptive LP decoding, which implies that a lot of linear
programs (of increasing size in the number of constraints) are solved successively. Consequently,
a simple way to reduce the overall complexity is to remove inactive constraints from time to
time.

172
11.4 Improvements

Algorithm 11.1 Maximum-Likelihood Decoding (MLD)


Input: The received LLR vector 𝝀 and the order 𝑖 of re-encoding.
Output: An ML decoded codeword 𝐜ML ̂ .
1: Initialize 𝜏 ← ∞ and 𝐿 ← {∅}
(∅)
2: while 𝐿 ≠ ∅ and 𝜓̄ min,u� < 𝜏 do
3: Choose and remove a constraint set 𝐹 from 𝐿.
4: if 𝐹 is valid and 𝜓̄ (u�)min,u� < 𝜏 then
5: let (𝐜,̂ 𝐩)̂ ← LUBD(𝝀, 𝐹)
6: if 𝜓u� (𝐜)̂ < 𝜏 then
7: let 𝐜ML
̂ ← 𝐜 ̂ and 𝜏 ← 𝜓u� (𝐜)̂
8: end if
9: 𝜓̄ (u�)
min,u� ← 𝜓u� (𝐩) ̂
10: if 𝐩̂ is integral then
11: if 𝜓u� (𝐩)̂ < 𝜏 then
12: let 𝐜ML
̂ ← 𝐩̂ and 𝜏 ← 𝜓u� (𝐩)̂
13: end if
14: else if 𝜓u� (𝐩)̂ < 𝜏 then
15: choose an unconstrained position 𝑝 based on 𝐩,̂ construct two new constraint
sets 𝐹 = 𝐹 ∪ {(𝑝, 0)} and 𝐹″ = 𝐹 ∪ {(𝑝, 1)}, and append them to 𝐿.

16: end if
17: end if
18: if 𝐹 ≠ ∅ then
19: determine (the unique) 𝐹 ̃ such that 𝐹 = 𝐹↓u�̃ , where 𝑖 = 0 or 1, and update parent
bounds as follows:
̃ ̃
20: 𝜓̄ (min,u�
u�)↓u�
← max{𝜓̄ (min,u�
u�)↓u�
, 𝜓̄ (u�)
min,u� }
̃ ̃ ̃ ̃
21: 𝜓̄ (min,u�
u�)
← max{𝜓̄ (min,u�
u�)
, min{𝜓̄ (min,u�
u�)↓0
, 𝜓̄ (min,u�
u�)↓1
}}
̃
22: If 𝜓̄ (min,u�
u�)
increased in the previous step, recurse to Step 18 with 𝐹 replaced by 𝐹.̃
23: end if
24: end while
25: Return 𝐜ML
̂ .

173
Chapter 11 Efficient Maximum-Likelihood Decoding of Linear Block Codes

Algorithm 11.2 Lower and Upper Bound Algorithm (LUBD)


Input: The received LLR vector 𝝀 and a constraint set 𝐹.
Output: The pair (𝐜,̂ 𝐩).
̂
1: Perform SP decoding with order-𝑖 re-encoding on an LLR vector constrained according to
𝐹 as follows:
• +∞ for positions corresponding to 0-constraints.
• −∞ for positions corresponding to 1-constraints.
• The original channel LLRs for positions not in 𝐹.
The resulting decoded codeword is denoted by 𝐜.̂
2: Perform ZS decoding on the received LLR vector 𝝀 with equality constraints according to
𝐹. Denote the decoded pseudocodeword by 𝐩.̂
3: Return the pair (𝐜,̂ 𝐩).
̂

Our implementation uses the dual simplex method for solving LP problems, which is very
effective in the case of iteratively added constraints by employing a warm-start technique that
reuses the basis information of the previously optimal solution (see, e. g., [14] for details). The
removal of constraints, however, is expensive because afterwards a new simplex basis has to be
computed. Thus, we remove the inactive constrains only when the number of constraints in
the current LP problem exceeds 𝑇, for some integer 𝑇. Note that this differs from the algorithm
called MALP-B in [8], where the inactive constraints are removed in each iteration.

Another way to decrease the running-time of the ZS algorithm in some cases is to terminate
the ZS decoder prematurely as soon as the objective value exceeds the current upper bound
𝜏 in Algorithm 11.1. In that event the objective function value cannot possibly be improved
below the current node, and it can be skipped immediately.

11.4.2 Tradeoff Between Tightness and Speed of the ZS Algorithm

The cut-search procedure used in the ZS decoding algorithm yields tight lower bounds on the
MLD solution at the cost of a high number of cuts and thus increased processing time spent in
the LP solver and for Gaussian elimination (see Section 11.2.1). In two different ways, a tradeoff
between speed and tightness can be realized. First, limit the maximum number of times 𝑅 that
the search for redundant parity-check cuts is applied, and secondly, only add a cut if the cutoff,
i. e., the distance between the current (infeasible) solution and the cutting hyperplane, exceeds
a fixed quantity 𝛾 > 0.

Our numerical experiments have shown that both approaches help to significantly reduce the
running-time complexity of our algorithm. Additionally, for the first approach, it has proven
helpful to use a higher value 𝑅bb in those iterations where a best-bound node has been selected
(cf. Section 11.3.3).

174
11.5 Minimum Distance Computation

decoder 1.0 dB 1.5 dB 2.0 dB 2.5 dB 3.0 dB 3.5 dB


u�avg u�avg u�avg u�avg u�avg u�avg u�avg u�avg u�avg u�avg u�avg u�avg
(155,64) Tanner code [9]
our alg. 0.81 51 0.24 15 0.05 3.5 0.014 1.4 0.005 1.04 0.004 1.004
CPLEX 9.49 4795 2.95 1816 0.63 370 0.17 61 0.095 5.4 0.086 0.3
(155,64) Tanner code [9] with all-zero decoding
our alg. 0.24 14 0.11 6.6 0.025 2.1 0.006 1.18 0.001 1.05 0.0005 1.002
CPLEX 3.23 2799 1.16 963 0.28 210 0.06 38 0.02 4.5 0.01 0.3
(204,102) MacKay code [15] with all-zero decoding
our alg. 2.2 9 0.73 30.5 0.15 6.5 0.02 1.66 0.003 1.04 0.0005 1.003
CPLEX 14.6 12364 4.7 3421 0.83 573 0.12 61 0.03 4.9 0.018 0.19
(127,85) BCH code with all-zero decoding
our alg. 86 7617 67 5855 33 2549 9 655 2.2 159 0.29 19
CPLEX 3.5 4132

Table 11.1: Numerical comparison of our proposed algorithm and CPLEX for several codes
using different values of the SNR on the AWGN channel. The number 𝑇avg is the
average decoding CPU time per frame in seconds, and 𝑁avg is the average number
of nodes (per frame) processed by the branch-and-bound algorithms. In all cases
but the first we used all-zero decoding as described in Section 11.4.3.

11.4.3 Special Case: MLD Performance Simulation

For benchmarking purposes we are only interested in the actual MLD curve, in which case the
MLD algorithm can be simplified. First, since the underlying code is always linear, the error
probability of MLD is independent of the actual transmitted codeword, thus we can always,
without loss of generality, transmit the all-zero codeword. Furthermore, when the all-zero
codeword is transmitted and a codeword 𝐜 with objective value 𝜓u� (𝐜) < 0 = 𝜓u� (𝟎) has been
identified, the search can be terminated, since any ML decoder would also fail on this received
LLR vector 𝝀.

11.5 Minimum Distance Computation

The MLD problem is closely related to the computation of the minimum distance 𝑑min of a code
as follows. If the all-zero codeword is explicitly forbidden, then an MLD algorithm with the
input 𝝀 = 𝟏 will output a codeword of minimum weight:

𝑑min (𝒞) = min 𝜓𝟏 (𝐜) = min 𝜓𝟏 (𝐜). (11.4)


𝐜∈𝒞⧵{𝟎} 𝐜∈conv(𝒞⧵{𝟎})

Our proposed decoding algorithm can be modified to exclude the all-zero codeword by the
following changes:
(1) Extend the condition in Step 10 of Algorithm 11.1 to “𝐩̂ is integral and 𝐩̂ ≠ 𝟎”, which
avoids decoding to the all-zero codeword.

175
Chapter 11 Efficient Maximum-Likelihood Decoding of Linear Block Codes

100

10−1

frame error rate


10−2

10−3

10−4
(127, 85) BCH code
10−5 (204, 102) MacKay code
(155, 64) Tanner code

1 2 1.5
2.5 3 3.5
𝐸b /𝑁0 (dB)
Figure 11.2: MLD performance of the codes considered in this paper.

(2) In the order-𝑖 re-encoding performed in Algorithm 11.2, exclude 𝟎 from the set of candidate
codewords.

Moreover, note that all feasible solutions (i. e., codewords) of (11.4) have an integral objective
value. This allows us to change the right hand side in Steps 2, 4, and 14 to 𝜏 − 1 + 𝜀, for a small
𝜀 > 0, since ⌈𝜓̄ (u�) (u�)
min,𝟏 ⌉ = 𝜏 implies that 𝜓min,𝟏 ≥ 𝜏.

11.6 Numerical Results

In this section, we present some numerical results for our proposed MLD algorithm, with all
the improvements outlined above in Section 11.4, for several codes on the AWGN channel. We
have used order-2 re-encoding (𝑖 = 2 in Algorithm 11.2), and the open-source GLPK library [16]
to solve the LP problems. The following set of parameters was heuristically found to perform
well for all codes: 𝑀 = 30, 𝛿 = 2, 𝑇 = 100, 𝑅 = 5, 𝑅bb = 100, and 𝛾 = 0.2.

As a benchmark, we use the CPLEX IP solver [3], with the IP formulation named IPD1 in
[2] which was found to be most efficient in that paper. In case of all-zero decoding, we
configured CPLEX to terminate as soon as a codeword with objective value below zero was
found, mimicking the adaptions of our algorithm described in Section 11.4.3.

We compare the algorithms with respect to both (single-core) average CPU time 𝑇avg and
average number 𝑁avg of branch-and-bound nodes processed per frame. For our algorithm, 𝑁avg
equals the number of times the main loop (Step 2 of Algorithm 11.1) is processed, while for
CPLEX we report the attribute “number of processed nodes”. Note that the latter drops below
one for high SNR, which is probably due to CPLEX’ presolve strategy that establishes optimality
in some cases without ever starting the branch-and-bound procedure.

176
11.6 Numerical Results

𝑑min 𝑇MLD 𝑇CPLEX 𝑁MLD 𝑁CPLEX


(155, 64) Tanner code 20 137 s 3682 s 42 785 21842224
(204, 102) MacKay code 8 1.6 s 11.49 s 371 44830
(408, 204) MacKay code 14 152 s 6893 s 9345 936 570

Table 11.3: Numerical results for minimum distance computation.

All calculations were performed on a desktop PC with an Intel Core i5-3470 CPU (3.2 GHz) and
8 GB of RAM.

11.6.1 Maximum-Likelihood Decoding

A comparison of CPLEX and our MLD algorithm, for the different codes outlined below, is
given in Table 11.1, both in terms of 𝑇avg and 𝑁avg . The numbers in the parentheses are for
CPLEX; a dash indicates that CPLEX was not able to decode a sufficient number of frames
without running out of memory. The corresponding MLD performance curves are plotted in
Figure 11.2. For the curves, we have counted 100 erroneous frames for each simulation point.
The (155, 64) Tanner code from [9] is often used as a benchmark code, and was also considered
in [8]. For all SNRs, the ZS decoding algorithm showed a performance loss compared to the
MLD curve [8].
As can be seen from Table 11.1, our algorithm is more than 11 times as fast as CPLEX for an
SNR of 1.0 dB. For higher values of the SNR our proposed algorithm is even faster compared to
CPLEX. In the case of all-zero decoding, both algorithms are faster by a factor of 2 to 3, while
the relative performance gain by our algorithm remains roughly the same.
The second example is a (3, 6)-regular (204, 102) LDPC code taken from the online database of
sparse graph codes from MacKay’s website [15] (called 204.33.484 there). As can be seen from
Table 11.1, also for this code, our algorithm is significantly faster than CPLEX for all simulated
SNRs.
In order to evaluate the performance of our algorithm for dense codes, Table 11.1 includes
results for the (127, 85) BCH code. CPLEX was not able to decode a significant number of
frames for this code, and to our knowledge the MLD curve, as presented in Figure 11.2, was
previously unknown.

11.6.2 Minimum Distance Computation

In the case of computing the minimum distance, we used different values for some of the
parameters, namely 𝑀 = 120, 𝑅 = 1, 𝑅bb = 1, and 𝛾 = 0.3. Results are shown in Table 11.3,
which additionally contains the (408, 204) MacKay code (named 408.33.844 at the website [15])
that was used also in [8]. Note that in case of the (155, 64) Tanner code, we can exploit the

177
References

symmetry and fix 𝑐0 = 1 before starting the algorithm. We compare our algorithm to CPLEX
u�−1
with the same formulation as for MLD and the additional constraint ∑u�=0 𝑐u� ≥ 1 to exclude
the all-zero codeword.1

References

[1] E. Berlekamp, R. J. McEliece, and H. C. A. van Tilborg. “On the inherent intractability
of certain coding problems”. IEEE Transactions on Information Theory 24.3 (May 1978),
pp. 954–972. doi: 10.1109/TIT.1978.1055873.
[2] A. Tanatmis et al. “Numerical comparison of IP formulations as ML decoders”. In: IEEE
International Conference on Communications. Cape Town, South Africa, May 2010, pp. 1–5.
doi: 10.1109/ICC.2010.5502303.
[3] IBM ILOG CPLEX Optimization Studio. Software Package. Version 12.6. 2013.
[4] E. Rosnes et al. “Addendum to ‘An Efficient Algorithm to Find All Small-Size Stopping
Sets of Low-Density Parity-Check Matrices’”. IEEE Transactions on Information Theory
58.1 (Jan. 2012), pp. 164–171. issn: 0018-9448. doi: 10.1109/TIT.2011.2171531.
[5] E. Rosnes and Ø. Ytrehus. “An efficient algorithm to find all small-size stopping sets of
low-density parity-check matrices”. IEEE Transactions on Information Theory 55.9 (Sept.
2009), pp. 4167–4178. issn: 0018-9448. doi: 10.1109/TIT.2009.2025573.
[6] J. Feldman, M. J. Wainwright, and D. R. Karger. “Using linear programming to decode
binary linear codes”. IEEE Transactions on Information Theory 51.3 (Mar. 2005), pp. 954–
972. doi: 10.1109/TIT.2004.842696. url: www.eecs.berkeley.edu/~wainwrig/Papers/
FelWaiKar05.pdf.
[7] M. H. Taghavi and P. H. Siegel. “Adaptive methods for linear programming decoding”.
IEEE Transactions on Information Theory 54.12 (Dec. 2008), pp. 5396–5410. doi: 10.1109/
TIT.2008.2006384. arXiv: cs/0703123 [cs.IT].
[8] X. Zhang and P. H. Siegel. “Adaptive cut generation algorithm for improved linear
programming decoding of binary linear codes”. IEEE Transactions on Information Theory
58.10 (Oct. 2012), pp. 6581–6594. doi: 10 . 1109 / TIT . 2012 . 2204955. arXiv: 1105 . 0703
[cs.IT].
[9] R. M. Tanner, D. Sridhara, and T. Fuja. “A class of group-structured LDPC codes”. In:
International Symposium on Communication Theory and Applications (ISCTA). Ambleside,
England, July 2001.
[10] F. R. Kschischang, B. J. Frey, and H.-A. Loeliger. “Factor graphs and the sum-product
algorithm”. IEEE Transactions on Information Theory 47.2 (Feb. 2001), pp. 498–519. doi:
10.1109/18.910572. url: www.comm.utoronto.ca/frank/papers/KFL01.pdf.

1
As a remark, for the (408, 204) MacKay code we have used the previous CPLEX 12.5 instead of 12.6; apparently
there is a bug in the latter, causing it to output a u�min of 20 instead of the correct value 14.

178
References

[11] M. P. C. Fossorier and S. Lin. “Soft-decision decoding of linear block codes based on
ordered statistics”. IEEE Transactions on Information Theory 41.5 (Sept. 1995), pp. 1379–
1396. doi: 10.1109/18.412683.
[12] M. Helmling, S. Ruzika, and A. Tanatmis. “Mathematical programming decoding of binary
linear codes: theory and algorithms”. IEEE Transactions on Information Theory 58.7 (July
2012), pp. 4753–4769. doi: 10.1109/TIT.2012.2191697. arXiv: 1107.3715 [cs.IT].
[13] P. O. Vontobel and R. Koetter. Graph-cover decoding and finite-length analysis of message-
passing iterative decoding of LDPC codes. 2005. arXiv: cs/0512078 [cs.IT].
[14] U. Faigle, W. Kern, and G. Still. Algorithmic Principles of Mathematical Programming.
Vol. 24. Kluwer Academic Publishers, 2010.
[15] D. J. C. MacKay. Encyclopedia of sparse graph codes. 2014. url: http://www.inference.phy.
cam.ac.uk/mackay/codes/data.html.
[16] The GNU Project. GNU Linear Programming Kit (GLPK). Software Library. Version 4.52.
url: http://www.gnu.org/software/glpk.

179
Chapter 12

Paper VII: Minimum Pseudoweight Analysis


of 3-Dimensional Turbo Codes

Eirik Rosnes, Michael Helmling, and Alexandre Graell i Amat

The following chapter is a reformatted copy of a preprint that is publicly available online
(http://arxiv.org/abs/1103.1559). A paper with the same content appeared in the following
publication:

E. Rosnes, M. Helmling, and A. Graell i Amat. “Minimum pseudoweight analysis of 3-


dimensional turbo codes”. IEEE Transactions on Communications 62.7 (July 2014), pp. 2170–
2182. doi: 10.1109/TCOMM.2014.2329690. arXiv: 1103.1559 [cs.IT]

A subset of the results was previously presented at the 2011 ISIT conference and appeared in
the conference proceedings:

E. Rosnes, M. Helmling, and A. Graell i Amat. “Pseudocodewords of linear programming


decoding of 3-dimensional turbo codes”. In: Proceedings of IEEE International Symposium
on Information Theory. St. Petersburg, Russia, July 2011, pp. 1643–1647. doi: 10.1109/ISIT.
2011.6033823

The work of E. Rosnes and M. Helmling were supported in part by the DFG under Grant
RU-1524/2-1 and in part by the DAAD/RCN under Grant 54565400 within the German–Norwe-
gian Collaborative Research Support Scheme. The work of A. Graell i Amat was supported by
the Swedish Research Council under Grant 2011-596.

181
Minimum Pseudoweight Analysis of
3-Dimensional Turbo Codes

Eirik Rosnes Michael Helmling


Alexandre Graell i Amat

In this work, we consider pseudocodewords of (relaxed) linear program-


ming (LP) decoding of 3-dimensional turbo codes (3D-TCs). We present
a relaxed LP decoder for 3D-TCs, adapting the relaxed LP decoder for
conventional turbo codes proposed by Feldman in his thesis. We show
that the 3D-TC polytope is proper and 𝐶-symmetric, and make a connec-
tion to finite graph covers of the 3D-TC factor graph. This connection
is used to show that the support set of any pseudocodeword is a stop-
ping set of iterative decoding of 3D-TCs using maximum a posteriori
constituent decoders on the binary erasure channel. Furthermore, we
compute ensemble-average pseudoweight enumerators of 3D-TCs and
perform a finite-length minimum pseudoweight analysis for small cover
degrees. Also, an explicit description of the fundamental cone of the
3D-TC polytope is given. Finally, we present an extensive numerical
study of small-to-medium block length 3D-TCs, which shows that 1)
typically (i.e., in most cases) when the minimum distance 𝑑min and/or
the stopping distance ℎmin is high, the minimum pseudoweight (on the
additive white Gaussian noise channel) is strictly smaller than both the
𝑑min and the ℎmin , and 2) the minimum pseudoweight grows with the
block length, at least for small-to-medium block lengths.
Chapter 12 Minimum Pseudoweight Analysis of 3-Dimensional Turbo Codes

12.1 Introduction

Turbo codes (TCs) have gained considerable attention since their introduction by Berrou et
al. in 1993 [1] due to their near-capacity performance and low decoding complexity. The
conventional TC is a parallel concatenation of two identical recursive systematic convolutional
encoders separated by a pseudorandom interleaver. To improve the performance of TCs in
the error floor region, hybrid concatenated codes (HCCs) can be used. In [2], a powerful HCC
nicknamed 3-dimensional turbo code (3D-TC) was introduced. The coding scheme consists
of a conventional turbo encoder and a patch, where a fraction 𝜆 of the parity bits from the
turbo encoder are post-encoded by a third rate-1 encoder. The value of 𝜆 can be used to trade
off performance in the waterfall region with performance in the error floor region. As shown
in [2], this coding scheme is able to provide very low error rates for a wide range of block
lengths and code rates at the expense of a small increase in decoding complexity with respect
to conventional TCs. In a recent work [3], an in-depth performance analysis of 3D-TCs was
conducted. Stopping sets for 3D-TCs were treated in [4]. Finally, we remark that tuned TCs
[5] are another family of HCC ensembles where performance in the waterfall and error floor
regions can be traded off using a tuning parameter.
The use of linear programming (LP) to decode turbo-like codes was introduced by Feldman et al.
[6, 7]. They proposed an LP formulation that resembles a shortest path problem, based on the
trellis graph representation of the constituent convolutional codes. The natural polytope for LP
decoding would be the convex hull of all codewords, in which case LP decoding is equivalent to
maximum-likelihood (ML) decoding. However, an efficient description of that polytope is not
known in general, i.e., its description length most likely grows exponentially with the block
length. The formulation proposed by Feldman et al. [6, 7], which grows only linearly with
the block length, is a relaxation in the sense that it describes a superset of the ML decoding
polytope, introducing additional, fractional vertices. The vertices of the relaxed polytope (both
integral and fractional) are what the authors called pseudocodewords in [8].
The same authors also introduced a different LP formulation to decode low-density parity-check
(LDPC) codes [7, 8] that has been extensively studied since then by various researchers. For
that LP decoder, Vontobel and Koetter [9] showed that the set of the polytope’s points with
rational coordinates (which in particular includes all of its vertices) is equal to the set of all
pseudocodewords derived from all finite graph covers of the Tanner graph. In [10], a similar
result was established (but with no proof included) for the case of conventional TCs. Here, we
prove that statement for the case of 3D-TCs.
In this work, we study (relaxed) LP decoding of 3D-TCs, explore the connection to finite
graph covers of the 3D-TC factor graph [11], and adapt the concept of a pseudocodeword.
Furthermore, we compute ensemble-average pseudoweight enumerators and perform a finite-
length minimum additive white Gaussian noise (AWGN) pseudoweight analysis which shows
that the minimum AWGN pseudoweight grows with the block length, at least for small-to-
medium block lengths. Furthermore, we show by several examples and by computer search that
typically (i.e., in most cases) when the minimum distance 𝑑min and/or the stopping distance
ℎmin is high, the minimum AWGN pseudoweight, denoted by 𝑤AWGN min , is strictly smaller than

184
12.2 Coding Scheme

both the 𝑑min and the ℎmin for these codes. In [12], Chertkov and Stepanov presented an
updated, more efficient (compared to the algorithm from [13]) minimum pseudoweight search
algorithm based entirely on the concept of the fundamental cone [9] of the LDPC code LP
decoder. We will show that such a fundamental cone can be described also for the 3D-TC LP
decoder.

Some other work related to the content of this paper is the work on pseudocodeword analysis
of binary and nonbinary (protograph-based) LDPC and generalized LDPC codes. See, for
instance, [14–16], and references therein. For such codes, the component codes are the trivial
repetition code and single parity-check codes, or in the case of generalized LDPC codes,
more advanced classical linear block codes. However, not much work has been done on
pseudocodeword analysis for turbo-like codes with trellis-based constituent codes. In contrast
to these previous works, the trellis-based approach in this paper is different and provides a
pseudocodeword analysis of 3D-TCs that can be adapted also to other trellis-based turbo-like
codes or concatenated codes based on two or more convolutional codes. We remark that some
results on enumerating pseudocodewords for convolutional codes have already been provided
by Boston and Conti in [17, 18]. Finally, it is worth mentioning [19] which presents, among
several results, a combinatorial characterization of the Bethe entropy function in terms of finite
graph covers of the factor graph under consideration. In particular, a characterization in terms
of the average number of preimages of a pseudomarginal vector of rational entries. In contrast
to the general framework in [19], this paper discusses techniques to numerically deal with
large-degree function nodes representing the indicator function of (long) convolutional codes.

The remainder of the paper is organized as follows. In Section 12.2, we describe the system
model and introduce some notation. In Section 12.3, we describe (relaxed) LP decoding of 3D-
TCs. The connection to finite graph covers of the 3D-TC factor graph is explored in Section 12.4.
Ensemble-average pseudoweight enumerators of 3D-TCs are computed in Section 12.5. These
enumerators are subsequently used to perform a probabilistic finite-length minimum AWGN
pseudoweight analysis of 3D-TCs. Section 12.6, an efficient heuristic for searching for low
AWGN pseudoweight pseudocodewords is discussed. Finally, in Section 12.7, an extensive
numerical study is presented, and some conclusions are drawn in Section 12.8.

12.2 Coding Scheme

A block diagram of the 3D-TC is depicted in Figure 12.1. The information data sequence 𝐮
of length 𝐾 bits is encoded by a binary conventional turbo encoder. By a conventional turbo
encoder we mean the parallel concatenation of two identical rate-1 recursive convolutional
encoders, denoted by 𝐶a and 𝐶b , respectively. Here, 𝐶a and 𝐶b are 8-state recursive convolu-
tional encoders with generator polynomial 𝑔(𝐷) = (1 + 𝐷 + 𝐷3 )/(1 + 𝐷2 + 𝐷3 ), i.e., the 8-state
constituent encoder specified in the 3GPP LTE standard [20]. The code sequences of 𝐶a and
𝐶b are denoted by 𝐱a and 𝐱b , respectively. We also denote by 𝐱TC the codeword obtained by
alternating bits from 𝐱a and 𝐱b . A fraction 𝜆 (0 ≤ 𝜆 ≤ 1), called the permeability rate, of the
parity bits from 𝐱TC are permuted by the interleaver Πc (of length 𝑁c = 2𝜆𝐾), and encoded by

185
Chapter 12 Minimum Pseudoweight Analysis of 3-Dimensional Turbo Codes

𝐱a 1−𝜆
𝐮 𝐶a 𝐩̄ M 𝐱
𝐱ch U
X

Π 𝜆
MUX 𝐩 Πc 𝐶c
𝐱TC 𝐱p 𝐱c
𝐶b
𝐱b

Figure 12.1: 3D turbo encoder. A fraction 𝜆 of the parity bits from both constituent encoders
𝐶a and 𝐶b are grouped by a parallel/serial multiplexer, permuted by the interleaver
Πc , and encoded by the rate-1 post-encoder 𝐶c .

an encoder of unity rate 𝐶c with generator polynomial 𝑔(𝐷) = 1/(1 + 𝐷2 ), called the patch
or the post-encoder [2]. This can be properly represented by a puncturing pattern 𝐩 applied
to 𝐱TC (see Figure 12.1) of period 𝑁p containing 𝜆𝑁p ones (where a one means that the bit is
not punctured). Note that the encoder of the patch is like two accumulators, one operating on
the even bits and one operating on the odd bits. The fraction 1 − 𝜆 of parity bits which are
not encoded by 𝐶c is sent directly to the channel. Equivalently, this can be represented by a
puncturing pattern 𝐩,̄ the complement of 𝐩. We denote by 𝐱c the code sequence produced by
𝐶c . Also, we denote by 𝐱ch ch
a and 𝐱b the sub-codewords of 𝐱a and 𝐱b , respectively, sent directly
to the channel, and by 𝐱ch the codeword obtained by multiplexing (in some order) the bits from
p p
𝐱ch ch
a and 𝐱b . Likewise, we denote by 𝐱a and 𝐱b the sub-codewords of 𝐱a and 𝐱b , respectively,
encoded by 𝐶c , and by 𝐱p the codeword obtained by multiplexing (in some order) the bits from
𝐱pa and 𝐱pb . Finally, the information sequence and the code sequences 𝐱ch and 𝐱c are multiplexed
to form the code sequence 𝐱, of length 𝑁 bits, transmitted to the channel. Note that the overall
nominal code rate of the 3D-TC is 𝑅 = 𝐾/𝑁 = 1/3, the same as for the conventional TC without
the patch. Higher code rates can be obtained either by puncturing 𝐱ch or by puncturing the
output of the patch, 𝐱c .

In [2], regular puncturing patterns of period 2/𝜆 were considered for 𝐩. For instance, if
𝜆 = 1/4, every fourth bit from each of the encoders of the outer TC are encoded by encoder
𝐶c . The remaining bits are sent directly to the channel, and it follows that 𝐩 = [11000000] and
𝐩̄ = [00111111]. Note that with this particular puncturing pattern and even with the generator
polynomial 𝑔(𝐷) = 1/(1 + 𝐷2 ) for 𝐶c (which is like two separate accumulators operating on
even and odd bits, respectively), the bit streams 𝐱a and 𝐱b are in general intermingled because
of the interleaver Πc .

12.3 LP Decoding of 3D-TCs

In this section, we consider relaxed LP decoding of 3D-TCs, adapting the relaxation proposed
in [7] for conventional TCs to 3D-TCs.

186
12.3 LP Decoding of 3D-TCs

Let 𝑇x = 𝑇x (𝑉x , 𝐸x ) denote the information bit-oriented trellis of 𝐶x , x = a, b, c, where the


vertex set 𝑉x partitions as 𝑉x = ⋃u�u�=0 x
𝑉x,u� , which also induces the partition 𝐸x = ⋃u�u�=0
x −1
𝐸x,u� of
the edge set 𝐸x , where 𝐼x is the trellis length of 𝑇x . In the following, the encoders 𝐶x (and their
corresponding trellises 𝑇x ) are assumed (with some abuse of notation) to be systematic, in the
sense that the output bits are prefixed with the input bits. Thus, 𝐶x is regarded as a rate-1/2
encoder, and the trellis 𝑇x has an output label containing two bits, for x = a, b, c. Now, let
𝑒 ∈ 𝐸x,u� be an arbitrary edge from the 𝑡th trellis section. The 𝑖th bit in the output label of 𝑒 is
denoted by 𝑐u� (𝑒), 𝑖 = 0, … , 𝑛x,u� − 1, the starting state of 𝑒 as 𝑠u� (𝑒), and the ending state of 𝑒 as
𝑠u� (𝑒), where 𝑛x,u� is the number of bits in the output label of an edge 𝑒 ∈ 𝐸x,u� .
For x = a, b, c, we define the path polytope 𝒬x to be the set of all 𝐟x ∈ [0, 1]∣u�x∣ satisfying

∑ 𝑓xu� = 1 (12.1a)
u�∈u�x,0
∑ 𝑓xu� = ∑ 𝑓xu� for all 𝑣 ∈ 𝑉x,u� and 𝑡 = 1, … , 𝐼x − 1 (12.1b)
u�∈u�x ∶ u�∈u�x ∶
u�u� (u�)=u� u�u� (u�)=u�

and let 𝒬 = 𝒬a × 𝒬b × 𝒬c . Note that 𝒬 is the set of all feasible network flows through the
three trellis graphs 𝑇x , x = a, b, c.
Next, we define the polytope 𝒬Π,Πc as the pairs (𝐲,̃ 𝐟), where 𝐲̃ ∈ [0, 1]u�+2u�u� and 𝐟 =
(𝐟a , 𝐟b , 𝐟c ) ∈ [0, 1]∣u�a∪u�b∪u�c∣ , meeting the constraints

(𝐟a , 𝐟b , 𝐟c ) ∈ 𝒬 (12.2a)
∑ 𝑓xu� = 𝑦u�̃ x(u�x(u�,u�)) for 𝑡 = 0, … , 𝐼x − 1, (12.2b)
u�∈u�x,u� ∶ 𝑖 = 0, … , 𝑛x,u� − 1, and x = a, b, c.
u�u� (u�)=1
u�−1
Here, 𝜙x (𝑡, 𝑖) = ∑u�=0 𝑛x,u� + 𝑖, and 𝜌x (⋅)
denotes the mapping of codeword indices of the
constituent encoder 𝐶x to codeword indices of the overall codeword of the 3D-TC appended
with the 2𝜆𝐾 parity bits from encoders 𝐶a and 𝐶b which are sent to the patch.
Finally, let

𝒬̇ Π,Πc = {𝐲 ∈ [0, 1]u� ∶ ∃𝐲̂ ∈ [0, 1]2u�u� , 𝐟 ∈ 𝒬 with ((𝐲, 𝐲),


̂ 𝐟) ∈ 𝒬Π,Πc }

be the projection of 𝒬Π,Πc onto the first 𝑁 variables.


Relaxed LP decoding (on a binary-input memoryless channel) of 3D-TCs can be described by
the linear program
u�−1
minimize ∑ 𝜆u� 𝑦u� subject to 𝐲 ∈ 𝒬̇ Π,Πc (12.3)
u�=0
where
Pr{𝑟u� ∣ 𝑐u� = 0}
𝜆u� = log ( ) , 𝑙 = 0, … , 𝑁 − 1,
Pr{𝑟u� ∣ 𝑐u� = 1}
𝑐u� is the 𝑙th codeword bit, and 𝑟u� is the 𝑙th component of the received vector. If instead of 𝒬̇ Π,Πc
we use the convex hull of the codewords of the 3D-TC, then solving the linear program in (12.3)
is equivalent to ML decoding.

187
Chapter 12 Minimum Pseudoweight Analysis of 3-Dimensional Turbo Codes

The notion of a proper and 𝐶-symmetric polytope was introduced in [7, Ch. 4] where the author
proved that the probability of error of LP decoding is independent of the transmitted codeword
on a binary-input output-symmetric memoryless channel when the underlying code is linear
and the polytope is proper and 𝐶-symmetric.

12.1 Proposition: Let 𝐶 denote a given 3D-TC with interleavers Π and Πc . The polytope 𝒬̇ Π,Πc
is proper, i.e., 𝒬̇ Π,Πc ∩ {0, 1}u� = 𝐶 and 𝐶-symmetric, i.e., for any 𝐲 ∈ 𝒬̇ Π,Πc and 𝐜 ∈ 𝐶 it holds
that ∣𝐲 − 𝐜∣ ∈ 𝒬̇ Π,Π .
c C

Feldman proved a similar statement in the context of LDPC codes (Lemma 5.2 and Theorem 5.4
in [7]). However, note that his proof is based explicitly on the inequalities of the LP formulation
for LDPC codes, and therefore does not generalize to the polytope 𝒬̇ Π,Πc .

In Appendix 12.A, we give a formal proof of both statements in a much more general setting.

12.4 Finite Graph Covers

Let 𝐶 denote a given 3D-TC with interleavers Π and Πc , and constituent codes 𝐶x , x =
a, b, c. The factor graph [11] of 𝐶x , denoted by Γ(𝐶x ), is composed of state, input, parity, and
check vertices. The state vertices 𝑠x,0 , … , 𝑠x,u�x in Γ(𝐶x ) represent state spaces of the length-𝐼x
information bit-oriented trellis 𝑇x of 𝐶x . The 𝑙th check vertex represents the 𝑙th trellis section,
i.e., it is an indicator function for the set of allowed combinations of left state, input symbol,
parity symbol, and right state. A factor graph Γ(𝐶) of 𝐶 is constructed as follows.

(1) Remove all the input vertices of Γ(𝐶b ) by connecting the 𝑙th input vertex of Γ(𝐶a ) to
the Π(𝑙)th check vertex of Γ(𝐶b ).

(2) Remove all the input vertices of Γ(𝐶c ) by connecting the parity vertex (from either Γ(𝐶a )
or Γ(𝐶b )) corresponding to the 𝑙th bit in 𝐱p to the Πc (𝑙)th check vertex of Γ(𝐶c ).

To construct a degree-𝑚 cover of Γ(𝐶), denoted by Γ(u�) (𝐶), we first make 𝑚 identical copies of
Γ(𝐶). Now, any permutation of the edges, denoted by 𝐸 = 𝐸(Γ(u�) (𝐶)), connecting the copies
of the constituent factor graphs Γ(𝐶x ) such that the following conditions are satisfied, will
give a valid cover of Γ(𝐶).

(1) The 𝑚 copies of the 𝑙th input vertex of Γ(𝐶a ) should be connected by a one-to-one
mapping (or permutation) to the 𝑚 copies of the Π(𝑙)th check vertex of Γ(𝐶b ).

(2) The 𝑚 copies of the parity vertex (from either Γ(𝐶a ) or Γ(𝐶b )) corresponding to the 𝑙th
bit in 𝐱p should be connected by a permutation to the 𝑚 copies of the Πc (𝑙)th check
vertex of Γ(𝐶c ).

188
12.4 Finite Graph Covers

Figure 12.2: Factor graph of a nominal rate-1/3 3D-TC with 𝜆 = 1/4, using the regular
puncturing pattern 𝐩 = [11000000].

Figure 12.3: Degree-2 cover of the factor graph in Figure 12.2. The six rectangular blue boxes
are permutations of size 2 that can be chosen arbitrarily. The nodes highlighted in
yellow and pink belong to the first and second cover copy, respectively.

189
Chapter 12 Minimum Pseudoweight Analysis of 3-Dimensional Turbo Codes

The corresponding code is denoted by 𝐶(u�) . Let 𝐱(u�) = (𝑥(0) (0) (u�−1)
0 , … , 𝑥u�−1 , … , 𝑥0 , … , 𝑥(u�−1)
u�−1 )
(u�)
denote a codeword in 𝐶 , define

|{𝑖 ∶ 𝑥(u�)
u� = 1}|
𝜔u� (𝐱(u�) ) =
𝑚
and let 𝝎 = 𝝎(𝐱(u�) ) = (𝜔0 (𝐱(u�) ), … , 𝜔u�−1 (𝐱(u�) )). Now, 𝝎 as defined above is said to be a
graph-cover pseudocodeword of degree 𝑚.

12.2 Example: Figure 12.2 depicts the factor graph of a nominal rate-1/3 3D-TC with 𝜆 = 1/4
and input length 𝐾 = 4, using the regular puncturing pattern 𝐩 = [11000000]. The upper part
to the left is the factor graph of 𝐶a , the lower part to the left is the factor graph of 𝐶b , and
the right part is the factor graph of 𝐶c . The brown squares in the graph are check vertices
corresponding to trellis sections. The single circles are input and parity vertices, and the
double circles are state vertices. Figure 12.3 depicts a degree-2 cover of the factor graph from
Figure 12.2. The six small blue rectangular boxes are permutations of size 2 that can be chosen
arbitrarily. C

12.3 Proposition: The following statements are true:


(1) The points in 𝒬̇ Π,Πc ∩ ℚu� are in one-to-one correspondence with 𝒫Π,Πc , where ℚ is the
set of rational numbers and 𝒫Π,Πc is the set of all graph-cover pseudocodewords from all
finite graph covers of the 3D-TC factor graph.
(2) All vertices of 𝒬̇ Π,Πc have rational entries. C

A similar result was proved in [9] for LDPC codes. We give a formal proof for the case of
3D-TCs in Appendix 12.B. It is inspired by the one in [9], but is a bit more involved because
pseudocodewords are defined only indirectly by a linear image of the polytope 𝒬. Note that
our proof does not depend on the detailed set-up of 3D-TCs, so it can be extended to all sorts of
turbo-like coding schemes.
When decoding 3D-TCs by solving the linear program in (12.3), there is always a vertex of
𝒬̇ Π,Πc at which the optimum value is attained. We can therefore assume that the LP decoder
always returns a vertex and hence, by Proposition 12.3, a (graph-cover) pseudocodeword.
Furthermore, the pseudoweight on the AWGN channel of a nonzero pseudocodeword 𝝎 is
defined as [9, 21]
u�−1 2
‖𝝎‖21 (∑u�=0 𝜔u� )
𝑤AWGN (𝝎) = = (12.4)
‖𝝎‖22 u�−1
∑u�=0 𝜔2u�
where ‖⋅‖u� is the ℓu� -norm of a vector.

12.4 Proposition: Let 𝐶 denote a given 3D-TC with interleavers Π and Πc . For any pseudocode-
word 𝝎, the support set 𝜒(𝝎) of 𝝎, i.e., the index set of the nonzero coordinates, is a stopping set
according to [4, Def. 1]. Conversely, for any stopping set 𝒮 = 𝒮(Π, Πc ) of the 3D-TC there exists
a pseudocodeword 𝝎 with support set 𝜒(𝝎) = 𝒮. C

190
12.5 Ensemble-Average Pseudoweight Enumerators

Proof. This result can be proved in the same manner as the corresponding result for conven-
tional TCs [10, Lem. 2]. The proof given in [10] is based on the linearity of the subcodes 𝐶ā and
𝐶b̄ (from the stopping set definition in [10, Def. 1]). For 3D-TCs the same proof applies using
the linearity of all the three subcodes 𝐶â , 𝐶b̂ , and 𝐶ĉ from [4, Def. 1]. 

As a consequence of Proposition 12.4, it follows that 𝑤AWGN


min of 𝐶 is upper-bounded by the
ℎmin of 𝐶.
We remark that the 𝑑min can be computed exactly by solving the integer program in (12.3)
with 𝜆u� = 1 for all 𝑙, with integer constraints on all the flow variables 𝐟 in (12.1) and (12.2),
u�−1
i. e., 𝑓u� ∈ {0, 1} for all 𝑙, and with the constraint ∑u�=0 𝑦u� ≥ 1 to avoid obtaining the all-zero
codeword (see [22, Prop. 3.6]). The exact 𝑑min of 3D-TCs has not been computed before in the
literature, but will be computed later in this paper for several codes. For instance, in [3], only
estimates of 𝑑min were provided. Finally, note that the exact ℎmin can be computed in a similar
manner using extended trellis modules in 𝑇x (see [23] for details).

12.5 Ensemble-Average Pseudoweight Enumerators

In this section, we describe how to compute the ensemble-average pseudoweight enumerator of


3D-TCs for a given graph cover degree 𝑚.
Here, we first introduce the concept of a pseudocodeword vector-weight enumerator (PCVWE)
𝒫u� u�x
𝐰,𝐡 of the constituent code 𝐶x . In particular, 𝒫𝐰,𝐡 is the number of pseudocodewords in the
x

constituent code 𝐶x , x = a, b, c, of Hamming vector-weight 𝐡 = (ℎ1 , … , ℎu� ) corresponding


to input sequences of Hamming vector-weight 𝐰 = (𝑤1 , … , 𝑤u� ). The pseudocodewords of a
constituent code 𝐶x are obtained as follows. Let 𝐶(u�)
x denote the degree-𝑚 cover of constituent
code 𝐶x , which is obtained by concatenating 𝐶x by itself 𝑚 times, i.e.,

𝐶(u�)
x = {(𝐱0 , … , 𝐱u�−1 ) ∶ 𝐱u� ∈ 𝐶x , ∀𝑖 ∈ {0, … , 𝑚 − 1}} .

Now, let
𝐱(u�) = (𝑥(0) (0) (u�−1)
0 , … , 𝑥u�x −1 , … , 𝑥0 , … , 𝑥(u�−1)
u�x −1 )

denote a codeword in 𝐶(u�) (u�) (u�)


x , where (𝑥0 , … , 𝑥u�x −1 ) is a codeword in 𝐶x for all 𝑖, 0 ≤ 𝑖 ≤ 𝑚 − 1,
and 𝑁x is the block length of 𝐶x . The corresponding unnormalized pseudocodeword is
u�−1 u�−1
𝝎(𝐱(u�) ) = ( ∑ 𝑥(u�) (u�)
0 , … , ∑ 𝑥u�x −1 ) (12.5)
u�=0 u�=0

where addition is integer addition, which means that each component of a pseudocodeword is
an integer between 0 and 𝑚. The 𝑗th component ℎu� of the vector-weight 𝐡 = (ℎ1 , … , ℎu� ) of the
pseudocodeword in (12.5) is the number of components in the pseudocodeword with value 𝑗,
i.e.,
u�−1
ℎu� = ∣{𝑙 ∶ ∑ 𝑥(u�)
u� = 𝑗 and 𝑙 ∈ {0, … , 𝑁x − 1}}∣ .
u�=0

191
Chapter 12 Minimum Pseudoweight Analysis of 3-Dimensional Turbo Codes

The PCVWE of constituent code 𝐶x can be computed using a nonbinary trellis constructed from
the ordinary (information bit-oriented) trellis 𝑇x . This trellis will be called the pseudocodeword
trellis and is denoted by 𝑇PC PC
x,u� for constituent code 𝐶x . The procedure to construct 𝑇x,u� from 𝑇x
is described below.

12.5.1 Constructing 𝑇PC


x,u� From 𝑇x

The pseudocodeword trellis 𝑇PC PC PC PC PC PC


x,u� = 𝑇x,u� (𝑉x,u� , 𝐸x,u� ), where 𝑉x,u� is the vertex set and 𝐸x,u� is
the edge set, can be constructed from the trellis 𝑇x in the following way. First, define the sets
u�
̃
𝑉PC ⏞⏞⏞⏞⏞⏞⏞⏞⏞
x,u�,u� = 𝑉x,u� × 𝑉x,u� × ⋯ × 𝑉x,u�
(0) (u�−1)
̃
𝐸PC
x,u�,u� = {((𝑣l , … , 𝑣l ), (𝑣(0) (u�−1)
r , … , 𝑣r )) ∶ (𝑣(u�) (u�)
l , 𝑣r ) ∈ 𝐸x,u� , ∀𝑖 ∈ {0, … , 𝑚 − 1}}

where the time index 𝑡 runs from 0 to 𝐼x (resp. 𝐼x − 1) for the vertices (resp. edges). The label
of an edge ((𝑣(0)
l , … , 𝑣l
(u�−1)
), (𝑣(0) (u�−1)
r , … , 𝑣r ̃
)) ∈ 𝐸PC
x,u�,u� is the integer sum of the labels of its
(u�) (u�)
constituent edges (𝑣l , 𝑣r ) ∈ 𝐸x,u� for all 𝑖, 0 ≤ 𝑖 ≤ 𝑚 − 1, which makes the trellis (to be
constructed below) nonbinary in general.
Let Ψ(⋅) denote a permutation that reorders the components of a vertex (𝑣(0) , … , 𝑣(u�−1) ) ∈
̃
𝑉PC
x,u�,u� according to their labels in a nondecreasing order. As an example, for 𝑚 = 3,

Ψ(𝑣1 , 𝑣0 , 𝑣2 ) = (𝑣0 , 𝑣1 , 𝑣2 ) and Ψ(𝑣2 , 𝑣1 , 𝑣0 ) = (𝑣0 , 𝑣1 , 𝑣2 ),

assuming that vertex 𝑣u� has label 𝑖. Now, define the vertex set 𝑉PC
x,u�,u� by expurgating vertices
̃x,u�,u� as follows:
from the vertex set 𝑉PC

𝑉PC (0)
x,u�,u� = {Ψ(𝑣 , … , 𝑣
̃
(u�−1) ) ∶ (𝑣(0) , … , 𝑣(u�−1) ) ∈ 𝑉PC
x,u�,u� } .

The edge set 𝐸PC ̃


PC
x,u�,u� is defined by expurgating edges from the edge set 𝐸x,u�,u� as follows:

(0) (u�−1)
𝐸PC
x,u�,u� = {(Ψ(𝑣l , … , 𝑣l ), Ψ(𝑣(0) (u�−1)
r , … , 𝑣r )),
∀ ((𝑣(0) (u�−1)
l , … , 𝑣l ), (𝑣(0) (u�−1)
r , … , 𝑣r ̃
)) ∈ 𝐸PC
x,u�,u� }

where all duplicated edges (edges with the same left and right vertex and edge label) are
expurgated. The final pseudocodeword trellis is constructed by concatenating the trellis sections
𝑇PC PC PC PC
x,u�,u� = 𝑇x,u�,u� (𝑉x,u�,u� , 𝐸x,u�,u� ), 𝑡 = 0, … , 𝐼x − 1.

As an example, in Figure 12.4, we show both the standard trellis section 𝑇x,u� (on the left) and
the pseudocodeword trellis section 𝑇PC
x,u�,u� for 𝑚 = 2 (on the right), both being invariant of the
time index 𝑡, of the accumulator code with generator polynomial 1/(1 + 𝐷). Note that for the
pseudocodeword trellis section there are two edges with labels 2/1 and 0/1, respectively, from
the middle vertex to the middle vertex.
|u�x,u� |
12.5 Lemma: For 𝑚 = 2, |𝑉PC PC 2
x,u�,u� | = |𝑉x,u� | + ( 2 ) and |𝐸x,u�,u� | = 2|𝑉x,u� | + |𝑉x,u� |. C

192
12.5 Ensemble-Average Pseudoweight Enumerators

The proof of this lemma is given in Appendix 12.C.


For a 4-state encoder this means 10 states, and for an 8-state encoder this means 36 states. For
an accumulator we only have 3 states as can be seen in Figure 12.4 (the right trellis section).
Similar formulas for the number of vertices and edges can be derived for 𝑚 > 2, but are omitted
for brevity here.
Now, we define an equivalence relation on the set of pseudocodewords as follows. The pseu-
docodewords 𝝎1 and 𝝎2 are said to be equivalent if and only if there exists a positive real
number Δ such that 𝝎1 = Δ ⋅ 𝝎2 , i.e., they are scaled versions of each other. As a consequence
only a single pseudocodeword from an equivalence class can be a vertex of the decoding poly-
tope, which justifies counting equivalence classes only. Running a Viterbi-like algorithm (see
Section 12.5.2 below for details) on the pseudocodeword trellis 𝑇PC
x,u� constructed above, will, in
general, count pseudocodewords from the same equivalence class.
However, counting pseudocodewords instead of their equivalence classes does not violate the
bounding argument of Section 12.5.5 below, but may lead to a loose bound. For 𝑚 = 2, for
instance, these duplicates can be removed by a simple procedure which removes all terms of
𝒫u�
𝐰,𝐡 with vector-weights (𝐰, 𝐡) = ((0, 𝑤), (0, ℎ)).
x

As a final remark, the issue of counting pseudocodewords from the same equivalence class is
not considered in [14, 24] in the context of LDPC code ensembles.

12.5.2 Computing the PCVWE 𝒫u�


𝐰,𝐡
x

The computation of the PCVWE 𝒫u� 𝐰,𝐡 for a constituent code 𝐶x can be performed (for a
x

given cover degree 𝑚) on the corresponding pseudocodeword trellis 𝑇PC x,u� , similarly to the
computation (on the trellis 𝑇x ) of the input-output weight enumerator. The algorithm performs
𝐼x steps in the trellis. At trellis depth 𝑡, and for each state 𝑠, it computes the partial enumerator
𝒫u�𝐰,𝐡 (𝑡, 𝑠) giving the number of paths in the trellis merging to state 𝑠 at trellis depth 𝑡 with
x

input vector-weight 𝐰 and output vector-weight 𝐡. In particular, 𝒫u� 𝐰,𝐡 (𝑡, 𝑠) is computed from
x

𝒫u� u� u� u� u�
𝐰,𝐡 (𝑡 − 1, 𝑠 (𝑒)) by considering all edges 𝑒 = (𝑠 (𝑒), 𝑠 (𝑒)) from starting state 𝑠 (𝑒) (at time
x

u�
𝑡 − 1) to ending state 𝑠 (𝑒) = 𝑠 (at time 𝑡) according to the dynamic programming principle.
Finally, 𝒫u� u�x
𝐰,𝐡 = 𝒫𝐰,𝐡 (𝐼x , 𝑠0 ), where 𝑠0 denotes the all-zero state.
x

The number of required computations per trellis section is |𝐸PC u� u�


x,u�,u� | ⋅ 𝑤max ⋅ ℎmax , where 𝑡 =
0, … , 𝐼x − 1 and 𝑤max (resp. ℎmax ) is the maximum entry of 𝐰 (resp. 𝐡) that we consider. Note
that the computational complexity and the memory requirements scale exponentially with the
cover degree 𝑚 (𝑚 = 1 corresponds to the codeword input-output weight enumerator).

12.5.3 Average Pseudoweight Enumerator With Random Puncturing Pattern 𝐩

We assume a random puncturing pattern for 𝐩. In particular, the puncturing patterns are
sampled uniformly at random from the ensemble of puncturing patterns 𝐩 with a fraction of 𝜆

193
Chapter 12 Minimum Pseudoweight Analysis of 3-Dimensional Turbo Codes

0/2
(1, 1) (1, 1)
1/1
2/0

1/2
0/1 2/1 0/1
1 1 (0, 1) (0, 1)
1/0 1/0
2/2

1/1 1/1
0 0 (0, 0) (0, 0)
0/0 0/0

Figure 12.4: The standard trellis section 𝑇x,u� (on the left) and the pseudocodeword trellis section
𝑇PC
x,u�,u� (on the right), both being invariant of the time index 𝑡, of the accumulator
code for 𝑚 = 2. Note that for the pseudocodeword trellis section there are two
edges with labels 2/1 and 0/1, respectively, from the middle vertex to the middle
vertex. The vertices of the pseudocodeword trellis section are labeled according to
the vertex labeling of the standard trellis section on the left.

ones. Now, using the concept of uniform interleaver, the ensemble-average pseudocodeword
input-output vector-weight enumerator is
u�
u� 2u�−∑ u�u�
𝒫u� u�b
𝐰,𝐪a 𝒫𝐰,𝐪−𝐪a
a [∏u�=1 (u�u�u�u�)] (2u�u�−∑u�=1
u� ) 𝒫u�c
u�=1 u�u� 𝐧,𝐡−𝐰−𝐪+𝐧
𝒫̄ 𝐰,𝐡 = ∑ u�
⋅ 2u�
⋅ (12.6)
𝐪,𝐪a ,𝐧 (u� ,u� ,…,u� )
1 2 u�
(2u�u� ) (u�1,u�2u�u�
2 ,…,u�u�
)

where 𝒫̄ 𝐰,𝐡 gives the average number (over all interleavers) of unnormalized pseudocodewords
of input vector-weight 𝐰 and output vector-weight 𝐡. In (12.6),
𝐾 𝐾!
( )= u� ,
𝑤1 , 𝑤2 , … , 𝑤u� 𝑤1 ! ⋯ 𝑤u� ! (𝐾 − ∑u�=1 𝑤u� )!
𝐪a is the output vector-weight from the constituent code 𝐶a , 𝐪 is the total output vector-weight
from the outer turbo code, and 𝐧 is input vector-weight for the inner constituent code 𝐶c .
We remark that (12.6) can be seen as a nonbinary version of [3, Eq. (2)].
Now, the ensemble-average pseudoweight enumerator on channel ℋ is

𝒫̄ ℋ
u� = ∑ ∑ 𝒫̄ 𝐰,𝐡
𝐰 𝐡∶
u�ℋ (𝐡)=u�

where 𝑤ℋ (𝐡) is the weight metric on ℋ. For instance, if ℋ is the AWGN channel, then
u� 2 u�
𝑤ℋ (𝐡) = (∑ 𝑗 ⋅ ℎu� ) / ∑ 𝑗2 ⋅ ℎu�
u�=1 u�=1
u�
and if ℋ is the binary erasure channel, then 𝑤ℋ (𝐡) = ∑u�=1 ℎu� .

194
12.6 Searching for the Minimum Pseudoweight

12.5.4 Average Pseudoweight Enumerator With Regular Puncturing Pattern 𝐩

In a similar fashion as for the case with a random puncturing pattern 𝐩, we can modify [3,
Eq. (3)] to arrive at a similar expression (to (12.6)) for the ensemble-average pseudocodeword
input-output vector-weight enumerator. Details are omitted for brevity.

12.5.5 Finite-Length Minimum Pseudoweight Analysis

The ensemble-average pseudoweight enumerator 𝒫̄ ℋ u� can be used to bound the minimum


pseudoweight on ℋ, denoted by 𝑤ℋ min , of the 3D-TC ensemble in the finite-length regime.
In particular, the probability that a code randomly chosen from the ensemble has minimum
pseudoweight 𝑤ℋ min < 𝑤̄ on ℋ is upper-bounded by [25]

<u�̄
Pr(𝑤ℋ ̄ ≤ ∑ 𝒫̄ ℋ
min < 𝑤) u� . (12.7)
u�>0

The upper bound in (12.7) can be used to obtain a probabilistic lower bound on the minimum
pseudoweight of a code ensemble. For a fixed value of 𝜖, where 𝜖 is any positive value between
0 and 1, we define the probabilistic lower bound with probability 𝜖, denoted by 𝑤ℋ min,LB,u� , to be
the largest real number 𝑤̄ such that the right-hand side of (12.7) is at most 𝜖. This guarantees
that Pr(𝑤ℋmin ≥ 𝑤)̄ ≥ 1 − 𝜖.

12.6 Searching for the Minimum Pseudoweight

In this section, we present an efficient heuristic to search for low-weight pseudocodewords of


3D-TCs. We use the recently published improved minimum pseudoweight estimation algorithm
by Chertkov and Stepanov [12]. In the following, we review that algorithm, restated for 3D-TCs
and in a more convenient language.
Recall that the determination of 𝑤AWGN
min amounts to minimizing (12.4) over all nonzero vertices
̇
of 𝒬Π,Πc . Some important observations allow us to state an equivalent but simpler problem.
Koetter and Vontobel [26] already noted that

𝑤AWGN
min = min 𝑤AWGN (𝝎)
u�∈conic(𝒬̇ Π,Πc )
u�≠𝟎

where conic(𝒬̇ Π,Πc ) is the conic hull of 𝒬̇ Π,Πc (also termed the fundamental cone). The
statement follows immediately from the fact that 𝑤AWGN (𝝎) = 𝑤AWGN (𝜏𝝎) for any pseu-
docodeword 𝝎 and for all 𝜏 > 0. The same property allows us to further restrict the search
region to the conic section

ℱsec = conic(𝒬̇ Π,Πc ) ∩ {𝝎 ∶ ‖𝝎‖1 = 1}

195
Chapter 12 Minimum Pseudoweight Analysis of 3-Dimensional Turbo Codes

because every nonzero pseudocodeword may be scaled to satisfy the normalizing condition
without changing the pseudoweight. The benefit of this step is twofold: First, in contrast to
(12.6) the domain of optimization now is a polytope that can be stated explicitly by means
of (in)equalities. Secondly, minimizing the pseudoweight 𝑤AWGN (𝝎) = ‖𝝎‖21 /‖𝝎‖22 now is
equivalent to maximizing ‖𝝎‖22 , since the numerator is constant on ℱsec .
We are thus in the situation of maximizing a convex function (‖⋅‖22 ) on a convex polytope. While
this is an NP-hard problem in general, the following heuristic proposed in [12] gives very good
results in practice.
For 𝝎 ∈ ℱsec , it holds that
1
‖𝝎‖22 = ‖𝝎 − 𝟏/𝑁‖22 + ,
𝑁
where 𝟏/𝑁 = (1, … , 1)/𝑁, i.e., our goal is to maximize, within ℱsec , the distance to the central
point 𝟏/𝑁 (the constant u�1 does not affect maximization). Chertkov and Stepanov proposed
to first generate a random point 𝝎(0) ≠ 𝟏/𝑁 on ℱsec , serving as the initial search direction.
Then, the linear program

𝝎(u�+1) = arg max (𝝎(u�) − 𝟏/𝑁)𝝎u� subject to 𝝎 ∈ ℱsec

where (⋅)u� denotes the transpose of its argument, is solved iteratively until the stopping criterion
𝝎(u�+1) = 𝝎(u�) is reached. In each iteration, ∥𝝎(u�) − 𝟏/𝑁∥1 increases and therefore ∥𝝎(u�) − 𝟏/𝑁∥2
increases as well, and the result is a local maximum. The search is repeated for an arbitrary
number of times in different random directions.
In the case of LDPC codes which are covered in [12], an explicit description of the polytope
in question by means of inequalities is available, thus the fundamental cone can be described
explicitly as well by omitting those inequalities which are not tight at 𝝎 = 𝟎 [26]. This is
however not the case for the polytope 𝒬̇ Π,Πc which is only implicitly given as the projection
of 𝒬Π,Πc onto 𝐲. Instead, as we will now show, the cone can be obtained by dropping upper
bound constraints on all variables while ensuring that the total flow is equal on all three trellis
graphs.
∣u�x ∣
For x = a, b, c, let 𝒬u� x
x be defined as the set of all 𝐟 ∈ ℝ≥0 , where ℝ is the real numbers,
satisfying (12.1b) and the following modified version of (12.1a):

∑ 𝑓xu� = 𝜏
u�∈u�x,0

and let
ℱ = {𝐟 = (𝐟a , 𝐟b , 𝐟c ) ∶ ∃𝜏 > 0 ∶ 𝐟x ∈ 𝒬u�
x for x = a, b, c}

which is, like 𝒬, the set of all network flows in the trellis graphs, but now with an arbitrary
positive total flow 𝜏 instead of 1. Analogously to 𝒬Π,Πc , we define ℱΠ,Πc as the set of (𝐲,̃ 𝐟)
where 𝐲̃ ∈ ℝu�+2u�u�
≥0 and 𝐟 = (𝐟a , 𝐟b , 𝐟c ) ∈ ℱ and additionally (12.2b) is satisfied. The following
lemma shows that the projection of ℱΠ,Πc onto 𝐲 indeed yields the fundamental cone of 3D-TC
LP decoding.

196
12.7 Numerical Results

35

30

25
u�̂ AWGN

20
min

15

10
Random interleavers
QPP interleavers
5
5 10 15 20 25 30 35
u�min

Figure 12.5: Estimated 𝑤AWGN


min , using the algorithm from [13] (which is straightforward to
apply to 3D-TCs), and exact 𝑑min for 3D-TCs with 100 randomly selected pairs of
interleavers (blue plus signs) and with 100 randomly selected pairs of QPP-based
interleavers (green x-marks). The diagonal line gives the trivial upper bound of
𝑑min on 𝑤AWGN
min provided by Proposition 12.4. 𝐾 = 128 and 𝑅 = 1/3.

12.6 Lemma: Let ℱ̇ Π,Πc be the projection of ℱΠ,Πc onto the first 𝑁 variables. Then, ℱ̇ Π,Πc =
conic(𝒬̇ Π,Πc ). C

See Appendix 12.D on page 207 for a proof.

12.7 Numerical Results

In this section, we present some numerical results when the interleaver pair (Π, Πc ) is taken
from the set of all possible interleaver pairs, and when it is taken from the set of pairs of quadratic
permutation polynomials (QPPs) over integer rings. Permutation polynomial based interleavers
over integer rings for conventional TCs were first proposed in [27]. These interleavers are
fully algebraic and maximum contention-free [28], which makes them very suitable for parallel
implementation in the turbo decoder. QPP-based interleavers for conventional TCs were also
recently adopted for the 3GPP LTE standard [20]. We remark that for the results below, 𝜆 = 1/4
and the regular puncturing pattern 𝐩 = [11000000] are assumed. As shown in [3], 𝜆 = 1/4
gives a suitable trade-off between performance in the waterfall and error floor regions. Finally,
we emphasize that all the numerical estimates of the 𝑑min and 𝑤AWGN min given below are actually
also upper bounds on the exact values.

197
Chapter 12 Minimum Pseudoweight Analysis of 3-Dimensional Turbo Codes

12.7.1 Ensemble-Average Results for 𝐾 = 128 and 𝑅 = 1/3

In Figure 12.5, we present the exact 𝑑min and an estimate of 𝑤AWGN


min (which is also an upper
bound), denoted by 𝑤̂ AWGN
min , of unpunctured 3D-TCs with 𝐾 = 128 and with 100 randomly
selected pairs of interleavers (Π, Πc ) (blue plus signs). The corresponding results with QPP-
based interleaver pairs (and with no constraints on the inverse polynomials) are also displayed
(green x-marks). For all codes, except 11, the estimated 𝑤AWGN
min is at most equal to the 𝑑min .
AWGN
The values of 𝑤min were estimated using the algorithm from [13] (which is straightforward
to apply to 3D-TCs) with a signal-to-noise ratio (SNR) of 2.0 dB and 500 evaluations of the
algorithm, while the 𝑑min was computed exactly as described in the second paragraph following
the proof of Proposition 12.4. Note that when the 𝑑min is strictly smaller than the estimated
𝑤AWGN
min (points above the diagonal line), the estimation algorithm from [13] was unable to
provide an estimate that beats the trivial upper bound provided by Proposition 12.4. From the
figure, it follows that QPPs give better codes (can provide a higher 𝑑min and a higher 𝑤̂ AWGN
min ),
and that 𝑤AWGN
min is strictly lower than 𝑑min for most codes when the 𝑑min is large. As a side
remark, the algorithm from Section 12.6 gives slightly worse results (the average 𝑤̂ AWGN min
increases by approximately 0.05) than with the algorithm from [13] with the same number of
runs (500) per instance. However, the algorithm from Section 12.6 is significantly faster.

12.7.2 Exhaustive/Random Search Optimizing 𝑤AWGN


min

In this subsection, we present the results of a computer search for pairs of QPPs with a quadratic
inverse for 𝐾 = 128, 256, and 320 for unpunctured 𝑅 = 1/3 3D-TCs. The objective of the search
was to find pairs of QPPs giving a large estimated 𝑤AWGN
min . To speed up the search, an adaptive
threshold on the minimum AWGN pseudoweight 𝑤AWGN min was set in the search, in the sense
that if a pseudocodeword of AWGN pseudoweight smaller than the threshold was found, then
this particular candidate pair of QPPs was rejected.

For 𝐾 = 128, we performed an exhaustive search over all 217 pairs of QPPs (with a quadratic
inverse). The minimum AWGN pseudoweight was estimated using the algorithm from [13]
(which is straightforward to apply to 3D-TCs) with an SNR of 1.7 dB and 500 evaluations of
the algorithm.

In Figure 12.6, we plot the exact 𝑑min (red circles), the exact ℎmin (green x-marks), and 𝑤̂ AWGN
min
(blue plus signs) of the 75 3D-TCs with the best 𝑤̂ AWGN min . For each point in the figure, the
𝑥-coordinate corresponds to the sample index (the results are ordered by increasing 𝑑min ), while
the 𝑦-coordinate is either the exact 𝑑min , the exact ℎmin , or 𝑤̂ AWGN
min . From the figure, we observe
that the best 𝑤AWGN
min (which is at most 30.2139) is strictly smaller than the best possible 𝑑min or
ℎmin . The best possible 𝑑min was established to be 38 (exhaustive search), and for this particular
code ℎmin = 36, but the estimate of 𝑤AWGNmin is not among the 75 best; it is only 29.6042 (see
Table 12.7 which shows the results of an exhaustive/random search optimizing the 𝑑min for
pairs of QPPs with a quadratic inverse).

198
12.7 Numerical Results

36

u�min , ℎmin , and estimated u�AWGN


min
34

32

30

0 10 20 30 40 50 60 70 80
Sample number (orderd by increasing u�min )

Figure 12.6: Exact 𝑑min (red circles), exact ℎmin (green x-marks), and 𝑤̂ AWGN
min (blue plus signs)
of the 75 best (in terms of 𝑤̂ AWGN
min ) QPP-based interleaver pairs for the 3D-TC with
input block length 𝐾 = 128 and code rate 𝑅 = 1/3.

For 𝐾 = 256, only a partial search has been conducted. The largest found value for 𝑤̂ AWGN
min ,
after taking about 180000 samples, which is close to 17% of the whole space, is 43.0335. As
for 𝐾 = 128, the minimum AWGN pseudoweight was estimated using the algorithm from [13]
(which is straightforward to apply to 3D-TCs) with an SNR of 1.7 dB and 500 evaluations of
the algorithm.

For 𝐾 = 320, we again performed an exhaustive search over the 218 pairs of QPPs with a
quadratic inverse. This time we used the algorithm presented in Section 12.6 with 500 iterations
per code. The largest estimated minimum pseudoweight 𝑤̂ AWGN min that we found was 46.0612,
which is considerably larger than that for the code in Table 12.7 (which shows the results of an
exhaustive/random search optimizing the 𝑑min for pairs of QPPs with a quadratic inverse).

12.7.3 Exhaustive/Random Search Optimizing 𝑑min

We also performed an exhaustive/random search optimizing the 𝑑min for pairs of QPPs with a
quadratic inverse for selected values of 𝐾 for unpunctured 𝑅 = 1/3 3D-TCs. For 𝐾 = 128, 160,
192, and 208, the search was exhaustive, in the sense that each pair of interleavers was looked
at. In the search, the 𝑑min was estimated using the triple impulse method [29]. The results are
given in Table 12.7 for selected values of 𝐾, where 𝑓(𝑥) = 𝑓1 𝑥 + 𝑓2 𝑥2 (mod 𝐾) generates the TC
̂ and
̃ = 𝑓1̃ 𝑥 + 𝑓2̃ 𝑥2 (mod 𝑁c ) generates the permutation in the patch, and 𝑑min
interleaver, 𝑓(𝑥)
AWGN
𝑤̂ min AWGN
denote the estimated 𝑑min and the estimated 𝑤min , respectively. The estimates of
AWGN
𝑤min were obtained by using the algorithm from [13] (which is straightforward to apply to
3D-TCs) with SNR and number of evaluations of the algorithm given in the ninth and tenth

199
Chapter 12 Minimum Pseudoweight Analysis of 3-Dimensional Turbo Codes

𝐾 𝑓1 𝑓2 𝑁c 𝑓1̃ 𝑓2̃ ̂
𝑑min 𝑤̂ AWGN
min SNR Evaluations
128 a 55 96 64 9 16 38 b 29.6042 2.0 dB 2000
160 a 131 60 80 9 20 42 b 30.0000 1.7 dB 500
192 a 35 24 96 11 12 46 32.9046 1.7 dB 500
208 a 165 182 104 37 26 49 36.3370 1.7 dB 500
256 239 192 128 37 32 52 42.7816 1.7 dB 500
320 183 280 160 57 20 58 41.3818 1.7 dB 500
512 c 175 192 256 15 192 67 45.5872 2.0 dB 500

Table 12.7: Results from an exhaustive/random search for pairs of QPPs with 𝜆 = 1/4, both
with a quadratic inverse, in which the first QPP 𝑓(𝑥) = 𝑓1 𝑥 + 𝑓2 𝑥2 (mod 𝐾) generates
the TC interleaver and the second QPP 𝑓(𝑥) ̃ = 𝑓1̃ 𝑥 + 𝑓2̃ 𝑥2 (mod 𝑁c ) generates the
permutation in the patch. Moreover, terms like “SNR” are explained in the text.
a
Exhaustive search, which implies that the corresponding u�min ̂ is an upper bound on the optimum u�min (the
true optimum u�min when the estimate u�min ̂ is exact) for this input block length.
b
This is the exact u�min , and we can observe a large gap between u�min and u�AWGN
min .
c
The QPPs are taken from [3].

column of the table, respectively. Finally, we remark that the codes in the first and second
rows, for 𝐾 = 128 and 160, are 𝑑min -optimal, in the sense that there does not exist any pair of
QPPs (with a quadratic inverse) giving a 𝑑min strictly larger than 38 and 42, respectively, for
the unpunctured 3D-TC.

12.7.4 Ensemble-Average Results for Various 𝐾 and 𝑅 = 1/3

In Figure 12.8, we present the average estimated (now using the algorithm from Section 12.6)
minimum AWGN pseudoweight of 3D-TCs for 𝐾 = 128, 160, 192, 208, 256, 320, 512, 640, 768,
1024, and 1504. Both random interleaver pairs and QPP-based (with a quadratic inverse)
interleaver pairs have been considered. In both cases, we generated 40 interleaver pairs of each
size. For each code we ran 𝐾/10 trials of the estimation algorithm described in Section 12.6.
From Figure 12.8, we observe that the average 𝑤̂ AWGN
min grows with 𝐾 for both random interleaver
pairs and QPP-based interleaver pairs. For all values of 𝐾, as expected, the average 𝑤̂ AWGN
min is
higher for QPP-based interleaver pairs than for random interleaver pairs. As a comparison,
we have also plotted the corresponding theoretical values 𝑤AWGNmin,LB,0.5 from Section 12.5 (using
(12.7)) for graph cover degree 2. Also, for comparison, we have plotted the corresponding
lower bounds on the 𝑑min and the ℎmin using a similar ensemble analysis as the one from
Section 12.5. For details, we refer the interested reader to [3, 4]. Note that the curves coincide
for small values of 𝐾. The reason that the curve for the probabilistic lower bound on 𝑤AWGN min
of the 3D-TC ensemble is higher than the corresponding curve for ℎmin is that the cover degree
is limited to 𝑚 = 2. In general, the pseudocodewords with support set equal to a small-size
stopping set which is not a codeword have a cover degree which is quite large. We would expect
that the curve for the probabilistic lower bound on 𝑤AWGN
min of the 3D-TC ensemble go further

200
12.8 Conclusion

60

50
Minimum AWGN pseudoweight

40

30

QPP Π/Πc
20 Random Π/Πc
ensemble-avg. u�min
ensemble-avg. u�AWGN
min
ensemble-avg. ℎmin
10
0 200 400 600 800 1,000 1,200 1,400 1,600
Information length u�

Figure 12.8: The average estimated minimum AWGN pseudoweight for 3D-TCs for different
information block lengths 𝐾, both for QPP-based (with a QPP inverse) and random
interleaver pairs. The lower curves show the probabilistic lower bounds on 𝑑min ,
𝑤AWGN
min , and ℎmin of the 3D-TC ensemble (for cover degrees of at most 𝑚 = 2).

down for 𝑚 larger than 2. However, it is currently unfeasible to do the actual computations
for larger values of 𝑚 (both the computational complexity and the memory requirements scale
exponentially with 𝑚). For ease of computation we have used random puncturing patterns 𝐩
to compute the curves, while the estimated average values are for regular patterns which in
general give better results.

12.8 Conclusion

In this work, we performed a minimum pseudoweight analysis of pseudocodewords of (relaxed)


LP decoding of 3D-TCs, adapting the LP relaxation proposed by Feldman in his thesis for
conventional TCs. We proved that the 3D-TC polytope is proper and 𝐶-symmetric, and made a
connection to finite graph covers of the 3D-TC factor graph. This connection was used to show
that the support set of any pseudocodeword is a stopping set (as defined in [4, Def. 1]), and
enabled a finite-length minimum pseudoweight analysis. Furthermore, an explicit description
of the fundamental cone of the 3D-TC polytope was given. Finally, both a theoretical and an
extensive numerical study of the minimum AWGN pseudoweight of small-to-medium block
length 3D-TCs was presented, which showed that 1) typically (i.e., in most cases) when the
𝑑min and/or the ℎmin is high, 𝑤AWGN
min is strictly smaller than both the 𝑑min and the ℎmin for
these codes, and 2) that 𝑤AWGN
min grows with the block length, at least for small-to-medium

201
Chapter 12 Minimum Pseudoweight Analysis of 3-Dimensional Turbo Codes

block lengths. For instance, the exhaustive search for 𝐾 = 128 over the entire class of QPP-
based interleaver pairs (with a quadratic inverse) revealed that the best minimum AWGN
pseudoweight is strictly smaller than the best minimum/stopping distance. It is expected that
the 𝑤AWGN
min will dominate the decoding performance for high SNRs.

12.A Proof of Proposition 12.1

We first prove a more general result and then show how it applies to our case.

12.7 Lemma: Let 𝐶u� , 𝛿 = 1, … , Δ, be linear block codes of the same length 𝑁 and let 𝐶 =
⋂Δu�=1 𝐶u� . Then,

(1) conv(𝐶u� ) is proper and 𝐶u� -symmetric for all 𝛿, and


(2) ⋂Δ
u�=1 conv(𝐶u� ) is proper and 𝐶-symmetric. C

Proof. (1) The convex hull conv(𝐶u� ) is proper because all codewords are by definition
vertices of the polytope. Moreover, because no vertex of the unit hypercube is the convex
combination of others, conv(𝐶u� ) cannot contain any other integral points. To show
𝐶u� -symmetry, choose 𝐚 ∈ conv(𝐶u� ) and 𝐜 ∈ 𝐶u� arbitrarily. By construction, 𝐚 can be
written as a convex combination of codewords of 𝐶u� , i.e.,
|u�u� | |u�u� |
𝐚 = ∑ 𝜆u� 𝐜u� where ∑ 𝜆u� = 1 and 𝜆u� ≥ 0.
u�=1 u�=1

We claim that
|u�u� | |u�u� |
|𝐚 − 𝐜| = ∑ 𝜆u� (𝐜u� ⊕ 𝐜) = ∑ 𝜆u� 𝐜u�̃ (12.8)
u�=1 u�=1
where ⊕ denotes integer addition modulo 2 and 𝐜u�̃ is the 𝑖th codeword of 𝐶u� using a
different ordering. This would imply 𝐶u� -symmetry, i.e., if 𝐚 ∈ conv(𝐶u� ) and 𝐜 ∈ 𝐶u� ,
then |𝐚 − 𝐜| ∈ conv(𝐶u� ).
Let 𝑎u� , 𝑐u� , and 𝑐u�,u� denote the 𝑗th coordinate of 𝐚, 𝐜, and 𝐜u� , respectively. The first equality
in (12.8) follows for 𝑐u� = 0 from
|u�u� | |u�u� |
∣𝑎u� − 𝑐u� ∣ = 𝑎u� = ∑ 𝜆u� 𝑐u�,u� = ∑ 𝜆u� (𝑐u�,u� ⊕ 𝑐u� )
u�=1 u�=1

and for 𝑐u� = 1 because


|u�u� | |u�u� |
∣𝑎u� − 𝑐u� ∣ = 1 − 𝑎u� = ∑ 𝜆u� (1 − 𝑐u�,u� ) = ∑ 𝜆u� (𝑐u�,u� ⊕ 𝑐u� ).
u�=1 u�=1

The second part of (12.8) holds because 𝐜u� ⊕ 𝐶u� = 𝐶u� due to the linearity of 𝐶u� .

202
12.B Proof of Proposition 12.3

(2) Let 𝒫 = ⋂Δ u�=1 conv(𝐶u� ). The properness of 𝒫 for 𝐶 follows immediately from the
properness of conv(𝐶u� ), 𝛿 = 1, … , Δ, and the definition of 𝐶. Now, if 𝐚 ∈ 𝒫 and 𝐜 ∈ 𝐶,
then for 𝛿 = 1, … , Δ we have 𝐚 ∈ conv(𝐶u� ) and 𝐜 ∈ 𝐶u� , so by 1) |𝐚 − 𝐜| ∈ conv(𝐶u� )
and thus |𝐚 − 𝐜| ∈ 𝒫. 

Now, let 𝐶 be a 3D-TC. By 𝐶 ̃ we denote the code of length 𝑁 + 2𝜆𝐾 obtained by appending
the hidden parity bits from 𝐶a and 𝐶b which are sent to the patch. For x = a, b, c we define a
supercode 𝐶x̃ of 𝐶 ̃ by unconstraining all bits that are not connected to the constituent code 𝐶x ,
i.e., 𝐱̃ ∈ 𝐶x̃ if and only if (𝑥u�̃ x(0) , … , 𝑥u�̃ x(u�x−1) ) ∈ 𝐶x , where 𝑁x is the block length of 𝐶x and
𝜌x (⋅) is defined in (12.2b), and 𝑥u�̃ ∈ {0, 1} for all remaining 𝑖. Observe that 𝐶ã ∩ 𝐶b̃ ∩ 𝐶c̃ = 𝐶.̃

Next, define polytopes 𝒬xΠ,Πc that are obtained from 𝒬Π,Πc by dropping in (12.2b) all con-
straints not corresponding to 𝐶x , and let 𝒬̃ xΠ,Πc be the projection of 𝒬xΠ,Πc onto the 𝐲̃ variables.
Finally, define 𝒬̃ Π,Πc in analogy to 𝒬̇ Π,Πc as the projection of 𝒬Π,Πc onto the first 𝑁 + 2𝜆𝐾
variables.

Due to the trellis structure, it is easily seen that 𝒬̃ xΠ,Πc = conv(𝐶x̃ ) for x = a, b, c, and by
comparing the polytope definitions we see that 𝒬̃ aΠ,Πc ∩ 𝒬̃ bΠ,Πc ∩ 𝒬̃ cΠ,Πc = 𝒬̃ Π,Πc . Applying
Lemma 12.7 shows that 𝒬̃ Π,Πc is both proper and 𝐶-symmetric.
̃ Now, 𝐶 is the projection of 𝐶 ̃
onto the first 𝑁 variables, and 𝒬Π,Π the corresponding projection of 𝒬̃ Π,Π .
̇
c c

To show that 𝒬̇ Π,Πc is proper, first observe that the projection of any 𝐜 ̃ ∈ 𝐶 ̃ onto the first 𝑁
variables is obviously contained in 𝒬̇ Π,Πc . Conversely, let 𝐚 ∈ 𝒬̇ Π,Πc ∩ {0, 1}u� . By definition
there exists 𝐚̃ ∈ 𝒬̃ Π,Πc such that 𝐚̃ = (𝐚, 𝐚).
̂ In order to show that 𝐚̂ is integral, note that the
systematic part of 𝐚̃ is contained in 𝐚 and thus integral. Again by the trellis structure, this
implies a unique and integral configuration of the flow variables in 𝑇a and 𝑇b , and consequently
also the variables according to the output of those encoders, including the hidden bits sent
to the patch, must be integral. Because 𝒬̃ Π,Πc is proper it follows that 𝐚̃ ∈ 𝐶 ̃ and thereby
also 𝐚 ∈ 𝐶, which proves properness. Finally, note that 𝐶-symmetry is trivially preserved by
projections, which concludes the proof.

12.B Proof of Proposition 12.3

Let 𝒬𝐟Π,Πc ⊂ 𝒬 be the projection of 𝒬Π,Πc onto 𝐟. We call a flow 𝐟 ∈ 𝒬 agreeable if 𝐟 ∈ 𝒬𝐟Π,Πc .
̃ be the uniquely determined element of [0, 1]u�+2u�u� such
For an agreeable flow 𝐟 let 𝐲̃ = 𝐲(𝐟)
that (𝐲,̃ 𝐟) ∈ 𝒬Π,Πc . Analogously, 𝐲(𝐟) is the projection of 𝐲(𝐟)
̃ onto the first 𝑁 variables. Note
that 𝐲(𝐟)
̃ (and 𝐲(𝐟)) can be read off from 𝐟 by (12.2b).

For 𝐟 = (𝐟a , 𝐟b , 𝐟c ) ∈ 𝒬, but not necessarily 𝐟 ∈ 𝒬𝐟Π,Πc , we can still use (12.2b) to deduce local
values of 𝐲.̃ More precisely, if we define

Φ(x) = {𝑗 ∶ 𝑗 = 𝜌x (𝜙x (𝑡, 𝑖)) for some (𝑡, 𝑖)}

203
Chapter 12 Minimum Pseudoweight Analysis of 3-Dimensional Turbo Codes

then for all 𝑗 ∈ Φ(x) we can deduce 𝑦xu�̃ (𝐟) = ∑ u�∈u�x,u�∶ 𝑓xu� where 𝑡 ∈ {0, … , 𝐼x − 1} and 𝑖 ∈
u�u� (u�)=1
{0, … , 𝑛x,u� − 1} are determined by (12.2b). This implies that 𝐟 ∈ 𝒬 is agreeable if and only if

𝑦xu�̃ (𝐟) = 𝑦xu�̃ (𝐟) (12.9)

for all (𝑗, x, x′) such that 𝑗 ∈ Φ(x) ∩ Φ(x′ ) and (x, x′ ) ∈ {(a, b), (a, c), (b, c)}, where the first
case amounts to the outer interleaver Π and the remaining cases are due to the connections to
the patch from 𝐶a and 𝐶b , respectively, via Πc . We denote the set of these triples (𝑗, x, x′ ) by
𝒜.

12.8 Lemma: The relation 𝒫Π,Πc ⊆ 𝒬̇ Π,Πc holds. C

Proof. Let 𝝎(𝐱(u�) ) ∈ 𝒫Π,Πc be a graph-cover pseudocodeword of 𝐶, i.e., there exists a


degree-𝑚 cover code 𝐶(u�) of 𝐶 such that 𝐱(u�) is a codeword of 𝐶(u�) . As before, we can extend
𝐱(u�) to
𝐱̃(u�) = (𝑥(0) (0)
0̃ , … , 𝑥u�+2u�u�−1
̃ , … , 𝑥(u�−1)
0̃ , … , 𝑥(u�−1)
̃
u�+2u�u�−1 )
by appending the parity bits of the copies of 𝐶a and 𝐶b that are sent to copies of 𝐶c .

For each 𝑙 = 0, … , 𝑚 − 1, (𝑥(u�) (u�)


0̃ , … , 𝑥u�+2u�u�−1
̃ ) induces via trellis encoding a flow 𝐟(u�)
u� in 𝒬 with
(u�)
entries only from {0, 1}. In general, 𝐟u� is not agreeable because the connections are mixed
with different copies in the cover graph. However, from the definition of a graph cover we can
conclude that
𝑦xu�̃ (𝐟(u�) x′ (u�u� (u�)) ) (12.10)
u� ) = 𝑦u�̃ (𝐟u�

for all (𝑗, x, x′ ) ∈ 𝒜 and all 𝑙 = 0, … , 𝑚 − 1, where 𝜋u� is the corresponding permutation
introduced by the graph cover, either (in the case x = a and x′ = b) on connections from an
input vertex of Γ(𝐶a ) to a check vertex of Γ(𝐶b ) or (if x′ = c) on connections from a parity
vertex of 𝐶a or 𝐶b to a check vertex of Γ(𝐶c ).

We claim that
1 u�−1 (u�)
𝐟u� = ∑𝐟 (12.11)
𝑚 u�=0 u�

is agreeable and that 𝐲(𝐟u� ) = 𝝎(𝐱(u�) ).

First, note that 𝐟u� is a convex combination of elements from the convex set 𝒬, so 𝐟u� ∈ 𝒬 as
well. To prove agreeability, we verify (12.9) for all (𝑗, x, x′ ) ∈ 𝒜:

1 u�−1 x,(u�)
𝑦xu�̃ (𝐟u� ) = ∑ 𝑓xu�,u� = ∑ ∑𝑓
u�∈u�x,u� ∶ u�∈u�x,u� ∶
𝑚 u�=0 u�,u�
u�u� (u�)=1 u�u� (u�)=1
u�−1 u�−1
1 1 ′ (u� (u�))
= ∑ 𝑦xu�̃ (𝐟(u�)
u� ) = ∑ 𝑦x̃ (𝐟 u� )
𝑚 u�=0 𝑚 u�=0 u� u�
1 u�−1 x′ (u�) ′
= ∑ 𝑦u�̃ (𝐟u� ) = 𝑦xu�̃ (𝐟u� )
𝑚 u�=0

204
12.B Proof of Proposition 12.3

where we have used (12.2b), (12.11), and (12.10). The second-to-last equality follows since 𝜋u� is
a permutation of {0, … , 𝑚 − 1}. This shows that 𝐟u� is agreeable and thus 𝐲(𝐟u� ) is well-defined.

Now, fix 𝑗 ∈ {0, … , 𝑁 − 1} and pick any x such that 𝑗 ∈ Φ(x). Then

1 u�−1 x (u�) 1 u�−1 (u�) 1


𝑦u� (𝐟u� ) = ∑ 𝑦u�̃ (𝐟u� ) = ∑ 𝑥 = ∣{𝑙 ∶ 𝑥(u�)
u� = 1}∣ = 𝜔u� (𝐱
(u�) )
𝑚 u�=0 𝑚 u�=0 u� 𝑚

which concludes the proof. 

Before proving the other direction for rational points, we first show part (2) of Proposition 12.3.

12.9 Lemma: All vertices of 𝒬̇ Π,Πc have rational entries. C

Proof. Let 𝐲 be a vertex of 𝒬̇ Π,Πc . Because 𝒬̇ Π,Πc is a projection of the polytope 𝒬Π,Πc , and
𝒬Π,Πc is the image of 𝒬𝐟Π,Πc under a linear map, there exists some vertex 𝐟 of 𝒬𝐟Π,Πc such
that (𝐲(𝐟),
̃ 𝐟) is also a vertex of 𝒬Π,Πc and 𝐲 is the projection of 𝐲̃ onto the first 𝑁 variables.

Now, 𝒬𝐟Π,Πc is a rational polyhedron (i.e., it is defined by (in)equalities with rational entries
only), so every vertex of 𝒬𝐟Π,Πc is rational [30, p. 123]. Since by (12.2b) each 𝑦u�̃ is just a sum of
entries of 𝐟 for each 𝑗 = 0, … , 𝑁 + 2𝜆𝐾 − 1, 𝐲̃ and thus 𝐲 must be rational as well. 

12.10 Lemma: For every 𝐲 ∈ 𝒬̇ Π,Πc ∩ ℚu� there exists a rational point (𝐲,̃ 𝐟) ∈ 𝒬Π,Πc such
that 𝐲 = 𝐲(𝐟). C

Proof. Let 𝐲 be a rational point of 𝒬̇ Π,Πc . Because 𝒬̇ Π,Πc is a polytope, 𝐲 can be written as a
u�
convex combination of vertices of 𝒬̇ Π,Πc , i.e., 𝐲 = ∑u�=0 𝜆u� 𝐲u� where 𝜆u� ≥ 0 for 𝑘 = 0, … , 𝑑
u�
and ∑u�=0 𝜆u� = 1. Furthermore, by Carathéodory’s theorem (e.g., [31, p. 94]), this is even
possible with some 𝑑 ≤ 𝑁 such that the 𝐲u� , 𝑘 = 0, … , 𝑑, are affinely independent. Consequently,
𝝀 is the unique solution of the system

𝜆0
𝐲0 𝐲1 … 𝐲u� ⎛ ⎞ 𝐲
( )⎜

⎜ ⋮ ⎟⎟
⎟ = (1)
1 1 … 1
⎝𝜆u� ⎠

and by applying Cramer’s rule for solving linear equation systems (and Lemma 12.9 which
guarantees that all 𝐲u� have rational entries) we see that 𝜆u� ∈ ℚ for 𝑘 = 0, … , 𝑑. Furthermore,
the proof of Lemma 12.9 tells us that for each 𝐲u� there is a rational 𝐟u� ∈ 𝒬𝐟Π,Πc such that 𝐲u� =
u�
𝐲(𝐟u� ). The flow 𝐟 = ∑u�=0 𝜆u� 𝐟u� satisfies 𝐲 = 𝐲(𝐟) (because 𝐲(⋅) is linear) and (𝐲(𝐟),
̃ 𝐟) ∈ 𝒬Π,Πc
is rational, which concludes the proof. 

Now, we are able to prove the missing counterpart to Lemma 12.8.

12.11 Lemma: It holds that 𝒬̇ Π,Πc ∩ ℚu� ⊆ 𝒫Π,Πc . C

205
Chapter 12 Minimum Pseudoweight Analysis of 3-Dimensional Turbo Codes

Proof. Let 𝐲 be a rational point of 𝒬̇ Π,Πc . By Lemma 12.10, there exist rational 𝐟 ∈ 𝒬𝐟Π,Πc
and rational 𝐲̃ = (𝐲, 𝐲)̂ such that (𝐲,̃ 𝐟) ∈ 𝒬Π,Πc . Let 𝑚 be the least common denominator of
the entries of 𝐟. Then, 𝐟u� = 𝑚𝐟 is a flow with integral values between 0 and 𝑚. Applying the
flow decomposition theorem [32, p. 80] in this context guarantees that 𝐟u� can be split up into
𝑚 binary flows, i.e.,
u�−1
𝐟u� = ∑ 𝐟(u�)
u� (12.12)
u�=0

where 𝐟(u�)
u� has entries from {0, 1} and represents a valid path for each trellis 𝑇x , x = a, b, c.

Because 𝐟 ∈ 𝒬𝐟Π,Πc , we conclude from (12.9) that 𝑦xu�̃ (𝐟) = 𝑦xu�̃ (𝐟) for all (𝑗, x, x′ ) ∈ 𝒜. This
′ u�−1
is equivalent (by linearity) to 𝑦xu�̃ (𝐟u� ) = 𝑦xu�̃ (𝐟u� ) which by (12.12) means that ∑u�=0 𝑦xu�̃ (𝐟(u�)
u� ) =
u�−1 ′
∑u�=0 𝑦xu�̃ (𝐟(u�) (u�)
u� ). Because all 𝐟u� are {0, 1}-valued, this last equation implies


|{𝑙 ∶ 𝑦xu�̃ (𝐟(u�) x (u�)
u� ) = 1}| = |{𝑙 ∶ 𝑦u�̃ (𝐟u� ) = 1}|

and consequently for each (𝑗, x, x′ ) ∈ 𝒜 a permutation 𝜋u� on {0, … , 𝑚 − 1} can be chosen
′ (u� (u�))
such that 𝑦xu�̃ (𝐟(u�) x
u� ) = 𝑦u�̃ (𝐟u�
u�
) for all 𝑙 = 0, … , 𝑚 − 1. These 𝜋u� define an 𝑚-cover Γ(u�) (𝐶) of
Γ(𝐶), and by construction

𝐱(u�) = (𝑥(0) (0) (u�−1)


0 , … , 𝑥u�−1 , … , 𝑥0 , … , 𝑥(u�−1)
u�−1 )

is a codeword of 𝐶(u�) , where we define 𝑥(u�) x (u�)


u� = 𝑦u�̃ (𝐟u� ) for the first x among (a, b, c) such that
𝑗 ∈ Φ(x). Finally, we see that

1 u�−1 (u�) 1 u�−1 x (u�)


𝜔u� (𝐱(u�) ) = ∑𝑥 = ∑ 𝑦 ̃ (𝐟 ) (for some x)
𝑚 u�=0 u� 𝑚 u�=0 u� u�
1
= 𝑦xu�̃ (𝐟u� ) = 𝑦xu�̃ (𝐟) = 𝑦u�
𝑚

(by definition of 𝒬Π,Πc ) for any 𝑗 = 0, … , 𝑁 − 1, which shows that 𝝎(𝐱(u�) ) = 𝐲. 

12.C Proof of Lemma 12.5

The number of vertices in 𝑉PC x,2,u� will be the number of distinct ordered 2-tuples of integers
modulo |𝑉x,u� |, which is |𝑉x,u� | + (|u�2x,u�|).

Now, let us consider the number of edges, and in particular, the edges from vertex Ψ(𝑣l , 𝑢l ) to
vertex Ψ(𝑣r , 𝑢r ) in 𝑇PC
x,2,u� , where 𝑣l , 𝑢l , 𝑣r , 𝑢r ∈ 𝑉x,u� . We have four possible constituent edges to
u� / u� u� / u� u� / u� u� / ℎ
consider, namely the edges 𝑣l −−−−→ 𝑣r , 𝑣l −−−−→ 𝑢r , 𝑢l −−−→ 𝑢r , and 𝑢l −−−−→ 𝑣r where the labels
above the arrows are the input/output labels. Note that 𝑎 ≠ 𝑐 and 𝑒 ≠ 𝑔 when 𝑣r ≠ 𝑢r . Also,
the (integer) label of a vertex 𝑣 ∈ 𝑉x,u� will be denoted by ℓ(𝑣).

206
12.D Proof of Lemma 12.6

• Case 𝑣l ≠ 𝑢l and 𝑣r ≠ 𝑢r : All four constituent edges are distinct, and there will be four
edges between the vertices Ψ(𝑣l , 𝑢l ) and Ψ(𝑣r , 𝑢r ) in 𝑇PC
x,2,u� with labels 𝑎+𝑒 / 𝑏+𝑓, 𝑐+𝑔 / 𝑑+ℎ,
𝑔+𝑐 / ℎ+𝑑, and 𝑒+𝑎 / 𝑓+𝑏. Since there are only two distinct labels (𝑎+𝑒 / 𝑏+𝑓 and
𝑐+𝑔 / 𝑑+ℎ are always distinct for a minimal trellis/encoder, details omitted for brevity),
two of the edges can be removed.

• Case 𝑣l ≠ 𝑢l and 𝑣r = 𝑢r : In this case there are only two distinct constituent edges to
consider, and there will be two edges between the vertices Ψ(𝑣l , 𝑢l ) and Ψ(𝑣r , 𝑢r ) in 𝑇PC
x,2,u�
with labels 𝑎+𝑔 / 𝑏+ℎ and 𝑔+𝑎 / ℎ+𝑏 (or 𝑐+𝑒 / 𝑑+𝑓 and 𝑒+𝑐 / 𝑓+𝑑). Since both labels
are the same, one of the edges can be removed.

In summary, for the first two cases above, we get a total of (|u�2x,u�|) ⋅ (2 + 1 + 1) edges in 𝑇PC
x,2,u� ,
|u�x,u� |
since there are ( 2 ) 2-tuples (𝑣l , 𝑢l ) with ℓ(𝑣l ) < ℓ(𝑢l ) and 𝑣l , 𝑢l ∈ 𝑉x,u� , and two possible
values for 𝑣r = 𝑢r in the second case (the label is either 𝑎+𝑔 / 𝑏+ℎ or 𝑐+𝑒 / 𝑑+𝑓).

• Case 𝑣l = 𝑢l and 𝑣r ≠ 𝑢r : In this case there are again only two distinct constituent edges
to consider, and there will be two edges between the vertices Ψ(𝑣l , 𝑢l ) and Ψ(𝑣r , 𝑢r ) in
𝑇PC
x,2,u� with labels 𝑎+𝑐 / 𝑏+𝑑 and 𝑐+𝑎 / 𝑑+𝑏. Since both labels are the same, one of the
edges can be removed.

• Case 𝑣l = 𝑢l and 𝑣r = 𝑢r : In this case there is only one distinct constituent edge to
consider, and there will be a single edge between the vertices Ψ(𝑣l , 𝑢l ) and Ψ(𝑣r , 𝑢r ) in
𝑇PC
x,2,u� with label 𝑎+𝑎 / 𝑏+𝑏 (or 𝑐+𝑐 / 𝑑+𝑑).

In summary, for the last two cases above, we get a total of |𝑉x,u� | ⋅ (1 + 1 + 1) edges in 𝑇PC x,2,u� ,
since there are |𝑉x,u� | 2-tuples (𝑣l , 𝑢l ) with 𝑣l = 𝑢l and 𝑣l , 𝑢l ∈ 𝑉x,u� , and two possible values for
𝑣r = 𝑢r in the fourth case (the label is either 𝑎+𝑎 / 𝑏+𝑏 or 𝑐+𝑐 / 𝑑+𝑑).

In total, there are 4(|u�2x,u�|) + 3|𝑉x,u� | = 2|𝑉x,u� |2 + |𝑉x,u� | edges in 𝑇PC


x,2,u� , which is the desired result.

12.D Proof of Lemma 12.6

At first we show that ℱ̇ Π,Πc ⊆ conic(𝒬̇ Π,Πc ), so let 𝐲 ∈ ℱ̇ Π,Πc . By definition of ℱ̇ Π,Πc , this
implies the existence of some 𝐟 = (𝐟a , 𝐟b , 𝐟c ), 𝐲̂ ∈ ℝ2u�u�
≥0 , and 𝜏 > 0 such that ((𝐲, 𝐲),
̂ 𝐟) ∈ ℱΠ,Πc
x u�
and 𝐟 ∈ 𝒬x for x = a, b, c. We will show that

1 1
(𝐲̃u� , 𝐟u� ) = ( (𝐲, 𝐲),
̂ 𝐟) ∈ 𝒬Π,Πc ,
𝜏 𝜏

from which the claim follows because then 𝐲 = 𝜏𝐲u� is a positive multiple of an element of
𝒬̇ Π,Πc .

Conditions (12.1b) and (12.2b), which hold for ((𝐲, 𝐲), ̂ 𝐟) by definition of ℱΠ,Πc , are invariant
to scaling, so they hold for (𝐲̃u� , 𝐟u� ) as well. Because 𝐟x ∈ 𝒬u� x
x , it also follows that 𝐟u� satisfies
(12.1a) for all x = a, b, c.

207
References

Equation (12.1a) also ensures that the total 𝐟u� -value in the first segment of each trellis 𝑇x equals
1
u� 𝜏 = 1, and because of (12.1b) this must hold for all other trellis segments as well. Since 𝐟u� is
also nonnegative, we can conclude from this that each entry of 𝐟u� lies in [0, 1]. But then also
𝐲̃u� ∈ [0, 1]u�+2u�u� because each 𝑦u�̃ , 𝑗 = 0, … , 𝑁 + 2𝜆𝐾 − 1, is a subset of the total flow through
a single segment and thus upper-bounded by 1, which concludes this part of the proof.
Now, we show that conic(𝒬̇ Π,Πc ) ⊆ ℱ̇ Π,Πc . Let 𝐲 ∈ conic(𝒬̇ Π,Πc ). Since conic(𝒬̇ Π,Πc ) is
the conic hull of the convex set 𝒬̇ Π,Πc , this implies the existence of some 𝜏 > 0 and 𝐲𝒬 ∈
𝒬̇ Π,Πc such that 𝐲 = 𝜏 ⋅ 𝐲𝒬 . To 𝐲𝒬 then there must exist 𝐲̂𝒬 and 𝐟𝒬 such that ((𝐲𝒬 , 𝐲̂𝒬 ), 𝐟𝒬 ) ∈
𝒬Π,Πc , from which immediately it follows that ((𝐲 = 𝜏 ⋅ 𝐲𝒬 , 𝜏𝐲̂𝒬 ), 𝜏𝐟𝒬 ) ∈ ℱΠ,Πc , and thus
𝐲 ∈ ℱ̇ Π,Πc .

Acknowledgment

The authors wish to thank the anonymous reviewers for their valuable comments and sugges-
tions that helped improve the presentation of the paper.

References

[1] C. Berrou, A. Glavieux, and P. Thitimajshima. “Near shannon limit error-correcting


coding and decoding: turbo-codes”. In: IEEE International Conference on Communications.
May 1993, pp. 1064–1070. doi: 10.1109/ICC.1993.397441.
[2] C. Berrou et al. “Improving the distance properties of turbo codes using a third component
code: 3D turbo codes”. IEEE Transactions on Communications 57.9 (Sept. 2009), pp. 2505–
2509. doi: 10.1109/TCOMM.2009.09.070521.
[3] E. Rosnes and A. Graell i Amat. “Performance analysis of 3-d turbo codes”. IEEE Trans-
actions on Information Theory 57.6 (June 2011), pp. 3707–3720. issn: 0018-9448. doi:
10.1109/TIT.2011.2133610.
[4] A. Graell i Amat and E. Rosnes. “Stopping set analysis of 3-dimensional turbo code
ensembles”. In: Proceedings of IEEE International Symposium on Information Theory. Austin,
TX, June 2010, pp. 2013–2017. doi: 10.1109/ISIT.2010.5513362.
[5] C. Koller et al. “Analysis and design of tuned turbo codes”. IEEE Transactions on Informa-
tion Theory 58.7 (July 2012), pp. 4796–4813.
[6] J. Feldman, D. R. Karger, and M. Wainwright. “Linear programming-based decoding
of turbo-like codes and its relation to iterative approaches”. In: Proceedings of the 40th
Annual Allerton Conference on Communication, Control and Computing. Monticello, IL,
2002, pp. 467–477.
[7] J. Feldman. “Decoding error-correcting codes via linear programming”. PhD thesis. Cam-
bridge, MA: Massachusetts Institute of Technology, 2003.

208
References

[8] J. Feldman, M. J. Wainwright, and D. R. Karger. “Using linear programming to decode


binary linear codes”. IEEE Transactions on Information Theory 51.3 (Mar. 2005), pp. 954–
972. doi: 10.1109/TIT.2004.842696. url: www.eecs.berkeley.edu/~wainwrig/Papers/
FelWaiKar05.pdf.
[9] P. O. Vontobel and R. Koetter. Graph-cover decoding and finite-length analysis of message-
passing iterative decoding of LDPC codes. 2005. arXiv: cs/0512078 [cs.IT].
[10] E. Rosnes. “On the connection between finite graph covers, pseudo-codewords, and
linear programming decoding of turbo codes”. In: Proceedings of the 4th International
Symposium on Turbo Codes & Related Topics. Munich, Germany, Apr. 2006, pp. 1–6.
[11] F. R. Kschischang, B. J. Frey, and H.-A. Loeliger. “Factor graphs and the sum-product
algorithm”. IEEE Transactions on Information Theory 47.2 (Feb. 2001), pp. 498–519. doi:
10.1109/18.910572. url: www.comm.utoronto.ca/frank/papers/KFL01.pdf.
[12] M. Chertkov and M. G. Stepanov. “Polytope of correct (linear programming) decoding
and low-weight pseudo-codewords”. In: Proceedings of IEEE International Symposium on
Information Theory. St. Petersburg, Russia, July 2011, pp. 1648–1652. doi: 10.1109/ISIT.
2011.6033824.
[13] M. Chertkov and M. G. Stepanov. “An efficient pseudocodeword search algorithm for
linear programming decoding of LDPC codes”. IEEE Transactions on Information Theory
54.4 (Apr. 2008), pp. 1514–1520. issn: 0018-9448. doi: 10.1109/TIT.2008.917682.
[14] S. Abu-Surra, D. Divsalar, and W. E. Ryan. “Enumerators for protograph-based ensembles
of LDPC and generalized LDPC codes”. IEEE Transactions on Information Theory 57.2
(Feb. 2011), pp. 858–886. doi: 10.1109/TIT.2010.2094819.
[15] D. Divsalar and L. Dolecek. “Graph cover ensembles of non-binary protograph LDPC
codes”. In: Proceedings of IEEE International Symposium on Information Theory. Cambridge,
MA, July 2012, pp. 2526–2530. doi: 10.1109/ISIT.2012.6283972.
[16] D. Divsalar and L. Dolecek. “Ensemble analysis of pseudocodewords of protograph-based
non-binary LDPC codes”. In: Proceedings of the IEEE Information Theory Workshop. Paraty,
Brazil, Oct. 2011, pp. 340–344. doi: 10.1109/ITW.2011.6089475.
[17] N. Boston. “A multivariate weight enumerator for tail-biting trellis pseudocodewords”. In:
Proceedings of the Workshop on Algebra, Combinatorics and Dynamics. Belfast, Northern
Ireland, Aug. 2009.
[18] D. Conti and N. Boston. “Matrix representations of trellises and enumerating trellis pseu-
docodewords”. In: Proceedings of the 49th Annual Allerton Conference on Communication,
Control and Computing. Monticello, IL, Sept. 2011, pp. 1438–1445.
[19] P. O. Vontobel. “Counting in graph covers: a combinatorial characterization of the bethe
entropy function”. IEEE Transactions on Information Theory 59.9 (Sept. 2013), pp. 6018–
6048. doi: 10.1109/TIT.2013.2264715.
[20] Evolved Universal Terrestrial Radio Access (E-UTRA); Multiplexing and Channel Coding
(Release 8). Technical Specification. 3rd Generation Partnership Project; Group Radio
Access Network, Dec. 2008.

209
References

[21] G. D. Forney Jr. “Codes on graphs: normal realizations”. IEEE Transactions on Information
Theory 47.2 (Feb. 2001), pp. 529–548. doi: 10.1109/18.910573.
[22] A. Tanatmis et al. “Valid inequalities for binary linear codes”. In: Proceedings of IEEE
International Symposium on Information Theory. Seoul, Korea, June 2009, pp. 2216–2220.
doi: 10.1109/ISIT.2009.5205846.
[23] E. Rosnes and Ø. Ytrehus. “Turbo decoding on the binary erasure channel: finite-length
analysis and turbo stopping sets”. IEEE Transactions on Information Theory 53.11 (Nov.
2007), pp. 4059–4075. doi: 10.1109/TIT.2007.907496.
[24] M. F. Flanagan. “Exposing pseudoweight layers in regular LDPC code ensembles”. In:
Proceedings of the IEEE Information Theory Workshop. Taormina, Italy, Oct. 2009, pp. 60–64.
doi: 10.1109/ITW.2009.5351212. arXiv: 0908.1298 [cs.IT].
[25] H. D. Pfister and P. H. Siegel. “The serial concatenation of rate-1 codes through uniform
random interleavers”. IEEE Transactions on Information Theory 49.6 (June 2003), pp. 1425–
1438. doi: 10.1109/TIT.2003.811907.
[26] R. Koetter and P. O. Vontobel. “Graph covers and iterative decoding of finite-length
codes”. In: Proceedings of the 3rd International Symposium on Turbo Codes & Related Topics.
Brest, France, Sept. 2003, pp. 75–82.
[27] J. Sun and O. Y. Takeshita. “Interleavers for turbo codes using permutation polynomials
over integer rings”. IEEE Transactions on Information Theory 51.1 (Jan. 2005), pp. 101–119.
doi: 10.1109/TIT.2004.839478.
[28] O. Y. Takeshita. “On maximum contention-free interleavers and permutation polynomials
over integer rings”. IEEE Transactions on Information Theory 52.3 (Mar. 2006), pp. 1249–
1253. doi: 10.1109/TIT.2005.864450.
[29] S. Crozier, P. Guinand, and A. Hunt. “Computing the minimum distance of turbo-codes
using iterative decoding techniques”. In: Proceedings of the 22nd Biennial Symposium on
Communications. Kingston, ON, Canada, May–June 2004, pp. 306–308.
[30] G. L. Nemhauser and L. A. Wolsey. Integer and Combinatorial Optimization. Wiley-
Interscience series in discrete mathematics and optimization, John Wiley & Sons, 1988.
[31] A. Schrijver. Theory of Linear and Integer Programming. John Wiley & Sons, 1986.
[32] R. K. Ahuja, T. L. Magnanti, and J. B. Orlin. Network Flows. Prentice-Hall, 1993.

210
Part III

Closing

211
Chapter 13

Conclusions and Future Work

In this thesis, we have shown that the use of mathematical optimization is able to contribute
substantially to the area of coding theory in such different aspects as fundamental theory,
algorithm development, and even hardware implementation. To a certain extent, this is due
to the formulation of the ML decoding problem as an integer linear program with a useful LP
relaxation by Feldman in 2003 [4]. But even more generally, all sorts of different subdisciplines
of mathematical optimization find their application to the decoding problem. In Paper IV, for
instance, the graphical trellis structure of turbo codes was used in a very fast combinatorial
solution algorithm for the subproblems, while ideas from computational geometry and multi-
criteria optimization provide the “glue” to find the optimal LP solution from a finite number of
those subproblems. More examples for such connections exist that did not play a central role
in this thesis, e.g. the interpretation of a linear code as a matroid that was mentioned in Paper I
(see also [28]) and the recent advances in applying non-linear optimization methods to the LP
decoding problem [38].
The potential of using mathematical optimization approaches for decoding are out of question.
The algorithm developed in Paper VI is, to our knowledge, the fastet available ML decoder
for general linear codes, and the outstanding performance of our turbo LP decoder (Paper IV)
suggests that, when it is used within a branch-and-cut procedure similar to Paper VI, the
same result will appear for the case of turbo and turbo-like codes. Paper III has shown that IP
formulations can even be used, in some cases, to simulate the error-correction performance
of heuristic decoding methods faster than by running these heuristics themselves. Finally,
Paper VII has illustrated that the theory of linear programming decoding, together with the
related concepts of LP pseudocodewords and graph covers, is able to evaluate individual codes
as well as complete ensembles with respect to their pseudoweight spectrum that determines
the decoding behavior of both the LP and belief-propagation decoders.
However, the “transfer of ideas” that is present in the contributions of this thesis is not simply
one-directional, going from optimization to coding problems. The findings in Paper IV, while
being driven by the specific application of LP decoding of turbo codes, are to a large extent
independent of the concrete problem setup and thus applicable to a whole class of combinatorial
optimization problems with additional complicating equality constraints. The question how
this approach relates and compares to other methods is of great interest for future research—in
fact, after publication of Paper IV we have discovered a close relationship to the Dantzig-Wolfe
reformulation of the problem (see [39]) that has to be further examined.

213
Chapter 13 Conclusions and Future Work

Besides that, however, there are several other natural directions in which to continue the studies
that were begun in this thesis. The application of the branch-and-cut ML decoder to the case of
turbo codes was already mentioned above. Another question includes if and how non-linear
LP decoders, in particular the ADMM method [38], can be employed as a high-performance
subroutine within the branch-and-cut procedure. The work in Paper V entails an entire new
discipline on hardware implementation of optimization-based decoding algorithms, and finally
it is an interesting open question how to generalize the algorithms we developed to the cases
of non-binary and / or higher-order modulation that were examined in Paper II.

214
Bibliography for Parts I and III

[1] D. J. C. MacKay. “Good error-correcting codes based on very sparse matrices”. IEEE
Transactions on Information Theory 45.2 (Mar. 1999), pp. 399–431. doi: 10.1109/18.748992.
[2] C. Berrou and A. Glavieux. “Near optimum error correcting coding and decoding: turbo-
codes”. IEEE Transactions on Communications 44.10 (Oct. 1996), pp. 1261–1271. issn:
0090-6778. doi: 10.1109/26.539767.
[3] J. Feldman, D. R. Karger, and M. Wainwright. “Linear programming-based decoding
of turbo-like codes and its relation to iterative approaches”. In: Proceedings of the 40th
Annual Allerton Conference on Communication, Control and Computing. Monticello, IL,
2002, pp. 467–477.
[4] J. Feldman. “Decoding error-correcting codes via linear programming”. PhD thesis. Cam-
bridge, MA: Massachusetts Institute of Technology, 2003.
[5] J. Feldman, M. J. Wainwright, and D. R. Karger. “Using linear programming to decode
binary linear codes”. IEEE Transactions on Information Theory 51.3 (Mar. 2005), pp. 954–
972. doi: 10.1109/TIT.2004.842696. url: www.eecs.berkeley.edu/~wainwrig/Papers/
FelWaiKar05.pdf.
[6] G. B. Dantzig. Linear Programming and Extensions. Princeton University Press, Aug. 1963.
[7] A. Schrijver. Theory of Linear and Integer Programming. John Wiley & Sons, 1986.
[8] U. Faigle, W. Kern, and G. Still. Algorithmic Principles of Mathematical Programming.
Vol. 24. Kluwer Academic Publishers, 2010.
[9] G. L. Nemhauser and L. A. Wolsey. Integer and Combinatorial Optimization. Wiley-
Interscience series in discrete mathematics and optimization, John Wiley & Sons, 1988.
[10] V. Klee and G. J. Minty. “How good is the simplex algorithm?” In: Proc. Inequalities III.
Los Angeles, CA: Academic Press, Sept. 1972, pp. 159–175.
[11] C. E. Shannon. “A mathematical theory of communication”. The Bell System Technical
Journal 27 (July 1948), pp. 379–423, 623–656.
[12] D. J. C. MacKay. Information Theory, Inference, and Learning Algorithms. Cambridge
University Press, 2003. url: http://www.inference.phy.cam.ac.uk/itprnn/book.html.
[13] R. G. Gallager. Information Theory and Reliable Communication. John Wiley & Sons, 1968.
[14] F. J. MacWilliams and N. J. A. Sloane. The Theory of Error-Correcting Codes. Vol. 16.
North-Holland, 1977.
[15] T. J. Richardson and R. L. Urbanke. Modern Coding Theory. Cambridge University Press,
2008. isbn: 978-0-521-85229-6.

215
Bibliography for Parts I and III

[16] D. Costello Jr. and G. D. Forney Jr. “Channel coding: the road to channel capacity”.
Proceedings of the IEEE 95.6 (June 2007), pp. 1150–1177. doi: 10.1109/JPROC.2007.895188.
[17] E. Berlekamp, R. J. McEliece, and H. C. A. van Tilborg. “On the inherent intractability
of certain coding problems”. IEEE Transactions on Information Theory 24.3 (May 1978),
pp. 954–972. doi: 10.1109/TIT.1978.1055873.
[18] A. Vardy. “The intractability of computing the minimum distance of a code”. IEEE Trans-
actions on Information Theory 43.6 (Nov. 1997), pp. 1757–1766. doi: 10.1109/18.641542.
[19] F. R. Kschischang, B. J. Frey, and H.-A. Loeliger. “Factor graphs and the sum-product
algorithm”. IEEE Transactions on Information Theory 47.2 (Feb. 2001), pp. 498–519. doi:
10.1109/18.910572. url: www.comm.utoronto.ca/frank/papers/KFL01.pdf.
[20] P. O. Vontobel and R. Koetter. Graph-cover decoding and finite-length analysis of message-
passing iterative decoding of LDPC codes. 2005. arXiv: cs/0512078 [cs.IT].
[21] M. Flanagan et al. “Linear-programming decoding of nonbinary linear codes”. IEEE
Transactions on Information Theory 55.9 (Sept. 2009), pp. 4134–4154. issn: 0018-9448. doi:
10.1109/TIT.2009.2025571. arXiv: 0804.4384 [cs.IT].
[22] M. Breitbach et al. “Soft-decision decoding of linear block codes as optimization problem”.
European Transactions on Telecommunications 9 (1998), pp. 289–293.
[23] A. Tanatmis et al. “A separation algorithm for improved LP-decoding of linear block
codes”. IEEE Transactions on Information Theory 56.7 (July 2010), pp. 3277–3289. issn:
0018-9448. doi: 10.1109/TIT.2010.2048489.
[24] M. Punekar et al. “Calculating the minimum distance of linear block codes via integer
programming”. In: Proceedings of the International Symposium on Turbo Codes and Iterative
Information Processing. Brest, France, Sept. 2010, pp. 329–333. doi: 10.1109/ISTC.2010.
5613894.
[25] A. B. Keha and T. M. Duman. “Minimum distance computation of LDPC codes using
a branch and cut algorithm”. IEEE Transactions on Communications 58.4 (Apr. 2010),
pp. 1072–1079. doi: 10.1109/TCOMM.2010.04.090164.
[26] A. Tanatmis et al. “Valid inequalities for binary linear codes”. In: Proceedings of IEEE
International Symposium on Information Theory. Seoul, Korea, June 2009, pp. 2216–2220.
doi: 10.1109/ISIT.2009.5205846.
[27] A. Tanatmis et al. “Numerical comparison of IP formulations as ML decoders”. In: IEEE
International Conference on Communications. Cape Town, South Africa, May 2010, pp. 1–5.
doi: 10.1109/ICC.2010.5502303.
[28] N. Kashyap. “A decomposition theory for binary linear codes”. IEEE Transactions on
Information Theory 54.7 (July 2008), pp. 3035–3058. doi: 10.1109/TIT.2008.924700. url:
http://www.ece.iisc.ernet.in/~nkashyap/Papers/code_decomp_final.pdf.
[29] R. G. Jeroslow. “On defining sets of vertices of the hypercube by linear inequalities”.
Discrete Mathematics 11.2 (1975), pp. 119–124. issn: 0012-365X. doi: 10 . 1016 / 0012 -
365X(75)90003-5.

216
Bibliography for Parts I and III

[30] R. G. Gallager. “Low-density parity-check codes”. PhD thesis. Cambridge, MA: Mas-
sachusetts Institute of Technology, Sept. 1960.
[31] M. H. Taghavi and P. H. Siegel. “Adaptive methods for linear programming decoding”.
IEEE Transactions on Information Theory 54.12 (Dec. 2008), pp. 5396–5410. doi: 10.1109/
TIT.2008.2006384. arXiv: cs/0703123 [cs.IT].
[32] M. H. Taghavi, A. Shokrollahi, and P. H. Siegel. “Efficient implementation of linear
programming decoding”. IEEE Transactions on Information Theory 57.9 (Sept. 2011),
pp. 5960–5982. doi: 10.1109/TIT.2011.2161920. arXiv: 0902.0657 [cs.IT].
[33] X. Zhang and P. H. Siegel. “Adaptive cut generation algorithm for improved linear
programming decoding of binary linear codes”. IEEE Transactions on Information Theory
58.10 (Oct. 2012), pp. 6581–6594. doi: 10 . 1109 / TIT . 2012 . 2204955. arXiv: 1105 . 0703
[cs.IT].
[34] G. D. Forney Jr. et al. “On the effective weights of pseudocodewords for codes defined
on graphs with cycles”. In: Codes, systems, and graphical models. Ed. by B. Marcus and
J. Rosenthal. Vol. 123. The IMA Volumes in Mathematics and its Applications. Springer
Verlag, New York, Inc., 2001, pp. 101–112.
[35] R. Koetter and P. O. Vontobel. “Graph covers and iterative decoding of finite-length
codes”. In: Proceedings of the 3rd International Symposium on Turbo Codes & Related Topics.
Brest, France, Sept. 2003, pp. 75–82.
[36] M. Chertkov and M. G. Stepanov. “Polytope of correct (linear programming) decoding
and low-weight pseudo-codewords”. In: Proceedings of IEEE International Symposium on
Information Theory. St. Petersburg, Russia, July 2011, pp. 1648–1652. doi: 10.1109/ISIT.
2011.6033824.
[37] A. I. Ali, J. Kennington, and B. Shetty. “The equal flow problem”. European Journal
of Operational Research 36.1 (1988), pp. 107–115. issn: 0377-2217. doi: 10.1016/0377-
2217(88)90012-4.
[38] S. Barman et al. “Decomposition methods for large scale lp decoding”. IEEE Transactions
on Information Theory 59.12 (Dec. 2013), pp. 7870–7886. doi: 10.1109/TIT.2013.2281372.
[39] G. B. Dantzig and P. Wolfe. “Decomposition principle for linear programs”. Operations
Research 8.1 (1960), pp. 101–111. issn: 0030364X. url: http://www.jstor.org/stable/167547.

217
Lebenslauf
Michael Helmling
geboren am 19. Februar 1986 in Kaiserslautern

03/2005 Allgemeine Hochschulreife (Abitur), Albert-Schweitzer-Gymnasium,


Kaiserslautern.

05/2005–01/2006 Zivildienst, Westpfalz-Klinikum, Kaiserslautern.

2005 Früheinstieg ins Physikstudium, Technische Universität


Kaiserslautern.
Teilnahme am Fernstudienprogramm.

04/2006–01/2011 Mathematikstudium mit Anwendungsfach Informatik, TU


Kaiserslautern

10/2007–09/2008 Parallelstudium Informatik, TU Kaiserslautern.


Erfolgreiche Ablegung von vier Prüfungen (ohne Abschluss).

01/2011 Diplom in Mathematik mit Anwendungsfach Informatik, TU


Kaiserslautern.
Titel der Diplomarbeit: Band Matrix Constrained Optimization Problems with
Applications to Coding Theory.

03/2011–02/2013 Promotionsstudium in Mathematik, TU Kaiserslautern.


Mit einem Stipendium der Landesforschungsinitiative (CM)² – Center for
Mathematical and Computational Modelling.

seit 03/2013 Wissenschaftlicher Mitarbeiter, Mathematisches Institut der


Universität Koblenz-Landau, Campus Koblenz.
Dabei Fortsetzung der in Kaiserslautern begonnenen Promotion.

219

You might also like