0% found this document useful (0 votes)

12 views62 pages

TEOI-Capacity of Discrete Channels

Course on Capacity of discrete channels

Uploaded by

Carlos Hurtado

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views62 pages

TEOI-Capacity of Discrete Channels

Course on Capacity of discrete channels

Uploaded by

Carlos Hurtado

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 62

Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

Information Theory
Degree in Data Science and Engineering
Lesson 5: Capacity of discrete channels

Jordi Quer, Josep Vidal

Mathematics Department, Signal Theory and Communications Department

{jordi.quer, josep.vidal}@upc.edu

2019/20 - Q1

1/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

Definition of communication

Communication between two points A and B is a procedure whereby physical

acts in A induce a desired state in B.

Succesful communication: when A and B agree on what was sent, in spite of

the noise and imperfections of the signalling process that might induce errors.

Data communication systems are strongly related to the transmission medium

and the design of the transceiver...

2/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

Some real communication channels

Light on optical fiber Acoustic medium Electric signals on

copper cable

Electromagnetic waves Wireless optical Data storage devices

What is the origin of errors? Thermal noise in all electronic devices, and also:
dust or scratches in HDD and DVD, solar wind in satellite comms, nearby
transmissions in cellular communications, cosmic rays in solid state memories,
propellers and biological noise in underwater comms, black body radiation,... 3/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

Goals

We want to ellucidate...

How to design distinguishable codewords in B so that we can reconstruct

the original sequence sent in A with arbitrary low probability of error?
Channel capacity theorem
At which rate can these codewords transmit information?
Channel capacity theorem
Can feedback from the receiver improve the transmission rate?
Feedback capacity theorem
Is it efficient to separately design source and channel encoders?
Joint source-channel coding theorem

4/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

Formal definitions

Discrete noisy channel. It consists of an input alphabet X , an output

alphabet Y and a set of transition probabilities p(y|x) accounting for the
probability of observing the output symbol y when x was sent. Therefore,
different input sequences may give rise to the same output sequence.
Memoryless channel. The current output yi only depends on the input xi ,
and is independent of past inputs and past outputs:

p(yi |xi , xi−1 , . . . , x1 , yi−1 , . . . , y1 ) = p(yi |xi )

Message. It is a random variable W taking values from a set

{1, 2, . . . , M }. The encoder selects a codeword X n (W ) which is received
as Y n . The decoder then guesses the index W by an appropriate
decoding rule Ŵ = g(Y n ). The receiver decides an error if Ŵ 6= W .

5/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

Formal definitions

If xn ∈ X n is the transmitted codeword, the probability of receiving

y n ∈ Y n is

Qn
p (y n |xn ) = i=1 p(yi |xi )

The conditional symbol error probability given that index i was sent is
X
λi = P r (g(Y n ) 6= i|xn (i)) = p (y n |xn (i)) I (g(y n ) 6= i)
y n ∈Y n

6/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

Formal definitions

The maximal probability of error λ(n) for an (M, n) code is

λ(n) = max λi
i∈1,2,...,M

(n)
The average probability of error Pe for an (M, n) code is
M
1 X
Pe(n) = λi
M i=1

log M
The rate R of an (M, n) code is R = n
bits per transmission.
A rate R is said to be achievable if there exists a sequence of d2nR e, n

codes such that λ(n) → 0 as n → ∞.

7/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

Range of application

The models used in the sequel do not include:

Discrete-time continous amplitude input/output channels

Channels with memory (i.e. frequency selective channels, where outputs
depend on previous inputs)
Multi-user transmissions (e.g. multiple-access channels, broadcast
channels, interference channels)

But the model is useful for transmission schemes where modulator and
demodulator are part of the channel:

8/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

Channel capacity

The capacity of the channel is defined as

C = max I(X; Y )
p(x)

where the maximum is taken over all input distributions p(x).

It is measured in bits/transmission, or bits/channel use.

The following duality should be noted:

Data compression - use source encoder to remove redundancy and

compress.
Data transmission - use channel encoder to add redundancy and combat
channel errors.

9/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

Examples of discrete channels

Noiseless channel
p(yi |xj ) = δi,j

I(X; Y ) = H(X) − H(X|Y ) = H(p) − 0

C = max I(X; Y ) = 1 bit/tr
p(x)

Noisy channel with non-overlapping outputs

The channel is random, but the inputs can be

reconstructed from the outputs without errors, so
H(X|Y ) = 0 and

C = max I(X; Y ) = 1 bit/tr

p(x)

10/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

Examples of discrete channels

Noisy typewritter
We can compute capacity as:

1

 2
if x = A
1
p(x|y = B) = 2
if x = B
0 otherwise

Z
X
H(X|Y ) = p(y)H(X|Y = y) = 1 bit
y=A

C = max I(X; Y ) = max (H(X) − 1)

p(x) p(x)

= log 26 − 1 = log 13 bits/tr

Note that if a uniform distribution is used for X,

we have a uniform distribution for Y .
The result shows that using only the alternate
input symbols, we can transmit 13 symbols without
errors, so C = log 13.

11/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

Examples of discrete channels

Noisy typewritter (cont.)

Non-confusable set of inputs for a three-outputs noisy typewritter channel:

For large block lengths, every channel looks like the typewritter: any input is
likely to produce a channel output in a small subset of the output alphabet.
Then, capacity is obtained from the non-confusable subset of inputs that
produce disjoint output sequences (will be discussed later in this lesson in the
noisy-channel capacity theorem).
12/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

Examples of discrete channels

Binary symmetric channel

1−p if x = y
p(x|y) =
p 6 y
if x =

I(X; Y ) = H(Y ) − H(Y |X)

X
= H(Y ) − p(x)H(Y |X = x)
x∈X

= H(Y ) + p log p + (1 − p) log(1 − p)

= H(Y ) − H(p)

The maximum is achieved when p(x = 1) = 12 , then p(y = 1) = 12 , so

C = 1 − H(p) bits/tr
Note that the rate at which we can transmit information is not (1 − p) bits per
channel use, since the receiver does not know when an error occurs. In fact, if
p = 21 , we cannot transmit any information at all!

13/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

Examples of discrete channels

Binary erasure channel

Some bits are lost, rather than corrupted:


 1 − α if y = 0
p(y|x = 0) = α if y = e
0 if y = 1


 0 if y = 0
p(y|x = 1) = α if y = e
1 − α if y = 1


C = max I(X; Y ) = max (H(Y ) − H(Y |X)) = max H(Y ) − H(α)

p(x) p(x) p(x)

Let us compute
X X
H(Y ) = − p(y) log p(y) where p(y) = p(y|x)p(x)
y∈Y x∈X
and maximize it w.r.t. p(x).

14/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

Examples of discrete channels

Binary erasure channel (cont.)

By computing the probability of the output it is clear that, for an arbitrary

value of α, we cannot make Y symbols equiprobable, and H(Y ) < log 3:

H(Y ) = H(α) + (1 − α)H(π)

where π = p(x = 1). Therefore,

C = max(1 − α)H(π)
π

so the maximum is achieved at π = 21 , and C = (1 − α) bits/tr.

The intuition for the expression is the following: since a fraction α of symbols
is lost, we can only transmit a fraction (1 − α).

15/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

Examples of discrete channels

Symmetric channel

Weakly symmetric channel: in matrix Q, that contains the transition

probabilites, rows are permutations of other rows, and all the column sums are
equal.
Symmetric channel: in matrix Q, rows are permutations of other rows, and
columns are permutations of other columns.

Examples:

Y = X + Z mod c

Weakly symmetric channel Non-symmetric channel Symmetric channel

0.6 0.2 0.2
X , Z ∈ {0, 1, ..., c − 1}
1/3 1/6 1/2 Q=
Q= 0.1 0.2 0.7 X and Z are independent
1/3 1/2 1/6
p(z) is arbitrary
Determine Q as an exercise
16/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

Examples of discrete channels

Symmetric channel (cont.)

Both for symmetric and weakly symmetric channels:
X X
I(X; Y ) = H(Y ) − H(Y |X) = H(Y ) + p(x) p(y|x) log p(y|x)
x∈X y∈Y

= H(Y ) − H(q)

where q is a row of Q. If p(x) is uniform,

X 1 X c(y)
p(y) = p(y|x)p(x) = p(y|x) =
x∈X
|X | x∈X |X |

where c(y) is the sum of elements in the y-th column of the transition matrix
and it is constant for both symmetric and quasi-symmetric channels. Therefore
the capacity is
C = max I(X; Y ) = log |Y| − H(q)
p(x)

17/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

Examples of discrete channels

Pattern recognition
Consider the problem of recognizing handwritten digits. In this case the input
to the channel is a decimal digit X ∈ X = {0, 1, 2, . . . , 9}. What comes out is
a pattern of ink on a paper that can be represented as a vector y.

If the ink pattern is digitized to 16 × 16 binary

pixels, the output of the channel is a vector
random variable Y ∈ {0, 1}256 (unlike previous
examples, where one scalar input produced one
scalar output).
One strategy for pattern recognition (that is,
decoding) is to build a model for p(y|x) and
use it to infer X given Y using Bayes’ theorem:
p(y|x)p(x)
x̂ = argmax p(x|y) = argmax
x x p(y)

18/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

Examples of discrete channels

Natural evolution
Natural evolution can be considered as a channel that models how information
about the environment is transferred to the genome.

19/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

Properties of channel capacity

C = max I(X; Y )
p(x)

1 C ≥ 0 since I(X; Y ) ≥ 0.
2 C ≤ log |X | since C = maxp(x) I(X; Y ) ≤ maxp(x) H(X) = log |X |.
3 C ≤ log |Y| for the same reason.
4 I(X; Y ) is a continuous function of p(x).
5 I(X; Y ) is a concave function of p(x).
6 Reminder of relations between information and entropy in graphic form:

Rationale: The mutual information is the uncertainty at the channel input

minus the remaining uncertainty when the channel output is observed.
20/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

Example: A code for reliable storage

A magnetic hard disk drive (HDD) records data by magnetizing a thin film of
ferromagnetic material in flat circular disks. Bits are stored by changing the
direction of magnetization through a magnetic coil head. A reading head is
used to detect the magnetization of the material underneath.

Hard disk drive Magnetic recoding read/write head

Watch this video on how an HDD works.

21/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

Example: A code for reliable storage

The reading/writing process can be modeled as a transmission of a sequence of
bits through a BSC:

Assume that the crossover probability of the BSC is p = 0.1.

If we read/write 1 Gbyte of data per day for 10 years, the number of bits
sent through the channel is 2, 92 × 1013 .
A useful HDD should not deliver any erroneous bit in its entire life.
This requires a target bit error probability on the order of 10−15 , or even
smaller.

An error-correcting code (channel encoder/decoder) is needed.

22/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

Example: A code for reliable storage

Let us try to reduce the probability of error in the BSC using a repetition code
RN , for N = 3 parallel disks:
1 0 1 1 0 0 . . . → 111 000 111 111 000 000 . . .

The transmission rate is R = 1/N . 23/62

Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

Example: A code for reliable storage

The optimum decoding strategy is deciding the transmitted bit by majority

voting among the N received symbols. Then, the probability of error is:
N
X N N
PeN = pn (1 − p)N −n ≈ (4p(1 − p)) 2
n
n=(N +1)/2

For PeN = 10−15 , N must be at least 68: we need so many parallel disks to
achieve the target bit error probability!!

Clearly, better codes are needed.

24/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

Example: A code for reliable storage

Plot PeN vs bitrate for p = 0.1 and different codes

What is the tradeoff between redundancy and error probability?

Is it possible to transmit at R > 0 with PeN = 0?
25/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

Better channel codes and a teaser

Before Shannon announced his theorem, it was believed that zero-error

communication implied zero-rate transmission.

Shannon stated the tradeoff between

Pe and R: there is a non-zero rate
R ≤ C at which we can transmit
information with Pe = 0.

The theorem proves that every channel

behaves like a typewritter channel: a
subset of inputs produce disjoint (and
hence non-confusable) sequences at the
output.

We need to evaluate how many of these

sequences are possible and how to decode
them.

26/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

Jointly typical sequences

Definition
(n)
The set A of jointly typical sequences (xn , y n ) is the set of n-sequences
with empirical entropies -close to the true ones, that is

A(n)
={(xn , y n ) ∈ X n × Y n :
1
− log p(xn ) − H(X) < ,
n
1
− log p(y n ) − H(Y ) < ,
n
1
− log p(xn , y n ) − H(X, Y ) < }
n
Qn
where p(xn , y n ) = i=1 p(xi , yi )

27/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

Example
Let us evaluate the joint typicality of two sequences n = 100:
xn (with p(x = 0) = 0.9) was transmitted at the input of a binary symmetric
channel (with p(y|x) = 0.2), and y n was received:

From the probabilities above, we can easily compute

1 p(x, y) 0 1
X 0.74 if y = 1
p(y) = p(y|x)p(x) = 0 0.72 0.02
0.26 if y = 0
x=0
1 0.18 0.08

and check that

H(Y |X) = 1.8140, H(X, Y ) = H(X) + H(Y |X) = 2.2830

which exactly matches the empirical entropies computed on the sequences xn

and y n , so sequences are jointly typical. Now flip the last bit of y n . To which
tolerance both sequences are now jointly typical?
28/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

Properties of the joint typical sequences

Theorem (5.1)

(n)
Pr (X n , Y n ) ∈ A > 1 − as n → ∞

Theorem (5.2)
(n)
For sufficiently large n, (1 − )2n(H(X,Y )−) ≤ A ≤ 2n(H(X,Y )+)

Theorem (5.3)
If (X̃ n , Ỹ n ) ∼ p(xn )p(y n ) (they are independent with the same marginals as
X n and Y n ), then the probability of being jointly typical is upper bounded by

Pr (X̃ n , Ỹ n ) ∈ A(n) ≤ 2−n(I(X;Y )−3)

and for sufficiently large n, it is lower bounded by

Pr (X̃ n , Ỹ n ) ∈ A(n) ≥ (1 − )2−n(I(X;Y )+3)

Proofs. See annex 1. 29/62

Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

The size of the jointly typical set

The outer box represents all conceivable
input/output pairs of sequences. Each dot
represents a jointly-typical pair of
sequences, whose total number is
2nH(X,Y ) (they can be computed offline,
using exhaustive search).
Note that for very large block lengths,
every channel looks like the typewritter:
any input is very likely to produce a
channel output in a small subset of the
output alphabet.
Transmitter operation: use a
non-confusable subset of inputs that
produce disjoint output sequences. This is
the key idea behind the channel coding
theorem. How can they be used?
Receiver operation: associate an observed
channel output y n to the i-th message, if
the transmitted codeword xn (i) is jointly
typical with y n .
30/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

Channel coding theorem

Theorem (C. E. Shannon, 1948)

1 Achievability. In every discrete memoryless channel, for R ≤ C and
n → ∞, there exist a code (2nR , n) and a decoding algorithm for which
the maximum probability of error is λ(n) → 0.
2 Converse. For any λ(n) → 0, transmission rates greater than C are not
achievable.

Proof. See annex 2 for a separate proof of the achievability and converse parts.

31/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

Communication above capacity

Which rates are achievable beyond capacity if some error can be accepted?

Theorem (Rate-distorsion)
For a channel of capacity C, transmission rates up to
C
R=
1 − H(Pe )
can be achieved at a probability of bit error Pe .

Proof. See annex 3.

32/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

Communication above capacity

Bit error probability versus transmission rate for a channel

of capacity 1 bit/transmission (the non-achievability region is not proved).

33/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

Capacity with feedback

Let us assume that the receiver can send back immediately and noiselessly the
received symbols to the transmitter, which then can decide what to do next.
Can feedback from the receiver increase the channel capacity?

Let us define a (2nR , n) feedback code as a sequence of mappings xi (w, yi−1 )

where the codeword is selected according to the input symbol and the past
received symbols.

Theorem (Feedback capacity)

CF B = C = max I(X; Y )
p(x)

Proof. See annex 4.

34/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

Example

Feedback cannot provide higher rates, but it helps in simplifying encoding and
decoding in practical systems.

As an example, let us use feedback in the binary erasure channel: transmit the
same bit through the channel until an erasure does not occur and the
information bit is received correctly.

Can you compute the average number of uses of the channel it takes to
transmit an information bit through this channel?

35/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

Joint source-channel coding

Now we shall combine the fundamental results of channel coding Rc ≤ C and
source coding Rs ≥ H: is the condition H ≤ C necessary and sufficient?
If so, we can separately design a source encoder that encodes to
H bits/symbol and a channel encoder that encodes to C bits/tr.

Theorem (Source-channel coding)

Let V = V1 , V2 , ..., Vn ∈ V n be a finite alphabet ergodic process that satisfies
the AEP. Then
1 Achievability. If H(V) ≤ C there exist a source-channel code with zero
probability of error: Pr (v̂ n 6= v n ) → 0.
2 Converse. If H(V) > C the probability of error is bounded away from
zero.

Proof. See annex 5. 36/62

Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

Conclusions

The data compression theorem is based on the AEP: there exists a

“small” subset (of size 2nH ) of all possible source sequences that contain
most of the probability and hence we can represent the source with a
small probability of error using H bits/symbol.
The channel coding theorem is based on the joint AEP: for long block
lengths the output sequence of the channel is very likely to be jointly
typical with the input codeword, while any other input codeword is jointly
typical with the observed output with a small probability ' 2−nI . Hence,
we can use about 2nI codewords and still have negligible probability of
error for large n.
The source–channel separation theorem shows that we can design the
source code and the channel code separately and use them together to
achieve optimal performance as long as H ≤ C.

37/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

Way through...

Chapter 6 introduces practical channel codes that achieve low probability

of error and aproach capacity.
Other relevant aspects of capacity not considered in this course are:
- How the probability of error decreases as a function of n (i.e. error
exponents, sphere packing bounds)
- Optimization with respect to p(x) (e.g. using Blahut-Arimoto
algorithm)

38/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

Annex 1: Proof of theorem 5.1

Proof. This proves that with high probability the sequences (X n , Y n ) of length
n are jointly typical. By the weak law of large numbers
1
− log p(X n ) → −E [log p(X)] = H(X)
n
Hence, given > 0, there exist n1 , n2 , n3 such that for all n > n1 , n > n2 ,
n > n3
1
Pr − log p(X n ) − H(X) ≥ <
n 3

1
Pr − log p(Y n ) − H(Y ) ≥ <
n 3

1
Pr − log p(X n , Y n ) − H(X, Y ) ≥ <
n 3
respectively. Then, by choosing n > max(n1 , n2 , n3 ), and using
Pr(A ∩ B) = Pr(A) + Pr(B) − Pr(A ∪ B) ≤ Pr(A) + Pr(B), the probability of
the intersection of the three sets must be less than . Hence, for n sufficiently
(n)
large, the probability of the set A is greater than 1 − .

39/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

Annex 1: Proof of theorem 5.2

Proof. For the upper bound,

X X
1= p(xn , y n ) ≥ p(xn , y n ) ≥ 2−n(H(X,Y )+) A(n)

(xn ,y n )∈{X n ,Y n } (n)
(xn ,y n )∈A

For the lower bound, take theorem 5.1, and if n is sufficiently large
X
1 − < Pr A(n) ≤ 2−n(H(X,Y )−) = 2−n(H(X,Y )−) A(n)
(n)
(xn ,y n )∈A

The set may be small, its size depends on the joint entropy of X and Y .

40/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

Annex 1: Proof of theorem 5.3

Proof. Under the conditions stated in the theorem, and using theorem 5.2
X
Pr (X̃ n , Ỹ n ) ∈ An
= p(xn )p(y n )
(n)
(xn ,y n )∈A
n(H(X,Y )+) −n(H(X)−) −n(H(Y )−)
≤2 2 2
−n(I(X;Y )−3)
=2

Similarly we can also prove using theorem 5.2

X
Pr (X̃ n , Ỹ n ) ∈ An
= p(xn )p(y n )
(n)
(xn ,y n )∈A
n(H(X,Y )−) −n(H(X)+) −n(H(Y )+)
≥ (1 − )2 2 2
−n(I(X;Y )+3)
= (1 − )2

41/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

Annex 2: Channel coding theorem. Achievability part

Proof of achievability. It is completed in two steps:

Can we define a suitable encoding/decoding scheme?
i. A random code of 2nRQsequences of length n is generated according to some
pdf, such that p(xn ) = n
i=1 p(xi ), so the codebook can be described as

xn (1)
   
x1 (1) x2 (1) ... xn (1)
.. .. .. ..
C= = ,
   
. . . .
n nR nR nR nR
x (2 ) x1 (2 ) x2 (2 ) . . . xn (2 )

and the probablity of the code is

nR
2Y n
Y
Pr(C) = p(xi (w)).
w=1 i=1

ii. The random code used is revealed to sender and receiver, who also knows
the channel transition matrix p(y|x).

42/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

Annex 2: Channel coding theorem. Achievability part

iii. A message W is selected according to a uniform distribution

P r(W = w) = 2−nR , w = 1, 2, . . . , 2nR .

iv. The w-th codeword xn (w) is sent over the channel.

v. The receiver observes a sequence y n according to the distribution

n
Y
p(y n |xn (w)) = p(yi |xi (w)).
i=1

vi. The receiver guesses which message was sent according to the joint
typicallity criterion. ŵ is declared to have been sent if the following conditions
are satisfied:
- (xn (ŵ), y n ) is jointly typical, and
- no other index w0 6= ŵ such that (xn (w0 ), y n ) ∈ An

otherwise, an error is declared.

vii. There is an error E if ŵ 6= w.

43/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

Annex 2: Channel coding theorem. Achievability part

Example of typical set decoding for a 4-symbols codebook: yan is not jointly
typical with any codeword, ybn is jointly typical with xn (3), ycn is jointly typical
with xn (4), ydn is jointly typical with more than one codeword.
44/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

ii. Let us evaluate the probability of error for

n w = 1 and it will be
o seen that it
n n (n)
does not depend on w. Let us define Ei = (x (i), y ) ∈ A for
i ∈ 1, 2, . . . , 2nR . An error occurs if either:

n o
(n)
- E1c = (xn (1), y n ) ∈
/ A , or
- E2 ∪ E3 ∪ · · · ∪ E2nR occurs (a wrong codeword is jointly typical)

Therefore, we can write, using the union bound...

45/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

Annex 2: Channel coding theorem. Achievability part

Pr(E|w = 1) = Pr(E1c ∪ E2 ∪ E3 ∪ · · · ∪ E2nR |w = 1)

nR
2
X
≤ Pr(E1c |w = 1) + Pr(Ei |w = 1).
i=2

iii. By the joint AEP, Pr(E1c |w = 1) ≤ for n → ∞. By the code generation

process, xn (1) and xn (i) are independent for i 6= 1 and so are y n and xn (i),
and hence by theorem 5.3 of the joint AEP:
nR
2
X
Pr(E) = Pr(E|w = 1) ≤ + 2−n(I(X,Y )−3)
i=2

= + (2nR − 1)2−n(I(X,Y )−3)

≤ + 23n 2−n(I(X,Y )−R)

A key point: we can make the last term in red lower than by increasing n,
provided that R < I(X, Y ) − 3, and then Pr(E) ≤ 2.

46/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

Annex 2: Channel coding theorem. Achievability part

iv. For transmission we have to select one among all possible random codes.
We can strengthen the conclusion about Pr(E) by wisely selecting the code:
a Choosing p(x) such that I(X, Y ) is maximal, and hence R < C.
b Get rid of the average over codes and keep just the good one. Since
averaging over all codes gives Pr(E) ≤ 2 there must be one code (found
by exhaustive search over all (2nR , n) codes) such that
nR
2
1 X
Pr(E|C ∗ ) = λi (C ∗ ) ≤ 2.
2nR i=1

c Throw away the worst half of codewords (so as we can minimize the
maximal probability of error λmax (C ∗ ), not only the average):
" #
∗ 1 1 X ∗ 1 X ∗
Pr(E|C ) = λi (C ) + nR−1 λi (C ) ≤ 2
2 2nR−1 worst i 2
best i

where the term in red is loosy upper bounded by 4.

47/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

Annex 2: Channel coding theorem. Achievability part

A typical random code (left), where a small fraction of the codewords are sufficiently
close to each other that the probability of error when either codeword is transmitted is
not tiny. We obtain a new code by deleting all these confusable codewords (right).
The resulting code has less codewords, so has a lower rate, and its maximal probability
of error is greatly reduced.

48/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

Annex 2: Channel coding theorem. Achievability part

The number of codewords has changed to 2nR−1 and therefore the rate is
1 1 1
log(2nR−1 ) = (nR − 1) = R −
n n n
In short: we have been able to turn a noisy channel into a noiseless channel, as
long as the transmission rate is below the capacity, just by constructing a code
of rate
1
R0 = R − ,
n
whose maximal probability of error is λ(n) ≤ 4.

49/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

Annex 2: Channel coding theorem. Converse part

Proof of converse. Assume we have a code (2nR , n) with λ(n) → 0 (this
(n)
implies Pe → 0), an encoding rule X n (.) and a decoding rule Ŵ = g(Y n ) so
we can construct the Markov chain

W → X n (W ) → Y n → Ŵ .
(n) 1
P
If W has a uniform distribution Pr(Ŵ 6= W ) = Pe = 2nR i λi and hence

nR = H(W )
= H(W |Ŵ ) + I(W ; Ŵ ) Entropy identity
≤ 1 + Pe(n) nR + I(W ; Ŵ ) Fano’s theorem (upper bounding H(Pe(n) ) < 1)
≤ 1 + Pe(n) nR + I(X n ; Y n ) Data processing inequality
≤ 1 + Pe(n) nR + nC Repeated use of channel does not increase capacity (see annex 6)

Dividing by n we obtain
1
R ≤ Pe(n) R + C +
n
(n)
For large n, we can build codes that Pe → 0 and hence R ≤ C.
50/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

Annex 3: Communication above capacity

Proof. Take a channel of capacity C together with corresponding

capacity-achieving encoder/decoder that allow zero-error transmission, and
take the capacity-achieving encoder/decoder designed for a BSC whose
transition probability is q.

Since we are interested in lossy transmission, let us reverse the use of encoder
and decoder, i.e. the BSC-decoder is used as a lossy encoder, whose input is a
sequence of n symbols and the output is a codeword of length k < n, and its
rate is R0 = n/k = 1/ (1 − H(q)).

51/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

Annex 3: Communication above capacity

Proof (cont.) The lossy decoder will take a sequence of k symbols and convert
it to a sequence of n symbols. Let us concatenate them with the C capacity
channel together with its own optimum encoder/decoder as follows...

The lossy encoding is a surjective mapping and the lossy decoding is a bijective
mapping. Both are designed using the joint typicality principle for the BSC
channel, so ŵ and w will differ in nq symbols, hence Pe = q.
Now the rate of the transmission with errors is
k n 1
R= = C · R0 = C
# transmissions k 1 − H(Pe )

The following slides illustrate the proof...

52/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

53/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

54/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

55/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

56/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

57/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

Annex 4: Capacity with feedback

Proof. A non-feedback code is a special case, so CF B ≥ C. Let us prove that

CF B ≤ C. Assume W be uniformly distributed over 1, 2, ..., 2nR , and hence
(n)
P r(W 6= Ŵ ) = Pe . Then let us bound the rate as:

nR = H(W ) = H(W |Ŵ ) + I(W ; Ŵ )

≤ 1 + Pe(n) nR + I(W ; Ŵ ) Fano’s inequality
≤ 1 + Pe(n) nR + I(W ; Y n ) Data processing inequality

The last term can be bounded as:

58/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

Annex 4: Capacity with feedback

I(W ; Y n ) = H(Y n ) − H(Y n |W )

n
X
= H(Y n ) − H(Yi |Y1 , Y2 , . . . , Yi−1 , W ) Chain rule of H
i=1
Xn
= H(Y n ) − H(Yi |Y1 , Y2 , . . . , Yi−1 , W, Xi ) Since Xi =f (W,Yi−1 ,...,Y1 )
i=1
Xn n
X n
X
= H(Y n ) − H(Yi |Xi ) ≤ H(Yi ) − H(Yi |Xi )
i=1 i=1 i=1
n
X
= I(Xi ; Yi ) ≤ nC Repeated use of channel does not increase capacity
i=1

Put altogether to obtain

nR ≤ 1 + Pe(n) nR + nC,

so when n → ∞, R ≤ C.

59/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

Annex 5: Joint source-channel coding

Proof of achievability. The ergodic source process V is mapped to codewords

through an encoding rule xn (v n ) and sent over the channel. The receiver
observes y n and makes a guess v̂ n . An error is declared if v̂ n 6= v n . We shall
follow three steps:
i. The size of the typical set An
for V is 2
n(H(V)+)
so we need n(H(V) + )
bits to encode it. Encoding non-typical sequences will entail an error → 0
(remember lecture 3).
ii. We can transmit the index to the typical set with arbitrary low error if
H(V) + = Rs ≤ C (according to the channel capacity theorem).
iii. The receiver can reconstruct v n by enumerating the typical set and
choosing a sequence that matches the transmitted one with high probability:

Pr (v n 6= v̂ n ) ≤ Pr (v n ∈
/ An n n n n
) + Pr (g(y ) 6= v |v ∈ A )

for large n. Therefore we can reconstruct the original sequence with low
probability of error if H(V) ≤ C.

60/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

Annex 5: Joint source-channel coding

Proof of converse. Identify the Markov’s chain V → X n → Y n → V̂

Then, by Fano’s inequality

H(V|V̂) ≤ 1 + Pr (v̂ n 6= v n ) log |V n | = 1 + Pr (v̂ n 6= v n ) n log |V|

Let us apply it to bound the error

1
H(V) ≤ H(V1 , ..., Vn )
n
1 1
= H(V n |V̂ n ) + I(V n ; V̂ n )
n n
1 1
≤ (1 + Pr (v̂ n 6= v n ) n log |V|) + I(V n ; V̂ n ) Fano’s theorem
n n
1 1
≤ (1 + Pr (v̂ n 6= v n ) n log |V|) + I(X n ; Ŷ n ) Data processing inequality
n n
1
≤ + Pr (v̂ n 6= v n ) log |V| + C Capacity of repeated use of channel (see annex 6)
n
By letting n → ∞, if Pr (v̂ n 6= v n ) → 0 then H(V) ≤ C.

61/62
Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

Annex 6: Repeated use of channel

Theorem (Capacity of repeated use of channel)
Let Y n be the result of passing X n through a discrete memoryless channel of
capacity C. Then I(X n ; Y n ) ≤ nC for all p(xn ).

Proof.
I(X n ; Y n ) = H(Y n ) − H(Y n |X n )
n
X
= H(Y n ) − H(Yi |Y1 , . . . , Yi−1 , X n ) by chain rule of entropy
i=1
n
X
n
= H(Y ) − H(Yi |Xi ) by definition of a memoryless channel
i=1
n
X n
X
≤ H(Yi ) − H(Yi |Xi ) by definition of joint entropy
i=1 i=1
Xn
= I(Xi ; Yi ) by definition of mutual information
i=1
≤ nC
62/62

Communication Theory II - Lecture 8
No ratings yet
Communication Theory II - Lecture 8
35 pages
Channel Capacity
No ratings yet
Channel Capacity
68 pages
Scientific Notation Unit Test
100% (1)
Scientific Notation Unit Test
3 pages
Introduction To The Theory of Error Correction Codes
No ratings yet
Introduction To The Theory of Error Correction Codes
78 pages
21EC301 DC Unit I Information Theory Materials
No ratings yet
21EC301 DC Unit I Information Theory Materials
35 pages
Channel Coding: - Channel Capacity Channel Capacity, C Is Defined As
No ratings yet
Channel Coding: - Channel Capacity Channel Capacity, C Is Defined As
11 pages
Network Information Theory: Multiple Access Channels Broadcasting Channel Capacity Region
No ratings yet
Network Information Theory: Multiple Access Channels Broadcasting Channel Capacity Region
30 pages
Communication Channels: Course Name: Information Theory and Coding Course Code: 19ECE312 Semester: 6 Sem - ECE
No ratings yet
Communication Channels: Course Name: Information Theory and Coding Course Code: 19ECE312 Semester: 6 Sem - ECE
171 pages
This Sheet Is For 1 Mark Questions: Channel Capacity Is Exactly Equal To
No ratings yet
This Sheet Is For 1 Mark Questions: Channel Capacity Is Exactly Equal To
16 pages
Chapter 4
No ratings yet
Chapter 4
89 pages
Gaussian Channel: I 2 I I I
No ratings yet
Gaussian Channel: I 2 I I I
26 pages
H (Y/X), As: (/) ( (0.5 (1) Log (1) ) 2 (0.5 Log) 2) (/) ( (1) Log (1) Log) Then I (X, Y) H (Y) - H (Y/X) 1+ (1-p) Log (1-p) +P Log P
No ratings yet
H (Y/X), As: (/) ( (0.5 (1) Log (1) ) 2 (0.5 Log) 2) (/) ( (1) Log (1) Log) Then I (X, Y) H (Y) - H (Y/X) 1+ (1-p) Log (1-p) +P Log P
7 pages
Communication Systems Channel Capacity
No ratings yet
Communication Systems Channel Capacity
6 pages
ch7 PDF
No ratings yet
ch7 PDF
33 pages
Channel Capacity and Models
No ratings yet
Channel Capacity and Models
30 pages
Information Theory: Today's Topics
No ratings yet
Information Theory: Today's Topics
4 pages
Chapter 2
No ratings yet
Chapter 2
12 pages
Introduction To Information Theory Channel Capacity and Models
No ratings yet
Introduction To Information Theory Channel Capacity and Models
36 pages
Channel Capacity Explained
No ratings yet
Channel Capacity Explained
51 pages
A Mathematical Theory of Communication: Jin Woo Shin, Sang Joon Kim
No ratings yet
A Mathematical Theory of Communication: Jin Woo Shin, Sang Joon Kim
6 pages
Lecture9 PDF
No ratings yet
Lecture9 PDF
39 pages
Village Map: Taluka: Kaij District: Bid
100% (1)
Village Map: Taluka: Kaij District: Bid
1 page
Lecture 6 - Communication Channels and Channel Capacity
No ratings yet
Lecture 6 - Communication Channels and Channel Capacity
19 pages
ITC Unit 2 Notes Detailed
No ratings yet
ITC Unit 2 Notes Detailed
3 pages
It Co 3 en
No ratings yet
It Co 3 en
14 pages
Channel Capacity: 1 Preliminaries and Definitions
No ratings yet
Channel Capacity: 1 Preliminaries and Definitions
5 pages
Lecture 15
No ratings yet
Lecture 15
5 pages
Different Integration Formulas
No ratings yet
Different Integration Formulas
11 pages
Discrete Memoryless Channel Guide
No ratings yet
Discrete Memoryless Channel Guide
5 pages
ETN642 Lec8 Ch8 Handouts
No ratings yet
ETN642 Lec8 Ch8 Handouts
12 pages
תקשורת ספרתית- הרצאה 1 - הקדמה
No ratings yet
תקשורת ספרתית- הרצאה 1 - הקדמה
34 pages
Information Theory: Channel Capacity
No ratings yet
Information Theory: Channel Capacity
17 pages
TE361 Channel Coding
No ratings yet
TE361 Channel Coding
65 pages
Noisy Coding Theorem Explained
No ratings yet
Noisy Coding Theorem Explained
6 pages
5-2 Information Theory
No ratings yet
5-2 Information Theory
37 pages
Lecture 15: Channel Capacity, Rate of Channel Code
No ratings yet
Lecture 15: Channel Capacity, Rate of Channel Code
6 pages
Secure HTTP: A Historical Overview
No ratings yet
Secure HTTP: A Historical Overview
1 page
Reto 4
No ratings yet
Reto 4
5 pages
TE361 Channel Coding 1
No ratings yet
TE361 Channel Coding 1
24 pages
Channel Coding I
No ratings yet
Channel Coding I
22 pages
T4 NoiseAndMutualInformation
No ratings yet
T4 NoiseAndMutualInformation
8 pages
Information Theory: Channel Capacity
No ratings yet
Information Theory: Channel Capacity
18 pages
Chapter 7: Channel Capacity
No ratings yet
Chapter 7: Channel Capacity
33 pages
Criminology MCQs
100% (1)
Criminology MCQs
4 pages
Anterior Uveitis
No ratings yet
Anterior Uveitis
65 pages
Information Theory: Mohamed Hamada
No ratings yet
Information Theory: Mohamed Hamada
24 pages
Matdid 325610
No ratings yet
Matdid 325610
14 pages
Comsysf311 L39
No ratings yet
Comsysf311 L39
71 pages
Discrete Memoryless Channels
No ratings yet
Discrete Memoryless Channels
16 pages
1 Merged
No ratings yet
1 Merged
182 pages
1.introduction To Surveying
No ratings yet
1.introduction To Surveying
10 pages
Coding 515
No ratings yet
Coding 515
92 pages
CE00036-3-Data Communication Systems Individual Assignment Page 1 of 34
No ratings yet
CE00036-3-Data Communication Systems Individual Assignment Page 1 of 34
34 pages
CH 7
No ratings yet
CH 7
68 pages
Multiplication&division PDF
No ratings yet
Multiplication&division PDF
2 pages
International Journal of Non-Linear Mechanics: Chiara Gastaldi, Teresa M. Berruti
No ratings yet
International Journal of Non-Linear Mechanics: Chiara Gastaldi, Teresa M. Berruti
16 pages
Information Theory & Channel Capacity
No ratings yet
Information Theory & Channel Capacity
64 pages
Personal Dynamics Part A
No ratings yet
Personal Dynamics Part A
20 pages
Project Topics On Law of Evidence
No ratings yet
Project Topics On Law of Evidence
5 pages
Stats
No ratings yet
Stats
14 pages
CSC 310: Information Theory: University of Toronto, Fall 2011 Instructor: Radford M. Neal
No ratings yet
CSC 310: Information Theory: University of Toronto, Fall 2011 Instructor: Radford M. Neal
15 pages
L9,10, L11 - Module 3 Channel Models and Capacity
No ratings yet
L9,10, L11 - Module 3 Channel Models and Capacity
40 pages
MRM Assessment Questionaire
No ratings yet
MRM Assessment Questionaire
2 pages
Lec 7
No ratings yet
Lec 7
17 pages
Discrete Memory Less Channel
No ratings yet
Discrete Memory Less Channel
68 pages
ITC Module2
No ratings yet
ITC Module2
30 pages
Digital Communication Chapter 3
No ratings yet
Digital Communication Chapter 3
37 pages
FALLSEM2015 16 - CP1207 - 10 Jul 2015 - RM01 - UNIT 1 Modified
No ratings yet
FALLSEM2015 16 - CP1207 - 10 Jul 2015 - RM01 - UNIT 1 Modified
39 pages
Georges Renault Cvis II
No ratings yet
Georges Renault Cvis II
76 pages
Tendrel Nyesel - Rigpa Wiki052150
No ratings yet
Tendrel Nyesel - Rigpa Wiki052150
6 pages
Marine Crane Failure Analysis
100% (1)
Marine Crane Failure Analysis
27 pages
Ecosystem Services: Economics and Policy Stephen Muddiman Instant Download
No ratings yet
Ecosystem Services: Economics and Policy Stephen Muddiman Instant Download
62 pages
Information Theory and Coding - Chapter 5
No ratings yet
Information Theory and Coding - Chapter 5
41 pages
Channel Capacity PDF
No ratings yet
Channel Capacity PDF
30 pages
PRIMARK To NIGERIA Group 12 ENG7144 - International Business & Marketing Presentation
No ratings yet
PRIMARK To NIGERIA Group 12 ENG7144 - International Business & Marketing Presentation
77 pages
Single Channel LoRa IoT Kit v2 User Manual - v1.0.7
No ratings yet
Single Channel LoRa IoT Kit v2 User Manual - v1.0.7
61 pages
Ii M.A. English Men 33 - Contemporary Literary Theory-I
No ratings yet
Ii M.A. English Men 33 - Contemporary Literary Theory-I
16 pages
Determinants of The Money Supply: © 2005 Pearson Education Canada Inc
No ratings yet
Determinants of The Money Supply: © 2005 Pearson Education Canada Inc
17 pages
How Do Trusses Work
No ratings yet
How Do Trusses Work
14 pages
DM GTU Study Material E-Notes Unit-4 29012022085557AM
No ratings yet
DM GTU Study Material E-Notes Unit-4 29012022085557AM
12 pages
Economics of Oil Prices 2
No ratings yet
Economics of Oil Prices 2
8 pages
Aditya Internship Training
No ratings yet
Aditya Internship Training
14 pages
First Term TT-2 CL 9,10,11&12
No ratings yet
First Term TT-2 CL 9,10,11&12
1 page
Lance Design For Argon Bubbling in Molten Metal
No ratings yet
Lance Design For Argon Bubbling in Molten Metal
12 pages
Software Requirements Specification (SRS)
No ratings yet
Software Requirements Specification (SRS)
5 pages
Latin American Veggie Meal Plan
No ratings yet
Latin American Veggie Meal Plan
2 pages
Electric Fan
No ratings yet
Electric Fan
1 page
(Hooker and Monas, 2008) Shoestring Venture - The Startup Bible
No ratings yet
(Hooker and Monas, 2008) Shoestring Venture - The Startup Bible
532 pages

TEOI-Capacity of Discrete Channels

Uploaded by

TEOI-Capacity of Discrete Channels

Uploaded by

Communication Discrete noisy channels Joint typicality Channel capacity theorems Annexes

Jordi Quer, Josep Vidal

Mathematics Department, Signal Theory and Communications Department

Communication between two points A and B is a procedure whereby physical

Succesful communication: when A and B agree on what was sent, in spite of

Data communication systems are strongly related to the transmission medium

Some real communication channels

Light on optical fiber Acoustic medium Electric signals on

Electromagnetic waves Wireless optical Data storage devices

How to design distinguishable codewords in B so that we can reconstruct

Discrete noisy channel. It consists of an input alphabet X , an output

p(yi |xi , xi−1 , . . . , x1 , yi−1 , . . . , y1 ) = p(yi |xi )

Message. It is a random variable W taking values from a set

If xn ∈ X n is the transmitted codeword, the probability of receiving

The maximal probability of error λ(n) for an (M, n) code is

codes such that λ(n) → 0 as n → ∞.

The models used in the sequel do not include:

Discrete-time continous amplitude input/output channels

The capacity of the channel is defined as

where the maximum is taken over all input distributions p(x).

It is measured in bits/transmission, or bits/channel use.

The following duality should be noted:

Data compression - use source encoder to remove redundancy and

Examples of discrete channels

I(X; Y ) = H(X) − H(X|Y ) = H(p) − 0

Noisy channel with non-overlapping outputs

The channel is random, but the inputs can be

C = max I(X; Y ) = 1 bit/tr

Examples of discrete channels

C = max I(X; Y ) = max (H(X) − 1)

= log 26 − 1 = log 13 bits/tr

Note that if a uniform distribution is used for X,

Examples of discrete channels

Non-confusable set of inputs for a three-outputs noisy typewritter channel:

Examples of discrete channels

Binary symmetric channel 

I(X; Y ) = H(Y ) − H(Y |X)

= H(Y ) + p log p + (1 − p) log(1 − p)

The maximum is achieved when p(x = 1) = 12 , then p(y = 1) = 12 , so

Examples of discrete channels

Binary erasure channel

Some bits are lost, rather than corrupted:

C = max I(X; Y ) = max (H(Y ) − H(Y |X)) = max H(Y ) − H(α)

Examples of discrete channels

Binary erasure channel (cont.)

By computing the probability of the output it is clear that, for an arbitrary

H(Y ) = H(α) + (1 − α)H(π)

where π = p(x = 1). Therefore,

so the maximum is achieved at π = 21 , and C = (1 − α) bits/tr.

Examples of discrete channels

Weakly symmetric channel: in matrix Q, that contains the transition

Weakly symmetric channel Non-symmetric channel Symmetric channel

Examples of discrete channels

Symmetric channel (cont.)

where q is a row of Q. If p(x) is uniform,

Examples of discrete channels

If the ink pattern is digitized to 16 × 16 binary

Examples of discrete channels

Properties of channel capacity

Rationale: The mutual information is the uncertainty at the channel input

Example: A code for reliable storage

Hard disk drive Magnetic recoding read/write head

Watch this video on how an HDD works.

Example: A code for reliable storage

Assume that the crossover probability of the BSC is p = 0.1.

An error-correcting code (channel encoder/decoder) is needed.

Example: A code for reliable storage

The transmission rate is R = 1/N . 23/62

Example: A code for reliable storage

The optimum decoding strategy is deciding the transmitted bit by majority

Clearly, better codes are needed.

Example: A code for reliable storage

What is the tradeoff between redundancy and error probability?

Better channel codes and a teaser

Before Shannon announced his theorem, it was believed that zero-error

Shannon stated the tradeoff between

Binary symmetric channel

iii. By the joint AEP, Pr(E1c |w = 1) ≤ for n → ∞. By the code generation

= + (2nR − 1)2−n(I(X,Y )−3)

where the term in red is loosy upper bounded by 4.