Optimal Demodulation Chapter6
Optimal Demodulation Chapter6
Optimal Demodulation
As we saw in Chapter 4, we can send bits over a channel by choosing one of a set of waveforms to
send. For example, when sending a single 16QAM symbol, we are choosing one of 16 passband
waveforms:
sbc ,bs = bc p(t) cos 2πfc t − bs p(t) sin 2πfc t
where bc , bs each take values in {±1, ±3}. We are thus able to transmit log2 16 = 4 bits of
information. In this chapter, we establish a framework for recovering these 4 bits when the
received waveform is a noisy version of the transmitted waveform. More generally, we consider
the fundamental problem of M-ary signaling in additive white Gaussian noise (AWGN): one of
M signals, s1 (t), ..., sM (t) is sent, and the received signal equals the transmitted signal plus white
Gaussian noise (WGN).
At the receiver, we are faced with a hypothesis testing problem: we have M possible hypotheses
about which signal was sent, and we have to make our “best” guess as to which one holds, based
on our observation of the received signal. We are interested in finding a guessing strategy, more
formally termed a decision rule, which is the “best” according to some criterion. For communi-
cations applications, we are typically interested in finding a decision rule which minimizes the
probability of error (i.e., the probability of making a wrong guess). We can now summarize the
goals of this chapter as follows.
Goals: We wish to design optimal receivers when the received signal is modeled as follows:
Hi : y(t) = si (t) + n(t) , i = 1, , , .M
where Hi is the ith hypothesis, corresponding to signal si (t) being transmitted, and where n(t)
is white Gaussian noise. We then wish to analyze the performance of such receivers, to see how
performance measures such as the probability of error depend on system parameters. It turns
out that, for the preceding AWGN model, the performance depends only on the received signal-
to-noise ratio (SNR) and on the “shape” of the signal constellation {s1 (t), ..., sM (t)}. Underlying
both the derivation of the optimal receiver and its analysis is a geometric view of signals and
noise as vectors, which we term signal space concepts. Once we have this background, we are in
a position to discuss elementary power-bandwidth tradeoffs. For example, 16QAM has higher
bandwidth efficiency than QPSK, so it makes sense that it has lower power efficiency; that is, it
requires higher SNR, and hence higher transmit power, for the same probability of error. We will
be able to quantify this intuition, previewed in Chapter 4, based on the material in this chapter.
We will also be able to perform link budget calculations: for example, how much transmit power
is needed to attain a given bit rate using a given constellation as a function of range, and transmit
and receive antenna gains?
Chapter Plan: The prerequisites for this chapter are Chapter 4 (digital modulation) and the
material on Gaussian random variables (Section 5.6) and noise modeling (Section 5.8) in Chap-
ter 5. We build up the remaining background required to attain our goals in this chapter in a
289
step-by-step fashion, as follows.
Hypothesis testing: In Section 6.1, we establish the basic framework for hypothesis testing, de-
rive the form of optimal decision rules, and illustrate the application of this framework for
finite-dimensional observations.
Signal space concepts: In Section 6.2, we show that continuous time M-ary signaling in AWGN
can be reduced to an equivalent finite-dimensional system, in which transmitted signal vectors
are corrupted by vector WGN. This is done by projecting the continuous time signal into the
finite-dimensional signal space spanned by the set of possible transmitted signals, s1 , ..., sM . We
apply the hypothesis testing framework to derive the optimal receiver for the finite-dimensional
system, and from this we infer the optimal receiver in continuous time.
Performance analysis: In Section 6.3, we analyze the performance of optimal reception. We show
that performance depends only on SNR and the relative geometry of the signal constellation. We
provide exact error probability expressions for binary signaling. While the probability of error for
larger signal constellations must typically be computed by simulation or numerical integration,
we obtain bounds and approximations, building on the analysis for binary signaling, that provide
quick insight into power-bandwidth tradeoffs.
Link budget analysis: In Section 6.5, we illustrate how performance analysis is applied to obtain-
ing the “link budget” for a typical radio link, which is the tool used to obtain coarse guidelines
for the design of hardware, including transmit power, transmit and receive antennas, and receiver
noise figure.
Software: Software Lab 6.1 in this chapter builds on Software Lab 4.1, providing a hands-on
feel for Nyquist signaling over an AWGN channel. In turn, we build on this lab in Software Lab
8.1, which adds in channel dispersion to the model.
Notational shortcut: In this chapter, we make extensive use of the notational simplification
discussed at the end of Section 5.3. Given a random variable X, a common notation for probabil-
ity density function or probability mass function is pX (x), with X denoting the random variable,
and x being a dummy variable which we might integrate out when computing probabilities.
However, when there is no scope for confusion, we use the less cumbersome (albeit incomplete)
notation p(x), using the dummy variable x not only as the argument of the density, but also
to indicate that the density corresponds to the random variable X. (Similarly, we would use
p(y) to denote the density for a random variable Y .) The same convention is used for joint and
conditional densities as well. For random variables X and Y , we use the notation p(x, y) in-
stead of pX,Y (x, y), and p(y|x) instead of pY |X (y|x), to denote the joint and conditional densities,
respectively.
290
different under the two hypotheses, then it is no longer clear that splitting the difference between
the means is the right thing to do. We therefore need a systematic framework for hypothesis
testing, which allows us to derive good decision rules for a variety of statistical models.
In this section, we consider the general problem of M-ary hypothesis testing, in which we must
decide which of M possible hypotheses, H0 , ..., HM −1, “best explains” an observation Y . For
our purpose, the observation Y can be a scalar or vector, and takes values in an observation
space Γ. The link between the hypotheses and observation is statistical: for each hypothesis
Hi , we know the conditional distribution of Y given Hi . We denote the conditional density
of Y given Hi as p(y|i), i = 0, 1, ..., M − 1. We may also know the prior probabilities of the
hypotheses (i.e., the probability of each hypothesis prior to seeing the observation), denoted by
πi = P [Hi ], i = 0, 1, ..., M − 1, which satisfy M
P −1
i=0 π= 1. The final ingredient of the hypothesis
testing framework is the decision rule: for each possible value Y = y of the observation, we must
decide which of the M hypotheses we will bet on. Denoting this guess as δ(y), the decision rule
δ(·) is a mapping from the observation space Γ to {0, 1, ..., M − 1}, where δ(y) = i means that
we guess that Hi is true when we see Y = y. The decision rule partitions the observation space
into decision regions, with Γi denoting the set of values of Y for which we guess Hi . That is,
Γi = {y ∈ Γ : δ(y) = i}, i = 0, 1, ..., M − 1. We summarize these ingredients of the hypothesis
testing framework as follows.
Ingredients of hypothesis testing framework
• Hypotheses H0 , H1 , ..., HM −1
• Observation Y ∈ Γ
• Conditional densities p(y|i), for i = 0, 1, ..., M − 1
Prior probabilities πi = P [Hi ], i = 0, 1, ..., M − 1, with M
P −1
• i=0 πi = 1
• Decision rule δ : Γ → {0, 1, ..., M − 1}
• Decision regions Γi = {y ∈ Γ : δ(y) = i}, i = 0, 1, ..., M − 1
To make the concepts concrete, let us quickly recall Example 5.6.3, where we have M = 2
hypotheses, with H0 : Y ∼ N(0, v 2 ) and H1 : Y ∼ N(m, v 2 ). The “sensible” decision rule in this
example can be written as
0, y ≤ m/2
δ(y) =
1, y > m/2
so that Γ0 = (−∞, m/2] and Γ1 = (m/2, ∞). Note that this decision rule need not be optimal
if we know the prior probabilities. For example, if we know that π0 = 1, we should say that H0
m
is true, regardless of the value of Y : this would reduce the probability of error from Q 2v (for
the “sensible” rule) to zero!
291
Conditional Probabilities of Correct Decision: These are defined as
Pc|i = 1 − Pe|i = P [Y ∈ Γi |Hi ] (6.2)
Average Error Probability: This is given by averaging the conditional error probabilities
using the priors:
M
X
Pe = πi Pe|i (6.3)
i=1
292
Properties of the MAP rule:
• The MAP rule reduces to the ML rule for equal priors.
• The MAP rule minimizes the probability of error. In other words, it is also the Minimum
Probability of Error (MPE) rule.
The first property follows from (6.6) by setting πi ≡ 1/M: in this case πi does not depend on i
and can therefore be dropped when maximizing over i. The second property is important enough
to restate and prove as a theorem.
Theorem 6.1.1 The MAP rule (6.6) minimizes the probability of error.
Proof of Theorem 6.1.1: We show that the MAP rule maximizes the probability of correct
decision. To do this, consider an arbitrary decision rule δ, with corresponding decision regions
{Γi }. The conditional probabilities of correct decision are given by
Z
Pc|i = P [Y ∈ Γi |Hi ] = p(y|i)dy, i = 0, 1, ..., M − 1
Γi
Any point y ∈ Γ can belong in exactly one of the M decision regions. If we decide to put it in
Γi , then the point contributes the term πi p(y|i) to the integrand. Since we wish to maximize
the overall integral, we choose to put y in the decision region for which it makes the largest
contribution to the integrand. Thus, we put it in Γi so as to maximize πi p(y|i), which is precisely
the MAP rule (6.6).
p(y|0)
p(y|1)
y
1.85
Figure 6.1: Hypothesis testing with exponentially distributed observations.
293
Example 6.1.1 (Hypothesis testing with exponentially distributed observations): A
binary hypothesis problem is specified as follows:
H0 : Y ∼ Exp(1) , H1 : Y ∼ Exp(1/4)
where Exp(µ) denotes an exponential distribution with density µe−µy , CDF 1 − e−µy and com-
plementary CDF e−µy , where y ≥ 0 (all the probability mass falls on the nonnegative numbers).
Note that the mean of an Exp(µ) random variable is 1/µ. Thus, in our case, the mean under H0
is 1, while the mean under H1 is 4.
(a) Find the ML rule and the corresponding conditional error probabilities.
(b) Find the MPE rule when the prior probability of H1 is 1/5. Also find the conditional and
average error probabilities.
Solution:
(a) As shown in Figure 6.1, we have
H1
>
y (4/3) log 4 = 1.8484
<
H0
294
which reduces to
H1
>
(1/5) (1/4)e−y/4 (4/5) e−y
<
H0
This gives
H1
> 4
y log 16 = 3.6968
< 3
H0
Proceeding as in (a), we obtain
Since the prior probability of H1 is small, the MPE rule is biased towards guessing that H0 is
true. In this case, the decision rule is so skewed that the conditional probability of error under
H1 is actually worse than a random guess. Taking this one step further, if the prior probability
of H1 actually becomes zero, then the MPE rule would always guess that H0 is true. In this case,
the conditional probability of error under H1 would be one! This shows that we must be careful
about modeling when applying the MAP rule: if we are wrong about our prior probabilities, and
H1 does occur with nonzero probability, then our performance would be quite poor.
Both the ML and MAP rules involve comparison of densities, and it is convenient to express
them in terms of a ratio of densities, or likelihood ratio, as discussed next.
Binary hypothesis testing and the likelihood ratio: For binary hypothesis testing, the ML
rule (6.5) reduces to
H1 H1
> p(y|1) >
p(y|1) p(y|0) , or 1 (6.7)
< p(y|0) <
H0 H0
The ratio of conditional densities appearing above is defined to be the likelihood ratio (LR) L(y)
a function of fundamental importance in hypothesis testing. Formally, we define the likelihood
ratio as
p(y|1)
L(y) = , y∈Γ (6.8)
p(y|0)
Likelihood ratio test: A likelihood ratio test (LRT) is a decision rule in which we compare the
likelihood ratio to a threshold.
H1
>
L(y) γ
<
H0
where the choice of γ depends on our performance criterion. An equivalent form is the log
likelihood ratio test (LLRT), where the log of the likelihood ratio is compared with a threshold.
295
We have already shown in (6.7) that the ML rule is an LRT with threshold γ = 1. From (6.6),
we see that the MAP, or MPE, rule is also an LRT:
H1 H1
> p(y|1) > π0
π1 p(y|1) π0 p(y|0) , or
< p(y|0) < π1
H0 H0
H1 H1
> >
L(y) 1 or log L(y) 0 ML rule (6.9)
< <
H0 H0
H1 H1
> π0 > π0
L(y) or log L(y) log MAP/MPE rule (6.10)
< π1 < π1
H0 H0
We now specialize further to the setting of Example 5.6.3. The conditional densities are as shown
in Figure 6.2. Since this example is fundamental to our understanding of signaling in AWGN,
let us give it a name, the basic Gaussian example, and summarize the set-up in the language of
hypothesis testing.
p(y|0) p(y|1)
0 m/2 m
H0 : Y ∼ N(0, v 2 ), H1 : Y ∼ N(m, v 2 ), or
2 2
y
exp(− 2v 2) exp(− (y−m)
2 )
p(y|0) = √ ; p(y|1) = √ 2v (6.11)
2πv 2 2πv 2
Likelihood ratio for basic Gaussian example: Substituting (6.11) into (6.8) and simplifying
(this is left as an exercise), obtain that the likelihood ratio for the basic Gaussian example is
1 m2
L(y) = exp v2
(my − 2
)
(6.12)
1 m2
log L(y) = v2
my − 2
296
ML and MAP rules for basic Gaussian example: Using (6.12) in (6.9), we leave it as an
exercise to check that the ML rule reduces to
H1
>
Y m/2, ML rule (m > 0) (6.13)
<
H0
(check that the inequalities get reversed for m < 0). This is exactly the “sensible” rule that we
analyzed in Example 5.6.3. Using (6.12) in (6.10), we obtain the MAP rule:
H1
> v2 π0
Y m/2 + log , MAP rule (m > 0) (6.14)
< m π1
H0
Example 6.1.2 (ML versus MAP for the basic Gaussian example): For the basic Gaus-
sian example, we now know that the decision rule in Example 5.6.3 is the ML rule, and we
showed in that example that the performance of this rule is given by
m p
Pe|0 = Pe|1 = Pe = Q =Q SNR/2
2v
We also saw that at 13 dB SNR, the error probability for the ML rule is
Pe,M L = 7.8 × 10−4
regardless of the prior probabilities. For equal priors, the ML rule is also MPE, and we cannot
hope to do better than this. Let us now see what happens when the prior probability of H0 is
π0 = 13 . The ML rule is no longer MPE, and we should be able to do better by using the MAP
rule. We leave it as an exercise to show that the conditional error probabilities for the MAP rule
are given by
m v π0 m v π0
Pe|0 = Q + log , Pe|1 = Q − log (6.15)
2v m π1 2v m π1
Plugging in the numbers for SNR of 13 dB and π0 = 13 , we obtain
297
0 0
10 10
P(error) (ML) P(error) (ML)
P(error) (MAP) P(error) (MAP)
P(error|0) (MAP) P(error|0) (MAP)
P(error|1) (MAP) P(error|1) (MAP)
−1
10
−1
Error Probabilities
Error Probabilities
10
−2
10
−2
10
−3
10
−3 −4
10 10
3 4 5 6 7 8 9 10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
SNR (dB) Prior Probability of H0
(a) Dependence on SNR (π0 = 0.3) (b) Dependence on priors (SNR = 10 dB)
Figure 6.3: Conditional and average error probabilities for the MAP receiver compared to the
error probability for the ML receiver. We consider the basic Gaussian example, fixing the priors
and varying SNR in (a), and fixing SNR and varying the priors in (b). For the MAP rule,
the conditional error probability given a hypothesis increases as the prior probability of the
hypothesis decreases. The average error probability for the MAP rule is always smaller than the
ML rule (which is the MAP rule for equal priors) when π0 6= 12 . The MAP error probability
tends towards zero as π0 → 0 or π0 → 1.
but we would be a lot more confident about our guess in the latter instance. Rather than
throwing away this information, we can employ soft decisions that convey reliability information
which could be used at a higher layer, for example, by a decoder which is processing a codeword
consisting of many bits.
Actually, we already know how to compute soft decisions: the posterior probabilities P [Hi |Y = y],
i = 0, 1, ..., M − 1, that appear in the MAP rule are actually the most information that we can
hope to get about the hypotheses from the observation. For notational compactness, let us
denote these by πi (y). The posterior probabilities can be computed using Bayes’ rule as follows:
πi p(y|i) πi p(y|i)
πi (y) = P [Hi |Y = y] = = PM −1 (6.16)
p(y) j=0 πj p(y|j)
In practice, we may settle for quantized soft decisions which convey less information than the
posterior probabilities due to tradeoffs in precision or complexity versus performance.
Example 6.1.3 (Soft decisions for 4PAM in AWGN): Consider a 4-ary hypothesis testing
problem modeled as follows:
This is a model that arises for 4PAM signaling in AWGN, as we see later. For σ 2 = 1, A = 1
and Y = −1.5, find the posterior probabilities if π0 = 0.4 and π1 = π2 = π3 = 0.2.
Solution: The posterior probability for the ith hypothesis is of the form
(y−mi )2
πi (y) = c πi e− 2σ 2
298
where mi ∈ {±A, ±3A} is the conditional mean under Hi , and where c is a constant that does
not depend on i. Since the posterior probabilities must sum to one, we have
3
X 3
X (y−mj )2
−
πj (y) = c πj e 2σ 2 =1
j=0 j=0
The MPE hard decision in this case is δM P E (−1.5) = 1, but note that the posterior probability
for H0 is also quite high, which is information which would have been thrown away if only
hard decisions were reported. However, if the noise strength is reduced, then the hard decision
becomes more reliable. For example, for σ 2 = 0.1, we obtain
π0 (−1.5) = 9.08 × 10−5 , π1 (−1.5) = 0.9999, π2 (−1.5) = 9.36 × 10−14 , π3 (−1.5) = 3.72 × 10−44
where it is not wise to trust some of the smaller numbers. Thus, we can be quite confident about
the hard decision from the MPE rule in this case.
For binary hypothesis testing, it suffices to output one of the two posterior probabilities, since
they sum to one. However, it is often more convenient to output the log of the ratio of the
posteriors, termed the log likelihood ratio (LLR):
p(y|1) (6.17)
= log ππ10 + log p(y|0)
Notice how the information from the priors and the information from the observations, each of
which also takes the form of an LLR, add up in the overall LLR. This simple additive combining of
information is exploited in sophisticated decoding algorithms in which information from one part
of the decoder provides priors for another part of the decoder. Note that the LLR contribution
due to the priors is zero for equal priors.
Example 6.1.4 (LLRs for binary antipodal signaling): Consider H1 : Y ∼ N(A, σ 2 ) versus
H0 : Y ∼ N(−A, σ 2 ). We shall see later how this model arises for binary antipodal signaling in
AWGN. We leave it as an exercise to show that the LLR is given by
2Ay
LLR(y) =
σ2
299
6.2 Signal Space Concepts
We have seen in the previous section that the statistical relation between the hypotheses {Hi }
and the observation Y are expressed in terms of the conditional densities p(y|i). We are now
interested in applying this framework for derive optimal decision rules (and the receiver structures
required to implement them) for the problem of M-ary signaling in AWGN. In the language of
hypothesis testing, the observation here is the received signal y(t) modeled as follows:
Hi : y(t) = si (t) + n(t), i = 0, 1, ..., M − 1 (6.18)
where si (t) is the transmitted signal corresponding to hypothesis Hi , and n(t) is WGN with PSD
σ 2 = N0 /2. Before we can apply the framework of the previous section, however, we must figure
out how to define conditional densities when the observation is a continuous-time signal. Here
is how we do it:
• We first observe that, while the signals si (t) live in an infinite-dimensional, continuous-time
space, if we are only interested in the M signals that could be transmitted under each of the M
hypotheses, then we can limit attention to a finite-dimensional subspace of dimension at most
M. We call this the signal space. We can then express the signals as vectors corresponding to
an expansion with respect to an orthonormal basis for the subspace.
• The projection of WGN onto the signal space gives us a noise vector whose components are
i.i.d. Gaussian. Furthermore, we observe that the component of the received signal orthogonal to
the signal space is irrelevant: that is, we can throw it away without compromising performance.
• We can therefore restrict attention to projection of the received signal onto the signal space
without loss of performance. This projection can be expressed as a finite-dimensional vector
which is modeled as a discrete time analogue of (6.18). We can now apply the hypothesis testing
framework of Section 6.1 to infer the optimal (ML and MPE) decision rules.
• We then translate the optimal decision rules back to continuous time to infer the structure of
the optimal receiver.
Figure 6.4: For linear modulation with no intersymbol interference, the complex symbols them-
selves provide a two-dimensional signal space representation. Three different constellations are
shown here.
Example 6.2.1 (Signal space for two-dimensional modulation): Consider a single complex-
valued symbol b = bc + jbs (assume that there is no intersymbol interference) sent using two-
dimensional passband linear modulation. The set of possible transmitted signals are given by
sbc ,bs (t) = bc p(t) cos 2πfc t − bs p(t) sin 2πfc t
300
where (bc , bs ) takes M possible values for an M-ary constellation (e.g., M = 4 for QPSK, M = 16
for 16QAM), and where p(t) is a baseband pulse of bandwidth smaller than the carrier frequency
fc . Setting φc (t) = p(t) cos 2πfc t and φs (t) = −p(t) sin 2πfc t, we see that we can write the set of
transmitted signals as a linear combination of these signals as follows:
so that the signal space has dimension at most 2. From Chapter 2, we know that φc and φs
are orthogonal (I-Q orthogonality), and hence linearly independent. Thus, the signal space has
dimension exactly 2. Noting that ||φc ||2 = ||φs ||2 = 21 ||p||2 , the normalized versions of φc and φs
provide an orthonormal basis for the signal space:
φc (t) φs (t)
ψc (t) = , ψs (t) =
||φc || ||φs ||
We can now write
1 1
sbc ,bs (t) = √ ||p||bcψc (t) + √ ||p||bsψs (t)
2 2
With respect to this basis, the signals can be represented as two dimensional vectors:
1 bc
sbc ,bs (t) ↔ sbc ,bs = √ ||p||
2 bs
That is, up to scaling, the signal space representation for the transmitted signals are simply the
two-dimensional symbols (bc , bs )T . Indeed, while we have been careful about keeping track of
the scaling factor in this example, we shall drop it henceforth, because, as we shall soon see,
what matters in performance is the signal-to-noise ratio, rather than the absolute signal or noise
strength.
Orthogonal modulation provides another example where an orthonormal basis for the signal
space is immediately obvious. For example, if s1 , ..., sM are orthogonal signals with equal energy
si (t)
||s||2 ≡ Es , then ψi (t) = √ Es
provide an orthonormal basis for the signal space, and the vector
√
representation of the ith signal is the scaled unit vector Es (0, ..., 0, 1( in ith position), 0, ..., 0)T .
Yes another example where an orthonormal basis can be determined by inspection is shown in
Figures 6.5 and 6.6, and discussed in Example 6.2.2.
s 0 (t) s 1(t)
1
1
t t
0 3 1 3
−1
s2 (t)
s 3 (t)
2
t t
0 1 3 0 1 2 3
−1
301
ψ (t) ψ1(t) ψ (t)
0 2
1 1 1
t t
0 1 t 0 1 2 2 3
Figure 6.6: An orthonormal basis for the signal set in Figure 6.5, obtained by inspection.
Example 6.2.2 (Developing a signal space representation for a 4-ary signal set): Con-
sider the example depicted in Figure 6.5, where there are 4 possible transmitted signals, s0 , ..., s3 .
It is clear from inspection that these span a three-dimensional signal space, with a convenient
choice of basis signals
ψ0 (t) = I[0,1] (t), ψ1 (t) = I[1,2] (t), ψ2 (t) = I[2,3] (t)
as shown in Figure 6.6. Let si = (si [1], si [2], si [3])T denote the vector representation of the signal
si with respect to the basis, for i = 0, 1, 2, 3. That is, the coefficients of the vector si are such
that
X 2
si (t) = si [k]ψk (t)
k=0
Now that we have seen some examples, it is time to be more precise about what we mean
by the “signal space.” The signal space S is the finite-dimensional subspace (of dimension
n ≤ M) spanned by s0 (t), ..., sM −1 (t). That is, S consists of all signals of the form a0 s0 (t) +
... + aM −1 sM −1 (t), where a0 , ..., aM −1 are arbitrary scalars. Let ψ0 (t), ..., ψn−1 (t) denote an or-
thonormal basis for S. We have seen in the preceding examples that such a basis can often be
determined by inspection. In general, however, given an arbitrary set of signals, we can always
construct an orthonormal basis using the Gram-Schmidt procedure described below. We do not
need to use this procedure often–in most settings of interest, the way to go from continuous to
discrete time is clear–but state it below for completeness.
Gram-Schmidt orthogonalization: The idea is to build up an orthonormal basis step by
step, with the basis after the mth step spanning the first m signals. The first basis function is
a scaled version of the first signal (assuming this is nonzero–otherwise we proceed to the second
signal without adding a basis function). We then consider the component of the second signal
orthogonal to the first basis function. This projection is nonzero if the second signal is linearly
independent of the first; in this case, we introduce a basis function that is a scaled version of
the projection. See Figure 6.7. This procedure goes on until we have covered all M signals. The
number of basis functions n equals the dimension of the signal space, and satisfies n ≤ M. We
can summarize the procedure as follows.
Letting Sk−1 denote the subspace spanned by s0 , ..., sk−1 , the Gram-Schmidt algorithm proceeds
iteratively: given an orthonormal basis for Sk−1 , it finds an orthonormal basis for Sk . The
procedure stops when k = M. The method is identical to that used for finite-dimensional
vectors, except that the definition of the inner product involves an integral, rather than a sum,
for the continuous-time signals considered here.
302
φ 1 (t)
ψ (t) =
1
|| φ 1 ||
φ (t) s 1(t)
1
s 0(t) = φ 0 (t)
φ (t)
ψ (t) = 0
0
|| φ 0 ||
The signal φk (t) is the component of sk (t) orthogonal to the subspace Sk−1 . If φk 6= 0, define
a new basis function ψm (t) = φ||φk (t)
k ||
, and update the basis as Bk = {ψ1 , ..., ψm , ψm }. If φk = 0,
then sk ∈ Sk−1 , and it is not necessary to update the basis; in this case, we set Bk = Bk−1 =
{ψ0 , ..., ψm−1 }.
The procedure terminates at step M, which yields a basis B = {ψ0 , ..., ψn−1 } for the signal space
S = SM −1 . The basis is not unique, and may depend (and typically does depend) on the order in
which we go through the signals in the set. We use the Gram-Schmidt procedure here mainly as
a conceptual tool, in assuring us that there is indeed a finite-dimensional vector representation
for a finite set of continuous-time signals.
ψ (t) ψ (t) ψ (t)
0 1 2
A B C
t t
0 3 t 0 1 3 1 2 3
−C
−2B
Figure 6.8: An orthonormal basis for the signal set in Figure 6.5, obtained by applying the
Gram-Schmidt procedure. The unknowns A, B, and C are to be determined in Exercise 6.2.1.
303
in the missing numbers. While the basis thus obtained is not as “nice” as the one obtained by
inspection in Figure 6.6, the Gram-Schmidt procedure has the advantage of general applicability.
Inner products are preserved: We shall soon see that the performance of M-ary signaling
in AWGN depends only on the inner products between the signals, if the noise PSD is fixed.
Thus, an important observation when mapping the continuous time hypothesis testing problem
to discrete time is to check that these inner products are preserved when projecting onto the
signal space. Consider the continuous time inner products
Z
hsi , sj i = si (t)sj (t)dt , i, j = 0, 1, ..., M − 1 (6.19)
where the extreme right-hand side is the inner product of the signal vectors si = (si [0], ..., si [n −
1])T and sj = (sj [0], ..., sj [n − 1])T . This makes sense: the geometric relationship between signals
(which is what the inner products capture) should not depend on the basis with respect to which
they are expressed.
304
Then we can write the noise n(t) as follows:
n−1
X
n(t) = Ni ψi (t) + n⊥ (t)
i=0
where n⊥ (t) is the projection of the noise orthogonal to the signal space. Thus, we can decom-
pose the noise into two parts: a noise vector N = (N0 , ..., Nn−1 )T corresponding to the projection
onto the signal space, and a component n⊥ (t) orthogonal to the signal space. In order to charac-
terize the statistics of these quantities, we need to consider random variables obtained by linear
processing of WGN. Specifically, consider random variables generated by passing WGN through
correlators: Z ∞
Z1 = n(t)u1 (t)dt = hn, u1i
−∞
Z ∞
Z2 = n(t)u2 (t)dt = hn, u2i
−∞
where u1 and u2 are deterministic, finite energy signals. We can now state the following result.
Theorem 6.2.1 (WGN through correlators): The random variables Z1 = hn, u1 i and Z2 =
hn, u2 i are zero mean, jointly Gaussian, with
Proof of Theorem 6.2.1: The random variables Z1 = hn, u1 i and Z2 = hn, u2 i are zero mean
and jointly Gaussian, since n is zero mean and Gaussian. Their covariance is computed as
R R
cov (hn, u 1 i, hn, u 2 i) = E [hn, u 1 ihn, u 2 i] = E n(t)u 1 (t)dt n(s)u 2 (s)ds
u1 (t)u2 (s)σ 2 δ(t − s)dt ds
RR RR
= R u1(t)u2 (s)E[n(t)n(s)]dt ds =
= σ 2 u1 (t)u2 (t)dt = σ 2 hu1 , u2 i
The preceding computation is entirely analogous to the ones we did in Example 5.8.2 and in
Section 5.10, but it is important enough that we repeat some points that we had mentioned
then. First, we need to use two different variables of integration, t and s, in order to make sure
we capture all the cross terms. Second, when we take the expectation inside the integrals, we
must group all random terms inside it. Third, the two integrals collapse into one because the
autocorrelation function of WGN is impulsive. Finally, specializing the covariance to get the
variance leads to the remaining results stated in the theorem.
We can now provide the following geometric interpretation of WGN.
Remark 6.2.1 (Geometric interpretation of WGN): Theorem 6.2.1 implies that the pro-
jection of WGN along any “direction” in the space of signals (i.e., the result of correlating WGN
with a unit energy signal) has variance σ 2 = N0 /2. Also, its projections in orthogonal directions
are jointly Gaussian and uncorrelated random variables, and are therefore independent.
305
Noise projection on the signal space is discrete time WGN: It follows from the preceding
remark that the noise projections Ni = hn, ψi i along the orthonormal basis functions {ψi } for
the signal space are i.i.d. N(0, σ 2 ) random variables. In other words, the noise vector N =
(N0 , ..., Nn−1 )T ∼ N (0, σ 2 I). In other word, the components of N constitute discrete time white
Gaussian noise (“white” in this case means uncorrelated and having equal variance across all
components).
s
0
s1
Figure 6.9: Illustration of signal space concepts. The noise projection n⊥ (t) orthogonal to the
signal space is irrelevant. The relevant part of the received signal is the projection onto the signal
space, which equals the vector Y = si + N under hypothesis Hi .
Now that we have the signal and noise models, we can put them together in our hypothesis
testing framework. Let us condition on hypothesis Hi . The received signal is given by
y(t) = si (t) + n(t) (6.22)
Projecting this onto the signal space by correlating against the orthonormal basis functions, we
get
Y [k] = hy, ψk i = hsi + n, ψk i = si [k] + N[k] , k = 0, 1., , , .n − 1
Collecting these into an n-dimensional vector, we get the model
Hi : Y = si + N
Note that the vector Y = (y[1], ..., y[n])T completely describes the component of the received
signal y(t) in the signal space, given by
n−1
X n−1
X
yS (t) = hy, ψj iψj (t) = Y [j]ψj (t)
j=0 j=0
The component of the received signal orthogonal to the signal space is given by
y ⊥ (t) = y(t) − yS (t)
It is shown in Appendix 6.A that this component is irrelevant to our decision. There are two
reasons for this, as elaborated in the appendix: first, there is no signal contribution orthogonal
306
to the signal space (by definition); second, for the WGN model, the noise component orthogonal
to the signal space carries no information regarding the noise vector in the signal space. As illus-
trated in Figure 6.9, this enables us to reduce our infinite-dimensional problem to the following
finite-dimensional vector model, without loss of optimality.
Model for received vector in signal space
Hi : Y = si + N , i = 0, 1, ..., M − 1 (6.23)
Ns N
Y
s1 s0
Nc
s3 s2
Figure 6.10: A signal space view of QPSK. In the scenario shown, s0 is the transmitted vector,
and Y = s0 + N is the received vector after noise is added. The noise components Nc , Ns are
i.i.d. N(0, σ 2 ) random variables.
Two-dimensional modulation (Example 6.2.1 revisited): For a single symbol sent using
two-dimensional modulation, we have the hypotheses
where
sbc ,bs (t) = bc p(t) cos 2πfc t − bs p(t) sin 2πfc t
Restricting attention to the two-dimensional signal space identified in the example, we obtain
the model
Yc bc Nc
Hbc ,bs : Y = = +
Ys bs Ns
where we have absorbed scale factors into the symbol (bc , bs ), and where the I and Q noise compo-
nents Nc , Ns are i.i.d. N(0, σ 2 ). This is illustrated for QPSK in Figure 6.10. Thus, conditioned
on Hbc ,bs , Yc ∼ N(bc , σ 2 ) and Ys ∼ N(bs , σ 2 ), and Yc , Ys are conditionally independent. The
conditional density of Y = (Yc , Ys )T conditioned on Hbc ,bs is therefore given by
307
We can now infer the ML and MPE rules using our hypothesis testing framework. However, since
the same reasoning applies to signal spaces of arbitrary dimensions, we provide a more general
discussion in the next section, and then return to examples of two-dimensional modulation.
Hi : Y = si + N i = 0, 1, ..., M − 1 (6.24)
where N ∼ N(0, σ 2 I) is discrete time WGN. The ML and MPE rules for this problem are given
as follows. As usual,P −1we denote the prior probabilities required to specify the MPE rule by
{πi , i = 1, .., M} ( M
i=0 πi = 1).
ML rule
δM L (y) = arg min0≤i≤M −1 ||y − si ||2
2 (6.25)
= arg max0≤i≤M −1 hy, si i − ||s2i||
MPE rule
δM P E (y) = arg min0≤i≤M −1 ||y − si ||2 − 2σ 2 log πi
2 (6.26)
= arg max0≤i≤M −1 hy, si i − ||s2i || + σ 2 log πi
Interpretation of optimal decision rules: The ML rule can be interpreted in two ways.
The first is as a minimum distance rule, choosing the transmitted signal which has minimum
Euclidean distance to the noisy received signal. The second is as a “template matcher”: choosing
the transmitted signal with highest correlation with the noisy received signal, while adjusting
for the fact that the energies of different transmitted signals may be different. The MPE rule
adjusts the ML cost function to reflect prior information: the adjustment term depends on the
noise level and the prior probabilities. The MPE cost functions decompose neatly into a sum of
the ML cost function (which depends on the observation) and a term reflecting prior knowledge
(which depends on the prior probabilities and the noise level). The latter term scales with the
noise variance σ 2 . Thus, we rely more on the observation at high SNR (small σ), and more on
prior knowledge at low SNR (large σ).
Derivation of optimal receiver structures (6.25) and (6.26): Under hypothesis Hi , Y is
a Gaussian random vector with mean si and covariance matrix σ 2 I (the translation of the noise
vector N by the deterministic signal vector si does not change the covariance matrix), so that
1 ||y − si ||2
pY|i (y|Hi ) = exp(− ) (6.27)
(2πσ 2 )n/2 2σ 2
Plugging (6.27) into the ML rule (6.5, we obtain the rule (6.25) upon simplification. Similarly,
we obtain (6.26) by substituting (6.27) in the MPE rule (6.6).
We now map the optimal decision rules in discrete time back to continuous time to obtain optimal
detectors for the original continuous-time model (6.18), as follows.
308
Optimal Demodulation for Signaling in Continuous Time AWGN
ML rule
||si ||2
δM L (y) = arg max0≤i≤M −1 hy, si i − (6.28)
2
MPE rule
||si ||2
δM P E (y) = arg max0≤i≤M −1 hy, si i − + σ 2 log πi (6.29)
2
Derivation of optimal receiver structures (6.28) and (6.29): Due to the irrelevance of y ⊥ ,
the continuous time model (6.18) reduces to the discrete time model (6.24) by projecting onto
the signal space. It remains to map the optimal decision rules (6.25) and (6.26) for discrete time
observations, back to continuous time. These rules involve correlation between the received and
transmitted signals, and the transmitted signal energies. It suffices to show that these quantities
are the same for both the continuous time model and the equivalent discrete time model. We
know now that signal inner products are preserved, so that
||si ||2 = ||si ||2
Further, the continuous-time correlator output can be written as
hy, si i = hyS + y ⊥ , si i = hyS , si i + hy ⊥, si i
= hyS , si i = hy, si i
where the last equality follows because the inner product between the signals yS and si (which
both lie in the signal space) is the same as the inner product between their vector representations.
Why don’t we have a “minimum distance” rule in continuous time? Notice that the
optimal decision rules for the continuous time model do not contain the continuous time version
of the minimum distance rule for discrete time. This is because of a technical subtlety. In
continuous time, the squares of the distances would be
||y − si ||2 = ||yS − si ||2 + ||y ⊥||2 = ||yS − si ||2 + ||n⊥ ||2
Under the AWGN model, the noise power orthogonal to the signal space is infinite, hence from
a purely mathematical point of view, the preceding quantities are infinite for each i (so that we
cannot minimize over i). Hence, it only makes sense to talk about the minimum distance rule
in a finite-dimensional space in which the noise power is finite. The correlator based form of
the optimal detector, on the other hand, automatically achieves the projection onto the finite-
dimensional signal space, and hence does not suffer from this technical difficulty. Of course, in
practice, even the continuous time received signal may be limited to a finite-dimensional space by
filtering and time-limiting, but correlator-based detection still has the practical advantage that
only components of the received signal which are truly useful appear in the decision statistics.
Bank of Correlators or Matched Filters: The optimal receiver involves computation of the
decision statistics Z
hy, si i = y(t)si (t)dt
and can therefore be implemented using a bank of correlators, as shown in Figure 6.11. Of
course, any correlation operation can also be implemented using a matched filter, sampled at the
appropriate time. Defining si,mf (t) = si (−t) as the impulse response of the filter matched to si ,
we have Z Z
hy, si i = y(t)si (t)dt = y(t)si,mf (−t)dt = (y ∗ si,mf ) (0)
309
−
s 0 (t) a0
Choose
−
y(t) the Decision
s (t) a1
1
max
−
s (t) a M−1
M−1
Figure 6.11: The optimal receiver for an AWGN channel can be implemented using a bank of
correlators. For the ML rule, the constants ai = ||si ||2/2; for the MPE rule, ai = ||si ||2 /2 −
σ 2 log πi .
t=0
s 0 (−t)
−
a0
t=0
s (−t) Choose
1
−
y(t) the Decision
a1
max
t=0
s (−t)
M−1
−
a M−1
Figure 6.12: An alternative implementation for the optimal receiver using a bank of matched
filters. For the ML rule, the constants ai = ||si ||2 /2; for the MPE rule, ai = ||si ||2 /2 − σ 2 log πi .
310
Figure 6.12 shows an alternative implementation for the optimal receiver using a bank of matched
filters.
Decision
yp (t)
statistic
s (t)
p
yc (t)
LPF
−2 sin 2π fc t s s (t)
Figure 6.13: The passband correlations required by the optimal receiver can be implemented in
complex baseband. Since the I and Q components are lowpass waveforms, correlation with them
is an implicit form of lowpass filtering. Thus, the LPFs after the mixers could potentially be
eliminated, which is why they are shown within dashed boxes.
311
sj
si
ML decision boundary
is an (n−1) dimensional hyperplane
Figure 6.14: The ML decision boundary when testing between si and sj is the perpendicular
bisector of the line joining the signal points, which is an (n − 1)-dimensional hyperplane for an
n-dimensional signal space.
can visualize a plane containing the decision boundary coming out of the paper for a three-
dimensional signal space. While it is hard to visualize signal spaces of more than 3 dimensions,
the computation for deciding which side of the ML decision boundary the received vector y lies
on is straightforward: simply compare the Euclidean distances ||y − si || and ||y − sj ||.
s5
L15
L12
Γ1 L16
s2
s1 s6
L 14 s4
L 13
s3
The ML decision regions are constructed from drawing these pairwise decision regions. For any
given i, draw a line between si and sj for all j 6= i. The perpendicular bisector of the line between
si and sj defines two half-spaces (half-planes for n = 2), one in which we choose si over sj , the
other in which we choose sj over si . The intersection of the half-spaces in which si is chosen over
sj , for j 6= i, defines the decision region Γi . This procedure is illustrated for a two-dimensional
signal space in Figure 6.15. The line L1i is the perpendicular bisector of the line between s1 and
si . The intersection of these lines defines Γ1 as shown. Note that L16 plays no role in determining
Γ1 , since signal s6 is “too far” from s1 , in the following sense: if the received signal is closer to s6
than to s1 , then it is also closer to si than to s1 for some i = 2, 3, 4, 5. This kind of observation
plays an important role in the performance analysis of ML reception in Section 6.3.
312
QPSK 8PSK
16QAM
The preceding procedure can now be applied to the simpler scenario of two-dimensional constel-
lations to obtain ML decision regions as shown in Figure 6.16. For QPSK, the ML regions are
simply the four quadrants. For 8PSK, the ML regions are sectors of a circle. For 16QAM, the
ML regions take a rectangular form.
Now, let us apply the same reasoning to the decision boundary corresponding to making an
ML decision between two signals s0 and s1 , as shown in Figure 6.18. Suppose that s0 is sent.
313
Decision
boundary
N
Npar
Nperp
s D
Figure 6.17: Only the component of noise perpendicular to the decision boundary, Nperp , can
cause the received vector to cross the decision boundary, starting from the signal point s.
s1
Decision
boundary
N
Npar
Nperp
D
s0
d=|| s 1 − s 0 ||
Figure 6.18: When making an ML decision between s0 and s1 , the decision boundary is at
distance D = d/2 from each signal point, where d = ||s1 − s0 || is the Euclidean distance between
the two points.
314
What is the probability that the noise vector N, when added to it, sends the received vector into
the wrong region by crossing the decision boundary? We know from (6.31) that the answer is
Q(D/σ), where D is the distance between s0 and the decision boundary. For ML reception, the
decision boundary is the plane that is the perpendicular bisector of the line between s0 and s1 ,
whose length equals d = ||s1 − s0 ||, the Euclidean distance between the two signal vectors. Thus,
D = d/2 = ||s1 − s0 ||/2. Thus, the probability of crossing the ML decision boundary between
the two signal vectors (starting from either of the two signal points) is
||s1 − s0 || ||s1 − s0 ||
P [cross ML boundary between s0 and s1 ] = Q =Q (6.32)
2σ 2σ
where we note that the Euclidean distance between the signal vectors and the corresponding
continuous time signals is the same.
Notation: Now that we have established the equivalence between working with continuous time
signals and the vectors that represent their projections onto signal space, we no longer need to
be careful about distinguishing between them. Accordingly, we drop the use of boldface notation
henceforth, using the notation y, si and n to denote the received signal, the transmitted signal,
and the noise, respectively, in both settings.
Geometric computation of error probability: The ML decision boundary for this problem
is as in Figure 6.18. The conditional error probability is simply the probability that, starting from
one of the signal points, the noise makes us cross the boundary to the wrong side, the probability
of which we have already computed in (6.32). Since the conditional error probabilities are equal,
they also equal the average error probability regardless of the priors. We therefore obtain the
following expression.
Error probability for binary signaling with ML reception
||s1 − s0 || d
Pe,M L = Pe|1 = Pe|0 = Q =Q (6.34)
2σ 2σ
where d = ||s1 − s0 || is the distance between the two possible received signals.
Algebraic computation: While this geometric computation is intuitively pleasing, it is impor-
tant to also master algebraic approaches to computing the probabilities of errors due to WGN.
It is easiest to first consider on-off keying.
H1
> ||s||2
hy, si (6.36)
< 2
H0
315
Setting Z = hy, si, we wish to compute the conditional error probabilities given by
||s||2 ||s||2
Pe|1 = P [Z < |H1 ] Pe|0 = P [Z > |H0] (6.37)
2 2
We have actually already done these computations in Example 5.8.2, but it pays to review them
quickly. Note that, conditioned on either hypothesis, Z is a Gaussian random variable. The
conditional mean and variance of Z under H0 are given by
where we have used Theorem 6.2.1, and the fact that n(t) has zero mean. The corresponding
computation under H1 is as follows:
noting that covariances do not change upon adding constants. Thus, Z ∼ N(0, v 2 ) under H0 and
Z ∼ N(m, v 2 ) under H1 , where m = ||s||2 and v 2 = σ 2 ||s||2. Substituting in (6.37), it is easy to
check that
||s||
Pe|1 = Pe|0 = Q (6.38)
2σ
Going back to the more general binary signaling problem (6.33), the ML rule is given by (6.28)
to be
H1
||s1 ||2 > ||s0 ||2
hy, s1i − hy, s0i −
2 < 2
H0
We can analyze this system by considering the joint distribution of the correlator statistics hy, s1i
and hy, s0i, which are jointly Gaussian conditioned on each hypothesis. However, it is simpler
and more illuminating to rewrite the ML decision rule as
H1
> ||s1||2 ||s0 ||2
hy, s1 − s0 i −
< 2 2
H0
This is consistent with the geometry depicted in Figure 6.18: only the projection of the received
signal along the line joining the signals matters in the decision, and hence only the noise along
this direction can produce errors. The analysis now involves the conditional distributions of the
single decision statistic Z = hy, s1 − s0 i, which is conditionally Gaussian under either hypothesis.
The computation of the conditional error probabilties is left as an exercise, but we already know
that the answer should work out to (6.34).
A quicker approach is to consider a transformed system with received signal ỹ(t) = y(t) − s0 (t).
Since this transformation is invertible, the performance of an optimal rule is unchanged under
it. But the transformed received signal ỹ(t) falls under the on-off signaling model (6.35), with
s(t) = s1 (t) − s0 (t). The ML error probability formula (6.34) therefore follows from the formula
(6.38).
Scale Invariance: The formula (6.34) illustrates that the performance of the ML rule is scale-
invariant: if we scale the signals and noise by the same factor α, the performance does not
316
change, since both ||s1 − s0 || and σ scale by α. Thus, the performance is determined by the ratio
of signal and noise strengths, rather than individually on the signal and noise strengths. We now
define some standard measures for these quantities, and then express the performance of some
common binary signaling schemes in terms of them.
Energy per bit, Eb : For binary signaling, this is given by
1
Eb = (||s0||2 + ||s1 ||2 )
2
assuming that 0 and 1 are equally likely to be sent.
Scale-invariant parameters: If we scale up both s1 and s0 by a factor A, Eb scales up by a
factor A2 , while the distance d scales up by a factor A. We can therefore define the scale-invariant
parameter
d2
ηP = (6.39)
Eb
√ p
Now, substituting, d = ηP Eb and σ = N0 /2 into (6.34), we obtain that the ML performance
is given by s r
r ! !
ηP Eb d 2 Eb
Pe,M L = Q =Q (6.40)
2N0 Eb 2N0
1 s1
s0 s1 s1 s0 0 s0
0 1 −1 0 1 0 1
Figure 6.19: Signal space representations with conveniently chosen scaling for three binary sig-
naling schemes.
On-off keying: Here s1 (t) = s(t) and s0 (t) = 0. As shown in Figure 6.19, the signal space is
one-dimensional. For the scaling in the figure, we have d = 1 and Eb = 21 (12 + 02 ) = 12 , so that
2
q
Eb
ηP = Ed b = 2. Substituting into (6.40), we obtain Pe,M L = Q N0
.
317
Antipodal signaling: Here s1 (t) = −s0 (t), leading again to a one-dimensional signal space
representation. One possible realization of antipodal signaling is BPSK, discussed in the previous
2
chapter. For the scaling chosen, d = 2 and Eb = 21 (12 + (−1)2 ) = 1, which gives ηP = Ed b = 4.
q
2Eb
Substituting into (6.40), we obtain Pe,M L = Q N0
.
Equal-energy, orthogonal signaling: Here s1 and s0 are orthogonal, with ||s1 ||2 = ||s0 ||2 .
This is a two-dimensional signal space. As discussed in the previous chapter, possible realizations
of orthogonal signaling include FSK and Walsh-Hadamard codes. q From Figure 6.19, we have
√ d2 Eb
d = 2 and Eb = 1, so that ηP = Eb = 2. This gives Pe,M L = Q N0
.
Thus, on-off keying (which is orthogonal signaling with unequal energies) and equal-energy or-
thogonal signaling have the same power efficiency, while the power efficiency of antipodal signaling
is a factor of two (i.e., 3 dB) better.
In plots of error probability versus SNR, we typically express error probability on a log scale (in
order to capture its rapid decay with SNR) and to express SNR in decibels (in order to span a
large range). We provide such a plot for antipodal and orthogonal signaling in Figure 6.20.
0
10
−1
10
−2
10
Probability of error (log scale)
(Orthogonal)
−3
10 FSK/OOK
(Antipodal)
−4
10 BPSK
−5
10
−6
10
−7
10
−8
10
0 2 4 6 8 10 12 14 16 18 20
E /N (dB)
b o
Figure 6.20: Error probability versus Eb /N0 (dB) for binary antipodal and orthogonal signaling
schemes.
318
Before doing detailed computations, let us discuss some general properties that greatly simplify
the framework for performance analysis.
Scale Invariance: For binary signaling, we have observed through explicit computation of
the error probability that performance depends only on signal-to-noise ratio (Eb /N0 ) and the
geometry of the signal set (which determines the power efficiency d2 /Eb ). Actually, we can make
such statements in great generality for M-ary signaling without explicit computations. First, let
us note that the performance of an optimal receiver does not change if we scale both signal and
noise by the same factor. Specifically, optimal reception for the model
does not depend on A. This is inferred from the following general observation: the performance
of an optimal receiver is unchanged when we pass the observation through an invertible transfor-
mation. Specifically, suppose z(t) = F (y(t)) is obtained by passing y(t) through an invertible
transformation F . If the optimal receiver for z does better than the optimal receiver for y, then
we could apply F to y to get z, then do optimal reception for z. This would perform better
than the optimal receiver for y, which is a contradiction. Similarly, if the optimal receiver for y
does better than the optimal receiver for z, then we could apply F −1 to z to get y, and then do
optimal reception for y to perform better than the optimal receiver for z, again a contradiction.
if the optimal receiver for y does better than the optimal receiver for f (y).
The preceding argument implies that performance depends only on the signal-to-noise ratio,
once we have fixed the signal constellation. Let us now figure out what properties of the signal
constellation are relevant in determining performance For M = 2, we have seen that all that
matters is the scale-invariant quantity d2 /Eb . What are the analogous quantities for M > 2? To
determine these, let us consider the conditional error probabilities for the ML rule.
Conditional error probability: The conditional error probability, conditioned on Hi , is given
by
Pe|i = P [y ∈
/ Γi |i sent] = P [Zi < Zj for some j 6= i|i sent] (6.43)
While computation of the conditional error probability in closed form is typically not feasible,
we can actually get significant insight on what parameters it depends on by examining the
conditional distributions of the decision statistics. Since y = si + n conditioned on Hi , the
decision statistics are given by
Zj = hy, sj i − ||sj ||2 /2 = hsi + n, sj i − ||sj ||2/2 = hn, sj i + hsi , sj i − ||sj ||2/2 , 0 ≤ j ≤ M − 1
By the Gaussianity of n(t), the decision statistics {Zj } are jointly Gaussian (conditioned on Hi ).
Their joint distribution is therefore completely characterized by their means and covariances.
Since the noise is zero mean, we obtain
Using Theorem 6.2.1, and noting that covariance is unaffected by translation, we obtain that
Thus, conditioned on Hi , the joint distribution of {Zj } depends only on the noise variance σ 2
and the signal inner products {hsi , sj i, 1 ≤ i, j ≤ M}. Now that we know the joint distribution,
we can in principle compute the conditional error probabilities Pe|i . In practice, this is often
difficult, and we often resort to Monte Carlo simulations. However, what we have found out
about the joint distribution can now be used to refine our concepts of scale-invariance.
Performance only depends on normalized inner products: Let us replace Zj by Zj /σ 2 .
Clearly, since we are simply picking the maximum among the decision statistics, scaling by a
319
common factor does not change the decision (and hence the performance). However, we now
obtain that
Zj hsi , sj i
E[ 2 |Hi ] =
σ σ2
and
Zj Zk 1 hsj , sk i
cov 2
, 2 |Hi = 4 cov(Zj , Zk |Hi ) =
σ σ σ σ2
Thus, the joint distribution of the normalized decision statistics {Zj /σ 2 }, conditioned on any
hs ,s i
of the hypotheses, depends only on the normalized inner products { σi 2 j , 1 ≤ i, j ≤ M}. Of
course, this means that the performance also depends only on these normalized inner products.
Let us now carry these arguments further, still without any explicit computations. We define
energy per symbol and energy per bit for M-ary signaling as follows.
Energy per symbol, Es : For M-ary signaling with equal priors, the energy per symbol Es is
given by
M
1 X
Es = ||si ||2
M i=1
Energy per bit, Eb : Since M-ary signaling conveys log2 M bits/symbol, the energy per bit is
given by
Es
Eb =
log2 M
If all signals in a M-ary constellation are scaled up by a factor A, then Es and Eb get scaled
up by A2 , as do all inner products {hsi , sj i}. Thus, we can define scale-invariant inner products
hs ,s i}
{ iEbj which depend only on the shape of the signal constellation. Indeed, we can define the
shape of a constellation as these scale-invariant inner products. Setting σ 2 = N0 /2, we can now
write the normalized inner products determining performance as follows:
hsi , sj i hsi , sj i 2Eb
2
= (6.44)
σ Eb N0
We can now make the following statement.
Performance depends only on Eb /N0 and constellation shape (as specified by the
scale-invariant inner products): We have shown that the performance depends only on the
hs ,s i
normalized inner products { iσ2 j }. Fromn(6.44),
o we see that these in turn depend only on Eb /N0
hs ,s i
i j
and the scale-invariant inner products Eb
. The latter depend only on the shape of the
signal constellation, and are completely independent of the signal and noise strengths. What
this means is that we can choose any convenient scaling that we want for the signal constellation
when investigating its performance, as long as we keep track of the signal-to-noise ratio. We
illustrate this via an example where we determine the error probability by simulation.
320
Typically, Eb /N0 is specified in dB, so we need to convert it to the “raw” Eb /N0 . We now have
a simulation consisting of the following steps, repeated over multiple symbol transmissions:
Step 1: Choose a symbol s at random from A. For this symmetric constellation, we can actually
keep sending the same symbol in order to compute the performance of the ML rule, since the
conditional error probabilities are all equal. For example, set s = (1, 0)T .
Step 2: Generate two i.i.d. N(0, 1) random variables Uc and Us . The I and Q noises can now be
set as Nc = σUc and Ns = σUs , so that N = (Nc , Ns )T .
Step 3: Set the received vector y = s + N.
Step 4: Compute the ML decision arg maxi hy, si i (the energy terms can be dropped, since the
signals are of equal energy) or arg mini ||y − si ||2 .
Step 5: If there is an error, increment the error count.
The error probability is estimated as the error count, divided by the number of symbols trans-
mitted. We repeat this simulation over a range of Eb /N0 , and typically plot the error probability
on a log scale versus Eb /N0 in dB.
These steps are carried out in the following code fragment, which generates Figure 6.21 comparing
a simulation-based estimate of the error probability for 8PSK against the intelligent union bound,
an analytical estimate that we develop shortly. The analytical estimate requires very little
computation (evaluation of a single Q function), but its agreement with simulations is excellent.
As we shall see, developing such analytical estimates also gives us insight into how errors are
most likely to occur for M-ary signaling in AWGN.
The code fragment is written for transparency rather than computational efficiency. The code
contains an outer for-loop for varying SNR, and an inner for-loop for computing minimum dis-
tances for the symbols sent at each SNR. The inner loop can be avoided and the program sped up
considerably by computing all minimum distances for all symbols at once using matrix operations
(try it!). We use a less efficient program here to make the operations easy to understand.
0
10
Simulation
Intelligent Union Bound
Symbol error probability
−1
10
−2
10
−3
10
0 1 2 3 4 5 6 7 8 9 10
Eb/N0 (dB)
321
ebnodb = 0:0.1:10;
number_snrs = length(ebnodb);
perr_estimate = zeros(number_snrs,1);
for k=1:number_snrs, %SNR for loop
ebnodb_now = ebnodb(k);
ebno=10^(ebnodb_now/10);
sigma=sqrt(1/(6*ebno));
%send first symbol without loss of generality, add 2d Gaussian noise
received = 1 + sigma*randn(nsymbols,1)+j*sigma*randn(nsymbols,1);
decisions=zeros(nsymbols,1);
for n=1:nsymbols, %Symbol for loop (can/should be avoided for fast implementation)
distances = abs(received(n)-constellation);
[min_dist,decisions(n)] = min(distances);
end
errors = (decisions ~= 1);
perr_estimate(k) = sum(errors)/nsymbols;
end
semilogy(ebnodb,perr_estimate);
hold on;
%COMPARE WITH INTELLIGENT UNION BOUND
etaP = 6-3*sqrt(2); %power efficiency
Ndmin = 2;% number of nearest neighbors
ebno = 10.^(ebnodb/10);
perr_union = Ndmin*q_function(sqrt(etaP*ebno/2));
semilogy(ebnodb,perr_union,’:r’);
xlabel(’Eb/N0 (dB)’);
ylabel(’Symbol error probability’);
legend(’Simulation’,’Intelligent Union Bound’,’Location’,’NorthEast’);
Ns
s1 s0
Nc
d
s3 s2
Figure 6.22: If s0 is sent, an error occurs if Nc or Ns is negative enough to make the received
vector fall out of the first quadrant.
Exact analysis for QPSK: Let us find Pe|0 , the conditional error probability for the ML rule
322
conditioned on s0 being sent. For the scaling shown in Figure 6.22,
d
s0 = 2
d
2
323
s1
s4
s2
N2 N1
s0
N3
Γ
0
s3
Figure 6.23: The noise random variables N1 , N2 , N3 which can drive the received vector outside
the decision region Γ0 are correlated, which makes it difficult to find an exact expression for Pe|0
.
Applying (6.49) to (6.48), we obtain that, for the scenario depicted in Figure 6.23, the conditional
error probability can be upper bounded as follows:
Pe|0 ≤ P [N1 > ||s1 − s0||/2] + P[N2 >||s2 − s 0 ||/2] + P [N3 > ||s3 − s0 ||/2]
(6.50)
= Q ||s12σ−s0 ||
+ Q ||s22σ−s0 ||
+ Q ||s32σ
−s0 ||
324
Thus, the conditional error probability is upper bounded by a sum of probabilities, each of which
corresponds to the error probability for a binary decision: s0 versus s1 , s0 versus s2 , and s0 versus
s3 . This approach applies in great generality, as we show next.
Union Bound and variants: Pictures such as the one in Figure 6.23 typically cannot be
drawn when the signal space dimension is high. However, we can still find union bounds on error
probabilities, as long as we can enumerate all the signals in the constellation. To do this, let us
rewrite (6.43), the conditional error probability, conditioned on Hi , as a union of M − 1 events
as follows:
Pe|i = P [∪j6=i {Zi < Zj }|i sent]]
where {Zj } are the decision statistics. Using the union bound (6.49), we obtain
X
Pe|i ≤ P [Zi < Zj |i sent]] (6.51)
j6=i
But the jth term on the right-hand side above is simply the error probability of ML reception
for binary hypothesis testing between the signals si and sj . From the results of Section 6.3.2, we
therefore obtain the following pairwise error probability:
||sj − si ||
P [Zi < Zj |i sent]] = Q
2σ
Substituting into (6.51), we obtain upper bounds on the conditional error probabilities and the
average error probability as follows.
Union Bound on conditional error probabilities: The conditional error proba-
bilities for the ML rule are bounded as
X ||sj − si || X dij
Pe|i ≤ Q = Q (6.52)
j6=i
2σ j6=i
2σ
We can now rewrite the union bound in terms of Eb /N0 and the scale-invariant squared
d2ij
distances Eb
as follows:
s
d2ij
r
X Eb
Pe|i ≤ Q (6.54)
j6=i
Eb 2N0
s
d2ij
r
X X X Eb
Pe = πi Pe|i ≤ πi Q (6.55)
i i j6=i
Eb 2N0
325
Notice that this answer is different from the one we had in (6.50). This is because the fourth
term corresponds to the signal s4 , which is “too far away” from s0 to play a role in determining
the decision region Γ0 . Thus, when we do have a more detailed geometric understanding of the
decision regions, we can do better than the generic union bound (6.52) and get a tighter bound,
as in (6.50). We term this the intelligent union bound, and give a general formulation in the
following.
Denote by Nml (i) the indices of the set of neighbors of signal si (we exclude i from Nml (i) by
definition) that characterize the ML decision region Γi . That is, the half-planes that we intersect
to obtain Γi correspond to the perpendicular bisectors of lines joining si and sj , j ∈ Nml (i). For
example, in Figure 6.23, Nml (0) = {1, 2, 3}; s4 is excluded from this set, since it does not play a
role in determining Γ0 . The decision region in (6.41) can now be expressed as
We can now say the following: y falls outside Γi if and only if Zi < Zj for some j ∈ Nml (i). We
can therefore write
Pe|i = P [y ∈
/ Γi |i sent] = P [Zi < Zj for some j ∈ Nml (i)|i sent] (6.57)
and from there, following the same steps as in the union bound, get a tighter bound, which we
express as follows.
Intelligent Union Bound: A better bound on Pe|i is obtained by considering only
the neighbors of si that determine its ML decision region, as follows:
X ||sj − si ||
Pe|i ≤ Q (6.58)
2σ
j ∈ Nml (i)
(the bound on the average error probability Pe is computed as before by averaging the
bounds on Pe|i using the priors).
Union Bound for QPSK: For QPSK, we infer from Figure 6.22 that the union bound for Pe|1
is given by
√ !
d01 d02 d03 d 2d
Pe = Pe|0 ≤ Q +Q +Q = 2Q +Q
2σ 2σ 2σ 2σ 2σ
d2
Using Eb
= 4, we obtain the union bound in terms of Eb /N0 to be
r ! r !
2Eb 4Eb
Pe ≤ 2Q +Q QPSK union bound (6.60)
N0 N0
For moderately large Eb /N0 , the dominant term in terms of the decay of the error probability is
the first one, since Q(x) falls off rapidly as x gets large. Thus, while the union bound (6.60) is
326
larger than the exact error probability (6.47), as it must be, it gets the multiplicity and argument
of the dominant term right. Tightening the analysis using the intelligent union bound, we get
r !
d01 d02 2Eb
Pe|0 ≤ Q +Q = 2Q QPSK intelligent union bound (6.61)
2σ 2σ N0
since Nml (0) = {1, 2} (the decision region for s0 is determined by the neighbors s1 and s2 ).
Another common approach for getting a better (and quicker to compute) estimate than the
original union bound is the nearest neighbors approximation. This is a loose term employed to
describe a number of different methods for pruning the terms in the summation (6.52). Most
commonly, it refers to regular signal sets in which each signal point has a number of nearest
neighbors at distance dmin from it, where dmin = mini6=j ||si − sj ||. Letting Ndmin (i) denote the
number of nearest neighbors of si , we obtain the following approximation.
Nearest Neighbors Approximation
dmin
Pe|i ≈ Ndmin (i)Q (6.62)
2σ
Averaging over i, we obtain that
dmin
Pe ≈ N̄dmin Q (6.63)
2σ
where N̄dmin denotes the average number of nearest neighbors for a signal point. The rationale
2
for the nearest neighbors approximation is that, since Q(x) decays rapidly, Q(x) ∼ e−x /2 , as
x gets large, the terms in the union bound corresponding to the smallest arguments for the Q
function dominate at high SNR.
The corresponding formulas as a function of scale-invariant quantities and Eb /N0 are:
s r
2
dmin Eb
Pe|i ≈ Ndmin (i)Q (6.64)
Eb 2N0
It is also worth explicitly writing down an expression for the average error probability, averaging
the preceding over i: s
2
r
dmin Eb
Pe ≈ N̄dmin Q (6.65)
Eb 2N0
where
M
1 X
N̄dmin = Nd (i)
M i=1 min
is the average number of nearest neighbors for the signal points in the constellation.
For QPSK, we have from Figure 6.22 that
327
yielding !
r
2Eb
Pe ≈ 2Q
N0
In this case, the nearest neighbors approximation coincides with the intelligent union bound
(6.61). This happens because the ML decision region for each signal point is determined by its
nearest neighbors for QPSK. Indeed, the latter property holds for many regular constellations,
including all of the PSK and QAM constellations whose ML decision regions are depicted in
Figure 6.16.
Power Efficiency: While exact performance analysis for M-ary signaling can be computation-
ally demanding, we have now obtained simple enough estimates that we can define concepts such
as power efficiency, analogous to the development for binary signaling. In particular, comparing
the nearest neighbors approximation (6.63) with the error probability for binary signaling (6.40),
we define in analogy the power efficiency of an M-ary signaling scheme as
d2min
ηP = (6.66)
Eb
Since the argument of the Q function in (6.67) plays a bigger role than the multiplicity N̄dmin for
moderately large SNR, ηP offers a means of quickly comparing the power efficiency of different
signaling constellations, as well as for determining the dependence of performance on Eb /N0 .
1
−3 −1 1 3
−1
−3
Figure 6.24: ML decision regions for 16QAM with scaling chosen for convenience in computing
power efficiency.
Performance analysis for 16QAM: We now apply the preceding performance analysis to the
16QAM constellation depicted in Figure 6.24, where we have chosen a convenient scale for the
constellation. We now compute the nearest neighbors approximation, which coincides with the
intelligent union bound, since the ML decision regions are determined by the nearest neighbors.
Noting that the number of nearest neighbors is four for the four innermost signal points, two for
the four outermost signal points, and three for the remaining eight signal points, we obtain upon
averaging
N̄dmin = 3 (6.68)
It remains to compute the power efficiency ηP and apply (6.67). We had done this in the preview
in Chapter 4, but we repeat it here. For the scaling shown, we have dmin = 2. The energy per
328
symbol is obtained as follows:
by symmetry. Since the I component is equally likely to take the four values ±1 and ±3, we have
1
average energy of I component = (12 + 32 ) = 5
2
and
Es = 10
We therefore obtain
Es 10 5
Eb = = =
log2 M log2 16 2
The power efficiency is therefore given by
d2min 22 8
ηP = = 5 = (6.69)
Eb 2
5
as the nearest neighbors approximation and intelligent union bound for 16QAM. The bandwidth
efficiency for 16QAM is 4 bits/2 dimensions, which is twice that of QPSK, whose bandwidth
efficiency is 2 bits/2 dimensions. It is not surprising, therefore, that the power efficiency of
16QAM (ηP = 1.6) is smaller than that of QPSK (ηP = 4). We often encounter such tradeoffs
between power and bandwidth efficiency in the design of communication systems, including when
the signaling waveforms considered are sophisticated codes that are constructed from multiple
symbols drawn from constellations such as PSK and QAM.
0
10
−1
10
−2
10
Symbol Error Probability
−3
10
−4
10
−5
10
−6
10
−7 QPSK (IUB)
10 QPSK (exact)
16QAM (IUB)
16QAM (exact)
−8
10
0 2 4 6 8 10 12
Eb/N0 (dB)
Figure 6.25 shows the symbol error probabilities for QPSK and 16QAM, comparing the intelligent
union bounds (which coincide with nearest neighbors approximations) with exact results. The
329
exact computations for 16QAM use the closed form expression (6.70) derived in Problem 6.21. We
see that the exact error probability and intelligent union bound are virtually indistinguishable.
The power efficiencies of the constellations (which depend on the argument of the Q function)
accurately predict the distance between the curves: ηηPP(16QAM
(QP SK)
)
4
= 1.6 , which equals about 4 dB.
From Figure 6.25, we see that the distance between the QPSK and 16QAM curves at small error
probabilities (high SNR) is indeed about 4 dB.
1 s0
s1
Decision boundary
The performance analysis techniques developed here can also be applied to suboptimal receivers.
Suppose, for example, that the receiver LO in a BPSK system is offset from the incoming carrier
by a phase shift θ, but that the receiver uses decision regions corresponding to no phase offset.
The signal space picture is now as in Figure 6.26. The error probability is now given by
s !
D 2 2Eb
D
Pe = Pe|0 = Pe|1 = Q =Q
σ Eb N0
so that there is a loss of 10 log10 cos2 θ dB in performance due to the phase offset (e.g. θ = 10◦
leads to a loss of 0.13 dB, while θ = 30◦ leads to a loss of 1.25 dB).
330
Let us first quickly derive the union bound. Without loss of generality, take the M orthogonal
signals as unit vectors along the M axes in our signal space. With this scaling, we have ||si ||2 ≡ 1,
so that Es = 1 and Eb = log1 M . Since the signals are orthogonal, the squared distance between
2
any two signals is
d2ij = ||si − sj ||2 = ||si ||2 + ||sj ||2 − 2hsi , sj i = 2Es = 2 , i 6= j
Thus, dmin ≡ dij (i 6= j) and the power efficiency
d2min
ηP = = 2 log2 M
Eb
The union bound, intelligent union bound and nearest neighbors approximation all coincide, and
we get s
r
2
X dij dmin Eb
Pe ≡ Pe|i ≤ Q = (M − 1)Q
j6=i
2σ Eb 2N0
Exact expressions: By symmetry, the error probability equals the conditional error probability,
conditioned on any one of the hypotheses; similarly, the probability of correct decision equals
the probability of correct decision given any of the hypothesis. Let us therefore condition on
hypothesis H0 (i.e., that s0 is sent), so that the received signal y = s0 + n. The decision statistics
Zi = hs0 + n, si i = Es δ0i + Ni , i = 0, 1, ..., M − 1
where {Ni = hn, si i} are jointly Gaussian, zero mean, with
cov(Ni , Nj ) = σ 2 hsi , sj i = σ 2 Es δij
Thus, Ni ∼ N(0, σ 2 Es ) are i.i.d. We therefore infer that, conditioned on s0 sent, the {Zi } are
conditionally independent, with Z0 ∼ N(Es , σ 2 Es ), and Zi ∼ N(0, σ 2 Es ) for i = 1, ..., M − 1.
Zi
Let us now express the decision statistics in scale-invariant terms, by replacing Zi by √
σ Es
. This
gives Z0 ∼ N(m, 1), Z1 , ..., ZM −1 ∼ N(0, 1), conditionally independent, where
r
Es Es p p
m= √ = = 2E s /N0 = 2Eb log2 M/N0
σ Es σ2
The conditional probability of correct reception is now given by
R
Pc|0 = RP [Z1 ≤ Z0 , ..., ZM −1 ≤ Z0 |H0 ] = P [Z1 ≤ x, ..., ZM −1 ≤ x|Z0 = x, H0 ]pZ0 |H0 (x|H0 )dx
= P [Z1 ≤ x|H0 ]...P [ZM −1 ≤ x|H0 ]pZ0 |H0 (x|H0 )dx
where we have used the conditional independence of the {Zi }. Plugging in the conditional
distributions, we get the following expression for the probability of correct reception.
Probability of correct reception for M-ary orthogonal signaling
R∞ 2
Pc = Pc|i = −∞ [Φ(x)]M −1 √12π e−(x−m) /2 dx (6.72)
p p
where m = 2Es /N0 = 2Eb log2 M/N0 .
331
The probability of error is, of course, one minus the preceding expression. But for small error
probabilities, the probability of correct reception is close to one, and it is difficult to get good
estimates of the error probability using (6.72). We therefore develop an expression for the error
probability that can be directly computed, as follows:
X
Pe|0 = P [Zj = maxi Zi |H0 ] = (M − 1)P [Z1 = maxi Zi |H0]
j6=0
0
10
−1
10
Probability of symbol error (log scale)
−2
10 M =16
−3
10
M=2
−4
10
M=4
−5
10 M=8
−6
10
−5 −1.6 0 5 10 15 20
Eb/No(dB)
Asymptotics for large M: The error probability for M-ary orthogonal signaling exhibits an
interesting thresholding effect as M gets large:
Eb
0, N > ln 2
lim Pe = Eb
0
(6.74)
M →∞ 1, N0 < ln 2
That is, by letting M get large, we can get arbitrarily reliable performance as long as Eb /N0
exceeds -1.6 dB (ln 2 expressed in dB). This result is derived in one of the problems. Actually, we
can show using the tools of information theory that this is the best we can do over the AWGN
channel in the limit of bandwidth efficiency tending to zero. That is, M-ary orthogonal signaling
is asymptotically optimum in terms of power efficiency.
Figure 6.27 shows the probability of symbol error as a function of Eb /N0 for several values of M.
We see that the performance is quite far away from the asymptotic limit of -1.6 dB (also marked
on the plot) for the moderate values of M considered. For example, the Eb /N0 required for
achieving an error probability of 10−6 for M = 16 is more than 9 dB away from the asymptotic
limit.
332
6.4 Bit Error Probability
We now know how to design rules for deciding which of M signals (or symbols) has been sent,
and how to estimate the performance of these decision rules. Sending one of M signals conveys
m = log2 M bits, so that a hard decision on one of these signals actually corresponds to hard
decisions on m bits. In this section, we discuss how to estimate the bit error probability, or the
bit error rate (BER), as it is often called.
Ns
10 00 Nc
11 01
QPSK with Gray coding: We begin with the example of QPSK, with the bit mapping shown
in Figure 6.28. This bit mapping is an example of a Gray code, in which the bits corresponding
to neighboring symbols differ by exactly one bit (since symbol errors are most likely going to
occur by decoding into neighboring decision regions, this reduces the number of bit errors). Let
us denote the symbol labels as b[1]b[2] for the transmitted symbol, where b[1] and b[2] each take
values 0 and 1. Letting b̂[1]b̂[2] denote the label for the ML symbol decision, the probabilities of
bit error are given by p1 = P [b̂[1] 6= b[1]] and p2 = P [b̂[2] 6= b[2]]. The average probability of bit
error, which we wish to estimate, is given by pb = 12 (p1 + p2 ). Conditioned on 00 being sent, the
probability of making an error on b[1] is as follows:
r !
d d 2Eb
P [b̂[1] = 1|00 sent] = P [ML decision is 10 or 11|00 sent] = P [Nc < − ] = Q( ) = Q
2 2σ N0
where, as before, we have expressed the result in terms of Eb /N0 using the power efficiency
d2
Eb
= 4. We also note, by the symmetry of the constellation and the bit map, that the conditional
probability of error of b[1] is the same, regardless of which symbol we condition on. Moreover,
exactly the same analysis holds for b[2], except that errors are caused by the noise random
variable Ns . We therefore obtain that
r !
2Eb
pb = p1 = p2 = Q (6.75)
N0
The fact that this expression is identical to the bit error probability for binary antipodal signaling
is not a coincidence. QPSK with Gray coding can be thought of as two independent BPSK
systems, one signaling along the I component, and the other along the Q component.
Gray coding is particularly useful at low SNR (e.g., for heavily coded systems), where symbol
errors happen more often. For example, in a coded system, we would pass up fewer bit errors to
the decoder for the same number of symbol errors. We define it in general as follows.
333
Gray Coding: Consider a 2n -ary constellation in which each point is represented by a binary
string b = (b1 , ..., bn ). The bit assigment is said to be Gray coded if, for any two constellation
points b and b′ which are nearest neighbors, the bit representations b and b′ differ in exactly
one bit location.
Nearest neighbors approximation for BER with Gray coded constellation: Consider
the ith bit bi in an n-bit Gray code for a regular constellation with minimum distance dmin . For
a Gray code, there is at most one nearest neighbor which differs in the ith bit, and the pairwise
error probability of decoding to that neighbor is Q dmin 2σ
. We therefore have
r !
ηP Eb
P (bit error) ≈ Q with Gray coding (6.76)
2N0
d2min
where ηP = Eb
is the power efficiency.
0
10
−2
10
Probability of bit error (BER) (log scale)
16PSK
−4
10
−6
10
−8 16QAM
10
−10
10 Exact
Nearest Neighbor Approximation
−12
10
−5 0 5 10 15 20
Eb/No(dB)
Figure 6.29: BER for 16QAM and 16PSK with Gray coding.
Figure 6.29 shows the BER of 16QAM and 16PSK with Gray coding, comparing the nearest
neighbors approximation with exact results (obtained analytically for 16QAM, and by simulation
for 16PSK). The slight pessimism and ease of computation of the nearest neighbors approximation
implies that it is an excellent tool for link design.
Gray coding may not always be possible. Indeed, for an arbitrary set of M = 2n signals, we may
not understand the geometry well enough to assign a Gray code. In general, a necessary (but
not sufficient) condition for an n-bit Gray code to exist is that the number of nearest neighbors
for any signal point should be at most n.
BER for orthogonal modulation: For M = 2m -ary equal energy, orthogonal modulation,
each of the m bits split the signal set into half. By the symmetric geometry of the signal set,
any of the M − 1 wrong symbols are equally likely to be chosen, given a symbol error, and M2 of
these will correspond to error in a given bit. We therefore have
M
2
P (bit error) = P (symbol error), BER for M − ary orthogonal signaling (6.77)
M −1
Note that Gray coding is out of the question here, since there are only m bits and 2m − 1
neighbors, all at the same distance.
334
6.5 Link Budget Analysis
We have seen now that performance over the AWGN channel depends only on constellation ge-
ometry and Eb /N0 . In order to design a communication link, however, we must relate Eb /N0 to
physical parameters such as transmit power, transmit and receive antenna gains, range and the
quality of the receiver circuitry. Let us first take stock of what we know:
(a) Given the bit rate Rb and the signal constellation, we know the symbol rate (or more gen-
erally, the number of modulation degrees of freedom required per unit time), and hence the
minimum Nyquist bandwidth Bmin . We can then factor in the excess bandwidth a dictated
by implementation considerations to find the bandwidth B = (1 + a)Bmin required. (However,
assuming optimal receiver processing, we show below that the excess bandwidth does not affect
the link budget.)
(b) Given the constellation and a desired bit error probability, we can infer the Eb /N0 we need
to operate at. Since the SNR satisfies SNR = ENb0RBb , we have
Eb Rb
SNRreqd = (6.78)
N0 reqd B
(c) Given the receiver noise figure F (dB), we can infer the noise power Pn = N0 B = N0,nom 10F/10 B,
and hence the minimum required received signal power is given by
Eb Rb Eb
PRX (min) = SNRreqd Pn = N0 B = Rb N0,nom 10F/10 (6.79)
N0 reqd B N0 reqd
This is called the required receiver sensitivity, and is usually quoted in dBm, as PRX,dBm (min) =
10 log10 PRX (min)(mW). Using (5.93), we obtain that
Eb
PRX,dBm (min) = + 10 log10 Rb − 174 + F (6.80)
N0 reqd,dB
where Rb is in bits per second. Note that dependence on bandwidth B (and hence on excess
bandwidth) cancels out in (6.79), so that the final expression for receiver sensitivity depends
only on the required Eb /N0 (which depends on the signaling scheme and target BER), the bit
rate Rb , and the noise figure F .
Once we know the receiver sensitivity, we need to determine the link parameters (e.g., transmitted
power, choice of antennas, range) such that the receiver actually gets at least that much power,
plus a link margin (typically expressed in dB). We illustrate such considerations via the Friis
formula for propagation loss in free space, which we can think of as modeling a line-of-sight
wireless link. While deriving this formula from basic electromagnetics is beyond our scope here,
let us provide some intuition before stating it.
Suppose that a transmitter emits power PT X that radiates uniformly in all directions. The power
PT X
per unit area at a distance R from the transmitter is 4πR 2 , where we have divided by the area
of a sphere of radius R. The receive antenna may be thought of as providing an effective area,
termed the antenna aperture, for catching a portion of this power. (The aperture of an antenna
is related to its size, but the relation is not usually straightforward.) If we denote the receive
antenna aperture by ARX , the received power is given by
PT X
PRX = ARX
4πR2
Now, if the transmitter can direct power selectively in the direction of the receiver rather than
radiating it isotropically, we get
PT X
PRX = GT X ARX (6.81)
4πR2
335
where GT X is the transmit antenna’s gain towards the receiver, relative to a hypothetical isotropic
radiator. We now have a formula for received power in terms of transmitted power, which depends
on the gain of the transmit antenna and the aperture of the receive antenna. We would like to
express this formula solely in terms of antenna gains or antenna apertures. To do this, we need
to relate the gain of an antenna to its aperture. To this end, we state without proof that the
λ2
aperture of an isotropic antenna is given by A = 4π . Since the gain of an antenna is the ratio
of its aperture to that of an isotropic antenna. This implies that the relation between gain and
aperture can be written as
A 4πA
G= 2 = 2 (6.82)
λ /(4π) λ
Assuming that the aperture A scales up in some fashion with antenna size, this implies that, for
a fixed form factor, we can get higher antenna gains as we decrease the carrier wavelength, or
increase the carrier frequency.
Using (6.82) in (6.81), we get two versions of the Friis formula:
Friis formula for free space propagation
λ2
PRX = PT X GT X GRX , in terms of antenna gains (6.83) where
16π 2 R2
AT X ARX
PRX = PT X , in terms of antenna apertures (6.84)
λ2 R 2
• GT X , AT X are the gain and aperture, respectively, of the transmit antenna,
• GRX , ARX are the gain and aperture, respectively, of the receive antenna,
• λ = fcc is the carrier wavelength (c = 3 × 108 meters/sec, is the speed of light, fc the carrier
frequency),
• R is the range (line-of-sight distance between transmitter and receiver).
The first version (6.83) of the Friis formula tells us that, for antennas with fixed gain, we should
try to use as low a carrier frequency (as large a wavelength) as possible. On the other hand,
the second version tells us that, if we have antennas of a given form factor, then we can get
better performance as we increase the carrier frequency (decrease the wavelength), assuming of
course that we can “point” these antennas accurately at each other. Of course, higher carrier
frequencies also have the disadvantage of incurring more attenuation from impairments such as
obstacles, rain, fog. Some of these tradeoffs are explored in the problems.
In order to apply the Friis formula (let us focus on version (6.83) for concreteness) to link budget
analysis, it is often convenient to take logarithms, converting the multiplications into addition.
On a logarithmic scale, antenna gains are expressed in dBi, where GdBi = 10 log10 G for an
antenna with raw gain G. Expressing powers in dBm, we have
λ2
PRX,dBm = PT X,dBm + GT X,dBi + GRX,dBi + 10 log10 (6.85)
16π 2 R2
More generally, we have the link budget equation
PRX,dBm = PT X,dBm + GT X,dBi + GRX,dBi − Lpathloss,dB (R) (6.86)
where Lpathloss,dB (R) is the path loss in dB. For free space propagation, we have from the Friis
formula (6.85) that
16π 2 R2
Lpathloss,dB (R) = 10 log10 path loss in dB for free space propagation (6.87)
λ2
While the Friis formula is our starting point, the link budget equation (6.86) applies more gen-
erally, in that we can substitute other expressions for path loss, depending on the propagation
336
environment. For example, for wireless communication in a cluttered environment, the signal
power may decay as R14 rather than the free space decay of R12 . A mixture of empirical mea-
surements and statistical modeling is typically used to characterize path loss as a function of
range for the environments of interest. For example, the design of wireless cellular systems is
accompanied by extensive “measurement campaigns” and modeling. Once we decide on the path
loss formula (Lpathloss,dB (R)) to be used in the design, the transmit power required to attain a
given receiver sensitivity can be determined as a function of range R. Such a path loss formula
typically characterizes an “average” operating environment, around which there might be sig-
nificant statistical variations that are not captured by the model used to arrive at the receiver
sensitivity For example, the receiver sensitivity for a wireless link may be calculated based on the
AWGN channel model, whereas the link may exhibit rapid amplitude variations due to multipath
fading, and slower variations due to shadowing (e.g., due to buildings and other obstacles). Even
if fading/shadowing effects are factored into the channel model used to compute BER, and the
model for path loss, the actual environment encountered may be worse than that assumed in
the model. In general, therefore, we add a link margin Lmargin,dB , again expressed in dB, in an
attempt to budget for potential performance losses due to unmodeled or unforeseen impairments.
The size of the link margin depends, of course, on the confidence of the system designer in the
models used to arrive at the rest of the link budget.
Putting all this together, if PRX,dBm (min) is the desired receiver sensitivity (i.e., the minimum
required received power), then we compute the transmit power for the link to be
Required transmit power
Example 6.5.1 Consider again the 5 GHz WLAN link of Example 5.8.1. We wish to utilize a
20 MHz channel, using Gray coded QPSK and an excess bandwidth of 33 %. The receiver has
a noise figure of 6 dB.
(a) What is the bit rate?
(b) What is the receiver sensitivity required to achieve a BER of 10−6?
(c) Assuming transmit and receive antenna gains of 2 dBi each, what is the range achieved for
100 mW transmit power, using a link margin of 20 dB? Use link budget analysis based on free
space path loss.
Solution (a) For bandwidth B and fractional excess bandwidth a, the symbol rate
1 B 20
Rs = = = = 15 Msymbols/sec
T 1+a 1 + 0.33
and the bit rate for an M-ary constellation is
337
We can now invert the formula for free space loss, (6.87), noting that fc = 5 GHz, which implies
λ = fcc = 0.06 m. We get a range R of 107 meters, which is of the order of the advertised ranges
for WLANs under nominal operating conditions. The range decreases, of course, for higher bit
rates using larger constellations. What happens, for example, when we use 16QAM or 64QAM?
Example 6.5.2 Consider an indoor link at 10 meters range using unlicensed spectrum at 60
GHz. Suppose that the transmitter and receiver each use antennas with horizontal beamwidths
of 60◦ and vertical beamwidths of 30◦ . Use the following approximation to calculate the resulting
antenna gains:
41000
G≈
θhoriz θvert
where G denotes the antenna gain (linear scale), θhoriz and θvert denote horizontal and vertical
beamwidths (in degrees). Set the noise figure to 8 dB, and assume a link margin of 10 dB at
BER of 10−6 .
(a) Calculate the bandwidth and transmit power required for a 2 Gbps link using Gray coded
QPSK and 50% excess bandwidth.
(b) How do your answers change if you change the signaling scheme to Gray coded 16QAM,
keeping the same bit rate as in (a)?
(c) If you now employ Gray coded 16QAM keeping the same symbol rate as in (a), what is the
bit rate attained and the transmit power required?
(d) How do the answers in the setting of (a) change if you increase the horizontal beamwidth to
120◦ , keeping all other parameters fixed?
Solution: (a) A 2 Gbps link using QPSK corresponds to a symbol rate of 1 Gsymbols/sec.
Factoring in the 50% excess bandwidth, the required bandwidth is B = 1.5 GHz. The target
BER and constellation are as in the previous example, hence we still have (Eb /N0 )reqd,dB ≈ 10.2
dB. Plugging in Rb = 2 Gbps and F = 8 dB in (6.80), we obtain that the required receiver
sensitivity is PRX,dBm (min) = −62.8 dBm.
The antenna gains at each end are given by
41000
G≈ = 22.78
60 × 30
Converting to dB scale, we obtain GT X,dBi = GRX,dBi = 13.58 dBi.
The transmit power for a range of 10 m can now be obtained using (6.88) to be 8.1 dBm.
(b) For the same bit rate of 2 Gbps, the symbol rate for 16QAM is 0.5 Gsymbols/sec, so that
the bandwidth required is 0.75 GHz, factoring in 50% excess bandwidth.
q The nearest neighbors
4Eb
approximation to BER for Gray coded 16QAM is given by Q 5N0
. Using this, we find that
a target BER of 10−6 requires (Eb /N0 )reqd,dB ≈ 14.54 dB, and increase of 4.34 dB relative to (a).
This leads to a corresponding increase in the receiver sensitivity to -58.45 dBm, which leads to
the required transmit power increasing to 12.4 dBm.
(c) If we keep the symbol rate fixed at 1 Gsymbols/sec, the bit rate with 16QAM is Rb = 4 Gbps.
As in (b), (Eb /N0 )reqd,dB ≈ 14.54 dB. The receiver sensitivity is therefore given by -55.45 dBm,
a 3 dB increase over (b), corresponding to the doubling of the bit rate. This translates directly
to a 3 dB increase, relative to (b), in transmit power to 15.4 dBm, since the path loss, antenna
gains, and link margin are as in (b).
(d) We now go back to the setting of (a), but with different antenna gains. The bandwidth is,
of course, unchanged from (a). The new antenna gains are 3 dB smaller because of the doubling
of horizontal beamwidth. The receiver sensitivity, path loss and link margin are as in (a), thus
the 3 dB reduction in antenna gains at each end must be compensated for by a 6 dB increase in
transmit power relative to (a). Thus, the required transmit power is 14.1 dBm.
Discussion: The parameter choices in the preceding examples illustrate how physical character-
istics of the medium change with choice of carrier frequency, and affect system design tradeoffs.
338
The 5 GHz system in Example 6.5.1 employs essentially omnidirectional antennas with small
gains of 2 dBi, whereas it is possible to realize highly directional yet small antennas (e.g., using
electronically steerable printed circuit antenna arrays) for the 60 GHz system in Example 6.5.2
by virtue of the small (5 mm) wavelength. 60 GHz waves are easily blocked by walls, hence the
range in Example 6.5.2 corresponds to in-room communication. We have also chosen parameters
such that the transmit power required for 60 GHz is smaller than that at 5 GHz, since it is
more difficult to produce power at higher radio frequencies. Finally, the link margin for 5 GHz
is chosen higher than for 60 GHz: propagation at 60 GHz is near line-of-sight, whereas fading
due to multipath propagation at 5 GHz can be more significant, and hence may require a higher
link margin relative to the AWGN benchmark which provides the basis for our link budget.
For equal priors, the MPE rule coincides with the ML rule:
• For binary hypothesis testing, ML and MPE rules can be written as likelihood, or log likelihood,
ratio tests:
H1 H1
p1 (y) > >
L(y) = 1 or log L(y) 0 ML rule
p0 (y) < <
H0 H0
H1 H1
p1 (y) > π0 > π0
L(y) = or log L(y) MPE/MAP rule
p0 (y) < π1 < π1
H0 H0
339
Signal space
• M-ary signaling in AWGN in continuous time can be reduced, without loss of information,
to M-ary signaling in finite-dimensional vector space with each dimension seeing i.i.d. N(0, σ 2 )
noise, which corresponds to discrete time WGN. This is accomplished by projecting the received
signal onto the signal space spanned by the M possible signals.
• Decision rules derived using hypothesis testing in the finite-dimensional signal space map
directly back to continuous time because of two key reasons: signal inner products are preserved,
and the noise component orthogonal to the signal space is irrelevant. Because of this equivalence,
we can stop making a distinction between continuous time signals and finite-dimensional vector
signals in our notation.
Optimal demodulation
• For the model Hi = y = si + n, 0 ≤ i ≤ M − 1, optimum demodulation involve computation of
the correlator outputs Zi = hy, si i. This can be accomplished by using a bank of correlators or
matched filters, but any other other receiver structure that yields the statistics {Zi } would also
preserve all of the relevant information.
• The ML and MPE rules are given by
||si ||2
δM L (y) = arg max0≤i≤M −1 hy, si i −
2
||si ||2
δM P E (y) = arg max0≤i≤M −1 hy, si i − + σ 2 log πi
2
When the received signal lies in a finite-dimensional space in which the noise has finite energy,
the ML rule can be written as a minimum distance rule (and the MPE rule as a variant thereof)
as follows:
δM L (y)arg min0≤i≤M −1 ||y − si ||2
δM P E (y) = arg min0≤i≤M −1 ||y − si ||2 − 2σ 2 log πi
Geometry of ML rule: ML decision boundaries are formed from hyperplanes that bisect lines
connecting signal points.
Performance analysis
• For binary signaling, the error probability for the ML rule is given by
s r !
d d2 Eb
Pe = Q =Q
2σ Eb 2N0
where d = ||s1 − s0 || is the Euclidean distance between the signals. The performance therefore
2
depends on the power efficiency ηP = Ed b and the SNR Eb /N0 . Since the power efficiency is scale-
invariant, we may choose any convenient scaling when computing it for a given constellation.
• For M-ary signaling, closed form expressions for the error probability may not be available,
hs ,s i
but we know that the performance depends only on the scale-invariant inner products { Ei bj },
which depend on the constellation “shape” alone, and on Eb /N0 .
• The conditional error probabilities for M-ary signaling can be bounded using the union bound
(these can then be averaged to obtain an upper bound on the average error probability):
s r
2
X dij X dij Eb
Pe|i ≤ Q = Q
2σ Eb 2N0
j6=i j6=i
where dij = ||si − sj || are the pairwise distances between signal points.
• When we understand the shape of the decision regions, we can tighten the union bound into
340
an intelligent union bound:
s
d2ij
r
X ||dij || X Eb
Pe|i ≤ Q = Q
2σ Eb 2N0
j ∈ Nml (i) j ∈ Nml (i)
where Nml (i) denotes the set of neighbors of si which define the decision region Γi .
• For regular constellations, the nearest neighbors approximation is given by
s r
2
dmin dmin Eb
Pe|i ≈ Ndmin (i)Q = Ndmin (i)Q
2σ Eb 2N0
s r
d2min
dmin Eb
Pe ≈ N̄dmin Q = N̄dmin Q
2σ Eb 2N0
d2
with ηP = E min
b
providing a measure of power efficiency which can be used to compare across
constellations.
• If Gray coding is possible, the bit error probability can be estimated as
r !
ηP Eb
P (bit error) ≈ Q
2N0
Link budget: This relates (e.g., using the Friis formula for free space propagation) the per-
formance of a communication link to physical parameters such as transmit power, transmit and
receive antenna gains, range, and receiver noise figure. A link margin is typically introduced to
account for unmodeled impairments.
6.7 Endnotes
The geometric signal space approach for deriving and analyzing is now standard in textbooks
on communication theory, such as [7, 8]. It was first developed by Russian pioneer Vladimir
Kotelnikov [33], and presented in a cohesive fashion in the classic textbook by Wozencraft and
Jacobs [9].
A number of details of receiver design have been swept under the rug in this chapter. Our
model for the received signal is that it equals the transmitted signal plus WGN. In practice,
the transmitted signal can be significantly distorted by the channel (e.g., scaling, delay, multi-
path propagation). However, the basic M-ary signaling model is still preserved: if M possible
signals are sent, then, prior to the addition of noise, M possible signals are received after the
deterministic (but a priori unknown) transformations due to channel impairments. The receiver
can therefore estimate noiseless copies of the latter and then apply the optimum demodula-
tion techniques developed here. This approach leads, for example, to the optimal equalization
strategies developed by Forney [34] and Ungerboeck [35]; see Chapter 5 of [7] for a textbook
exposition. Estimation of the noiseless received signals involves tasks such as carrier phase and
frequency synchronization, timing synchronization, and estimation of the channel impulse re-
sponse or transfer function. In modern digital communication transceivers, these operations
are typically all performed using DSP on the complex baseband received signal. Perhaps the
best approach for exploring further is to acquire a basic understanding of the relevant estima-
tion techniques, and to then go to technical papers of specific interest (e.g., IEEE conference
341
and journal publications). Classic texts covering estimation theory include Kay [36], Poor [37]
and Van Trees [38]. Several graduate texts in communications contain a brief discussion of the
modern estimation-theoretic approach to synchronization that may provide a helpful orientation
prior to going to the research literature; for example, see [7] (Chapter 4) and [11, 39] (Chapter
8).
6.8 Problems
Hypothesis Testing
Problem 6.1 The received signal in a digital communication system is given by
s(t) + n(t) 1 sent
y(t) =
n(t) 0 sent
where n is AWGN with PSD σ 2 = N0 /2 and s(t) is as shown below. The received signal is passed
s(t)
1 t = t0
ML decision
0 2 t h(t)
4 rule
-1
through a filter, and the output is sampled to yield a decision statistic. An ML decision rule is
employed based on the decision statistic. The set-up is shown in Figure 6.30.
(a) For h(t) = s(−t), find the error probability as a function of Eb /N0 if t0 = 1.
(b) Can the error probability in (a) be improved by choosing the sampling time t0 differently?
(c) Now, find the error probability as a function of Eb /N0 for h(t) = I[0,2] and the best possible
choice of sampling time.
(d) Finally, comment on whether you can improve the performance in (c) by using a linear com-
bination of two samples as a decision statistic, rather than just using one sample.
Problem 6.2 Consider binary hypothesis testing based on the decision statistic Y , where Y ∼
N(2, 9) under H1 and Y ∼ N(−2, 4) under H0 .
(a) Show that the optimal (ML or MPE) decision rule is equivalent to comparing a function of
the form ay 2 + by to a threshold.
(b) Specify the MPE rule explicitly (i.e., specify a, b and the threshold) when π0 = 41 .
(c) Express the conditional error probability Pe|0 for the decision rule in (b) in terms of the Q
function with positive arguments. Also provide a numerical value for this probability.
Problem 6.3 Find and sketch the decision regions for a binary hypothesis testing problem with
observation Z, where the hypotheses are equally likely, and the conditional distributions are
given by
H0 : Z is uniform over [−2, 2]
H1 : Z is Gaussian with mean 0 and variance 1.
342
Problem 6.4 The receiver in a binary communication system employs a decision statistic Z
which behaves as follows:
Z = N if 0 is sent
Z = 4 + N if 1 is sent
where N is modeled as Laplacian with density
1
pN (x) = e−|x| , −∞<x<∞
2
Note: Parts (a) and (b) can be done independently.
(a) Find and sketch, as a function of z, the log likelihood ratio
p(z|1)
K(z) = log L(z) = log
p(z|0)
where p(z|i) denotes the conditional density of Z given that i is sent (i = 0, 1).
(b) Find Pe|1 , the conditional error probability given that 1 is sent, for the decision rule
0, z < 1
δ(z) =
1, z ≥ 1
(c) Is the rule in (b) the MPE rule for any choice of prior probabilities? If so, specify the prior
probability π0 = P [ 0 sent] for which it is the MPE rule. If not, say why not.
Problem 6.5 Consider the MAP/MPE rule for the hypothesis testing problem in Example
6.1.1.
(a) Show that the MAP rule always says H1 if the prior probability of H0 is smaller than some
positive threshold. Specify this threshold.
(b) Compute and plot the conditional probabilities Pe|0 and Pe|1 , and the average error proba-
bility Pe , versus π0 as the latter varies in [0, 1].
(c) Discuss any trends that you see from the plots in (b).
Problem 6.6 Consider a MAP receiver for the basic Gaussian example, as discussed in Example
6.1.2. Fix SNR at 13 dB. We wish to explore the effect of prior mismatch, by quantifying the
performance degradation of a MAP receiver if the actual priors are different from the priors for
which it has been designed.
(a) Plot the average error probability for a MAP receiver designed for π0 = 0.2, as π0 varies
from 0 to 1. As usual, use a log scale for the probabilities. On the same plot, also plot the error
probability of the ML receiver as a benchmark.
(b) From the plot in (a), comment on how much error you can tolerate in the prior probabilities
before the performance of the MAP receiver designed for the given prior becomes unacceptable.
(c) Repeat (a) and (b) for a MAP receiver designed for π0 = 0.4. Is the performance more or
less sensitive to errors in the priors?
Problem 6.7 Consider binary hypothesis testing in which the observation Y is modeled as uni-
formly distributed over [−2, 2] under H0 , and has conditional density p(y|1) = c(1−|y|/3)I[−3,3](y)
under H1 , where c > 0 is a constant to be determined.
(a) Find c.
(b) Find and sketch the decision regions Γ0 and Γ1 corresponding to the ML decision rule.
(c) Find the conditional error probabilities.
343
Problem 6.8 Consider binary hypothesis testing with scalar observation Y . Under hypothesis
H0 , Y is modeled as uniformly distributed over [−5, 5]. Under H1 , Y has conditional density
p(y|1) = 18 e−|y|/4 , − ∞ < y < ∞.
(a) Specify the ML rule and clearly draw the decision regions Γ0 and Γ1 on the real line.
(b) Find the conditional probabilities of error for the ML rule under each hypothesis.
Problem 6.9 For the setting of Problem 6.8, suppose that the prior probability of H0 is 1/3.
(a) Specify the MPE rule and draw the decision regions.
(b) Find the conditional error probabilities and the average error probability. Compare with the
corresponding quantities for the ML rule considered in Problem 6.8.
Problem 6.10 The receiver output Z in an on-off keyed optical communication system is mod-
eled as a Poisson random variable with mean m0 = 1 if 0 is sent, and mean m1 = 10 if 1 is sent.
(a) Show that the ML rule consists of comparing Z to a threshold, and specify the numerical
value of the threshold. Note that Z can only take nonnegative integer values.
(b) Compute the conditional error probabilities for the ML rule (compute numerical values in
addition to deriving formulas).
(c) Find the MPE rule if the prior probability of sending 1 is 0.1.
(d) Compute the average error probability for the MPE rule.
Problem 6.11 The received sample Y in a binary communication system is modeled as follows:
Y = A + N if 0 is sent, and Y = −A + N if 1 is sent, where N is Laplacian noise with density
λ
pN (x) = e−λ|x| , − ∞ < x < ∞
2
(a) Find the ML decision rule. Simplify as much as possible.
(b) Find the conditional error probabilities for the ML rule.
(c) Now, suppose that the prior probability of sending 0 is 1/3. Find the MPE rule, simplifying
as much as possible.
(d) In the setting of (c), find the LLR log PP [0|Y =A/2]
[1|Y =A/2]
.
Problem 6.12 Consider binary hypothesis testing with scalar observation Y . Under hypothesis
H0 , Y is modeled as an exponential random variable with mean 5. Under hypothesis H1 , Y is
modeled as uniformly distributed over the interval [0, 10].
(a) Specify the ML rule and clearly draw the decision regions Γ0 and Γ1 on the real line.
(b) Find the conditional probability of error for the ML rule, given that H0 is true.
(c) Suppose that the prior probability of H0 is 1/3. Compute the posterior probability of H0
given that we observe Y = 4 (i.e., find P [H0 |Y = 4]).
Problem 6.13 Consider hypothesis testing in which the observation Y is given by the following
model:
H1 : Y = 6 + N
H0 : Y = N
where the noise N has density pN (x) = 101
1 − |x|
10
I[−10,10] (x).
(a) Find the conditional error probability given H1 for the following decision rule:
H1
>
Y 4
<
H0
(b) Are there a set of prior probabilities for which the decision rule in (a) minimizes the error
probability? If so, specify them. If not, say why not.
344
Receiver design and performance analysis for the AWGN channel
Problem 6.14 Consider binary signaling in AWGN, with s1 (t) = (1 − |t|)I[−1,1] (t) and s0 (t) =
−s1 (t). The received signal is given by y(t) = si (t) + n(t), i = 0, 1,, where the noise n has PSD
σ 2 = N20 = 0.1. For all of the error probabilities computed in this problem, specify in terms of
the Q function with positive arguments and also give numerical values.
(a) How would you implement the ML receiver using the received signal y(t)? What is its
conditional error probability given that s0 is sent?
Now, consider a suboptimal receiver, where the receiver generates the following decision statistics:
Z −0.5 Z 0 Z 0.5 Z 1
y0 = y(t)dt, y1 = y(t)dt, y2 = y(t)dt, y0 = y(t)dt
−1 −0.5 0 0.5
(b) Specify the conditional distribution of y = (y0 , y1 , y2, y3 )T , conditioned on s0 being sent.
(c) Specify the ML rule when the observation is y. What is its conditional error probability
given that s0 is sent?
(d) Specify the ML rule when the observation is y0 + y1 + y2 + y3 . What is its conditional error
probability, given that s0 is sent?
(e) Among the error probabilities in (a), (c) and (d), which is the smallest? Which is the biggest?
Could you have rank ordered these error probabilities without actually computing them?
Problem 6.15 The received signal in an on-off keyed digital communication system is given by
s(t) + n(t) 1 sent
y(t) =
n(t) 0 sent
where n is AWGN with PSD σ 2 = N0 /2, and s(t) = A(1−|t|)I[−1,1] (t), where A > 0. The received
signal is passed through a filter with impulse response h(t) = I[0,1] (t) to obtain z(t) = (y ∗ h)(t).
Remark: It would be helpful to draw a picture of the system before you start doing the calculations.
(a) Consider the decision statistic Z = z(0) + z(1). Specify the conditional distribution of Z
given that 0 is sent, and the conditional distribution of Z given that 1 is sent.
(b) Assuming that the receiver must make its decision based on Z, specify the ML rule and its
error probability in terms of Eb /N0 (express your answer in terms of the Q function with positive
arguments).
(c) Find the error probability (in terms of Eb /N0 ) for ML decisions based on the decision statistic
Z2 = z(0) + z(0.5) + z(1).
Problem 6.16 Consider binary signaling in AWGN using the signals depicted in Figure 6.31.
The received signal is given by
s1 (t) + n(t), 1 sent
y(t) =
s0 (t) + n(t), 0 sent
where n(t) is WGN with PSD σ 2 = N0 /2. R
(a) Show that the ML decision rule can be implemented by comparing Z = y(t)a(t)dt to a
threshold γ. Sketch a(t) and specify the corresponding value of γ.
(b) Specify the error probability of the ML rule as a function of Eb /N0 .
(c) Can the MPE rule, assuming that the prior probability of sending 0 is 1/3, be implemented
using the same receiver structure as in (a)? What would need to change? (Be specific.)
(d) Consider now a suboptimal receiver structure in which y(t) is passed through a filter with
impulse response h(t) = I[0,1] (t), and we take three samples: Z1 = (y ∗ h)(1), Z2 = (y ∗ h)(2),
Z3 = (y ∗ h)(3). Specify the conditional distribution of Z = (Z1 , Z2 , Z3 )T given that 0 is sent.
(e) (more challenging) Specify the ML rule based on Z and the corresponding error probability
as a function of Eb /N0 .
345
s 1(t)
1
t
1 3
s 0(t)
0 2 t
−1
Problem 6.17 Let p1 (t) = I[0,1] (t) denote a rectangular pulse of unit duration. Consider two
4-ary signal sets as follows:
Signal Set A: si (t) = p1 (t − i), i = 0, 1, 2, 3.
Signal Set B: s0 (t) = p1 (t) + p1 (t − 3), s1 (t) = p1 (t − 1) + p1 (t − 2), s2 (t) = p1 (t) + p1 (t − 2),
s3 (t) = p1 (t − 1) + p1 (t − 3).
(a) Find signal space representations for each signal set with respect to the orthonormal basis
{p1 (t − i), i = 0, 1, 2, 3}.
(b) Find union bounds on the average error probabilities for both signal sets as a function of
Eb /N0. At high SNR, what is the penalty in dB for using signal set B?
(c) Find an exact expression for the average error probability for signal set B as a function of
Eb /N0.
a(t) b(t)
2 2
t t
−2 −1 0 1 2 −2 −1 0 1 2
−2
c(t) d(t)
2 2
0
t t
−2 −1 0 1 2 −2 −1 1 2
−2 −2
Problem 6.18 Consider the 4-ary signaling set shown in Figure 6.32, to be used over an AWGN
channel.
(a) Find a union bound, as a function of Eb /N0 , on the conditional probability of error given
that c(t) is sent.
(b) True or False This constellation is more power efficient than QPSK. Justify your answer.
346
R
QAM1 QAM2
8-PSK
Problem 6.19 Three 8-ary signal constellations are shown in Figure 6.33.
(2) (1)
(a) Express R and dmin in terms of dmin so that all three constellations have the same Eb .
(b) For a given Eb /N0 , which constellation do you expect to have the smallest bit error probability
over a high SNR AWGN channel?
(c) For each constellation, determine whether you can label signal points using 3 bits so that the
label for nearest neighbors differs by at most one bit. If so, find such a labeling. If not, say why
not and find some “good” labeling.
(d) For the labelings found in part (c), compute nearest neighbors approximations for the average
bit error probability as a function of Eb /N0 for each constellation. Evaluate these approximations
for Eb /N0 = 15dB.
Problem 6.20 Consider the signal constellation shown in Figure 6.34, which consists of two
QPSK constellations of different radii, offset from each other by π4 . The constellation is to be
used to communicate over a passband AWGN channel.
(a) Carefully
√ redraw the constellation (roughly to scale, to the extent possible) for r = 1 and
R = 2. Sketch the ML√decision regions.
(b) For r = 1 and R = 2, find an intelligent union bound for the conditional error probability,
given that a sign al point from the inner circle is sent, as a function of Eb /N0 .
(c) How would you choose the parameters r and R so as to optimize the power efficiency of the
constellation (at high SNR )?
347
Problem 6.21 (Exact symbol error probabilities for rectangular constellations) As-
suming each symbol is equally likely, derive the following expressions for the average error prob-
ability for 4PAM and 16QAM:
r !
3 4Eb
Pe = Q , symbol error probability for 4PAM (6.90)
2 5N0
r ! r !
4Eb 9 4Eb
Pe = 3Q − Q2 , symbol error probability for 16QAM (6.91)
5N0 4 5N0
(Assume 4PAM with equally spaced levels symmetric about the origin, and rectangular 16QAM
equivalent to two 4PAM constellations independently modulating the I and Q components.)
d
d
I
Problem 6.22 The signal constellation shown in Figure 6.35 is obtained by moving the outer
corner points in rectangular 16QAM to the I and Q axes.
(a) Sketch the ML decision regions.
(b) Is the constellation more or less power-efficient than rectangular 16QAM?
Problem 6.23 Consider a 16-ary signal constellation with 4 signals with coordinates (±1, ±1),
four others with coordinates (±3, ±3), and two each having coordinates (±3, 0), (±5, 0), (0, ±3),
and (0, ±5), respectively.
(a) Sketch the signal constellation and indicate the ML decision regions.
(b) Find an intelligent union bound on the average symbol error probability as a function of
Eb /N0.
(c) Find the nearest neighbors approximation to the average symbol error probability as a func-
tion of Eb /N0 .
(d) Find the nearest neighbors approximation to the average symbol error probability for 16QAM
as a function of Eb /N0 .
(e) Comparing (c) and (d) (i.e., comparing the performance at high SNR), which signal set is
more power efficient?
Problem 6.24 A QPSK demodulator is designed to put out an erasure when the decision is
ambivalent. Thus, the decision regions are modified as shown in Figure 6.36, where the cross-
hatched region corresponds to an erasure. Set α = dd1 , where 0 ≤ α ≤ 1.
(a) Use the intelligent union bound to find approximations to the probability p of symbol error
and the probability q of symbol erasure in terms of Eb /N0 and α.
(b) Find exact expressions for p and q as functions of Eb /N0 and α.
348
d1
d
(c) Using the approximations in (a), find an approximate value for α such that q = 2p for
Eb /N0 = 4dB.
Remark: The motivation for (c) is that a typical error-correcting code can correct twice as many
erasures as errors.
r R
Problem 6.25 The constellation shown in Figure 6.37 consists of two QPSK constellations lying
on concentric circles, with inner circle of radius r and outer circle of radius R.
(a) For r = 1 and R = 2, redraw the constellation, and carefully sketch the ML decision regions.
(b) Still keeping r = 1 and R = 2, find an intelligent union bound for the symbol error probability
as a function of Eb /N0 .
(c) For r = 1, find the best choice of R in terms of high SNR performance. Compute the gain in
power efficiency (in dB), if any, over the setting in (a)-(b).
Problem 6.26 Consider the constant modulus constellation shown in Figure 6.38. where θ ≤
π/4. Each symbol is labeled by 2 bits (b1 , b2 ) as shown. Assume that the constellation is used
over a complex baseband AWGN channel with noise Power Spectral Density (PSD) N0 /2 in each
dimension. Let (bˆ1 , bˆ2 ) denote the maximum likelihood (ML) estimates of (b1 , b2 ).
(a) Find Pe1 = P [bˆ1 6= b1 ] and Pe2 = P [bˆ2 6= b2 ] as a function of Es /N0 , where Es denotes the
signal energy.
(b) Assume now that the transmitter is being heard by two receivers, R1 and R2, and that R2 is
twice as far away from the transmitter as R1. Assume that the received signal energy falls off as
1/r 4 , where r is the distance from the transmitter, and that the noise PSD for both receivers is
349
(0,0)
00
11 11111
00000
00
11(1,0)
00000
11111
00000
11111
00000
11111
θ
00000
11111
00
11
(0,1)
00
11
(1,1)
Figure 6.38: Signal constellation with unequal error protection (Problem 6.26).
identical. Suppose that R1 can demodulate both bits b1 and b2 with error probability at least as
good as 10−3 , i.e., so that max{Pe1(R1), Pe2 (R1)} = 10−3 . Design the signal constellation (i.e.,
specify θ) so that R2 can demodulate at least one of the bits with the same error probability,
i.e., such that min{Pe1 (R2), Pe2 (R2)} = 10−3 .
Remark: You have designed an unequal error protection scheme in which the receiver that sees
a poorer channel can still extract part of the information sent.
Q
2 s1
s3 s0
I
−3 −1 0 1 3
−2
s2
Problem 6.27 The 2-dimensional constellation shown in Figure 6.39 is to be used for signaling
over an AWGN channel.
(a)Specify the ML decision if the observation is (I, Q) = (1, −1).
(b) Carefully redraw the constellation and sketch the ML decision regions.
(c) Find an intelligent union bound for the symbol error probability conditioned on s0 being sent,
as a function of Eb /N0 .
Problem 6.28 (Demodulation with amplitude mismatch) Consider a 4PAM system us-
ing the constellation points {±1, ±3}. The receiver has an accurate estimate of its noise level.
An automatic gain control (AGC) circuit is supposed to scale the decision statistics so that the
noiseless constellation points are in {±1, ±3}. ML decision boundaries are set according to this
nominal scaling.
(a) Suppose that the AGC scaling is faulty, and the actual noiseless signal points are at {±0.9, ±2.7}.
Sketch the points and the mismatched decision regions. Find an intelligent union bound for the
symbol error probability in terms of the Q function and Eb /N0 .
350
(b) Repeat (a), assuming that faulty AGC scaling puts the noiseless signal points at {±1.1, ±3.3}.
(c) AGC circuits try to maintain a constant output power as the input power varies, and can be
viewed as imposing a scale factor on the input inversely proportional to the square root of the
input power. In (a), does the AGC circuit overestimate or underestimate the input power?
Problem 6.29 (Demodulation with phase mismatch) Consider a BPSK system in which
the receiver’s estimate of the carrier phase is off by θ.
(a) Sketch the I and Q components of the decision statistic, showing the noiseless signal points
and the decision region.
(b) Derive the BER as a function of θ and Eb /N0 (assume that θ < π2 ).
(c) Assuming now that θ is a random variable taking values uniformly in [− π4 , π4 ], numerically
compute the BER averaged over θ, and plot it as a function of Eb /N0 . Plot the BER without
phase mismatch as well, and estimate the dB degradation due to the phase mismatch.
Problem 6.30 (Simplex signaling set) Let s0 (t), ..., sM −1 (t) denote a set of equal energy,
orthogonal signals. Construct a new M-ary signal set from these as follows, by subtracting out
the average of the M signals from each signal as follows:
M −1
1 X
uk (t) = sk (t) − sj (t) , k = 0, 1, .., M − 1
M j=0
Problem 6.31 (Soft decisions for BPSK) Consider a BPSK system in which 0 and 1 are
equally likely to be sent, with 0 mapped to +1 and 1 to -1 as usual. Thus, the decision statistic
Y = A + N if 0 is sent, and Y = −A + N if 1 is sent, where A > 0 and N ∼ N(0, σ 2 ).
(a) Show that the LLR is conditionally Gaussian given the transmitted bit, and that the condi-
tional distribution is scale-invariant, depending only on Eb /N0 .
(b) If the BER for hard decisions is 10%, specify the conditional distribution of the LLR, given
that 0 is sent.
Problem 6.32 (Soft decisions for PAM) Consider soft decisions for 4PAM signaling as in
Example 6.1.3. Assume that the signals have been scaled to ±1, ±3 (i.e., set A = 1 in Example
6.1.3. The system is operating at Eb /N0 of 6 dB. Bits b1 , b2 ∈ {0, 1} are mapped to the symbols
using Gray coding. Assume that (b1 , b2 ) = (0, 0) for symbol -3, and (1, 0) for symbol +3.
(a) Sketch the constellation, along with the bit maps. Indicate the ML hard decision boundaries.
(b) Find the posterior symbol probability P [−3|y] as a function of the noisy observation y. Plot
it as a function of y.
Hint: The noise variance σ 2 can be inferred from the signal levels and SNR.
(c) Find P [b1 = 1|y] and P [b2 = 1|y], and plot as a function of y.
Remark: The posterior probability of b1 = 1 equals the sum of the posterior probabilities of all
symbols which have b1 = 1 in their labels.
(d) Display the results of part (c) in terms of LLRs.
P [b1 = 0|y] P [b2 = 0|y]
LLR1 (y) = log , LLR2 (y) = log
P [b1 = 1|y] P [b2 = 1|y]
351
Plot the LLRs as a function of y, saturating the values as ±50.
(e) Try other values of Eb /N0 (e.g., 0 dB, 10 dB). Comment on any trends you notice. How do
the LLRs vary as a function of distance from the noiseless signal points? How do they vary as
you change Eb /N0 .
(f) In order to characterize the conditional distribution of the LLRs, simulate the system over
multiple symbols at Eb /N0 such that the BER is about 5%. Plot the histograms of the LLRs
for each of the two bits, and comment on whether they look Gaussian. What happens as you
increase or decrease Eb /N0 ?
Hint: Use L’Hospital’s rule on the log of the expression whose limit is to be evaluated.
(c) Substitute (b) into the integral in (a) to infer the desired result.
Problem 6.34 (Effect of Rayleigh fading) Constructive and destructive interference between
multiple paths in wireless systems lead to large fluctuations in received amplitude, modeled as a
Rayleigh random variable A (see Problem 5.21 for a definition). The energy per bit is therefore
proportional to A2 , which, using Problem 5.21(c), is an exponential random variable. Thus,
we can model Eb /N0 as an exponential random variable with mean Ēb /N0 , where Ēb is the
Eb
average energy per bit. Simplify notation by setting N 0
= X, and the mean Ēb /N0 = µ1 , so that
X ∼ Exp(µ).
(a) Show that the average error probability for BPSK with Rayleigh fading can be written as
Z ∞ √
Pe = Q( 2x) µe−µx dx
0
q
2Eb
Hint: The error probability for BPSK is given by Q N0
, where Eb /N0 is a random variable.
We now find the expected error probability by averaging over the distribution of Eb /N0 .
(b) Integrating by parts and simplifying, show that the average error probability can be written
as 1
1 − 21 N0 − 1
Pe = 1 − (1 + µ) = 1 − (1 + ) 2
2 2 Ēb
Hint: Q(x) is defined via an integral, so we can find its derivative (when integrating by parts)
using the fundamental theorem of calculus.
(c) Using the approximation that (1 + a)b ≈ 1 + ba for |a| small, show that
1
Pe ≈
4(Ēb /N0)
352
at high SNR. Comment on how this decay of error probability with the reciprocal of SNR
compares with the decay for the AWGN channel.
Ēb
(b) Plot the error probability versus N 0
for BPSK over the AWGN and Rayleigh fading channels
(BER on log scale, NĒ0 in dB). Note that Ēb = Eb for the AWGN channel. At BER of 10−3 , what
is the degradation in dB due to Rayleigh fading?
Problem 6.37 Consider a line-of-sight communication link operating in the 60 GHz band (where
large amounts of unlicensed bandwidth have been set aside by regulators). From version 1 of
the Friis formula (6.83), we see that the received power scales as λ2 , and hence as the inverse
square of the carrier frequency, so that 60 GHz links have much worse propagation than, say, 5
GHz links when antenna gains are fixed. However, from (6.82), we see that the we can get much
better antenna gains at small carrier wavelengths for a fixed form factor, and version 2 of the
Friis formula (6.84) shows that the received power scales as 1/λ2 , which improves with increasing
carrier frequency. Furthermore, electronically steerable antenna arrays with high gains can be
implemented with compact form factor (e.g., patterns of metal on circuit board) at higher carrier
frequencies such as 60 GHz. Suppose, now, that we wish to design a 2 Gbps link using QPSK
with an excess bandwidth of 50%. The receiver noise figure is 8 dB, and the desired link margin
is 10 dB.
(a) What is the transmit power in dBm required to attain a range of 10 meters (e.g., for in-room
communication), assuming that the transmit and receive antenna gains are each 10 dBi?
(b) For a transmit power of 20 dBm, what are the antenna gains required at the transmitter and
receiver (assume that the gains at both ends are equal) to attain a range of 200 meters (e.g., for
an outdoor last-hop link)?
(c) For the antenna gains found in (b), what happens to the attainable range if you account for
additional path loss due to oxygen absorption (typical in the 60 GHz band) of 16 dB/km?
(d) In (c), what happens to the attainable range if there is a further path loss of 30 dB/km due
to heavy rain (on top of the loss due to oxygen absorption)?
353
dB, and the transmit and receive antenna gains are 10 dBi each. This is the baseline scenario
against which each of the scenarios in (a)-(c) are to be compared.
(a) Suppose that you change the carrier frequency to 5 GHz, keeping all other link parameters
the same. What is the new range?
(b) Suppose that you change the carrier frequency to 5 GHz and increase the transmit and receive
antenna gains by 3 dBi each, keeping all other link parameters the same. What is the new range?
(c) Suppose you change the carrier frequency to 5 GHz, increase the transmit and receive antenna
directivities by 3 dBi each, and increase the data rate to 40 Mbps, still using 16QAM with excess
bandwidth of 25%. All other link parameters are the same. What is the new range?
Laboratory Assignment
3) BPSK symbol generation: Use part 1 to generate 12000 0/1 bits. Map these to BPSK (±1)
bits using bpskmap. Pass these through the transmit and receive filter in lab 1 to get noiseless
received samples at rate 4/T , as before.
4) Adding noise: We consider discrete time additive white Gaussian noise (AWGN). At the input
to the receive filter, add independent and identically distributed (iid) complex Gaussian noise,
such that the real and imaginary part of each sample are iid N(0, σ 2 ) (you will choose σ 2 = N20
Eb
corresponding to a specified value of N 0
, as described in part 5. Pass these (rate 4/T ) noise
samples through the receive filter, and add the result to the output of part 3.
Remark: If the nth transmitted symbol is b[n], the average received energy per symbol is
Es = E[|b[n]|2 ]||gT ∗ gC ||2 . Divide that by the number of bits per symbol to get Eb . The
354
noise variance per dimension is σ 2 = N20 . This enables you to compute Eb /N0 for your simula-
tion model. The signal-to-noise ratio Eb /N0 is usually expressed in decibels (dB): Eb /N0 (dB) =
10 log10 Eb /N0 (raw). Thus, if you fix the transmit and channel filter coefficients, then you can
simulate any given value of Eb /N0 in dB by varying the value of the noise variance σ 2 .
p
5) Plot the ideal bit error probability for BPSK, which is given by Q( 2Eb /N0 ), on a log scale
as a function of Eb /N0 in dB over the range 0-10 dB. Find the value of Eb /N0 that corresponds
to an error probability of 10−2 .
6) For the value of Eb /N0 found in part 5, choose the corresponding value of σ 2 in part 1. Find
the decision statistics corresponding to the transmitted symbols at the input and output of the
receive filter, as in lab 1 (parts 5 and 6). Plot the imaginary versus the real parts of the decision
statistics; you should see a noisy version of the constellation.
7) Using an appropriate decision rule, make decisions on the 12000 transmitted bits based on
the 12000 decision statistics, and measure the error probability obtained at the input and the
output. Compare the results with the ideal error probability from part 5. You should find that
the error probability based on the receiver input samples is significantly worse than that based
on the receiver output, and that the latter is a little worse than the ideal performance because
of the ISI in the decision statistics.
8) Now, map 12000 0/1 bits into 6000 4PAM symbols using function fourpammap (use as input
2 parallel vectors of 6000 bits). As shown in Chapter 6, a good approximation (the nearest
neighbors
q approximation) to the ideal bit error probability for Gray coded 4PAM is given by
4Eb
Q 5N0
. As in part 5), plot this on a log scale as a function of Eb /N0 in dB over the range
0-10 dB. What is the value of Eb /N0 (dB) corresponding to a bit error probability of 10−2?
9) Choose the value of the noise variance σ 2 corresponding to the Eb /N0 found in part 7. Now,
find decision statistics for the 6000 transmitted symbols based on the receive filter output only.
(a) Plot the imaginary versus the real parts of the decision statistics, as before.
(b) Determine an appropriate decision rule for estimating the two parallel bit streams of 6000
bits from the 6000 complex decision statistics.
(c) Measure the bit error probability, and compare it with the ideal bit error probability.
10) Repeat parts 8 and 9 for QPSK, the ideal bit error probability for which, as a function of
Eb /N0, is the same as for BPSK.
11) Repeat parts 8 and 9 for 16QAM (4 bit streams of length 3000 each), the ideal bit error
probability for which, as a function of Eb /N0 , is the same as for 4PAM.
12) Repeat parts 8 and 9 for 8PSK (3 bit streams of length 4000 each). The ideal bit error
probability for Gray
coded 8PSK is approximated by (using the nearest neighbors approximation)
√
q
(6−3 2)Eb
Q 2N0
.
13) Since all your answers above will be off from the ideal answers because of some ISI, run a
simulation with 12000 bits sent using Gray-coded 16-QAM with no ISI. To do this, generate the
decision statistics by adding noise directly to the transmitted symbols, setting the noise variance
appropriately to operate at the required Eb /N0 . Do this for two different values of Eb /N0 , the one
in part 11 and a value 3 dB higher. In each case, compare the nearest neighbors approximation
to the measured bit error probability, and plot the imaginary versus real part of the decision
statistics.
Lab Report: Your lab report should document the results of the preceding steps in order.
Describe the reasoning you used and the difficulties you encountered.
Tips: Vectorize as many of the functions as possible, including both the bit-to-symbol maps and
the decision rules. Do BPSK and 4-PAM first, where you will only use the real part of the complex
355
decision statistics. Leverage this for QPSK and 16-QAM, by replicating what you did for the
imaginary part of the decision statistics as well. To avoid confusion, keep different matlab files
for simulations regarding different signal constellations, and keep the analytical computations
and plots separate from the simulations.
Laboratory Assignment
Let us consider the following simple model of a wireless channel (obtained after filtering and
sampling at the symbol rate, and assuming that there is no ISI). If {b[n]} is the transmitted
symbol sequence, then the complex-valued received sequence is given by
where {w[n] = wc [n] + jws [n]} is an iid complex Gaussian noise sequence with wc [n], ws [n] i.i.d.
N(0, σ 2 = N20 ) random variables. We say that w[n] has variance σ 2 per dimension. The channel
sequence {h[n]} is a time-varying sequence of complex gains.
Equation (6.93) models the channel at a given time as a simple scalar gain h[n]. On the other
hand, as discussed in Example 2.5.6, a multipath wireless channel cannot be modeled as a simple
scalar gain: it is dispersive in time, and exhibits frequency selectivity. However, it is shown in
Chapter 8 that we can decompose complicated dispersive channels into scalar models by using
frequency-domain modulation, or OFDM, which transmits data in parallel over narrow enough
frequency slices such that the channel over each slice can be modeled as a complex scalar.
Equation (6.93) could therefore be interpreted as modeling time variations in such scalar gains.
Rayleigh fading: The channel gain sequence {h[n] = hc [n] + jhs [n]}, where {hc [n]} and {hs [n]}
are zero mean, independent and identically distributed p colored Gaussian random processes. The
reason this is called Rayleigh fading is that |h[n]| = h2c [n] + h2s [n] is a Rayleigh random variable.
Remark: The Gaussianity arises because the overall channel gain results from a superposition of
gains from multiple reflections off scatterers.
Simulation of Rayleigh fading: We will use a simple model wherein the colored channel gain
sequence {h[n]} is obtained by passing white Gaussian noise through a first-order recursive filter,
as follows:
hc [n] = ρhc [n − 1] + u[n]
(6.94)
hs [n] = ρhs [n − 1] + v[n]
where {u[n]} and {v[n]} are independent real-valued white Gaussian sequences, with i.i.d. N(0, β 2 )
elements. The parameter ρ (0 < ρ < 1) determines how rapidly the channel varies. The model for
I and Q gains in (6.94) are examples of first-order autoregressive (AR(1)) random processes: au-
toregressive because future values depend on the past in a linear fashion, and first order because
only the immediately preceding value affects the current one.
Setting up the fading simulator
1) Set up the AR(1) Rayleigh fading model in matlab, with ρ and β 2 as programmable parameters.
356
2) Calculate E[|h[n]|2 ] = 2E [h2c [n]] = 2v 2 analytically as a function of ρ and β 2 . Use simulation
to verify your results, setting ρ = .99 and β = .01. You may choose to initialize hc [0] and hs [0]
as iid N(0, v 2 ) in your simulation. Use at least 10,000 samples.
2
3) Plot the instantaneous channel power relative to the average channel power, |h[n]|
2v2
in dB as
a function of n. Thus, 0 dB corresponds to the average value of 2v 2 . You will occasionally see
sharp dips in the power, which are termed deep fades.
4) Define the channel phase θ[n] = angle(h[n]) = tan−1 hhsc [n]
[n]
. Plot θ[n] versus n. Compare with
the plot in part 3; you should see sharp phase changes corresponding to deep fades.
QPSK in Rayleigh fading
Now, implement the model (6.93), where {b[n]} correspond to Gray coded QPSK, using an AR(1)
simulation of Rayleigh fading as in (a). Assume that the receiver has perfect knowledge of the
channel gains {h[n]}, and employs the decision statistic Z[n] = h∗ [n]y[n].
Remark: In practice, the channel estimation required for implementing this is achieved by insert-
ing pilot symbols periodically into the data stream. The performance will, of course, be worse
than with the ideal channel estimates considered here.
5) Do scatter plots of the two-dimensional received symbols {y[n]}, and of the decision statistics
{Z[n]}. What does multiplying by h∗ [n] achieve?
6) Implement a decision rule for the bits encoded in the QPSK symbols based on the statistics
{Z[n]}. Estimate by simulation, and plot, the bit error probability (log scale) as a function of
the average Eb /N0 (dB), where Eb /N0 ranges from 0 to 30 dB. Use at least 10,000 symbols for
your estimate. On the same plot, also plot the analytical bit error probability as a function of
Eb /N0 when there is no fading. You should see a marked degradation due to fading. How do
you think the error probability in fading varies with Eb /N0 ?
Relating simulation parameters to Eb /N0 : The average symbol energy is Es = E[|b[n]|2 ]E[|h[n]|2 ],
and Eb = logEsM . This is a function of the constellation scaling and the parameters β 2 and ρ in
2
the fading simulator (see (b)). You can therefore fix Es , and hence Eb , by fixing β, ρ (e.g., as in
part 2), and fix the scaling of the {b[n]} (e.g., keep the constellation points as ±1 ± j). Eb /N0
can now be varied by varying the variance σ 2 of the noise in (6.93).
Diversity
The severe degradation due to Rayleigh fading can be mitigated by using diversity: the proba-
bility that two paths are simultaneously in a deep fade is less likely than the probability that a
single path is in a deep fade. Consider a receive antenna diversity system, where the received
signals y1 and y2 at the two antennas are given by
Thus, you get two looks at the data stream, through two different channels.
Implement the two-fold diversity system in (6.95) as you implemented (6.93), keeping the fol-
lowing in mind:
• The noises w1 and w2 are independent white noise sequences with variance σ 2 = N20 per di-
mension as before.
• The channels h1 and h2 are generated by passing independent white noise streams through a
first-order recursive filter. In relating the simulation parameters to Eb /N0 , keep in mind that the
average symbol energy now is Es = E[|b[n]|2 ]E[|h1 [n]|2 + |h2 [n]|2 ].
• Use the following maximal ratio combining rule to obtain the decision statistic
357
The decision statistic above can be written as
where w̃[n] is zero mean complex Gaussian with variance σ 2 (|h1 [n]|2 + |h2 [n]|2 ) per dimension.
Thus, the instantaneous SNR is given by
h i
2
E |(|h1 [n]|2 + |h2 [n]|2 )b[n]| |h1 [n]|2 + |h2 [n]|2 E[|b[n]|2 ]
SNR[n] = =
E [|w̃[n]|2 ] 2σ 2
7) Plot |h1 [n]|2 + |h2 [n]|2 in dB as a function of n, with 0 dB representing the average value as
before. You should find that the fluctuations around the average are less than in part 3.
8) Implement a decision rule for the bits encoded in the QPSK symbols based on the statistics
{Z2 [n]}. Estimate by simulation, and plot (on the same plot as in part 5), the bit error proba-
bility (log scale) as a function of the average Eb /N0 (dB), where Eb /N0 ranges from 0 to 30 dB.
Use at least 10,000 symbols for your estimate. You should see an improvement compared to the
situation with no diversity.
Lab Report: Your lab report should document the results of the preceding steps in order.
Describe the reasoning you used and the difficulties you encountered.
Bonus: A Glimpse of differential modulation and demodulation
Throughout this chapter, we have assumed that a noiseless “template”” for the set of possible
transmitted signals is available at the receiver. In the present context, it means assuming that
estimates for the time-varying fading channel are available. But what is these estimates, which
we used to generate the decision statistics earlier in this lab, are not available? One approach that
avoids the need for explicit channel estimation is based on exploiting the fact that the channel
does not change much from symbol to symbol. Let us illustrate this for the case of QPSK. The
model is exactly as in (6.93) or (6.95), but the channel sequence(s) is(are) unknown a priori. This
necessitates encoding the data in a different way. Specifically, let d[n] be a Gray coded QPSK
information sequence, which contains information about the bits of interest. Instead of sending
d[n] directly, we generate the transmitted sequence b[n] by differential encoding as follows:
(You can initialize b(0) as any element of the constellation, known by agreement to both trans-
mitter and receiver. Or, just ignore the first information symbol in your demodulation). At
the receiver, use differential demodulation to generate the decision statistic for the information
symbol d[n] as follows:
Z nc [n] = y[n]y ∗[n − 1] single path
358
6.A Irrelevance of component orthogonal to signal space
Conditioning on Hi , we have y(t) = si (t)+n(t). The component of the received signal orthogonal
to the signal space is given by
n−1
X n−1
X
⊥
y (t) = y(t) − yS (t) = y(t) − Y [k]ψk (t) = si (t) + n(t) − (si [k] + N[k]) ψk (t)
k=0 k=0
What we have just shown is that the component of the received signal orthogonal to the signal
space contains the noise component n⊥ only, and thus does not depend on which signal is sent
under a given hypothesis. Since n⊥ is independent of N, the noise vector in the signal space,
knowing n⊥ does not provide any information about N. These two observations imply that y ⊥
is irrelevant for our hypothesis problem. The preceding discussion is illustrated in Figure 6.9,
and enables us to reduce our infinite-dimensional problem to a finite-dimensional vector model
restricted to the signal space.
Note that our irrelevance argument depends crucially on the property of WGN that its projec-
tions along orthogonal directions are independent. Even though y ⊥ does not contain any signal
component (since these by definition fall into the signal space), if n⊥ and N exhibited statis-
tical dependence, one could hope to learn something about N from n⊥ , and thereby improve
performance compared to a system in which y ⊥ is thrown away. However, since n⊥ and N are
independent for WGN, we can restrict attention to the signal space for our hypothesis testing
problem.
359